Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling Othrolearners using Ray #793

Closed
v-shaal opened this issue Jul 17, 2023 · 6 comments
Closed

Scaling Othrolearners using Ray #793

v-shaal opened this issue Jul 17, 2023 · 6 comments

Comments

@v-shaal
Copy link
Contributor

v-shaal commented Jul 17, 2023

Currently its challenging to scale Ortholearners for a large dataset as via current implementation of _crossfit is sequential which may not be efficient for large datapoints.
To over come this we can use Ray Remote function (Ray Tasks) for remote and asynchronous invocations of each of the K folds simultaneously on separate Python workers.

This can be done via simply modifying the _crossfit in _ortho_learner.py.
We conducted a performance analysis of the EconML implementation of DML and our version of DML_Ray at varying scales (10k, 100k, and 1Million) of treated units and using approximately 500 covariates generated by a synthetic data generator API sourced from https://github.com/py-why/dowhy/blob/main/dowhy/datasets.py

Here's the link of the Implementation of DML scaled via Ray that I have created. Let me know your thoughts .
@amit-sharma @emrekiciman

https://gist.github.com/vishal-d11/cd886eb6bdff96ad5a04711cb18339ed#file-dml_ray-ipynb

@fverac
Copy link
Collaborator

fverac commented Jul 19, 2023

Thanks for sharing. Would you be able to share your findings from the performance analysis?

@v-shaal
Copy link
Contributor Author

v-shaal commented Jul 21, 2023

@fverac yes we have done the performance analysis , we were able to run 1M units with about 500 covariates in ~7-8 Minutes over ray based implementation vs more than ~40min on current implementation on EC2-High Mem Node
Screenshot 2023-07-21 at 1 03 31 PM

@vsyrgkanis
Copy link
Collaborator

This is a great achievement @v-shaal ! I think if there is a way to seamlessly incorporate this Ray Remote function framework, we should strongly consider it!

Do you know what it would take to incorporate in the library? Would you be willing to submit a PR with this improvement?

@v-shaal
Copy link
Contributor Author

v-shaal commented Jul 21, 2023

@fverac @vsyrgkanis , I would be glad to work on this and raise a PR. I am currently going over the current structuring to figure out the best possible way to incorporate this with minimal changes to existing code structuring. let me know if you guys have any suggestions.

@v-shaal
Copy link
Contributor Author

v-shaal commented Jul 26, 2023

@vsyrgkanis can you please assign this to me.

@v-shaal
Copy link
Contributor Author

v-shaal commented Aug 2, 2023

@vsyrgkanis @fverac @kbattocchi , I've raised PR for this , kindly review and let me know the feedback

@v-shaal v-shaal closed this as completed Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants