Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding SimpleImputer, IterativeImputer and KNNImputer to the config space. #142

Merged
merged 5 commits into from
Jul 29, 2024

Conversation

gketronDS
Copy link
Member

What does this PR do?

Adds sklearn.impute functions into TPOT2's configuration search space.

Where should the reviewer start?

See changes below to get_configspace.py and imputers.py, there are not many.

How should this PR be tested?

Simple, Iterative, and KNN Imputers passed pytest for the config folder. I wasn't able to install scikit-learn-intelex to confirm it passes everything in Tpot2.

Any background context you want to provide?

Allows us to soft-code preprocessing, and allow for additional preprocessing optimization within TPOT.

What are the relevant issues?

N/A

Questions:

  • Do the docs need to be updated?
    May need to be added to the next version update.
  • Does this PR add new (Python) dependencies?
    No, just uses scikit-learn.

Copy link
Collaborator

@perib perib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some estimators with the iterative imputation search space return fail when sample_posterior=True. The estimator must support "return_std" in the predict function for if sample_posterior=True.

I would recommend either add a conditional to only search sample_posterior=True if BayesianRidge is selected, or just keep sample_posterior to the default value of False.

@gketronDS
Copy link
Member Author

Comments from Pedro: "Putting the conditional in the parser is tricky because it means that posterior=True or False is equivalent when using the other three methods, but TPOT may not know this and evaluate it twice. instead, I would recommend putting in the config_space function similar to how it was done in the logistic regression example."

Fixed in [Conditional Sample Posterior Added for Iterative Imputer]

@perib perib merged commit ed95419 into EpistasisLab:dev Jul 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants