multioutput problem #971

kradant · 2019-12-03T12:30:53Z

I am working on a multioutput regression problem, that is the target values have more than 1-dim.
A number of regressors from scikit-learn can only be used for multi output problems when used with the class MultiOutputRegressor (see especially https://scikit-learn.org/stable/modules/multiclass.html#multioutput-regression).

MultiOutputRegressor takes a regressor as an argument and fits then one regressor per target. In this way most single-output regressors can deal multidimensional output. So I want to use it with TPOT.

The issue #903 deals with the changes that must be applied to base.py (which I did and this worked fine) in order to work for multiple output. But neither in #747, #810 nor in #903 it was clarified how to actually use MultiOutputRegressor with several regressors.

This is my config dictionary:
custom_regressor_config_dict = { 'sklearn.multioutput.MultiOutputRegressor': { 'estimator': {'sklearn.linear_model.ElasticNetCV': { 'l1_ratio': np.arange(0.0, 1.01, 0.05), 'tol': [1e-5, 1e-4, 1e-3, 1e-2, 1e-1]}}}

Now I want to insert more regressors. But I tried to put them in a list or in dictionary and whatever, for example I tried

'sklearn.multioutput.MultiOutputRegressor': { 'estimator': [ {'sklearn.linear_model.ElasticNetCV': { 'l1_ratio': np.arange(0.0, 1.01, 0.05), 'tol': [1e-5, 1e-4, 1e-3, 1e-2, 1e-1]}}, {'sklearn.ensemble.AdaBoostRegressor': { 'n_estimators': [100], 'learning_rate': [1e-3, 1e-2, 1e-1, 0.5, 1.], 'loss': ["linear", "square", "exponential"]}} ] }

and it always throws an error like:

RuntimeError: There was an error in the TPOT optimization process. This could be because the data was not formatted properly[...]

How to format it properly so that I can use more models? Is there a workaround? Thank you!

The text was updated successfully, but these errors were encountered:

weixuanfu · 2019-12-03T14:17:20Z

TPOT currently do not support multi-output regression and current configuration did not support more than one estimator option (similar to #956) within this kind of meta estimators, but we are working on adding supports for > 1 estimators and I think that is the first step for supporting multi-output regression.

kradant · 2019-12-03T15:16:08Z

Do you know an alternative tool like TPOT that can do multi-output regression? Or a workaround? Like starting a loop with TPOT and changing each time the estimator?

jhmenke · 2019-12-03T15:43:14Z

so using only the regressors that natively perform multi-output regression is not viable for now? because that seems to work already with some slight modifications (changing some sklearn metric iirc)

weixuanfu · 2019-12-03T15:55:45Z

so using only the regressors that natively perform multi-output regression is not viable for now? because that seems to work already with some slight modifications (changing some sklearn metric iirc)

Hmm, maybe it is practical workaround. Could you please share the modifications with a demo via pull request?

kradant · 2019-12-03T17:07:26Z

so using only the regressors that natively perform multi-output regression is not viable for now? because that seems to work already with some slight modifications (changing some sklearn metric iirc)

Now I am a bit confused :) I already implemented the changes of base.py as suggested in #903 . So now I am already able to run TPOT with the regressors that natively perform multi-output regression. I didn't need to adjust metrics or whatever. Am I doing this wrong??
And well yes, I wanted to compare most regressors, I thought this was the whole point about TPOT and automatic ML: to compare a large set of different pipelines/algorithms.

jhmenke · 2019-12-04T14:56:03Z

so using only the regressors that natively perform multi-output regression is not viable for now? because that seems to work already with some slight modifications (changing some sklearn metric iirc)

Hmm, maybe it is practical workaround. Could you please share the modifications with a demo via pull request?

Sure, i'll look it up when i find the time. But i think there were no changes necessary in tpot directly, just a slight modification of a sklearn metric. Will post once i get around to it.

Now I am a bit confused :) I already implemented the changes of base.py as suggested in #903 . So now I am already able to run TPOT with the regressors that natively perform multi-output regression.

Yes that was my question. At least it works for those.

kradant · 2019-12-06T12:05:31Z

I am still not getting it :)

In Flag to allow multioutput. #903 the code changes to use the "native" multioutput regressors is already presented. So why is there a need to

share the modifications with a demo via pull request?

since it already happened?

And if multioutput regressors can be used with adjustments of Flag to allow multioutput. #903 and also we can make use of MultiOutputRegressor (with currently just one estimator), why @weixuanfu are you saying that multioutput isn't supported?

jhmenke · 2019-12-06T12:51:33Z

That pull request is not merged, so it is not native to TPOT and therefore unsupported.

Also if there is a solution that does not require a flag, that would of course be better (again, i'm looking into it when i find the time).

weixuanfu · 2019-12-06T14:05:37Z

Yes, one of the reasons that we did not merge #903 is that we hoped there was a nice solution without the flag. I forgot to push a comment to that PR.

jhmenke · 2020-01-14T14:14:51Z

Can someone confirm that changing this line:

tpot/tpot/base.py

Line 1160 in aea42a5

X, y = check_X_y(features, target, accept_sparse=True, dtype=None)

to

X, y = check_X_y(features, target, accept_sparse=True, dtype=None, multi_output=len(target.shape) > 1 and target.shape[1] > 1)

multioutputs are supported correctly? It seems to work for me and was the only change i made, but i'd rather see it confirmed by someone before making a proper PR.

edit: i just looked up the PR #903 and it seems to be doing the same change, albeit with a manual flag.

windowshopr · 2022-06-27T23:15:13Z

Just found this thread myself, and I'm getting the same error as in #747 which is:

  File "C:\Users\...\tpot\base.py", line 1393, in _check_dataset
    "Error: Input data is not in a valid format. Please confirm "
ValueError: Error: Input data is not in a valid format. Please confirm that the input data is scikit-learn compatible. For example, the features must be a 2-D array and target labels must be a 1-D array.

I know it's not officially supported, but would love to be able to use TPOT for a multi-output regression problem.

For a simple reproducible problem, use this code:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from tpot import TPOTRegressor
from numpy import arange

RANDOM_SEED = 42

X, y = make_regression(n_samples=500,
                       n_features=5,
                       n_informative=2,
                       n_targets=2,
                       shuffle=True,
                       random_state=RANDOM_SEED)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=RANDOM_SEED)

regressor_config_dict = {
    'sklearn.multioutput.MultiOutputRegressor': {
        'estimator': {
            'sklearn.ensemble.ExtraTreesRegressor': {
                'n_estimators': [100],
                'max_features': arange(0.05, 1.01, 0.05)
            }
        }
    }
}

tpot = TPOTRegressor(generations=100, 
                     population_size=100,
                     offspring_size=None, 
                     mutation_rate=0.9,
                     crossover_rate=0.1,
                     scoring='neg_mean_squared_error', 
                     cv=3,
                     subsample=1.0, 
                     n_jobs=4,
                     max_time_mins=None, 
                     max_eval_time_mins=5,
                     random_state=None, 
                     config_dict=regressor_config_dict,
                     template=None,
                     warm_start=False,
                     memory=None,
                     use_dask=True,
                     periodic_checkpoint_folder=None,
                     early_stop=2,
                     verbosity=2,
                     disable_update_check=False)

tpot.fit(X_train, y_train)

preds = tpot.predict([1.0,1.0,1.0])
print(r2_score(y_test, preds))
print(preds)

How can we make this work? Hacky solutions are welcome! :D

weixuanfu added the enhancement label Dec 3, 2019

weixuanfu added the being worked on label Dec 3, 2019

jhmenke mentioned this issue Jan 14, 2020

automatic detection of multioutput datasets #1001

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multioutput problem #971

multioutput problem #971

kradant commented Dec 3, 2019 •

edited

Loading

weixuanfu commented Dec 3, 2019

kradant commented Dec 3, 2019

jhmenke commented Dec 3, 2019

weixuanfu commented Dec 3, 2019

kradant commented Dec 3, 2019 •

edited

Loading

jhmenke commented Dec 4, 2019

kradant commented Dec 6, 2019

jhmenke commented Dec 6, 2019

weixuanfu commented Dec 6, 2019 •

edited

Loading

jhmenke commented Jan 14, 2020 •

edited

Loading

windowshopr commented Jun 27, 2022

multioutput problem #971

multioutput problem #971

Comments

kradant commented Dec 3, 2019 • edited Loading

weixuanfu commented Dec 3, 2019

kradant commented Dec 3, 2019

jhmenke commented Dec 3, 2019

weixuanfu commented Dec 3, 2019

kradant commented Dec 3, 2019 • edited Loading

jhmenke commented Dec 4, 2019

kradant commented Dec 6, 2019

jhmenke commented Dec 6, 2019

weixuanfu commented Dec 6, 2019 • edited Loading

jhmenke commented Jan 14, 2020 • edited Loading

windowshopr commented Jun 27, 2022

kradant commented Dec 3, 2019 •

edited

Loading

kradant commented Dec 3, 2019 •

edited

Loading

weixuanfu commented Dec 6, 2019 •

edited

Loading

jhmenke commented Jan 14, 2020 •

edited

Loading