How to use `get_params()`

I have a scikit-learn pipeline defined in the code below.

```python
from sklearn.pipeline import make_pipeline, make_union
from sklearn.preprocessing import OneHotEncoder, Binarizer
from sklearn.impute import SimpleImputer
from skrub import SelectCols
from sklearn.ensemble import HistGradientBoostingClassifier

feat_pipe = make_union(
    make_pipeline(
        SelectCols(["pclass", "sex"]),
        OneHotEncoder(sparse_output=False)
    ),
    SelectCols(["fare", "age"])
)

pipe = make_pipeline(
    feat_pipe, 
    HistGradientBoostingClassifier()
)
```

When I ask for the params of said pipeline I can see a long list of names that I can refer to when I do hyperparameter tuning. 

```python
pipe.get_params()
```

The list is long, but that is because it is nice and elaborate. 

```python
{'memory': None,
 'steps': [('featureunion',
   FeatureUnion(transformer_list=[('pipeline',
                                   Pipeline(steps=[('selectcols',
                                                    SelectCols(cols=['pclass',
                                                                     'sex'])),
                                                   ('onehotencoder',
                                                    OneHotEncoder(sparse_output=False))])),
                                  ('selectcols',
                                   SelectCols(cols=['fare', 'age']))])),
  ('histgradientboostingclassifier', HistGradientBoostingClassifier())],
 'verbose': False,
 'featureunion': FeatureUnion(transformer_list=[('pipeline',
                                 Pipeline(steps=[('selectcols',
                                                  SelectCols(cols=['pclass',
                                                                   'sex'])),
                                                 ('onehotencoder',
                                                  OneHotEncoder(sparse_output=False))])),
                                ('selectcols',
                                 SelectCols(cols=['fare', 'age']))]),
 'histgradientboostingclassifier': HistGradientBoostingClassifier(),
 'featureunion__n_jobs': None,
 'featureunion__transformer_list': [('pipeline',
   Pipeline(steps=[('selectcols', SelectCols(cols=['pclass', 'sex'])),
                   ('onehotencoder', OneHotEncoder(sparse_output=False))])),
  ('selectcols', SelectCols(cols=['fare', 'age']))],
 'featureunion__transformer_weights': None,
 'featureunion__verbose': False,
 'featureunion__verbose_feature_names_out': True,
 'featureunion__pipeline': Pipeline(steps=[('selectcols', SelectCols(cols=['pclass', 'sex'])),
                 ('onehotencoder', OneHotEncoder(sparse_output=False))]),
 'featureunion__selectcols': SelectCols(cols=['fare', 'age']),
 'featureunion__pipeline__memory': None,
 'featureunion__pipeline__steps': [('selectcols',
   SelectCols(cols=['pclass', 'sex'])),
  ('onehotencoder', OneHotEncoder(sparse_output=False))],
 'featureunion__pipeline__verbose': False,
 'featureunion__pipeline__selectcols': SelectCols(cols=['pclass', 'sex']),
 'featureunion__pipeline__onehotencoder': OneHotEncoder(sparse_output=False),
 'featureunion__pipeline__selectcols__cols': ['pclass', 'sex'],
 'featureunion__pipeline__onehotencoder__categories': 'auto',
 'featureunion__pipeline__onehotencoder__drop': None,
 'featureunion__pipeline__onehotencoder__dtype': numpy.float64,
 'featureunion__pipeline__onehotencoder__feature_name_combiner': 'concat',
 'featureunion__pipeline__onehotencoder__handle_unknown': 'error',
 'featureunion__pipeline__onehotencoder__max_categories': None,
 'featureunion__pipeline__onehotencoder__min_frequency': None,
 'featureunion__pipeline__onehotencoder__sparse_output': False,
 'featureunion__selectcols__cols': ['fare', 'age'],
 'histgradientboostingclassifier__categorical_features': 'warn',
 'histgradientboostingclassifier__class_weight': None,
 'histgradientboostingclassifier__early_stopping': 'auto',
 'histgradientboostingclassifier__interaction_cst': None,
 'histgradientboostingclassifier__l2_regularization': 0.0,
 'histgradientboostingclassifier__learning_rate': 0.1,
 'histgradientboostingclassifier__loss': 'log_loss',
 'histgradientboostingclassifier__max_bins': 255,
 'histgradientboostingclassifier__max_depth': None,
 'histgradientboostingclassifier__max_features': 1.0,
 'histgradientboostingclassifier__max_iter': 100,
 'histgradientboostingclassifier__max_leaf_nodes': 31,
 'histgradientboostingclassifier__min_samples_leaf': 20,
 'histgradientboostingclassifier__monotonic_cst': None,
 'histgradientboostingclassifier__n_iter_no_change': 10,
 'histgradientboostingclassifier__random_state': None,
 'histgradientboostingclassifier__scoring': 'loss',
 'histgradientboostingclassifier__tol': 1e-07,
 'histgradientboostingclassifier__validation_fraction': 0.1,
 'histgradientboostingclassifier__verbose': 0,
 'histgradientboostingclassifier__warm_start': False}
```

The reason why this is nice is that it allows me to be very specific. I can tune each input argument of every component like `featureunion__pipeline__selectcols__cols` or `featureunion__pipeline__onehotencoder__sparse_output`. This is very nice for gridsearch!

```python
from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(
    pipe, 
    param_grid={
        "featureunion__pipeline__onehotencoder__min_frequency": [None, 1, 5, 10]
    }
)

grid.fit(X, y)
```

The cool thing about this is that I am able to get a nice table as output too.

```python
import pandas as pd

pd.DataFrame(grid.cv_results_).to_markdown()
```

|    |   mean_fit_time |   std_fit_time |   mean_score_time |   std_score_time |   param_featureunion__pipeline__onehotencoder__min_frequency | params                                                         |   split0_test_score |   split1_test_score |   split2_test_score |   split3_test_score |   split4_test_score |   mean_test_score |   std_test_score |   rank_test_score |
|---:|----------------:|---------------:|------------------:|-----------------:|-------------------------------------------------------------:|:---------------------------------------------------------------|--------------------:|--------------------:|--------------------:|--------------------:|--------------------:|------------------:|-----------------:|------------------:|
|  0 |        0.557284 |     0.0364319  |        0.0053968  |      0.00091813  |                                                          nan | {'featureunion__pipeline__onehotencoder__min_frequency': None} |            0.515267 |            0.774809 |            0.637405 |            0.709924 |            0.636015 |          0.654684 |        0.0866783 |                 1 |
|  1 |        0.567849 |     0.0222483  |        0.00532556 |      0.000495336 |                                                            1 | {'featureunion__pipeline__onehotencoder__min_frequency': 1}    |            0.515267 |            0.774809 |            0.637405 |            0.709924 |            0.636015 |          0.654684 |        0.0866783 |                 1 |
|  2 |        0.567496 |     0.00920872 |        0.00557318 |      0.000404766 |                                                            5 | {'featureunion__pipeline__onehotencoder__min_frequency': 5}    |            0.515267 |            0.774809 |            0.637405 |            0.709924 |            0.636015 |          0.654684 |        0.0866783 |                 1 |
|  3 |        0.553523 |     0.023475   |        0.0052145  |      0.000855578 |                                                           10 | {'featureunion__pipeline__onehotencoder__min_frequency': 10}   |            0.515267 |            0.774809 |            0.637405 |            0.709924 |            0.636015 |          0.654684 |        0.0866783 |                 1 |

But when I look at IbisML I wonder if I am able to do the same thing. 

```python
import ibis_ml as iml

tfm = iml.Recipe(
    iml.ExpandDateTime(iml.date())
)
```

In IbisML it is the `Recipe` object that is scikit-learn compatible, not the `ExpandDateTime` object. So lets inspect. 

```python
tfm.get_params()
```

This yields the following.

```python
{'steps': (ExpandDateTime(date(),
                 components=['dow', 'month', 'year', 'hour', 'minute', 'second']),),
 'expanddatetime': ExpandDateTime(date(),
                components=['dow', 'month', 'year', 'hour', 'minute', 'second'])}
```

In fairness, this is not completely unlike what scikit-learn does natively. In a pipeline in scikit-learn you also have access to the `steps` argument and you could theoretically make all the changes there directly by passing in new subpipelines. But there is a reason why scikit-learn does not stop there! It can go deeper into all the input arguments of all the estimators in the pipeline because it makes the end `cv_results_` output a lot nicer. And this is where I worry if IbisML can do the same thing. It seems that I need to pass full objects, instead of being able to pluck out the individual attributes that I care about. 

In this particular case, what if I want to measure the effect of including/excluding `dow` or the `hour`? Is that possible? Can I have an understore-syntax-like string just like in scikit-learn to configure that? Or do I need to overwrite the steps object?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use `get_params()` #135

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	mean_fit_time	std_fit_time	mean_score_time	std_score_time	param_featureunion__pipeline__onehotencoder__min_frequency	params	split0_test_score	split1_test_score	split2_test_score	split3_test_score	split4_test_score	mean_test_score	std_test_score	rank_test_score
0	0.557284	0.0364319	0.0053968	0.00091813	nan	{'featureunion__pipeline__onehotencoder__min_frequency': None}	0.515267	0.774809	0.637405	0.709924	0.636015	0.654684	0.0866783	1
1	0.567849	0.0222483	0.00532556	0.000495336	1	{'featureunion__pipeline__onehotencoder__min_frequency': 1}	0.515267	0.774809	0.637405	0.709924	0.636015	0.654684	0.0866783	1
2	0.567496	0.00920872	0.00557318	0.000404766	5	{'featureunion__pipeline__onehotencoder__min_frequency': 5}	0.515267	0.774809	0.637405	0.709924	0.636015	0.654684	0.0866783	1
3	0.553523	0.023475	0.0052145	0.000855578	10	{'featureunion__pipeline__onehotencoder__min_frequency': 10}	0.515267	0.774809	0.637405	0.709924	0.636015	0.654684	0.0866783	1

How to use get_params() #135

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

How to use `get_params()` #135