Skip to content

[OTHER] Feedback on reduction to regression get_sklearn_wrapper #46

@mloning

Description

@mloning

Hi, following our conversion yesterday, I have two questions about the get_sklearn_wrapper:

1. API: Why do you dynamically create the init and pass the class rather than simply using composition and passing the object?

forecaster = get_sklearn_wrapper(DummyRegressor, lags=3)  # you do this
forecaster = get_sklearn_wrapper(DummyRegressor(), lags=3)  # why not this?
  • This would avoid the issues you seem to have with local class definitions and pickling
  • This would allow you to pass composite models like pipelines or GridSearchCV into the wrapper
  • I really like the dynamic init creator, but my intuition tells me it's misapplied here.

2. Algorithm: Why not use standard recursive strategy?

from hcrystalball.wrappers import get_sklearn_wrapper
from sklearn.dummy import DummyRegressor
import pandas as pd
import numpy as np

index = pd.date_range("2000", periods=13, freq="Y")
y_train = pd.Series(np.arange(10), index=index[:-3])
X_train = pd.DataFrame(index=y_train.index)
X_test = pd.DataFrame(index=index[-3:])

model = get_sklearn_wrapper(DummyRegressor, lags=3)
model.fit(X_train, y_train)
model.predict(X_test)
# >>> 2010-12-31	7.0
# >>> 2011-12-31	7.0
# >>> 2012-12-31	7.0

# you use the first 3 values as lagged variables, the DummyRegressor simply computes the mean of the rest
# so shouldn't the result be the following?
y_train.iloc[3:].mean()  # >>> 6.0

# the problem seems to be in the way you generate the target series 
# you increase the gap between lagged variables and target to match the length of 
# forecasting horizon 
X, y = model._transform_data_to_tsmodel_input_format(X_train, y_train, len(X_test))
pd.concat([X, pd.Series(y, index=X.index, name="y")], axis=1).head()
#	lag_0	lag_1	lag_2	y
#5	2.0	1.0	0.0	5
#6	3.0	2.0	1.0	6
#7	4.0	3.0	2.0	7
#8	5.0	4.0	3.0	8
#9	6.0	5.0	4.0	9

Hope this helps!

Metadata

Metadata

Labels

questionFurther information is requested

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions