[ODSC-76829/76830] : improve auto-select logic and handle missing data#1259
[ODSC-76829/76830] : improve auto-select logic and handle missing data#1259
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
250585d to
a57b872
Compare
ahosler
left a comment
There was a problem hiding this comment.
A couple of comments. Just needs a bit of polish
| if target_col not in data.columns: | ||
| raise ValueError(f"Target column '{target_col}' not found in DataFrame") | ||
|
|
||
| data[target_col] = data[target_col].fillna(0) |
There was a problem hiding this comment.
why fillna with 0? why no backfill? Did we discuss this?
Don't we already have this covered in pre-processing steps? What are we gaining from this?
There was a problem hiding this comment.
This is independent of the pre-processing, we want this to be done even if the pre-processing is not set/enabled. This helps us to create the meta-features incase we have nan's in the dataset. Filling with other values is not validated yet, can be experimented and documented to see the impact.
| ): | ||
|
|
||
| operator_config.spec.model = AUTO_SELECT | ||
| model = ForecastOperatorModelFactory.get_model(operator_config, datasets) |
There was a problem hiding this comment.
nice!
Can we reflect this in the report? Make sure it's still saying "auto-select-series".
Can we add a unit test for this?
There was a problem hiding this comment.
Added log message we are falling back to auto select, populating the report with method as auto-select-series would give a wrong impression to the user and I feel it should be avoided , enabled the test-case
Improve auto-select logic and handle missing data
This commit introduces following to the forecasting operator.
Improved
AUTO_SELECT_SERIESLogic:AUTO_SELECTmodel has been implemented for cases whereAUTO_SELECT_SERIESis used without specifyingtarget_category_columns.Missing Data Handling:
build_fforms_meta_featuresfunction now fills missing values in the target column with zeros. This prevents errors during meta-feature calculation when the data contains NaNs.New Test Case:
auto-select-seriesmodel functions correctly with datasets containing missing values.