I'm encountering a ValueError when using a custom function with add_encoders that returns a pandas DataFrame with multiple components (20 features in my case). I believe the custom function itself is correctly producing the DataFrame, but an issue arises during Darts' internal conversion to a TimeSeries.
Error Message:
ValueError: conflicting sizes for dimension 'component': length 20 on the data but length 1 on coordinate 'component'
This error occurs during model.fit(), originating from xarray.core.dataarray.py when a TimeSeries is being created internally from my custom encoder's output.
Setup:
I'm using add_encoders with a 'custom' key, providing a list of callable functions. One of these functions, let's call it encode_calendar_features, generates 20 calendar-related features.
# Simplified add_encoders structure in my model configuration
self.add_encoders = {
# ... other potential built-in encoders like 'cyclic', 'datetime_attribute' ...
'custom': {
'past': [self.encode_calendar_features, self.encode_weekend_features], # encode_calendar_features returns 20 components
'future': [self.encode_calendar_features, self.encode_weekend_features]
},
'transformer': Scaler() # Global scaler for all generated covariates
}
# My custom encoder function structure
def encode_calendar_features(self, idx: pd.DatetimeIndex) -> pd.DataFrame:
df = pd.DataFrame(index=idx)
# ... logic to create 20 feature columns ...
# Example: df['feature_1'] = idx.year
# df['feature_2'] = idx.month
# ... (up to 20 features)
return df
Debugging Done So Far:
Verified Custom Function Output: I've confirmed that my encode_calendar_features function correctly returns a pandas.DataFrame with the expected shape (e.g., (num_samples, 20)) and df.columns is a pandas.Index of 20 unique string names.
--- Debugging encode_calendar_features ---
df.shape: (1008, 20)
df.columns: Index(['is_holiday', 'holiday_new_year', ..., 'is_post_thanksgiving_week'], dtype='object')
type(df.columns): <class 'pandas.core.indexes.base.Index'>
len(df.columns): 20
df.columns.tolist(): ['is_holiday', ..., 'is_post_thanksgiving_week']
--- End Debugging ---
The error suggests that while the data for 20 components is present, xarray (via Darts' TimeSeries.from_times_and_values) is being told that the 'component' coordinate only has a length of 1. This occurs even though Darts' CallableStaticTransformer (which I believe wraps DataFrame-returning custom functions) seems designed to use df.values and df.columns correctly.
Question:
Is there a known issue or a specific way a custom function in add_encoders should return a multi-column DataFrame to ensure Darts correctly interprets it as a multi-component TimeSeries without this coordinate mismatch?
I'm encountering a ValueError when using a custom function with add_encoders that returns a pandas DataFrame with multiple components (20 features in my case). I believe the custom function itself is correctly producing the DataFrame, but an issue arises during Darts' internal conversion to a TimeSeries.
Error Message:
ValueError: conflicting sizes for dimension 'component': length 20 on the data but length 1 on coordinate 'component'
This error occurs during model.fit(), originating from xarray.core.dataarray.py when a TimeSeries is being created internally from my custom encoder's output.
Setup:
I'm using add_encoders with a 'custom' key, providing a list of callable functions. One of these functions, let's call it encode_calendar_features, generates 20 calendar-related features.
Debugging Done So Far:
Verified Custom Function Output: I've confirmed that my encode_calendar_features function correctly returns a pandas.DataFrame with the expected shape (e.g., (num_samples, 20)) and df.columns is a pandas.Index of 20 unique string names.
--- Debugging encode_calendar_features ---
df.shape: (1008, 20)
df.columns: Index(['is_holiday', 'holiday_new_year', ..., 'is_post_thanksgiving_week'], dtype='object')
type(df.columns): <class 'pandas.core.indexes.base.Index'>
len(df.columns): 20
df.columns.tolist(): ['is_holiday', ..., 'is_post_thanksgiving_week']
--- End Debugging ---
The error suggests that while the data for 20 components is present, xarray (via Darts' TimeSeries.from_times_and_values) is being told that the 'component' coordinate only has a length of 1. This occurs even though Darts' CallableStaticTransformer (which I believe wraps DataFrame-returning custom functions) seems designed to use df.values and df.columns correctly.
Question:
Is there a known issue or a specific way a custom function in add_encoders should return a multi-column DataFrame to ensure Darts correctly interprets it as a multi-component TimeSeries without this coordinate mismatch?