Skip to content

[QUESTION] ValueError with add_encoders Custom Function Returning Multiple Components #2806

@Terrevue

Description

@Terrevue

I'm encountering a ValueError when using a custom function with add_encoders that returns a pandas DataFrame with multiple components (20 features in my case). I believe the custom function itself is correctly producing the DataFrame, but an issue arises during Darts' internal conversion to a TimeSeries.

Error Message:

ValueError: conflicting sizes for dimension 'component': length 20 on the data but length 1 on coordinate 'component'
This error occurs during model.fit(), originating from xarray.core.dataarray.py when a TimeSeries is being created internally from my custom encoder's output.

Setup:
I'm using add_encoders with a 'custom' key, providing a list of callable functions. One of these functions, let's call it encode_calendar_features, generates 20 calendar-related features.

# Simplified add_encoders structure in my model configuration
self.add_encoders = {
    # ... other potential built-in encoders like 'cyclic', 'datetime_attribute' ...
    'custom': {
        'past': [self.encode_calendar_features, self.encode_weekend_features], # encode_calendar_features returns 20 components
        'future': [self.encode_calendar_features, self.encode_weekend_features]
    },
    'transformer': Scaler() # Global scaler for all generated covariates
}

# My custom encoder function structure
def encode_calendar_features(self, idx: pd.DatetimeIndex) -> pd.DataFrame:
    df = pd.DataFrame(index=idx)
    # ... logic to create 20 feature columns ...
    # Example: df['feature_1'] = idx.year
    #          df['feature_2'] = idx.month
    #          ... (up to 20 features)
    return df

Debugging Done So Far:

Verified Custom Function Output: I've confirmed that my encode_calendar_features function correctly returns a pandas.DataFrame with the expected shape (e.g., (num_samples, 20)) and df.columns is a pandas.Index of 20 unique string names.

--- Debugging encode_calendar_features ---
df.shape: (1008, 20)
df.columns: Index(['is_holiday', 'holiday_new_year', ..., 'is_post_thanksgiving_week'], dtype='object')
type(df.columns): <class 'pandas.core.indexes.base.Index'>
len(df.columns): 20
df.columns.tolist(): ['is_holiday', ..., 'is_post_thanksgiving_week']
--- End Debugging ---

The error suggests that while the data for 20 components is present, xarray (via Darts' TimeSeries.from_times_and_values) is being told that the 'component' coordinate only has a length of 1. This occurs even though Darts' CallableStaticTransformer (which I believe wraps DataFrame-returning custom functions) seems designed to use df.values and df.columns correctly.

Question:

Is there a known issue or a specific way a custom function in add_encoders should return a multi-column DataFrame to ensure Darts correctly interprets it as a multi-component TimeSeries without this coordinate mismatch?

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestUse this label to request a new feature

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions