Summary
TimeSeriesDataSet currently applies scaling to encoder continuous variables internally, but the scaling parameters (e.g., means, stds, per‑sample scales) are not exposed in a way that allows users to invert the transformation. This limits interpretability, debugging, and downstream analysis.
I propose adding:
1. Public access to encoder scaling parameters
2. Support for user‑provided per‑feature scalers
3. A clean inverse‑scaling utility that works for both dataset indexing and dataloader batches
I have a working implementation and a demonstration notebook and can open a PR once the design is approved.
Motivation
Many forecasting workflows require inspecting model inputs and outputs in the original data space.
Currently:
-
TimeSeriesDataSet applies scaling (e.g., EncoderNormalizer, StandardScaler, or custom scalers) to real inputs, but the resulting scale parameters are not exposed in a structured way
-
Users cannot reliably invert the transformation for encoder continuous variables.
This makes it difficult to:
Proposed Enhancement
1. Expose encoder scaling parameters
For each continuous variable, expose:
This could be added as attributes on the dataset, e.g.:
def __getitem__():
...
return (
dict(
x_cat=data_cat,
x_cont=data_cont,
encoder_length=encoder_length,
decoder_length=decoder_length,
encoder_target=encoder_target,
encoder_time_idx_start=time[0],
groups=groups,
target_scale=target_scale,
# add input scales as tensors
x_scale_idx=data_scale_idx,
x_scale=data_scale,
),
(target, weight),
)
and on the dataloader:
def _collate_fn()
...
return (
dict(
encoder_cat=encoder_cat,
encoder_cont=encoder_cont,
encoder_target=encoder_target,
encoder_lengths=encoder_lengths,
decoder_cat=decoder_cat,
decoder_cont=decoder_cont,
decoder_target=target,
decoder_lengths=decoder_lengths,
decoder_time_idx=decoder_time_idx,
groups=groups,
target_scale=target_scale,
# add encoder scale
encoder_scale_idx=encoder_scale_idx,
encoder_scale=encoder_scale,
),
(target, weight),
)
2. Provide an inverse‑scaling utility (optional)
A function such as TimeSeriesDataSet.inverse_transform(batch_or_item) that handles:
I have implemented a working version and validated it across all cases.
Example Notebook
I prepared a small notebook demonstrating:
-
a toy dataset with mixed scaling
-
a modified TimeSeriesDataSet supporting per‑feature scalers
-
a robust inverse‑scaling function
-
validation that inverse‑scaling reconstructs the original data
-
correct behavior for both dataset items and dataloader batches
I can attach the notebook if helpful.
Benefits
-
Enables full interpretability of model inputs/outputs
-
Makes debugging significantly easier
-
Provides a clean API for advanced users
-
No breaking changes to existing code
Request
Please let me know whether the proposed API direction aligns with project goals.
I’m happy to refine the design based on your feedback and/or submit a PR.
Summary
TimeSeriesDataSetcurrently applies scaling to encoder continuous variables internally, but the scaling parameters (e.g., means, stds, per‑sample scales) are not exposed in a way that allows users to invert the transformation. This limits interpretability, debugging, and downstream analysis.I propose adding:
1. Public access to encoder scaling parameters
2. Support for user‑provided per‑feature scalers
3. A clean inverse‑scaling utility that works for both dataset indexing and dataloader batches
I have a working implementation and a demonstration notebook and can open a PR once the design is approved.
Motivation
Many forecasting workflows require inspecting model inputs and outputs in the original data space.
Currently:
TimeSeriesDataSetapplies scaling (e.g., EncoderNormalizer, StandardScaler, or custom scalers) to real inputs, but the resulting scale parameters are not exposed in a structured wayUsers cannot reliably invert the transformation for encoder continuous variables.
This makes it difficult to:
debug model behavior
visualize inputs/outputs
compare predictions to raw data
Proposed Enhancement
1. Expose encoder scaling parameters
For each continuous variable, expose:
the computed scale parameters (if using internal normalizers)
the index mapping between features and scale tensors
This could be added as attributes on the dataset, e.g.:
and on the dataloader:
2. Provide an inverse‑scaling utility (optional)
A function such as
TimeSeriesDataSet.inverse_transform(batch_or_item)that handles:StandardScalerEncoderNormalizerper‑sample scale parameters
both dataset indexing and dataloader batches
I have implemented a working version and validated it across all cases.
Example Notebook
I prepared a small notebook demonstrating:
a toy dataset with mixed scaling
a modified
TimeSeriesDataSetsupporting per‑feature scalersa robust inverse‑scaling function
validation that inverse‑scaling reconstructs the original data
correct behavior for both dataset items and dataloader batches
I can attach the notebook if helpful.
Benefits
Enables full interpretability of model inputs/outputs
Makes debugging significantly easier
Provides a clean API for advanced users
No breaking changes to existing code
Request
Please let me know whether the proposed API direction aligns with project goals.
I’m happy to refine the design based on your feedback and/or submit a PR.