Skip to content

[ENH] Expose Encoder Scaling Parameters in TimeSeriesDataSet + Provide Inverse‑Scaling Utility #2271

@cngmid

Description

@cngmid

Summary

TimeSeriesDataSet currently applies scaling to encoder continuous variables internally, but the scaling parameters (e.g., means, stds, per‑sample scales) are not exposed in a way that allows users to invert the transformation. This limits interpretability, debugging, and downstream analysis.

I propose adding:

1. Public access to encoder scaling parameters

2. Support for user‑provided per‑feature scalers

3. A clean inverse‑scaling utility that works for both dataset indexing and dataloader batches

I have a working implementation and a demonstration notebook and can open a PR once the design is approved.


Motivation

Many forecasting workflows require inspecting model inputs and outputs in the original data space.

Currently:

  • TimeSeriesDataSet applies scaling (e.g., EncoderNormalizer, StandardScaler, or custom scalers) to real inputs, but the resulting scale parameters are not exposed in a structured way

  • Users cannot reliably invert the transformation for encoder continuous variables.

This makes it difficult to:

  • debug model behavior

  • visualize inputs/outputs

  • compare predictions to raw data


Proposed Enhancement

1. Expose encoder scaling parameters

For each continuous variable, expose:

  • the computed scale parameters (if using internal normalizers)

  • the index mapping between features and scale tensors

This could be added as attributes on the dataset, e.g.:

    def __getitem__():
        ...
        return (
            dict(
                x_cat=data_cat,
                x_cont=data_cont,
                encoder_length=encoder_length,
                decoder_length=decoder_length,
                encoder_target=encoder_target,
                encoder_time_idx_start=time[0],
                groups=groups,
                target_scale=target_scale,
                
                # add input scales as tensors
                x_scale_idx=data_scale_idx,
                x_scale=data_scale,
            ),
            (target, weight),
        )

and on the dataloader:

    def _collate_fn()
        ...
        return (
            dict(
                encoder_cat=encoder_cat,
                encoder_cont=encoder_cont,
                encoder_target=encoder_target,
                encoder_lengths=encoder_lengths,
                decoder_cat=decoder_cat,
                decoder_cont=decoder_cont,
                decoder_target=target,
                decoder_lengths=decoder_lengths,
                decoder_time_idx=decoder_time_idx,
                groups=groups,
                target_scale=target_scale,
                
                # add encoder scale
                encoder_scale_idx=encoder_scale_idx,
                encoder_scale=encoder_scale,
            ),
            (target, weight),
        )

2. Provide an inverse‑scaling utility (optional)

A function such as TimeSeriesDataSet.inverse_transform(batch_or_item) that handles:

  • StandardScaler

  • EncoderNormalizer

  • per‑sample scale parameters

  • both dataset indexing and dataloader batches

I have implemented a working version and validated it across all cases.


Example Notebook

I prepared a small notebook demonstrating:

  • a toy dataset with mixed scaling

  • a modified TimeSeriesDataSet supporting per‑feature scalers

  • a robust inverse‑scaling function

  • validation that inverse‑scaling reconstructs the original data

  • correct behavior for both dataset items and dataloader batches

I can attach the notebook if helpful.


Benefits

  • Enables full interpretability of model inputs/outputs

  • Makes debugging significantly easier

  • Provides a clean API for advanced users

  • No breaking changes to existing code


Request

Please let me know whether the proposed API direction aligns with project goals.

I’m happy to refine the design based on your feedback and/or submit a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions