Skip to content

[BUG] Potential sequence length bug in preprocessed record #2265

@Chetansahney

Description

@Chetansahney

Describe the bug

In pytorch_forecasting/data/_tslib_data_module.py, _preprocess_data() sets:

"length": len(series)

But series is a dict-like record, so this returns the number of keys, not the sequence length.
This can produce incorrect metadata for downstream consumers that rely on length.

To Reproduce

import pandas as pd
import numpy as np

from pytorch_forecasting.data import TimeSeries
from pytorch_forecasting.data._tslib_data_module import TslibDataModule

#simple toy series
df = pd.DataFrame(
    {
        "group": np.repeat([0], 20),
        "time": pd.date_range("2020-01-01", periods=20),
        "target": np.random.randn(20),
        "feature": np.random.randn(20),
    }
)

ts = TimeSeries(
    data=df,
    time="time",
    target="target",
    group=["group"],
    num=["feature"],
)

dm = TslibDataModule(
    time_series_dataset=ts,
    context_length=8,
    prediction_length=4,
    batch_size=2,
)
dm.setup()

idx = dm._train_indices[0]
processed = dm._preprocess_data(idx)
original = ts[idx]

print("processed length:", processed["length"])
print("expected length:", len(original["t"]))

Expected behavior

processed["length"] should equal the actual time-series length (e.g., len(original["t"])), not the number of keys in the series record.

Additional context

This can cause subtle bugs where sequence-dependent logic reads a wrong length value from preprocessed records.

Versions

pytorch-forecasting: 1.7.0
Python: 3.12.x
PyTorch: 2.x
OS: Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Needs triage & validation

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions