Skip to content

[<Library component: Models|Core|etc...>] How to align TimeGPT evaluation with rolling window strategies used in other time series models (e.g., TimeLLM)? #668

@fine68

Description

@fine68

Description

I would like to compare the forecasting performance of TimeGPT with other time series forecasting methods (e.g., TimeLLM). However, these methods adopt different evaluation strategies compared to TimeGPT.

For example, with a test set length of 1000, an input length of 96, and a prediction horizon of 96:

The typical evaluation procedure for other models is as follows: first, feed rows 0–95 to predict 96–191; then feed rows 1–96 to predict 97–192; and so on, until the entire test set is covered.

Under this setup, metrics such as MAE are computed over all predictions. In other words, the output shape of such models is:
(test set length−input length−prediction length, prediction length)
(test set length−input length−prediction length, prediction length)

and evaluation metrics are then calculated on this output.

Therefore, to ensure a fair comparison, TimeGPT’s forecasting process needs to be adjusted so that it generates predictions under a rolling window evaluation scheme, aligning both the output shape and the metric computation with the above methods.

How can this be achieved with TimeGPT?

Use case

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions