[<Library component: Models|Core|etc...>] How to align TimeGPT evaluation with rolling window strategies used in other time series models (e.g., TimeLLM)?

### Description

I would like to compare the forecasting performance of TimeGPT with other time series forecasting methods (e.g., TimeLLM). However, these methods adopt different evaluation strategies compared to TimeGPT.

For example, with a test set length of 1000, an input length of 96, and a prediction horizon of 96:

    The typical evaluation procedure for other models is as follows: first, feed rows 0–95 to predict 96–191; then feed rows 1–96 to predict 97–192; and so on, until the entire test set is covered.

    Under this setup, metrics such as MAE are computed over all predictions. In other words, the output shape of such models is:
    (test set length−input length−prediction length, prediction length)
    (test set length−input length−prediction length, prediction length)

    and evaluation metrics are then calculated on this output.

Therefore, to ensure a fair comparison, TimeGPT’s forecasting process needs to be adjusted so that it generates predictions under a rolling window evaluation scheme, aligning both the output shape and the metric computation with the above methods.

How can this be achieved with TimeGPT?

### Use case

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[<Library component: Models|Core|etc...>] How to align TimeGPT evaluation with rolling window strategies used in other time series models (e.g., TimeLLM)? #668

Description

Use case

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[<Library component: Models|Core|etc...>] How to align TimeGPT evaluation with rolling window strategies used in other time series models (e.g., TimeLLM)? #668

Description

Description

Use case

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions