| Name | Domain | Granularity | Variates | Clients (max) | Samples | CV (mean±std) | URL |
|---|---|---|---|---|---|---|---|
| BaseStation5G | Communication | 2 minutes | 11 | 3 | 9_004±5_018 | Github | |
| BeijingAirQuality | Environment | 1 hour | 11 | 12 | 31_847±981 | 0.93±0.03 | UCI |
| CitiesILI | Healthcare | 1 week | 1 | 122 | Github | ||
| COVID19Cases | Healthcare | 1 day | 10 | 55 | Github | ||
| CryptoDataDownloadDay | Economic | 1 day | 4 | CDD | |||
| CryptoDataDownloadHour | Economic | 1 hour | 4 | CDD | |||
| CryptoDataDownloadMinute | Economic | 1 minute | 4 | CDD | |||
| ETTh1 | Energy | 1 hour | 7 | 1 | 14_400±0 | 0.74±0.29 | Github |
| ETTh2 | Energy | 1 hour | 7 | 1 | 14_400±0 | 0.74±0.29 | Github |
| ETDatasetHour | Energy | 1 hour | 7 | 2 | 14_400±0 | 0.74±0.29 | Github |
| ETTm1 | Energy | 15 minutes | 7 | 1 | 57_600±0 | Github | |
| ETTm2 | Energy | 15 minutes | 7 | 1 | 57_600±0 | Github | |
| ETDatasetMinute | Energy | 15 minutes | 7 | 2 | 57_600±0 | Github | |
| Electricity | Energy | 15 minutes | 1 | 321 | 26_304±0 | 0.41±0.28 | Github |
| ElectricityLoadDiagrams | Energy | 15 minutes | 1 | 370 | 140_256±0 | UCI | |
| ExchangeRate | Economic | 1 day | 1 | 8 | 7_588±0 | Github | |
| METRLA | Traffic | 5 minutes | 1 | 207 | 34_272±0 | Github | |
| MekongSalinity | Environment | 1 day | 1 | 38 | 1_500±953 | 0.90±0.40 | Springer |
| PeMS03 | Traffic | 5 minutes | 1 | 358 | 26_208±0 | Github | |
| PeMS04 | Traffic | 5 minutes | 1 | 307 | 16_992±0 | Github | |
| PeMS07 | Traffic | 5 minutes | 1 | 883 | 28_224±0 | Github | |
| PeMS08 | Traffic | 5 minutes | 3 | 170 | 17_856±0 | Github | |
| PeMSBAY | Traffic | 5 minutes | 1 | 325 | 52_116±0 | Github | |
| PeMSSF | Traffic | 10 minutes | 1 | 963 | 63_345±0 | UCI | |
| SolarCSGREGFC | Energy | 15 minutes | 5 | 8 | 63_852±16_443 | Github | |
| SolarEnergy | Energy | 1 hour | 1 | 137 | 52_560±0 | 1.46±0.04 | Github |
| StatesILI | Healthcare | 1 week | 1 | 37 | Github | ||
| TetouanPowerConsumption | Energy | 10 minutes | 1 | 3 | 52_416±0 | UCI | |
| Traffic | Traffic | 1 hour | 1 | 862 | 17_544±0 | 0.81±0.22 | Github |
| TinyWeather5K | Environment | 1 hour | 5 | 200 | 87_648±0 | 0.57±0.22 | Github |
| Weather5K | Environment | 1 hour | 5 | 5_672 | Github | ||
| WindCSGREGFC | Energy | 15 minutes | 10 | 6 | 70_146±66 | Github |
Note: Number of clients will be decided after splitting the data since clients with insufficient data (cannot form at least 10 samples) will be discarded. Clients (max) is the maximum number of clients possible.
To use a dataset in your experiment (default scenario), specify the dataset name with the --dataset argument when running your training or analysis scripts.
Example:
python main.py --dataset=ETTh1You can also set other related arguments such as --input_len, --output_len, and --batch_size to control the window size, forecast horizon, and batch size for your experiment.
Example:
python main.py --dataset=SolarEnergy --input_len=168 --output_len=24 --batch_size=64All clients will use the same configuration as specified above.
Refer to the table above for available dataset names and their details.
Different clients from the same dataset may have different configurations (e.g., different output lengths or channels).
Examples:
- PeMS08OutVar1: 75% of clients have
output_len=96, 25% haveoutput_len=720.See:python main.py --dataset=PeMS08OutVar1
data_factory/PeMS08.py/PeMS08OutVar1 - PeMS08OutVar2: 50% of clients have
output_len=96, 50% haveoutput_len=720.See:python main.py --dataset=PeMS08OutVar2
data_factory/PeMS08.py/PeMS08OutVar2 - PeMS08OutVar3: 25% of clients have
output_len=96, 75% haveoutput_len=720.See:python main.py --dataset=PeMS08OutVar3
data_factory/PeMS08.py/PeMS08OutVar3 - Customized2: 50% of clients have 1 output channel and 1 input channel, 50% have 7 output channels and 7 input channels.
See:
python main.py --dataset=Customized2
data_factory/Customized.py/Customized2
Merge multiple datasets, each client belongs to one dataset. Useful for multi-task learning or federated learning across different domains.
Example:
- Customized1: Merges ETDatasetHour (2 clients), TetouanPowerConsumption (3 clients), SolarEnergy (137 clients), Electricity (321 clients) for a total of 463 clients.
See:
python main.py --dataset=Customized1
data_factory/Customized.py/Customized1
Merge multiple datasets, each with potentially different configurations per dataset or client.
Example:
- Customized3: Merges ETDatasetHour (2 clients,
output_len=96), TetouanPowerConsumption (3 clients,output_len=192).See:python main.py --dataset=Customized3
data_factory/Customized.py/Customized3
Note:
- For all scenarios, you can further control client configuration using arguments like
--input_len,--output_len, and--batch_size. - Refer to the dataset table above for available dataset names and their details.
Processed datasets (.npz files) automatically include temporal time mark features (x_mark, y_mark) alongside the input (x) and target (y) arrays. These are integer-valued calendar features extracted from the date column, ordered as [month, day_of_month, day_of_week, hour, minute] (not all frequencies use all columns).
Models that accept time marks (e.g., Transformer) use them automatically. Models that do not accept marks simply ignore them — the training pipeline only passes marks when the model's forward signature accepts x_mark/y_mark keyword arguments.
The number of mark columns depends on the dataset granularity:
| Granularity | Columns | Count | |
|-------------|---------|------- | |
| s (second) | month, day, weekday, hour, minute, second | 6 | |
| t (minute) | month, day, weekday, hour, minute | 5 | |
| h (hour) | month, day, weekday, hour | 4 | |
| d (day) | month, day, weekday | 3 | |
| w (week) | month, day, week_of_year | 3 | |
| mo (month) | month | 1 | |
| q (quarter) | month | 1 | |