-
Notifications
You must be signed in to change notification settings - Fork 500
Open
Description
I'm having trouble reproducing the exact forecasting MSE / MAE results on my machine. I'm running on pytorch version '2.9.0+cu128' on a RTX 5090 machine.
These are the details of the supervised training results:
Args in experiment:
Namespace(is_training=1, train_only=False, model_id='ETTh1_336_96', model='DLinear', data='ETTh1', root_path='./dataset/', data_path='ETTh1.csv', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=336, label_len=48, pred_len=96, individual=False, embed_type=0, enc_in=7, dec_in=7, c_out=7, d_model=512, n_heads=8, e_layers=2, d_layers=1, d_ff=2048, moving_avg=25, factor=1, distil=True, dropout=0.05, embed='timeF', activation='gelu', output_attention=False, do_predict=False, num_workers=10, itr=1, train_epochs=10, batch_size=32, patience=3, learning_rate=0.005, des='Exp', loss='mse', lradj='type1', use_amp=False, use_gpu=True, gpu=0, use_multi_gpu=False, devices='0,1,2,3', test_flop=False)
Use GPU: cuda:0
>>>>>>>start training : ETTh1_336_96_DLinear_ETTh1_ftM_sl336_ll48_pl96_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue_Exp_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8209
val 2785
test 2785
iters: 100, epoch: 1 | loss: 0.3399413
speed: 0.0067s/iter; left time: 16.6099s
iters: 200, epoch: 1 | loss: 0.3660883
speed: 0.0011s/iter; left time: 2.5808s
Epoch: 1 cost time: 0.4904322624206543
Epoch: 1, Steps: 256 | Train Loss: 0.4029245 Vali Loss: 0.7159958 Test Loss: 0.4223829
Validation loss decreased (inf --> 0.715996). Saving model ...
Updating learning rate to 0.005
iters: 100, epoch: 2 | loss: 0.4710605
speed: 0.0072s/iter; left time: 15.9437s
iters: 200, epoch: 2 | loss: 0.4261848
speed: 0.0011s/iter; left time: 2.3564s
Epoch: 2 cost time: 0.4115111827850342
Epoch: 2, Steps: 256 | Train Loss: 0.3777461 Vali Loss: 0.7026909 Test Loss: 0.4617932
Validation loss decreased (0.715996 --> 0.702691). Saving model ...
Updating learning rate to 0.0025
iters: 100, epoch: 3 | loss: 0.4502370
speed: 0.0079s/iter; left time: 15.3003s
iters: 200, epoch: 3 | loss: 0.3483346
speed: 0.0011s/iter; left time: 1.9859s
Epoch: 3 cost time: 0.429640531539917
Epoch: 3, Steps: 256 | Train Loss: 0.3555608 Vali Loss: 0.7140683 Test Loss: 0.3948340
EarlyStopping counter: 1 out of 3
Updating learning rate to 0.00125
iters: 100, epoch: 4 | loss: 0.3529106
speed: 0.0079s/iter; left time: 13.3660s
iters: 200, epoch: 4 | loss: 0.4078414
speed: 0.0011s/iter; left time: 1.7850s
Epoch: 4 cost time: 0.44702672958374023
Epoch: 4, Steps: 256 | Train Loss: 0.3421529 Vali Loss: 0.6486187 Test Loss: 0.3856722
Validation loss decreased (0.702691 --> 0.648619). Saving model ...
Updating learning rate to 0.000625
iters: 100, epoch: 5 | loss: 0.3692797
speed: 0.0075s/iter; left time: 10.8258s
iters: 200, epoch: 5 | loss: 0.3427099
speed: 0.0011s/iter; left time: 1.5080s
Epoch: 5 cost time: 0.41738319396972656
Epoch: 5, Steps: 256 | Train Loss: 0.3369668 Vali Loss: 0.6703700 Test Loss: 0.3762412
EarlyStopping counter: 1 out of 3
Updating learning rate to 0.0003125
iters: 100, epoch: 6 | loss: 0.3457166
speed: 0.0075s/iter; left time: 8.8783s
iters: 200, epoch: 6 | loss: 0.3241579
speed: 0.0011s/iter; left time: 1.2278s
Epoch: 6 cost time: 0.42697834968566895
Epoch: 6, Steps: 256 | Train Loss: 0.3338273 Vali Loss: 0.6666238 Test Loss: 0.3734278
EarlyStopping counter: 2 out of 3
Updating learning rate to 0.00015625
iters: 100, epoch: 7 | loss: 0.3143586
speed: 0.0077s/iter; left time: 7.1401s
iters: 200, epoch: 7 | loss: 0.3235159
speed: 0.0011s/iter; left time: 0.9038s
Epoch: 7 cost time: 0.43926358222961426
Epoch: 7, Steps: 256 | Train Loss: 0.3320066 Vali Loss: 0.6627674 Test Loss: 0.3719860
EarlyStopping counter: 3 out of 3
Early stopping
>>>>>>>testing : ETTh1_336_96_DLinear_ETTh1_ftM_sl336_ll48_pl96_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue_Exp_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 2785
mse:0.3841443955898285, mae:0.40471312403678894
Args in experiment:
Namespace(is_training=1, train_only=False, model_id='ETTh1_336_192', model='DLinear', data='ETTh1', root_path='./dataset/', data_path='ETTh1.csv', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=336, label_len=48, pred_len=192, individual=False, embed_type=0, enc_in=7, dec_in=7, c_out=7, d_model=512, n_heads=8, e_layers=2, d_layers=1, d_ff=2048, moving_avg=25, factor=1, distil=True, dropout=0.05, embed='timeF', activation='gelu', output_attention=False, do_predict=False, num_workers=10, itr=1, train_epochs=10, batch_size=32, patience=3, learning_rate=0.005, des='Exp', loss='mse', lradj='type1', use_amp=False, use_gpu=True, gpu=0, use_multi_gpu=False, devices='0,1,2,3', test_flop=False)
Use GPU: cuda:0
>>>>>>>start training : ETTh1_336_192_DLinear_ETTh1_ftM_sl336_ll48_pl192_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue_Exp_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8113
val 2689
test 2689
iters: 100, epoch: 1 | loss: 0.4030632
speed: 0.0069s/iter; left time: 16.7170s
iters: 200, epoch: 1 | loss: 0.3989787
speed: 0.0011s/iter; left time: 2.5751s
Epoch: 1 cost time: 0.4998776912689209
Epoch: 1, Steps: 253 | Train Loss: 0.4550423 Vali Loss: 0.9690023 Test Loss: 0.4644403
Validation loss decreased (inf --> 0.969002). Saving model ...
Updating learning rate to 0.005
iters: 100, epoch: 2 | loss: 0.3955494
speed: 0.0076s/iter; left time: 16.5265s
iters: 200, epoch: 2 | loss: 0.4618148
speed: 0.0011s/iter; left time: 2.3185s
Epoch: 2 cost time: 0.44080448150634766
Epoch: 2, Steps: 253 | Train Loss: 0.4368211 Vali Loss: 0.9542814 Test Loss: 0.4633306
Validation loss decreased (0.969002 --> 0.954281). Saving model ...
Updating learning rate to 0.0025
iters: 100, epoch: 3 | loss: 0.3942771
speed: 0.0081s/iter; left time: 15.6748s
iters: 200, epoch: 3 | loss: 0.3918642
speed: 0.0012s/iter; left time: 2.1555s
Epoch: 3 cost time: 0.44551682472229004
Epoch: 3, Steps: 253 | Train Loss: 0.4110306 Vali Loss: 0.8765706 Test Loss: 0.4461168
Validation loss decreased (0.954281 --> 0.876571). Saving model ...
Updating learning rate to 0.00125
iters: 100, epoch: 4 | loss: 0.4308404
speed: 0.0079s/iter; left time: 13.1649s
iters: 200, epoch: 4 | loss: 0.4194396
speed: 0.0012s/iter; left time: 1.8090s
Epoch: 4 cost time: 0.43269872665405273
Epoch: 4, Steps: 253 | Train Loss: 0.3971279 Vali Loss: 0.9650614 Test Loss: 0.4192918
EarlyStopping counter: 1 out of 3
Updating learning rate to 0.000625
iters: 100, epoch: 5 | loss: 0.3640448
speed: 0.0083s/iter; left time: 11.7415s
iters: 200, epoch: 5 | loss: 0.3823794
speed: 0.0011s/iter; left time: 1.5164s
Epoch: 5 cost time: 0.4396634101867676
Epoch: 5, Steps: 253 | Train Loss: 0.3900709 Vali Loss: 0.8844976 Test Loss: 0.4103249
EarlyStopping counter: 2 out of 3
Updating learning rate to 0.0003125
iters: 100, epoch: 6 | loss: 0.3405749
speed: 0.0078s/iter; left time: 9.1243s
iters: 200, epoch: 6 | loss: 0.3576510
speed: 0.0011s/iter; left time: 1.1756s
Epoch: 6 cost time: 0.4277379512786865
Epoch: 6, Steps: 253 | Train Loss: 0.3865888 Vali Loss: 0.8965758 Test Loss: 0.4061241
EarlyStopping counter: 3 out of 3
Early stopping
>>>>>>>testing : ETTh1_336_192_DLinear_ETTh1_ftM_sl336_ll48_pl192_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue_Exp_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 2689
mse:0.4434865117073059, mae:0.449925035238266
Args in experiment:
Namespace(is_training=1, train_only=False, model_id='ETTh1_336_336', model='DLinear', data='ETTh1', root_path='./dataset/', data_path='ETTh1.csv', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=336, label_len=48, pred_len=336, individual=False, embed_type=0, enc_in=7, dec_in=7, c_out=7, d_model=512, n_heads=8, e_layers=2, d_layers=1, d_ff=2048, moving_avg=25, factor=1, distil=True, dropout=0.05, embed='timeF', activation='gelu', output_attention=False, do_predict=False, num_workers=10, itr=1, train_epochs=10, batch_size=32, patience=3, learning_rate=0.005, des='Exp', loss='mse', lradj='type1', use_amp=False, use_gpu=True, gpu=0, use_multi_gpu=False, devices='0,1,2,3', test_flop=False)
Use GPU: cuda:0
>>>>>>>start training : ETTh1_336_336_DLinear_ETTh1_ftM_sl336_ll48_pl336_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue_Exp_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 7969
val 2545
test 2545
iters: 100, epoch: 1 | loss: 0.5185504
speed: 0.0070s/iter; left time: 16.7609s
iters: 200, epoch: 1 | loss: 0.4845584
speed: 0.0012s/iter; left time: 2.7602s
Epoch: 1 cost time: 0.5235846042633057
Epoch: 1, Steps: 249 | Train Loss: 0.5078838 Vali Loss: 1.1275409 Test Loss: 0.4939544
Validation loss decreased (inf --> 1.127541). Saving model ...
Updating learning rate to 0.005
iters: 100, epoch: 2 | loss: 0.4619045
speed: 0.0074s/iter; left time: 15.9251s
iters: 200, epoch: 2 | loss: 0.5337794
speed: 0.0012s/iter; left time: 2.3671s
Epoch: 2 cost time: 0.4245939254760742
Epoch: 2, Steps: 249 | Train Loss: 0.4874056 Vali Loss: 1.0683818 Test Loss: 0.5289848
Validation loss decreased (1.127541 --> 1.068382). Saving model ...
Updating learning rate to 0.0025
iters: 100, epoch: 3 | loss: 0.4296724
speed: 0.0076s/iter; left time: 14.4112s
iters: 200, epoch: 3 | loss: 0.4573922
speed: 0.0012s/iter; left time: 2.1116s
Epoch: 3 cost time: 0.43151092529296875
Epoch: 3, Steps: 249 | Train Loss: 0.4611091 Vali Loss: 1.2806047 Test Loss: 0.4659030
EarlyStopping counter: 1 out of 3
Updating learning rate to 0.00125
iters: 100, epoch: 4 | loss: 0.4402729
speed: 0.0079s/iter; left time: 12.9769s
iters: 200, epoch: 4 | loss: 0.5046263
speed: 0.0012s/iter; left time: 1.8594s
Epoch: 4 cost time: 0.43999457359313965
Epoch: 4, Steps: 249 | Train Loss: 0.4484481 Vali Loss: 1.0534259 Test Loss: 0.4497196
Validation loss decreased (1.068382 --> 1.053426). Saving model ...
Updating learning rate to 0.000625
iters: 100, epoch: 5 | loss: 0.4836571
speed: 0.0079s/iter; left time: 11.0167s
iters: 200, epoch: 5 | loss: 0.3926441
speed: 0.0012s/iter; left time: 1.6136s
Epoch: 5 cost time: 0.4381873607635498
Epoch: 5, Steps: 249 | Train Loss: 0.4405276 Vali Loss: 1.0723035 Test Loss: 0.4386732
EarlyStopping counter: 1 out of 3
Updating learning rate to 0.0003125
iters: 100, epoch: 6 | loss: 0.4520393
speed: 0.0081s/iter; left time: 9.2720s
iters: 200, epoch: 6 | loss: 0.4383180
speed: 0.0012s/iter; left time: 1.2263s
Epoch: 6 cost time: 0.44292569160461426
Epoch: 6, Steps: 249 | Train Loss: 0.4367764 Vali Loss: 1.0381850 Test Loss: 0.4469434
Validation loss decreased (1.053426 --> 1.038185). Saving model ...
Updating learning rate to 0.00015625
iters: 100, epoch: 7 | loss: 0.5690084
speed: 0.0075s/iter; left time: 6.7392s
iters: 200, epoch: 7 | loss: 0.4597495
speed: 0.0012s/iter; left time: 0.9326s
Epoch: 7 cost time: 0.4305448532104492
Epoch: 7, Steps: 249 | Train Loss: 0.4349577 Vali Loss: 1.0773293 Test Loss: 0.4334371
EarlyStopping counter: 1 out of 3
Updating learning rate to 7.8125e-05
iters: 100, epoch: 8 | loss: 0.4200609
speed: 0.0078s/iter; left time: 5.0615s
iters: 200, epoch: 8 | loss: 0.4240857
speed: 0.0012s/iter; left time: 0.6316s
Epoch: 8 cost time: 0.4108271598815918
Epoch: 8, Steps: 249 | Train Loss: 0.4337557 Vali Loss: 1.0759537 Test Loss: 0.4318466
EarlyStopping counter: 2 out of 3
Updating learning rate to 3.90625e-05
iters: 100, epoch: 9 | loss: 0.4291583
speed: 0.0079s/iter; left time: 3.1685s
iters: 200, epoch: 9 | loss: 0.4364204
speed: 0.0012s/iter; left time: 0.3636s
Epoch: 9 cost time: 0.4447150230407715
Epoch: 9, Steps: 249 | Train Loss: 0.4331892 Vali Loss: 1.0747591 Test Loss: 0.4318236
EarlyStopping counter: 3 out of 3
Early stopping
>>>>>>>testing : ETTh1_336_336_DLinear_ETTh1_ftM_sl336_ll48_pl336_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue_Exp_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 2545
mse:0.4469250440597534, mae:0.4483712911605835
Args in experiment:
Namespace(is_training=1, train_only=False, model_id='ETTh1_336_720', model='DLinear', data='ETTh1', root_path='./dataset/', data_path='ETTh1.csv', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=336, label_len=48, pred_len=720, individual=False, embed_type=0, enc_in=7, dec_in=7, c_out=7, d_model=512, n_heads=8, e_layers=2, d_layers=1, d_ff=2048, moving_avg=25, factor=1, distil=True, dropout=0.05, embed='timeF', activation='gelu', output_attention=False, do_predict=False, num_workers=10, itr=1, train_epochs=10, batch_size=32, patience=3, learning_rate=0.005, des='Exp', loss='mse', lradj='type1', use_amp=False, use_gpu=True, gpu=0, use_multi_gpu=False, devices='0,1,2,3', test_flop=False)
Use GPU: cuda:0
>>>>>>>start training : ETTh1_336_720_DLinear_ETTh1_ftM_sl336_ll48_pl720_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue_Exp_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 7585
val 2161
test 2161
iters: 100, epoch: 1 | loss: 0.6062934
speed: 0.0071s/iter; left time: 16.1237s
iters: 200, epoch: 1 | loss: 0.6095265
speed: 0.0013s/iter; left time: 2.8664s
Epoch: 1 cost time: 0.5387582778930664
Epoch: 1, Steps: 237 | Train Loss: 0.5910378 Vali Loss: 1.2456728 Test Loss: 0.6004225
Validation loss decreased (inf --> 1.245673). Saving model ...
Updating learning rate to 0.005
iters: 100, epoch: 2 | loss: 0.6238616
speed: 0.0079s/iter; left time: 16.1546s
iters: 200, epoch: 2 | loss: 0.5666766
speed: 0.0013s/iter; left time: 2.5445s
Epoch: 2 cost time: 0.4538290500640869
Epoch: 2, Steps: 237 | Train Loss: 0.5705364 Vali Loss: 1.2542794 Test Loss: 0.5699026
EarlyStopping counter: 1 out of 3
Updating learning rate to 0.0025
iters: 100, epoch: 3 | loss: 0.5516502
speed: 0.0079s/iter; left time: 14.1166s
iters: 200, epoch: 3 | loss: 0.5464033
speed: 0.0014s/iter; left time: 2.3400s
Epoch: 3 cost time: 0.4545300006866455
Epoch: 3, Steps: 237 | Train Loss: 0.5410811 Vali Loss: 1.2246164 Test Loss: 0.5051882
Validation loss decreased (1.245673 --> 1.224616). Saving model ...
Updating learning rate to 0.00125
iters: 100, epoch: 4 | loss: 0.5571614
speed: 0.0076s/iter; left time: 11.7834s
iters: 200, epoch: 4 | loss: 0.5702764
speed: 0.0014s/iter; left time: 2.0257s
Epoch: 4 cost time: 0.4412670135498047
Epoch: 4, Steps: 237 | Train Loss: 0.5253573 Vali Loss: 1.2611449 Test Loss: 0.4813100
EarlyStopping counter: 1 out of 3
Updating learning rate to 0.000625
iters: 100, epoch: 5 | loss: 0.5861177
speed: 0.0080s/iter; left time: 10.5206s
iters: 200, epoch: 5 | loss: 0.5146908
speed: 0.0013s/iter; left time: 1.6404s
Epoch: 5 cost time: 0.464583158493042
Epoch: 5, Steps: 237 | Train Loss: 0.5185215 Vali Loss: 1.2298431 Test Loss: 0.4759401
EarlyStopping counter: 2 out of 3
Updating learning rate to 0.0003125
iters: 100, epoch: 6 | loss: 0.4870183
speed: 0.0076s/iter; left time: 8.2522s
iters: 200, epoch: 6 | loss: 0.5146449
speed: 0.0013s/iter; left time: 1.3303s
Epoch: 6 cost time: 0.4479823112487793
Epoch: 6, Steps: 237 | Train Loss: 0.5148075 Vali Loss: 1.2609578 Test Loss: 0.4583052
EarlyStopping counter: 3 out of 3
Early stopping
>>>>>>>testing : ETTh1_336_720_DLinear_ETTh1_ftM_sl336_ll48_pl720_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue_Exp_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 2161
mse:0.5042446851730347, mae:0.5145779848098755
All of the MSE and MAE results are consistently worse than what the benchmarks have mentioned.
Any feedback is appreciated!
Metadata
Metadata
Assignees
Labels
No labels