-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathtraining_log.txt
More file actions
9708 lines (9708 loc) · 607 KB
/
training_log.txt
File metadata and controls
9708 lines (9708 loc) · 607 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Epoch [1/3], Step [1/3236], Loss: 9.2144, Perplexity: 10040.4040
Epoch [1/3], Step [2/3236], Loss: 9.0749, Perplexity: 8733.1513
Epoch [1/3], Step [3/3236], Loss: 8.9460, Perplexity: 7677.3642
Epoch [1/3], Step [4/3236], Loss: 8.6428, Perplexity: 5669.0102
Epoch [1/3], Step [5/3236], Loss: 8.3443, Perplexity: 4206.3230
Epoch [1/3], Step [6/3236], Loss: 7.9907, Perplexity: 2953.3587
Epoch [1/3], Step [7/3236], Loss: 7.1574, Perplexity: 1283.5352
Epoch [1/3], Step [8/3236], Loss: 6.4809, Perplexity: 652.5706
Epoch [1/3], Step [9/3236], Loss: 5.9538, Perplexity: 385.1976
Epoch [1/3], Step [10/3236], Loss: 5.8868, Perplexity: 360.2555
Epoch [1/3], Step [11/3236], Loss: 5.3848, Perplexity: 218.0726
Epoch [1/3], Step [12/3236], Loss: 4.9136, Perplexity: 136.1244
Epoch [1/3], Step [13/3236], Loss: 4.8422, Perplexity: 126.7505
Epoch [1/3], Step [14/3236], Loss: 4.6083, Perplexity: 100.3184
Epoch [1/3], Step [15/3236], Loss: 4.9612, Perplexity: 142.7584
Epoch [1/3], Step [16/3236], Loss: 4.7533, Perplexity: 115.9708
Epoch [1/3], Step [17/3236], Loss: 4.8877, Perplexity: 132.6486
Epoch [1/3], Step [18/3236], Loss: 4.7136, Perplexity: 111.4498
Epoch [1/3], Step [19/3236], Loss: 4.6320, Perplexity: 102.7194
Epoch [1/3], Step [20/3236], Loss: 4.7115, Perplexity: 111.2153
Epoch [1/3], Step [21/3236], Loss: 4.8525, Perplexity: 128.0661
Epoch [1/3], Step [22/3236], Loss: 4.6818, Perplexity: 107.9681
Epoch [1/3], Step [23/3236], Loss: 4.6959, Perplexity: 109.5009
Epoch [1/3], Step [24/3236], Loss: 4.6888, Perplexity: 108.7198
Epoch [1/3], Step [25/3236], Loss: 4.5513, Perplexity: 94.7526
Epoch [1/3], Step [26/3236], Loss: 4.6209, Perplexity: 101.5885
Epoch [1/3], Step [27/3236], Loss: 4.5973, Perplexity: 99.2129
Epoch [1/3], Step [28/3236], Loss: 4.5305, Perplexity: 92.8091
Epoch [1/3], Step [29/3236], Loss: 4.4226, Perplexity: 83.3116
Epoch [1/3], Step [30/3236], Loss: 4.5943, Perplexity: 98.9212
Epoch [1/3], Step [31/3236], Loss: 4.4718, Perplexity: 87.5140
Epoch [1/3], Step [32/3236], Loss: 4.3476, Perplexity: 77.2901
Epoch [1/3], Step [33/3236], Loss: 4.3738, Perplexity: 79.3484
Epoch [1/3], Step [34/3236], Loss: 4.6394, Perplexity: 103.4810
Epoch [1/3], Step [35/3236], Loss: 4.5329, Perplexity: 93.0312
Epoch [1/3], Step [36/3236], Loss: 4.4231, Perplexity: 83.3579
Epoch [1/3], Step [37/3236], Loss: 4.2894, Perplexity: 72.9226
Epoch [1/3], Step [38/3236], Loss: 4.2611, Perplexity: 70.8914
Epoch [1/3], Step [39/3236], Loss: 4.5776, Perplexity: 97.2846
Epoch [1/3], Step [40/3236], Loss: 4.4685, Perplexity: 87.2234
Epoch [1/3], Step [41/3236], Loss: 4.2584, Perplexity: 70.6962
Epoch [1/3], Step [42/3236], Loss: 4.5630, Perplexity: 95.8734
Epoch [1/3], Step [43/3236], Loss: 4.4732, Perplexity: 87.6365
Epoch [1/3], Step [44/3236], Loss: 4.1703, Perplexity: 64.7353
Epoch [1/3], Step [45/3236], Loss: 4.2311, Perplexity: 68.7951
Epoch [1/3], Step [46/3236], Loss: 4.2895, Perplexity: 72.9295
Epoch [1/3], Step [47/3236], Loss: 4.1794, Perplexity: 65.3272
Epoch [1/3], Step [48/3236], Loss: 4.2113, Perplexity: 67.4461
Epoch [1/3], Step [49/3236], Loss: 4.1240, Perplexity: 61.8041
Epoch [1/3], Step [50/3236], Loss: 4.1682, Perplexity: 64.5976
Epoch [1/3], Step [51/3236], Loss: 4.1186, Perplexity: 61.4725
Epoch [1/3], Step [52/3236], Loss: 3.9952, Perplexity: 54.3373
Epoch [1/3], Step [53/3236], Loss: 4.0712, Perplexity: 58.6296
Epoch [1/3], Step [54/3236], Loss: 4.0148, Perplexity: 55.4104
Epoch [1/3], Step [55/3236], Loss: 4.2004, Perplexity: 66.7104
Epoch [1/3], Step [56/3236], Loss: 4.0676, Perplexity: 58.4169
Epoch [1/3], Step [57/3236], Loss: 4.1441, Perplexity: 63.0618
Epoch [1/3], Step [58/3236], Loss: 4.0990, Perplexity: 60.2781
Epoch [1/3], Step [59/3236], Loss: 4.0529, Perplexity: 57.5668
Epoch [1/3], Step [60/3236], Loss: 3.9760, Perplexity: 53.3018
Epoch [1/3], Step [61/3236], Loss: 3.9575, Perplexity: 52.3269
Epoch [1/3], Step [62/3236], Loss: 3.9167, Perplexity: 50.2335
Epoch [1/3], Step [63/3236], Loss: 4.4786, Perplexity: 88.1125
Epoch [1/3], Step [64/3236], Loss: 3.9492, Perplexity: 51.8948
Epoch [1/3], Step [65/3236], Loss: 4.0766, Perplexity: 58.9428
Epoch [1/3], Step [66/3236], Loss: 3.9073, Perplexity: 49.7627
Epoch [1/3], Step [67/3236], Loss: 4.1543, Perplexity: 63.7065
Epoch [1/3], Step [68/3236], Loss: 4.3675, Perplexity: 78.8489
Epoch [1/3], Step [69/3236], Loss: 3.9577, Perplexity: 52.3360
Epoch [1/3], Step [70/3236], Loss: 4.7161, Perplexity: 111.7284
Epoch [1/3], Step [71/3236], Loss: 4.3637, Perplexity: 78.5489
Epoch [1/3], Step [72/3236], Loss: 3.8329, Perplexity: 46.1974
Epoch [1/3], Step [73/3236], Loss: 4.2216, Perplexity: 68.1402
Epoch [1/3], Step [74/3236], Loss: 3.8858, Perplexity: 48.7037
Epoch [1/3], Step [75/3236], Loss: 3.8730, Perplexity: 48.0882
Epoch [1/3], Step [76/3236], Loss: 4.1802, Perplexity: 65.3768
Epoch [1/3], Step [77/3236], Loss: 4.2231, Perplexity: 68.2429
Epoch [1/3], Step [78/3236], Loss: 3.7887, Perplexity: 44.1979
Epoch [1/3], Step [79/3236], Loss: 3.8778, Perplexity: 48.3187
Epoch [1/3], Step [80/3236], Loss: 3.8343, Perplexity: 46.2624
Epoch [1/3], Step [81/3236], Loss: 4.4780, Perplexity: 88.0578
Epoch [1/3], Step [82/3236], Loss: 4.0070, Perplexity: 54.9839
Epoch [1/3], Step [83/3236], Loss: 4.0234, Perplexity: 55.8881
Epoch [1/3], Step [84/3236], Loss: 3.8816, Perplexity: 48.5000
Epoch [1/3], Step [85/3236], Loss: 3.8712, Perplexity: 48.0010
Epoch [1/3], Step [86/3236], Loss: 3.9214, Perplexity: 50.4706
Epoch [1/3], Step [87/3236], Loss: 4.2314, Perplexity: 68.8117
Epoch [1/3], Step [88/3236], Loss: 3.9454, Perplexity: 51.6962
Epoch [1/3], Step [89/3236], Loss: 3.9543, Perplexity: 52.1584
Epoch [1/3], Step [90/3236], Loss: 4.4325, Perplexity: 84.1394
Epoch [1/3], Step [91/3236], Loss: 3.9119, Perplexity: 49.9930
Epoch [1/3], Step [92/3236], Loss: 3.7570, Perplexity: 42.8197
Epoch [1/3], Step [93/3236], Loss: 4.0386, Perplexity: 56.7445
Epoch [1/3], Step [94/3236], Loss: 4.2756, Perplexity: 71.9222
Epoch [1/3], Step [95/3236], Loss: 3.8527, Perplexity: 47.1191
Epoch [1/3], Step [96/3236], Loss: 3.7207, Perplexity: 41.2919
Epoch [1/3], Step [97/3236], Loss: 3.8031, Perplexity: 44.8421
Epoch [1/3], Step [98/3236], Loss: 3.7529, Perplexity: 42.6450
Epoch [1/3], Step [99/3236], Loss: 3.7289, Perplexity: 41.6319
Epoch [1/3], Step [100/3236], Loss: 3.7821, Perplexity: 43.9063
Epoch [1/3], Step [101/3236], Loss: 3.7128, Perplexity: 40.9678
Epoch [1/3], Step [102/3236], Loss: 4.0271, Perplexity: 56.0999
Epoch [1/3], Step [103/3236], Loss: 3.8403, Perplexity: 46.5391
Epoch [1/3], Step [104/3236], Loss: 3.6543, Perplexity: 38.6387
Epoch [1/3], Step [105/3236], Loss: 3.7586, Perplexity: 42.8886
Epoch [1/3], Step [106/3236], Loss: 3.8103, Perplexity: 45.1635
Epoch [1/3], Step [107/3236], Loss: 3.7083, Perplexity: 40.7857
Epoch [1/3], Step [108/3236], Loss: 3.7719, Perplexity: 43.4636
Epoch [1/3], Step [109/3236], Loss: 3.6059, Perplexity: 36.8160
Epoch [1/3], Step [110/3236], Loss: 3.6882, Perplexity: 39.9745
Epoch [1/3], Step [111/3236], Loss: 3.6378, Perplexity: 38.0075
Epoch [1/3], Step [112/3236], Loss: 3.5850, Perplexity: 36.0525
Epoch [1/3], Step [113/3236], Loss: 3.7130, Perplexity: 40.9772
Epoch [1/3], Step [114/3236], Loss: 3.4859, Perplexity: 32.6527
Epoch [1/3], Step [115/3236], Loss: 3.6309, Perplexity: 37.7461
Epoch [1/3], Step [116/3236], Loss: 3.7037, Perplexity: 40.5954
Epoch [1/3], Step [117/3236], Loss: 4.0766, Perplexity: 58.9425
Epoch [1/3], Step [118/3236], Loss: 3.9585, Perplexity: 52.3784
Epoch [1/3], Step [119/3236], Loss: 3.5665, Perplexity: 35.3938
Epoch [1/3], Step [120/3236], Loss: 3.6280, Perplexity: 37.6375
Epoch [1/3], Step [121/3236], Loss: 3.9982, Perplexity: 54.5021
Epoch [1/3], Step [122/3236], Loss: 4.0943, Perplexity: 59.9975
Epoch [1/3], Step [123/3236], Loss: 3.9583, Perplexity: 52.3681
Epoch [1/3], Step [124/3236], Loss: 4.0448, Perplexity: 57.0986
Epoch [1/3], Step [125/3236], Loss: 3.9834, Perplexity: 53.6979
Epoch [1/3], Step [126/3236], Loss: 3.7287, Perplexity: 41.6253
Epoch [1/3], Step [127/3236], Loss: 3.7670, Perplexity: 43.2497
Epoch [1/3], Step [128/3236], Loss: 3.4844, Perplexity: 32.6045
Epoch [1/3], Step [129/3236], Loss: 3.5223, Perplexity: 33.8609
Epoch [1/3], Step [130/3236], Loss: 4.0307, Perplexity: 56.2979
Epoch [1/3], Step [131/3236], Loss: 3.5808, Perplexity: 35.9036
Epoch [1/3], Step [132/3236], Loss: 3.6266, Perplexity: 37.5857
Epoch [1/3], Step [133/3236], Loss: 3.4861, Perplexity: 32.6599
Epoch [1/3], Step [134/3236], Loss: 3.5296, Perplexity: 34.1094
Epoch [1/3], Step [135/3236], Loss: 3.5157, Perplexity: 33.6387
Epoch [1/3], Step [136/3236], Loss: 3.5604, Perplexity: 35.1774
Epoch [1/3], Step [137/3236], Loss: 3.6003, Perplexity: 36.6097
Epoch [1/3], Step [138/3236], Loss: 3.5147, Perplexity: 33.6043
Epoch [1/3], Step [139/3236], Loss: 3.7963, Perplexity: 44.5353
Epoch [1/3], Step [140/3236], Loss: 3.9141, Perplexity: 50.1055
Epoch [1/3], Step [141/3236], Loss: 3.6076, Perplexity: 36.8783
Epoch [1/3], Step [142/3236], Loss: 3.7528, Perplexity: 42.6418
Epoch [1/3], Step [143/3236], Loss: 3.6830, Perplexity: 39.7646
Epoch [1/3], Step [144/3236], Loss: 3.3878, Perplexity: 29.6009
Epoch [1/3], Step [145/3236], Loss: 3.5352, Perplexity: 34.3012
Epoch [1/3], Step [146/3236], Loss: 3.5460, Perplexity: 34.6736
Epoch [1/3], Step [147/3236], Loss: 3.5945, Perplexity: 36.3967
Epoch [1/3], Step [148/3236], Loss: 3.4995, Perplexity: 33.0973
Epoch [1/3], Step [149/3236], Loss: 3.6518, Perplexity: 38.5432
Epoch [1/3], Step [150/3236], Loss: 3.5954, Perplexity: 36.4311
Epoch [1/3], Step [151/3236], Loss: 3.5727, Perplexity: 35.6116
Epoch [1/3], Step [152/3236], Loss: 3.3919, Perplexity: 29.7215
Epoch [1/3], Step [153/3236], Loss: 4.5780, Perplexity: 97.3179
Epoch [1/3], Step [154/3236], Loss: 3.4942, Perplexity: 32.9247
Epoch [1/3], Step [155/3236], Loss: 3.5427, Perplexity: 34.5608
Epoch [1/3], Step [156/3236], Loss: 3.5358, Perplexity: 34.3230
Epoch [1/3], Step [157/3236], Loss: 3.6044, Perplexity: 36.7588
Epoch [1/3], Step [158/3236], Loss: 3.7987, Perplexity: 44.6444
Epoch [1/3], Step [159/3236], Loss: 3.4512, Perplexity: 31.5386
Epoch [1/3], Step [160/3236], Loss: 3.6684, Perplexity: 39.1885
Epoch [1/3], Step [161/3236], Loss: 3.8124, Perplexity: 45.2581
Epoch [1/3], Step [162/3236], Loss: 3.4764, Perplexity: 32.3440
Epoch [1/3], Step [163/3236], Loss: 3.7062, Perplexity: 40.7006
Epoch [1/3], Step [164/3236], Loss: 3.7461, Perplexity: 42.3576
Epoch [1/3], Step [165/3236], Loss: 3.4150, Perplexity: 30.4165
Epoch [1/3], Step [166/3236], Loss: 3.5290, Perplexity: 34.0890
Epoch [1/3], Step [167/3236], Loss: 3.7644, Perplexity: 43.1373
Epoch [1/3], Step [168/3236], Loss: 3.3530, Perplexity: 28.5883
Epoch [1/3], Step [169/3236], Loss: 3.3789, Perplexity: 29.3381
Epoch [1/3], Step [170/3236], Loss: 3.4060, Perplexity: 30.1430
Epoch [1/3], Step [171/3236], Loss: 3.4327, Perplexity: 30.9613
Epoch [1/3], Step [172/3236], Loss: 3.4619, Perplexity: 31.8778
Epoch [1/3], Step [173/3236], Loss: 3.4771, Perplexity: 32.3668
Epoch [1/3], Step [174/3236], Loss: 3.4269, Perplexity: 30.7823
Epoch [1/3], Step [175/3236], Loss: 3.7034, Perplexity: 40.5843
Epoch [1/3], Step [176/3236], Loss: 3.7038, Perplexity: 40.6003
Epoch [1/3], Step [177/3236], Loss: 3.6711, Perplexity: 39.2946
Epoch [1/3], Step [178/3236], Loss: 3.3979, Perplexity: 29.9016
Epoch [1/3], Step [179/3236], Loss: 3.4701, Perplexity: 32.1388
Epoch [1/3], Step [180/3236], Loss: 3.8114, Perplexity: 45.2148
Epoch [1/3], Step [181/3236], Loss: 3.4860, Perplexity: 32.6537
Epoch [1/3], Step [182/3236], Loss: 3.3752, Perplexity: 29.2288
Epoch [1/3], Step [183/3236], Loss: 3.5353, Perplexity: 34.3061
Epoch [1/3], Step [184/3236], Loss: 3.6445, Perplexity: 38.2641
Epoch [1/3], Step [185/3236], Loss: 3.5016, Perplexity: 33.1695
Epoch [1/3], Step [186/3236], Loss: 3.5583, Perplexity: 35.1035
Epoch [1/3], Step [187/3236], Loss: 3.4021, Perplexity: 30.0261
Epoch [1/3], Step [188/3236], Loss: 3.3604, Perplexity: 28.8019
Epoch [1/3], Step [189/3236], Loss: 3.4864, Perplexity: 32.6684
Epoch [1/3], Step [190/3236], Loss: 3.4515, Perplexity: 31.5480
Epoch [1/3], Step [191/3236], Loss: 3.4871, Perplexity: 32.6910
Epoch [1/3], Step [192/3236], Loss: 3.4730, Perplexity: 32.2341
Epoch [1/3], Step [193/3236], Loss: 3.8259, Perplexity: 45.8728
Epoch [1/3], Step [194/3236], Loss: 3.9893, Perplexity: 54.0186
Epoch [1/3], Step [195/3236], Loss: 3.3780, Perplexity: 29.3119
Epoch [1/3], Step [196/3236], Loss: 3.3126, Perplexity: 27.4566
Epoch [1/3], Step [197/3236], Loss: 3.6121, Perplexity: 37.0446
Epoch [1/3], Step [198/3236], Loss: 3.3776, Perplexity: 29.3005
Epoch [1/3], Step [199/3236], Loss: 3.3654, Perplexity: 28.9460
Epoch [1/3], Step [200/3236], Loss: 3.3055, Perplexity: 27.2621
Epoch [1/3], Step [201/3236], Loss: 3.3539, Perplexity: 28.6133
Epoch [1/3], Step [202/3236], Loss: 3.3634, Perplexity: 28.8870
Epoch [1/3], Step [203/3236], Loss: 3.4495, Perplexity: 31.4831
Epoch [1/3], Step [204/3236], Loss: 3.3656, Perplexity: 28.9519
Epoch [1/3], Step [205/3236], Loss: 3.3617, Perplexity: 28.8385
Epoch [1/3], Step [206/3236], Loss: 3.4152, Perplexity: 30.4238
Epoch [1/3], Step [207/3236], Loss: 3.4379, Perplexity: 31.1206
Epoch [1/3], Step [208/3236], Loss: 3.5003, Perplexity: 33.1254
Epoch [1/3], Step [209/3236], Loss: 3.5962, Perplexity: 36.4602
Epoch [1/3], Step [210/3236], Loss: 3.7629, Perplexity: 43.0748
Epoch [1/3], Step [211/3236], Loss: 3.4288, Perplexity: 30.8397
Epoch [1/3], Step [212/3236], Loss: 3.9314, Perplexity: 50.9775
Epoch [1/3], Step [213/3236], Loss: 3.4733, Perplexity: 32.2426
Epoch [1/3], Step [214/3236], Loss: 3.4780, Perplexity: 32.3949
Epoch [1/3], Step [215/3236], Loss: 3.6452, Perplexity: 38.2916
Epoch [1/3], Step [216/3236], Loss: 3.3015, Perplexity: 27.1522
Epoch [1/3], Step [217/3236], Loss: 3.3612, Perplexity: 28.8238
Epoch [1/3], Step [218/3236], Loss: 3.2982, Perplexity: 27.0651
Epoch [1/3], Step [219/3236], Loss: 3.3955, Perplexity: 29.8308
Epoch [1/3], Step [220/3236], Loss: 3.2594, Perplexity: 26.0341
Epoch [1/3], Step [221/3236], Loss: 3.6317, Perplexity: 37.7753
Epoch [1/3], Step [222/3236], Loss: 3.4385, Perplexity: 31.1398
Epoch [1/3], Step [223/3236], Loss: 3.2867, Perplexity: 26.7535
Epoch [1/3], Step [224/3236], Loss: 3.4265, Perplexity: 30.7685
Epoch [1/3], Step [225/3236], Loss: 3.3623, Perplexity: 28.8546
Epoch [1/3], Step [226/3236], Loss: 3.3042, Perplexity: 27.2263
Epoch [1/3], Step [227/3236], Loss: 3.3541, Perplexity: 28.6205
Epoch [1/3], Step [228/3236], Loss: 3.2819, Perplexity: 26.6252
Epoch [1/3], Step [229/3236], Loss: 3.2790, Perplexity: 26.5502
Epoch [1/3], Step [230/3236], Loss: 3.3368, Perplexity: 28.1297
Epoch [1/3], Step [231/3236], Loss: 3.2783, Perplexity: 26.5302
Epoch [1/3], Step [232/3236], Loss: 3.7296, Perplexity: 41.6635
Epoch [1/3], Step [233/3236], Loss: 3.3562, Perplexity: 28.6808
Epoch [1/3], Step [234/3236], Loss: 3.4326, Perplexity: 30.9577
Epoch [1/3], Step [235/3236], Loss: 4.1085, Perplexity: 60.8563
Epoch [1/3], Step [236/3236], Loss: 3.2433, Perplexity: 25.6169
Epoch [1/3], Step [237/3236], Loss: 3.8383, Perplexity: 46.4479
Epoch [1/3], Step [238/3236], Loss: 3.1673, Perplexity: 23.7438
Epoch [1/3], Step [239/3236], Loss: 3.3794, Perplexity: 29.3543
Epoch [1/3], Step [240/3236], Loss: 3.4553, Perplexity: 31.6672
Epoch [1/3], Step [241/3236], Loss: 3.2684, Perplexity: 26.2689
Epoch [1/3], Step [242/3236], Loss: 3.2784, Perplexity: 26.5335
Epoch [1/3], Step [243/3236], Loss: 3.7481, Perplexity: 42.4398
Epoch [1/3], Step [244/3236], Loss: 3.2530, Perplexity: 25.8689
Epoch [1/3], Step [245/3236], Loss: 3.7366, Perplexity: 41.9532
Epoch [1/3], Step [246/3236], Loss: 4.0228, Perplexity: 55.8575
Epoch [1/3], Step [247/3236], Loss: 3.3300, Perplexity: 27.9383
Epoch [1/3], Step [248/3236], Loss: 3.6214, Perplexity: 37.3914
Epoch [1/3], Step [249/3236], Loss: 3.3933, Perplexity: 29.7642
Epoch [1/3], Step [250/3236], Loss: 3.2025, Perplexity: 24.5928
Epoch [1/3], Step [251/3236], Loss: 3.4519, Perplexity: 31.5603
Epoch [1/3], Step [252/3236], Loss: 3.6389, Perplexity: 38.0513
Epoch [1/3], Step [253/3236], Loss: 3.5237, Perplexity: 33.9105
Epoch [1/3], Step [254/3236], Loss: 3.2903, Perplexity: 26.8508
Epoch [1/3], Step [255/3236], Loss: 3.4075, Perplexity: 30.1909
Epoch [1/3], Step [256/3236], Loss: 3.3462, Perplexity: 28.3943
Epoch [1/3], Step [257/3236], Loss: 3.3055, Perplexity: 27.2620
Epoch [1/3], Step [258/3236], Loss: 3.1811, Perplexity: 24.0741
Epoch [1/3], Step [259/3236], Loss: 3.5827, Perplexity: 35.9708
Epoch [1/3], Step [260/3236], Loss: 3.4622, Perplexity: 31.8864
Epoch [1/3], Step [261/3236], Loss: 3.6387, Perplexity: 38.0432
Epoch [1/3], Step [262/3236], Loss: 3.2654, Perplexity: 26.1916
Epoch [1/3], Step [263/3236], Loss: 3.2555, Perplexity: 25.9322
Epoch [1/3], Step [264/3236], Loss: 3.3209, Perplexity: 27.6860
Epoch [1/3], Step [265/3236], Loss: 3.2002, Perplexity: 24.5381
Epoch [1/3], Step [266/3236], Loss: 3.2065, Perplexity: 24.6932
Epoch [1/3], Step [267/3236], Loss: 3.2661, Perplexity: 26.2097
Epoch [1/3], Step [268/3236], Loss: 3.2883, Perplexity: 26.7978
Epoch [1/3], Step [269/3236], Loss: 3.4933, Perplexity: 32.8954
Epoch [1/3], Step [270/3236], Loss: 3.2447, Perplexity: 25.6531
Epoch [1/3], Step [271/3236], Loss: 3.2347, Perplexity: 25.3991
Epoch [1/3], Step [272/3236], Loss: 3.5962, Perplexity: 36.4589
Epoch [1/3], Step [273/3236], Loss: 3.2527, Perplexity: 25.8613
Epoch [1/3], Step [274/3236], Loss: 3.1604, Perplexity: 23.5800
Epoch [1/3], Step [275/3236], Loss: 3.3759, Perplexity: 29.2508
Epoch [1/3], Step [276/3236], Loss: 3.8532, Perplexity: 47.1427
Epoch [1/3], Step [277/3236], Loss: 3.3592, Perplexity: 28.7676
Epoch [1/3], Step [278/3236], Loss: 3.3394, Perplexity: 28.2015
Epoch [1/3], Step [279/3236], Loss: 3.3509, Perplexity: 28.5293
Epoch [1/3], Step [280/3236], Loss: 3.3812, Perplexity: 29.4069
Epoch [1/3], Step [281/3236], Loss: 3.7181, Perplexity: 41.1869
Epoch [1/3], Step [282/3236], Loss: 3.2755, Perplexity: 26.4564
Epoch [1/3], Step [283/3236], Loss: 3.2063, Perplexity: 24.6879
Epoch [1/3], Step [284/3236], Loss: 3.3339, Perplexity: 28.0470
Epoch [1/3], Step [285/3236], Loss: 3.1410, Perplexity: 23.1273
Epoch [1/3], Step [286/3236], Loss: 3.4094, Perplexity: 30.2469
Epoch [1/3], Step [287/3236], Loss: 3.2510, Perplexity: 25.8160
Epoch [1/3], Step [288/3236], Loss: 3.1617, Perplexity: 23.6115
Epoch [1/3], Step [289/3236], Loss: 3.1905, Perplexity: 24.3010
Epoch [1/3], Step [290/3236], Loss: 3.2124, Perplexity: 24.8389
Epoch [1/3], Step [291/3236], Loss: 3.2942, Perplexity: 26.9562
Epoch [1/3], Step [292/3236], Loss: 3.2095, Perplexity: 24.7670
Epoch [1/3], Step [293/3236], Loss: 3.1909, Perplexity: 24.3103
Epoch [1/3], Step [294/3236], Loss: 3.3297, Perplexity: 27.9287
Epoch [1/3], Step [295/3236], Loss: 3.3606, Perplexity: 28.8060
Epoch [1/3], Step [296/3236], Loss: 3.3329, Perplexity: 28.0195
Epoch [1/3], Step [297/3236], Loss: 3.2488, Perplexity: 25.7584
Epoch [1/3], Step [298/3236], Loss: 3.2568, Perplexity: 25.9659
Epoch [1/3], Step [299/3236], Loss: 3.5675, Perplexity: 35.4275
Epoch [1/3], Step [300/3236], Loss: 3.2720, Perplexity: 26.3638
Epoch [1/3], Step [301/3236], Loss: 3.3026, Perplexity: 27.1844
Epoch [1/3], Step [302/3236], Loss: 3.7706, Perplexity: 43.4082
Epoch [1/3], Step [303/3236], Loss: 3.5368, Perplexity: 34.3577
Epoch [1/3], Step [304/3236], Loss: 3.0975, Perplexity: 22.1423
Epoch [1/3], Step [305/3236], Loss: 3.3013, Perplexity: 27.1483
Epoch [1/3], Step [306/3236], Loss: 3.3624, Perplexity: 28.8596
Epoch [1/3], Step [307/3236], Loss: 3.3444, Perplexity: 28.3432
Epoch [1/3], Step [308/3236], Loss: 3.3946, Perplexity: 29.8040
Epoch [1/3], Step [309/3236], Loss: 3.2577, Perplexity: 25.9893
Epoch [1/3], Step [310/3236], Loss: 3.1986, Perplexity: 24.4986
Epoch [1/3], Step [311/3236], Loss: 3.2229, Perplexity: 25.0997
Epoch [1/3], Step [312/3236], Loss: 3.1503, Perplexity: 23.3435
Epoch [1/3], Step [313/3236], Loss: 3.2053, Perplexity: 24.6640
Epoch [1/3], Step [314/3236], Loss: 3.2274, Perplexity: 25.2149
Epoch [1/3], Step [315/3236], Loss: 3.0884, Perplexity: 21.9413
Epoch [1/3], Step [316/3236], Loss: 3.2269, Perplexity: 25.2008
Epoch [1/3], Step [317/3236], Loss: 3.2265, Perplexity: 25.1925
Epoch [1/3], Step [318/3236], Loss: 3.1545, Perplexity: 23.4413
Epoch [1/3], Step [319/3236], Loss: 3.3220, Perplexity: 27.7150
Epoch [1/3], Step [320/3236], Loss: 3.0842, Perplexity: 21.8496
Epoch [1/3], Step [321/3236], Loss: 3.2750, Perplexity: 26.4444
Epoch [1/3], Step [322/3236], Loss: 3.3054, Perplexity: 27.2601
Epoch [1/3], Step [323/3236], Loss: 3.2631, Perplexity: 26.1296
Epoch [1/3], Step [324/3236], Loss: 3.4817, Perplexity: 32.5155
Epoch [1/3], Step [325/3236], Loss: 3.7754, Perplexity: 43.6159
Epoch [1/3], Step [326/3236], Loss: 3.2005, Perplexity: 24.5437
Epoch [1/3], Step [327/3236], Loss: 3.0688, Perplexity: 21.5160
Epoch [1/3], Step [328/3236], Loss: 3.0632, Perplexity: 21.3952
Epoch [1/3], Step [329/3236], Loss: 3.2061, Perplexity: 24.6830
Epoch [1/3], Step [330/3236], Loss: 3.3942, Perplexity: 29.7901
Epoch [1/3], Step [331/3236], Loss: 3.2851, Perplexity: 26.7111
Epoch [1/3], Step [332/3236], Loss: 3.1180, Perplexity: 22.6018
Epoch [1/3], Step [333/3236], Loss: 3.2393, Perplexity: 25.5168
Epoch [1/3], Step [334/3236], Loss: 3.2490, Perplexity: 25.7635
Epoch [1/3], Step [335/3236], Loss: 3.2992, Perplexity: 27.0904
Epoch [1/3], Step [336/3236], Loss: 3.1927, Perplexity: 24.3537
Epoch [1/3], Step [337/3236], Loss: 3.1478, Perplexity: 23.2842
Epoch [1/3], Step [338/3236], Loss: 3.1133, Perplexity: 22.4961
Epoch [1/3], Step [339/3236], Loss: 3.2692, Perplexity: 26.2899
Epoch [1/3], Step [340/3236], Loss: 3.2195, Perplexity: 25.0162
Epoch [1/3], Step [341/3236], Loss: 3.0406, Perplexity: 20.9188
Epoch [1/3], Step [342/3236], Loss: 3.0644, Perplexity: 21.4224
Epoch [1/3], Step [343/3236], Loss: 3.2306, Perplexity: 25.2953
Epoch [1/3], Step [344/3236], Loss: 3.4700, Perplexity: 32.1381
Epoch [1/3], Step [345/3236], Loss: 3.2565, Perplexity: 25.9587
Epoch [1/3], Step [346/3236], Loss: 3.0419, Perplexity: 20.9442
Epoch [1/3], Step [347/3236], Loss: 3.8194, Perplexity: 45.5765
Epoch [1/3], Step [348/3236], Loss: 3.3171, Perplexity: 27.5789
Epoch [1/3], Step [349/3236], Loss: 3.1972, Perplexity: 24.4642
Epoch [1/3], Step [350/3236], Loss: 3.1107, Perplexity: 22.4368
Epoch [1/3], Step [351/3236], Loss: 3.1959, Perplexity: 24.4310
Epoch [1/3], Step [352/3236], Loss: 3.1661, Perplexity: 23.7157
Epoch [1/3], Step [353/3236], Loss: 3.1940, Perplexity: 24.3858
Epoch [1/3], Step [354/3236], Loss: 3.3296, Perplexity: 27.9274
Epoch [1/3], Step [355/3236], Loss: 3.1131, Perplexity: 22.4914
Epoch [1/3], Step [356/3236], Loss: 3.2792, Perplexity: 26.5534
Epoch [1/3], Step [357/3236], Loss: 3.2199, Perplexity: 25.0260
Epoch [1/3], Step [358/3236], Loss: 3.5012, Perplexity: 33.1568
Epoch [1/3], Step [359/3236], Loss: 3.5545, Perplexity: 34.9687
Epoch [1/3], Step [360/3236], Loss: 2.9960, Perplexity: 20.0044
Epoch [1/3], Step [361/3236], Loss: 3.2940, Perplexity: 26.9496
Epoch [1/3], Step [362/3236], Loss: 3.2252, Perplexity: 25.1585
Epoch [1/3], Step [363/3236], Loss: 3.0090, Perplexity: 20.2665
Epoch [1/3], Step [364/3236], Loss: 3.5014, Perplexity: 33.1631
Epoch [1/3], Step [365/3236], Loss: 3.5445, Perplexity: 34.6231
Epoch [1/3], Step [366/3236], Loss: 3.0637, Perplexity: 21.4074
Epoch [1/3], Step [367/3236], Loss: 3.0399, Perplexity: 20.9032
Epoch [1/3], Step [368/3236], Loss: 3.2298, Perplexity: 25.2748
Epoch [1/3], Step [369/3236], Loss: 3.2175, Perplexity: 24.9651
Epoch [1/3], Step [370/3236], Loss: 3.0573, Perplexity: 21.2705
Epoch [1/3], Step [371/3236], Loss: 3.4509, Perplexity: 31.5281
Epoch [1/3], Step [372/3236], Loss: 3.0461, Perplexity: 21.0323
Epoch [1/3], Step [373/3236], Loss: 3.0489, Perplexity: 21.0921
Epoch [1/3], Step [374/3236], Loss: 3.0925, Perplexity: 22.0325
Epoch [1/3], Step [375/3236], Loss: 3.1795, Perplexity: 24.0347
Epoch [1/3], Step [376/3236], Loss: 3.2716, Perplexity: 26.3532
Epoch [1/3], Step [377/3236], Loss: 3.1727, Perplexity: 23.8708
Epoch [1/3], Step [378/3236], Loss: 3.2942, Perplexity: 26.9563
Epoch [1/3], Step [379/3236], Loss: 3.2416, Perplexity: 25.5741
Epoch [1/3], Step [380/3236], Loss: 3.3133, Perplexity: 27.4752
Epoch [1/3], Step [381/3236], Loss: 3.1513, Perplexity: 23.3654
Epoch [1/3], Step [382/3236], Loss: 3.1399, Perplexity: 23.1015
Epoch [1/3], Step [383/3236], Loss: 3.3286, Perplexity: 27.8981
Epoch [1/3], Step [384/3236], Loss: 3.0738, Perplexity: 21.6240
Epoch [1/3], Step [385/3236], Loss: 2.9233, Perplexity: 18.6027
Epoch [1/3], Step [386/3236], Loss: 3.1599, Perplexity: 23.5680
Epoch [1/3], Step [387/3236], Loss: 3.1380, Perplexity: 23.0568
Epoch [1/3], Step [388/3236], Loss: 3.1383, Perplexity: 23.0636
Epoch [1/3], Step [389/3236], Loss: 3.1473, Perplexity: 23.2732
Epoch [1/3], Step [390/3236], Loss: 3.1004, Perplexity: 22.2059
Epoch [1/3], Step [391/3236], Loss: 3.2227, Perplexity: 25.0967
Epoch [1/3], Step [392/3236], Loss: 3.0990, Perplexity: 22.1759
Epoch [1/3], Step [393/3236], Loss: 3.0930, Perplexity: 22.0432
Epoch [1/3], Step [394/3236], Loss: 3.0648, Perplexity: 21.4311
Epoch [1/3], Step [395/3236], Loss: 3.1064, Perplexity: 22.3399
Epoch [1/3], Step [396/3236], Loss: 3.2653, Perplexity: 26.1888
Epoch [1/3], Step [397/3236], Loss: 3.0936, Perplexity: 22.0559
Epoch [1/3], Step [398/3236], Loss: 3.0084, Perplexity: 20.2557
Epoch [1/3], Step [399/3236], Loss: 3.0483, Perplexity: 21.0801
Epoch [1/3], Step [400/3236], Loss: 3.5283, Perplexity: 34.0672
Epoch [1/3], Step [401/3236], Loss: 3.5141, Perplexity: 33.5842
Epoch [1/3], Step [402/3236], Loss: 3.3125, Perplexity: 27.4550
Epoch [1/3], Step [403/3236], Loss: 3.2715, Perplexity: 26.3517
Epoch [1/3], Step [404/3236], Loss: 3.2181, Perplexity: 24.9797
Epoch [1/3], Step [405/3236], Loss: 3.3479, Perplexity: 28.4441
Epoch [1/3], Step [406/3236], Loss: 3.0582, Perplexity: 21.2888
Epoch [1/3], Step [407/3236], Loss: 3.1509, Perplexity: 23.3560
Epoch [1/3], Step [408/3236], Loss: 3.0225, Perplexity: 20.5435
Epoch [1/3], Step [409/3236], Loss: 3.4237, Perplexity: 30.6815
Epoch [1/3], Step [410/3236], Loss: 3.1524, Perplexity: 23.3924
Epoch [1/3], Step [411/3236], Loss: 3.0359, Perplexity: 20.8197
Epoch [1/3], Step [412/3236], Loss: 2.9341, Perplexity: 18.8045
Epoch [1/3], Step [413/3236], Loss: 3.1499, Perplexity: 23.3340
Epoch [1/3], Step [414/3236], Loss: 3.1896, Perplexity: 24.2792
Epoch [1/3], Step [415/3236], Loss: 3.0429, Perplexity: 20.9654
Epoch [1/3], Step [416/3236], Loss: 2.9545, Perplexity: 19.1925
Epoch [1/3], Step [417/3236], Loss: 2.9806, Perplexity: 19.6993
Epoch [1/3], Step [418/3236], Loss: 3.1679, Perplexity: 23.7568
Epoch [1/3], Step [419/3236], Loss: 3.1616, Perplexity: 23.6081
Epoch [1/3], Step [420/3236], Loss: 3.1903, Perplexity: 24.2955
Epoch [1/3], Step [421/3236], Loss: 3.0826, Perplexity: 21.8155
Epoch [1/3], Step [422/3236], Loss: 2.9018, Perplexity: 18.2074
Epoch [1/3], Step [423/3236], Loss: 3.1891, Perplexity: 24.2668
Epoch [1/3], Step [424/3236], Loss: 3.5919, Perplexity: 36.3015
Epoch [1/3], Step [425/3236], Loss: 3.6288, Perplexity: 37.6691
Epoch [1/3], Step [426/3236], Loss: 3.1247, Perplexity: 22.7532
Epoch [1/3], Step [427/3236], Loss: 3.2382, Perplexity: 25.4869
Epoch [1/3], Step [428/3236], Loss: 3.0565, Perplexity: 21.2525
Epoch [1/3], Step [429/3236], Loss: 3.2534, Perplexity: 25.8781
Epoch [1/3], Step [430/3236], Loss: 3.5610, Perplexity: 35.1973
Epoch [1/3], Step [431/3236], Loss: 3.3371, Perplexity: 28.1380
Epoch [1/3], Step [432/3236], Loss: 3.0220, Perplexity: 20.5328
Epoch [1/3], Step [433/3236], Loss: 3.0637, Perplexity: 21.4075
Epoch [1/3], Step [434/3236], Loss: 3.2455, Perplexity: 25.6738
Epoch [1/3], Step [435/3236], Loss: 3.4492, Perplexity: 31.4738
Epoch [1/3], Step [436/3236], Loss: 3.0327, Perplexity: 20.7537
Epoch [1/3], Step [437/3236], Loss: 2.9443, Perplexity: 18.9980
Epoch [1/3], Step [438/3236], Loss: 3.0028, Perplexity: 20.1428
Epoch [1/3], Step [439/3236], Loss: 3.1480, Perplexity: 23.2903
Epoch [1/3], Step [440/3236], Loss: 3.2850, Perplexity: 26.7101
Epoch [1/3], Step [441/3236], Loss: 3.0424, Perplexity: 20.9547
Epoch [1/3], Step [442/3236], Loss: 2.9046, Perplexity: 18.2575
Epoch [1/3], Step [443/3236], Loss: 3.2638, Perplexity: 26.1497
Epoch [1/3], Step [444/3236], Loss: 3.0436, Perplexity: 20.9810
Epoch [1/3], Step [445/3236], Loss: 3.5750, Perplexity: 35.6942
Epoch [1/3], Step [446/3236], Loss: 3.0195, Perplexity: 20.4806
Epoch [1/3], Step [447/3236], Loss: 3.0972, Perplexity: 22.1356
Epoch [1/3], Step [448/3236], Loss: 3.0075, Perplexity: 20.2360
Epoch [1/3], Step [449/3236], Loss: 2.9850, Perplexity: 19.7870
Epoch [1/3], Step [450/3236], Loss: 3.0683, Perplexity: 21.5048
Epoch [1/3], Step [451/3236], Loss: 3.1055, Perplexity: 22.3195
Epoch [1/3], Step [452/3236], Loss: 2.9573, Perplexity: 19.2450
Epoch [1/3], Step [453/3236], Loss: 3.0820, Perplexity: 21.8013
Epoch [1/3], Step [454/3236], Loss: 3.0837, Perplexity: 21.8388
Epoch [1/3], Step [455/3236], Loss: 3.0906, Perplexity: 21.9906
Epoch [1/3], Step [456/3236], Loss: 3.2695, Perplexity: 26.2978
Epoch [1/3], Step [457/3236], Loss: 2.9721, Perplexity: 19.5329
Epoch [1/3], Step [458/3236], Loss: 3.1334, Perplexity: 22.9511
Epoch [1/3], Step [459/3236], Loss: 3.0951, Perplexity: 22.0889
Epoch [1/3], Step [460/3236], Loss: 2.9292, Perplexity: 18.7122
Epoch [1/3], Step [461/3236], Loss: 3.0238, Perplexity: 20.5693
Epoch [1/3], Step [462/3236], Loss: 3.0904, Perplexity: 21.9862
Epoch [1/3], Step [463/3236], Loss: 3.1185, Perplexity: 22.6123
Epoch [1/3], Step [464/3236], Loss: 2.9257, Perplexity: 18.6469
Epoch [1/3], Step [465/3236], Loss: 3.0184, Perplexity: 20.4585
Epoch [1/3], Step [466/3236], Loss: 3.1507, Perplexity: 23.3516
Epoch [1/3], Step [467/3236], Loss: 3.2652, Perplexity: 26.1846
Epoch [1/3], Step [468/3236], Loss: 3.4571, Perplexity: 31.7255
Epoch [1/3], Step [469/3236], Loss: 3.0169, Perplexity: 20.4284
Epoch [1/3], Step [470/3236], Loss: 3.0848, Perplexity: 21.8638
Epoch [1/3], Step [471/3236], Loss: 3.0309, Perplexity: 20.7149
Epoch [1/3], Step [472/3236], Loss: 5.2633, Perplexity: 193.1092
Epoch [1/3], Step [473/3236], Loss: 2.9640, Perplexity: 19.3757
Epoch [1/3], Step [474/3236], Loss: 3.0847, Perplexity: 21.8602
Epoch [1/3], Step [475/3236], Loss: 3.4944, Perplexity: 32.9321
Epoch [1/3], Step [476/3236], Loss: 3.3933, Perplexity: 29.7642
Epoch [1/3], Step [477/3236], Loss: 3.1452, Perplexity: 23.2246
Epoch [1/3], Step [478/3236], Loss: 3.1064, Perplexity: 22.3410
Epoch [1/3], Step [479/3236], Loss: 2.7978, Perplexity: 16.4086
Epoch [1/3], Step [480/3236], Loss: 2.9977, Perplexity: 20.0398
Epoch [1/3], Step [481/3236], Loss: 2.9204, Perplexity: 18.5480
Epoch [1/3], Step [482/3236], Loss: 2.9692, Perplexity: 19.4754
Epoch [1/3], Step [483/3236], Loss: 3.1493, Perplexity: 23.3194
Epoch [1/3], Step [484/3236], Loss: 2.9350, Perplexity: 18.8220
Epoch [1/3], Step [485/3236], Loss: 3.0890, Perplexity: 21.9545
Epoch [1/3], Step [486/3236], Loss: 3.1097, Perplexity: 22.4136
Epoch [1/3], Step [487/3236], Loss: 2.9481, Perplexity: 19.0695
Epoch [1/3], Step [488/3236], Loss: 3.0412, Perplexity: 20.9295
Epoch [1/3], Step [489/3236], Loss: 3.2370, Perplexity: 25.4562
Epoch [1/3], Step [490/3236], Loss: 3.0012, Perplexity: 20.1106
Epoch [1/3], Step [491/3236], Loss: 2.8001, Perplexity: 16.4468
Epoch [1/3], Step [492/3236], Loss: 3.1674, Perplexity: 23.7454
Epoch [1/3], Step [493/3236], Loss: 3.1342, Perplexity: 22.9705
Epoch [1/3], Step [494/3236], Loss: 3.0198, Perplexity: 20.4879
Epoch [1/3], Step [495/3236], Loss: 3.3938, Perplexity: 29.7788
Epoch [1/3], Step [496/3236], Loss: 3.0902, Perplexity: 21.9815
Epoch [1/3], Step [497/3236], Loss: 3.0970, Perplexity: 22.1307
Epoch [1/3], Step [498/3236], Loss: 3.1175, Perplexity: 22.5908
Epoch [1/3], Step [499/3236], Loss: 2.9653, Perplexity: 19.4006
Epoch [1/3], Step [500/3236], Loss: 2.9068, Perplexity: 18.2981
Epoch [1/3], Step [501/3236], Loss: 2.7255, Perplexity: 15.2643
Epoch [1/3], Step [502/3236], Loss: 3.0644, Perplexity: 21.4206
Epoch [1/3], Step [503/3236], Loss: 2.8146, Perplexity: 16.6866
Epoch [1/3], Step [504/3236], Loss: 3.6910, Perplexity: 40.0841
Epoch [1/3], Step [505/3236], Loss: 2.7972, Perplexity: 16.3986
Epoch [1/3], Step [506/3236], Loss: 2.9630, Perplexity: 19.3568
Epoch [1/3], Step [507/3236], Loss: 3.0306, Perplexity: 20.7099
Epoch [1/3], Step [508/3236], Loss: 2.8563, Perplexity: 17.3979
Epoch [1/3], Step [509/3236], Loss: 2.9462, Perplexity: 19.0342
Epoch [1/3], Step [510/3236], Loss: 3.2511, Perplexity: 25.8180
Epoch [1/3], Step [511/3236], Loss: 3.5550, Perplexity: 34.9881
Epoch [1/3], Step [512/3236], Loss: 2.9212, Perplexity: 18.5632
Epoch [1/3], Step [513/3236], Loss: 3.0451, Perplexity: 21.0114
Epoch [1/3], Step [514/3236], Loss: 3.0674, Perplexity: 21.4859
Epoch [1/3], Step [515/3236], Loss: 2.8920, Perplexity: 18.0297
Epoch [1/3], Step [516/3236], Loss: 3.3736, Perplexity: 29.1834
Epoch [1/3], Step [517/3236], Loss: 2.9757, Perplexity: 19.6027
Epoch [1/3], Step [518/3236], Loss: 3.0052, Perplexity: 20.1904
Epoch [1/3], Step [519/3236], Loss: 3.0346, Perplexity: 20.7934
Epoch [1/3], Step [520/3236], Loss: 2.7850, Perplexity: 16.1995
Epoch [1/3], Step [521/3236], Loss: 3.0429, Perplexity: 20.9659
Epoch [1/3], Step [522/3236], Loss: 3.2398, Perplexity: 25.5282
Epoch [1/3], Step [523/3236], Loss: 3.3760, Perplexity: 29.2522
Epoch [1/3], Step [524/3236], Loss: 2.8588, Perplexity: 17.4409
Epoch [1/3], Step [525/3236], Loss: 2.9176, Perplexity: 18.4967
Epoch [1/3], Step [526/3236], Loss: 2.9498, Perplexity: 19.1015
Epoch [1/3], Step [527/3236], Loss: 3.2404, Perplexity: 25.5446
Epoch [1/3], Step [528/3236], Loss: 2.9384, Perplexity: 18.8865
Epoch [1/3], Step [529/3236], Loss: 2.9242, Perplexity: 18.6189
Epoch [1/3], Step [530/3236], Loss: 3.1082, Perplexity: 22.3804
Epoch [1/3], Step [531/3236], Loss: 2.8544, Perplexity: 17.3647
Epoch [1/3], Step [532/3236], Loss: 3.0807, Perplexity: 21.7735
Epoch [1/3], Step [533/3236], Loss: 3.3465, Perplexity: 28.4032
Epoch [1/3], Step [534/3236], Loss: 3.2208, Perplexity: 25.0471
Epoch [1/3], Step [535/3236], Loss: 2.9612, Perplexity: 19.3209
Epoch [1/3], Step [536/3236], Loss: 4.6069, Perplexity: 100.1719
Epoch [1/3], Step [537/3236], Loss: 2.8167, Perplexity: 16.7218
Epoch [1/3], Step [538/3236], Loss: 3.0918, Perplexity: 22.0159
Epoch [1/3], Step [539/3236], Loss: 3.0901, Perplexity: 21.9800
Epoch [1/3], Step [540/3236], Loss: 3.7633, Perplexity: 43.0920
Epoch [1/3], Step [541/3236], Loss: 3.2540, Perplexity: 25.8925
Epoch [1/3], Step [542/3236], Loss: 2.9676, Perplexity: 19.4450
Epoch [1/3], Step [543/3236], Loss: 2.9447, Perplexity: 19.0048
Epoch [1/3], Step [544/3236], Loss: 2.9831, Perplexity: 19.7492
Epoch [1/3], Step [545/3236], Loss: 2.9359, Perplexity: 18.8390
Epoch [1/3], Step [546/3236], Loss: 3.0048, Perplexity: 20.1825
Epoch [1/3], Step [547/3236], Loss: 2.9073, Perplexity: 18.3073
Epoch [1/3], Step [548/3236], Loss: 2.9634, Perplexity: 19.3643
Epoch [1/3], Step [549/3236], Loss: 2.9686, Perplexity: 19.4638
Epoch [1/3], Step [550/3236], Loss: 2.8649, Perplexity: 17.5478
Epoch [1/3], Step [551/3236], Loss: 2.8999, Perplexity: 18.1724
Epoch [1/3], Step [552/3236], Loss: 3.3281, Perplexity: 27.8841
Epoch [1/3], Step [553/3236], Loss: 3.3057, Perplexity: 27.2673
Epoch [1/3], Step [554/3236], Loss: 3.0504, Perplexity: 21.1232
Epoch [1/3], Step [555/3236], Loss: 2.9760, Perplexity: 19.6086
Epoch [1/3], Step [556/3236], Loss: 2.8518, Perplexity: 17.3181
Epoch [1/3], Step [557/3236], Loss: 2.9708, Perplexity: 19.5083
Epoch [1/3], Step [558/3236], Loss: 2.9148, Perplexity: 18.4449
Epoch [1/3], Step [559/3236], Loss: 2.7815, Perplexity: 16.1426
Epoch [1/3], Step [560/3236], Loss: 3.0081, Perplexity: 20.2494
Epoch [1/3], Step [561/3236], Loss: 2.7795, Perplexity: 16.1109
Epoch [1/3], Step [562/3236], Loss: 3.0274, Perplexity: 20.6433
Epoch [1/3], Step [563/3236], Loss: 2.9143, Perplexity: 18.4365
Epoch [1/3], Step [564/3236], Loss: 3.7067, Perplexity: 40.7191
Epoch [1/3], Step [565/3236], Loss: 3.2199, Perplexity: 25.0248
Epoch [1/3], Step [566/3236], Loss: 3.2883, Perplexity: 26.7960
Epoch [1/3], Step [567/3236], Loss: 2.7973, Perplexity: 16.4005
Epoch [1/3], Step [568/3236], Loss: 3.0601, Perplexity: 21.3293
Epoch [1/3], Step [569/3236], Loss: 3.1114, Perplexity: 22.4515
Epoch [1/3], Step [570/3236], Loss: 3.0343, Perplexity: 20.7866
Epoch [1/3], Step [571/3236], Loss: 2.9171, Perplexity: 18.4876
Epoch [1/3], Step [572/3236], Loss: 2.7645, Perplexity: 15.8712
Epoch [1/3], Step [573/3236], Loss: 2.8643, Perplexity: 17.5365
Epoch [1/3], Step [574/3236], Loss: 2.7253, Perplexity: 15.2607
Epoch [1/3], Step [575/3236], Loss: 2.8575, Perplexity: 17.4180
Epoch [1/3], Step [576/3236], Loss: 2.8861, Perplexity: 17.9232
Epoch [1/3], Step [577/3236], Loss: 3.0026, Perplexity: 20.1387
Epoch [1/3], Step [578/3236], Loss: 2.8539, Perplexity: 17.3552
Epoch [1/3], Step [579/3236], Loss: 2.9729, Perplexity: 19.5487
Epoch [1/3], Step [580/3236], Loss: 2.6711, Perplexity: 14.4564
Epoch [1/3], Step [581/3236], Loss: 3.1899, Perplexity: 24.2852
Epoch [1/3], Step [582/3236], Loss: 3.0024, Perplexity: 20.1339
Epoch [1/3], Step [583/3236], Loss: 3.1498, Perplexity: 23.3323
Epoch [1/3], Step [584/3236], Loss: 2.8311, Perplexity: 16.9644
Epoch [1/3], Step [585/3236], Loss: 2.8898, Perplexity: 17.9901
Epoch [1/3], Step [586/3236], Loss: 2.9872, Perplexity: 19.8308
Epoch [1/3], Step [587/3236], Loss: 3.2068, Perplexity: 24.7005
Epoch [1/3], Step [588/3236], Loss: 2.7998, Perplexity: 16.4417
Epoch [1/3], Step [589/3236], Loss: 3.1026, Perplexity: 22.2547
Epoch [1/3], Step [590/3236], Loss: 2.8629, Perplexity: 17.5121
Epoch [1/3], Step [591/3236], Loss: 3.0979, Perplexity: 22.1514
Epoch [1/3], Step [592/3236], Loss: 3.8040, Perplexity: 44.8815
Epoch [1/3], Step [593/3236], Loss: 2.9853, Perplexity: 19.7926
Epoch [1/3], Step [594/3236], Loss: 3.2365, Perplexity: 25.4446
Epoch [1/3], Step [595/3236], Loss: 2.9314, Perplexity: 18.7543
Epoch [1/3], Step [596/3236], Loss: 3.0862, Perplexity: 21.8947
Epoch [1/3], Step [597/3236], Loss: 2.8480, Perplexity: 17.2538
Epoch [1/3], Step [598/3236], Loss: 3.3411, Perplexity: 28.2499
Epoch [1/3], Step [599/3236], Loss: 2.8569, Perplexity: 17.4069
Epoch [1/3], Step [600/3236], Loss: 2.9805, Perplexity: 19.6967
Epoch [1/3], Step [601/3236], Loss: 3.0239, Perplexity: 20.5711
Epoch [1/3], Step [602/3236], Loss: 2.8369, Perplexity: 17.0630
Epoch [1/3], Step [603/3236], Loss: 2.9864, Perplexity: 19.8147
Epoch [1/3], Step [604/3236], Loss: 2.8645, Perplexity: 17.5403
Epoch [1/3], Step [605/3236], Loss: 2.8083, Perplexity: 16.5816
Epoch [1/3], Step [606/3236], Loss: 2.9531, Perplexity: 19.1649
Epoch [1/3], Step [607/3236], Loss: 3.1178, Perplexity: 22.5956
Epoch [1/3], Step [608/3236], Loss: 2.7295, Perplexity: 15.3246
Epoch [1/3], Step [609/3236], Loss: 3.1472, Perplexity: 23.2699
Epoch [1/3], Step [610/3236], Loss: 2.8178, Perplexity: 16.7392
Epoch [1/3], Step [611/3236], Loss: 2.9055, Perplexity: 18.2743
Epoch [1/3], Step [612/3236], Loss: 2.9826, Perplexity: 19.7387
Epoch [1/3], Step [613/3236], Loss: 3.0396, Perplexity: 20.8972
Epoch [1/3], Step [614/3236], Loss: 2.9494, Perplexity: 19.0936
Epoch [1/3], Step [615/3236], Loss: 3.7268, Perplexity: 41.5480
Epoch [1/3], Step [616/3236], Loss: 2.9027, Perplexity: 18.2226
Epoch [1/3], Step [617/3236], Loss: 2.8802, Perplexity: 17.8175
Epoch [1/3], Step [618/3236], Loss: 3.4233, Perplexity: 30.6695
Epoch [1/3], Step [619/3236], Loss: 2.8475, Perplexity: 17.2454
Epoch [1/3], Step [620/3236], Loss: 2.6906, Perplexity: 14.7400
Epoch [1/3], Step [621/3236], Loss: 3.2861, Perplexity: 26.7377
Epoch [1/3], Step [622/3236], Loss: 2.8457, Perplexity: 17.2143
Epoch [1/3], Step [623/3236], Loss: 2.7992, Perplexity: 16.4312
Epoch [1/3], Step [624/3236], Loss: 2.8235, Perplexity: 16.8351
Epoch [1/3], Step [625/3236], Loss: 2.8532, Perplexity: 17.3440
Epoch [1/3], Step [626/3236], Loss: 2.8282, Perplexity: 16.9145
Epoch [1/3], Step [627/3236], Loss: 2.9987, Perplexity: 20.0595
Epoch [1/3], Step [628/3236], Loss: 2.9253, Perplexity: 18.6400
Epoch [1/3], Step [629/3236], Loss: 3.5771, Perplexity: 35.7689
Epoch [1/3], Step [630/3236], Loss: 3.2521, Perplexity: 25.8454
Epoch [1/3], Step [631/3236], Loss: 2.8689, Perplexity: 17.6174
Epoch [1/3], Step [632/3236], Loss: 2.9520, Perplexity: 19.1444
Epoch [1/3], Step [633/3236], Loss: 2.9740, Perplexity: 19.5704
Epoch [1/3], Step [634/3236], Loss: 2.8418, Perplexity: 17.1473
Epoch [1/3], Step [635/3236], Loss: 2.9208, Perplexity: 18.5560
Epoch [1/3], Step [636/3236], Loss: 3.2208, Perplexity: 25.0470
Epoch [1/3], Step [637/3236], Loss: 2.7849, Perplexity: 16.1985
Epoch [1/3], Step [638/3236], Loss: 3.3519, Perplexity: 28.5570
Epoch [1/3], Step [639/3236], Loss: 3.1482, Perplexity: 23.2943
Epoch [1/3], Step [640/3236], Loss: 2.8662, Perplexity: 17.5695
Epoch [1/3], Step [641/3236], Loss: 2.9261, Perplexity: 18.6539
Epoch [1/3], Step [642/3236], Loss: 2.8957, Perplexity: 18.0953
Epoch [1/3], Step [643/3236], Loss: 2.9563, Perplexity: 19.2270
Epoch [1/3], Step [644/3236], Loss: 2.7708, Perplexity: 15.9709
Epoch [1/3], Step [645/3236], Loss: 3.0453, Perplexity: 21.0156
Epoch [1/3], Step [646/3236], Loss: 2.9589, Perplexity: 19.2771
Epoch [1/3], Step [647/3236], Loss: 2.7729, Perplexity: 16.0057
Epoch [1/3], Step [648/3236], Loss: 2.9330, Perplexity: 18.7847
Epoch [1/3], Step [649/3236], Loss: 2.8884, Perplexity: 17.9649
Epoch [1/3], Step [650/3236], Loss: 2.9641, Perplexity: 19.3766
Epoch [1/3], Step [651/3236], Loss: 3.1238, Perplexity: 22.7332
Epoch [1/3], Step [652/3236], Loss: 3.2514, Perplexity: 25.8269
Epoch [1/3], Step [653/3236], Loss: 2.6544, Perplexity: 14.2169
Epoch [1/3], Step [654/3236], Loss: 3.0955, Perplexity: 22.0981
Epoch [1/3], Step [655/3236], Loss: 2.9511, Perplexity: 19.1260
Epoch [1/3], Step [656/3236], Loss: 2.8451, Perplexity: 17.2041
Epoch [1/3], Step [657/3236], Loss: 2.9110, Perplexity: 18.3748
Epoch [1/3], Step [658/3236], Loss: 2.9301, Perplexity: 18.7286
Epoch [1/3], Step [659/3236], Loss: 2.7695, Perplexity: 15.9503
Epoch [1/3], Step [660/3236], Loss: 3.2777, Perplexity: 26.5135
Epoch [1/3], Step [661/3236], Loss: 2.8086, Perplexity: 16.5867
Epoch [1/3], Step [662/3236], Loss: 2.9191, Perplexity: 18.5253
Epoch [1/3], Step [663/3236], Loss: 2.8534, Perplexity: 17.3474
Epoch [1/3], Step [664/3236], Loss: 2.8789, Perplexity: 17.7949
Epoch [1/3], Step [665/3236], Loss: 2.7140, Perplexity: 15.0894
Epoch [1/3], Step [666/3236], Loss: 2.9819, Perplexity: 19.7248
Epoch [1/3], Step [667/3236], Loss: 2.7795, Perplexity: 16.1113
Epoch [1/3], Step [668/3236], Loss: 2.6259, Perplexity: 13.8174
Epoch [1/3], Step [669/3236], Loss: 3.0330, Perplexity: 20.7594
Epoch [1/3], Step [670/3236], Loss: 2.7185, Perplexity: 15.1578
Epoch [1/3], Step [671/3236], Loss: 2.8311, Perplexity: 16.9635
Epoch [1/3], Step [672/3236], Loss: 2.8553, Perplexity: 17.3795
Epoch [1/3], Step [673/3236], Loss: 2.8885, Perplexity: 17.9660
Epoch [1/3], Step [674/3236], Loss: 2.7284, Perplexity: 15.3090
Epoch [1/3], Step [675/3236], Loss: 3.0170, Perplexity: 20.4297
Epoch [1/3], Step [676/3236], Loss: 3.0122, Perplexity: 20.3315
Epoch [1/3], Step [677/3236], Loss: 2.7899, Perplexity: 16.2795
Epoch [1/3], Step [678/3236], Loss: 2.8934, Perplexity: 18.0538
Epoch [1/3], Step [679/3236], Loss: 2.8523, Perplexity: 17.3276
Epoch [1/3], Step [680/3236], Loss: 2.9458, Perplexity: 19.0264
Epoch [1/3], Step [681/3236], Loss: 2.5056, Perplexity: 12.2508
Epoch [1/3], Step [682/3236], Loss: 2.7508, Perplexity: 15.6551
Epoch [1/3], Step [683/3236], Loss: 2.7526, Perplexity: 15.6840
Epoch [1/3], Step [684/3236], Loss: 2.9436, Perplexity: 18.9848
Epoch [1/3], Step [685/3236], Loss: 2.7773, Perplexity: 16.0748
Epoch [1/3], Step [686/3236], Loss: 2.8104, Perplexity: 16.6172
Epoch [1/3], Step [687/3236], Loss: 4.2740, Perplexity: 71.8118
Epoch [1/3], Step [688/3236], Loss: 2.9025, Perplexity: 18.2191
Epoch [1/3], Step [689/3236], Loss: 2.7256, Perplexity: 15.2662
Epoch [1/3], Step [690/3236], Loss: 2.8472, Perplexity: 17.2392
Epoch [1/3], Step [691/3236], Loss: 2.7897, Perplexity: 16.2760
Epoch [1/3], Step [692/3236], Loss: 2.7452, Perplexity: 15.5675
Epoch [1/3], Step [693/3236], Loss: 2.8721, Perplexity: 17.6738
Epoch [1/3], Step [694/3236], Loss: 2.6755, Perplexity: 14.5196
Epoch [1/3], Step [695/3236], Loss: 2.8580, Perplexity: 17.4269
Epoch [1/3], Step [696/3236], Loss: 2.8815, Perplexity: 17.8406
Epoch [1/3], Step [697/3236], Loss: 2.8703, Perplexity: 17.6431
Epoch [1/3], Step [698/3236], Loss: 2.8773, Perplexity: 17.7661
Epoch [1/3], Step [699/3236], Loss: 2.8681, Perplexity: 17.6030
Epoch [1/3], Step [700/3236], Loss: 2.9255, Perplexity: 18.6430
Epoch [1/3], Step [701/3236], Loss: 3.2955, Perplexity: 26.9901
Epoch [1/3], Step [702/3236], Loss: 2.7127, Perplexity: 15.0706
Epoch [1/3], Step [703/3236], Loss: 2.8381, Perplexity: 17.0824
Epoch [1/3], Step [704/3236], Loss: 2.8893, Perplexity: 17.9802
Epoch [1/3], Step [705/3236], Loss: 2.7342, Perplexity: 15.3980
Epoch [1/3], Step [706/3236], Loss: 2.7052, Perplexity: 14.9571
Epoch [1/3], Step [707/3236], Loss: 2.6212, Perplexity: 13.7521
Epoch [1/3], Step [708/3236], Loss: 2.7656, Perplexity: 15.8878
Epoch [1/3], Step [709/3236], Loss: 2.8799, Perplexity: 17.8127
Epoch [1/3], Step [710/3236], Loss: 2.8217, Perplexity: 16.8055
Epoch [1/3], Step [711/3236], Loss: 2.8696, Perplexity: 17.6300
Epoch [1/3], Step [712/3236], Loss: 2.8605, Perplexity: 17.4701
Epoch [1/3], Step [713/3236], Loss: 2.8791, Perplexity: 17.7988
Epoch [1/3], Step [714/3236], Loss: 2.7400, Perplexity: 15.4874
Epoch [1/3], Step [715/3236], Loss: 2.7569, Perplexity: 15.7502
Epoch [1/3], Step [716/3236], Loss: 2.7601, Perplexity: 15.8019
Epoch [1/3], Step [717/3236], Loss: 2.9357, Perplexity: 18.8341
Epoch [1/3], Step [718/3236], Loss: 2.6941, Perplexity: 14.7916
Epoch [1/3], Step [719/3236], Loss: 2.6313, Perplexity: 13.8917
Epoch [1/3], Step [720/3236], Loss: 3.1566, Perplexity: 23.4911
Epoch [1/3], Step [721/3236], Loss: 2.7539, Perplexity: 15.7035
Epoch [1/3], Step [722/3236], Loss: 2.6679, Perplexity: 14.4100
Epoch [1/3], Step [723/3236], Loss: 2.6946, Perplexity: 14.8003
Epoch [1/3], Step [724/3236], Loss: 2.7002, Perplexity: 14.8832
Epoch [1/3], Step [725/3236], Loss: 2.6754, Perplexity: 14.5175
Epoch [1/3], Step [726/3236], Loss: 2.6601, Perplexity: 14.2980
Epoch [1/3], Step [727/3236], Loss: 2.8297, Perplexity: 16.9409
Epoch [1/3], Step [728/3236], Loss: 2.8018, Perplexity: 16.4742
Epoch [1/3], Step [729/3236], Loss: 2.6054, Perplexity: 13.5365
Epoch [1/3], Step [730/3236], Loss: 3.0633, Perplexity: 21.3979
Epoch [1/3], Step [731/3236], Loss: 2.4897, Perplexity: 12.0577
Epoch [1/3], Step [732/3236], Loss: 2.7570, Perplexity: 15.7524
Epoch [1/3], Step [733/3236], Loss: 2.8453, Perplexity: 17.2065
Epoch [1/3], Step [734/3236], Loss: 2.8187, Perplexity: 16.7550
Epoch [1/3], Step [735/3236], Loss: 2.7339, Perplexity: 15.3929
Epoch [1/3], Step [736/3236], Loss: 3.0574, Perplexity: 21.2719
Epoch [1/3], Step [737/3236], Loss: 3.0174, Perplexity: 20.4383
Epoch [1/3], Step [738/3236], Loss: 2.9507, Perplexity: 19.1187
Epoch [1/3], Step [739/3236], Loss: 4.2819, Perplexity: 72.3808
Epoch [1/3], Step [740/3236], Loss: 3.5217, Perplexity: 33.8419
Epoch [1/3], Step [741/3236], Loss: 3.0070, Perplexity: 20.2275
Epoch [1/3], Step [742/3236], Loss: 2.8689, Perplexity: 17.6183
Epoch [1/3], Step [743/3236], Loss: 2.8645, Perplexity: 17.5405
Epoch [1/3], Step [744/3236], Loss: 2.9765, Perplexity: 19.6196
Epoch [1/3], Step [745/3236], Loss: 2.7668, Perplexity: 15.9083
Epoch [1/3], Step [746/3236], Loss: 2.9730, Perplexity: 19.5500
Epoch [1/3], Step [747/3236], Loss: 2.6856, Perplexity: 14.6675
Epoch [1/3], Step [748/3236], Loss: 2.8472, Perplexity: 17.2402
Epoch [1/3], Step [749/3236], Loss: 3.0627, Perplexity: 21.3856
Epoch [1/3], Step [750/3236], Loss: 3.1098, Perplexity: 22.4157
Epoch [1/3], Step [751/3236], Loss: 2.8159, Perplexity: 16.7075
Epoch [1/3], Step [752/3236], Loss: 2.8566, Perplexity: 17.4018
Epoch [1/3], Step [753/3236], Loss: 2.7713, Perplexity: 15.9799
Epoch [1/3], Step [754/3236], Loss: 2.7661, Perplexity: 15.8967
Epoch [1/3], Step [755/3236], Loss: 2.9083, Perplexity: 18.3259
Epoch [1/3], Step [756/3236], Loss: 3.2030, Perplexity: 24.6065
Epoch [1/3], Step [757/3236], Loss: 2.6767, Perplexity: 14.5373
Epoch [1/3], Step [758/3236], Loss: 2.8022, Perplexity: 16.4809
Epoch [1/3], Step [759/3236], Loss: 2.7739, Perplexity: 16.0212
Epoch [1/3], Step [760/3236], Loss: 3.1166, Perplexity: 22.5684
Epoch [1/3], Step [761/3236], Loss: 2.7570, Perplexity: 15.7519
Epoch [1/3], Step [762/3236], Loss: 2.7042, Perplexity: 14.9426
Epoch [1/3], Step [763/3236], Loss: 3.2652, Perplexity: 26.1866
Epoch [1/3], Step [764/3236], Loss: 2.6702, Perplexity: 14.4425
Epoch [1/3], Step [765/3236], Loss: 2.7039, Perplexity: 14.9378
Epoch [1/3], Step [766/3236], Loss: 2.7852, Perplexity: 16.2030
Epoch [1/3], Step [767/3236], Loss: 2.8066, Perplexity: 16.5533
Epoch [1/3], Step [768/3236], Loss: 2.8101, Perplexity: 16.6117
Epoch [1/3], Step [769/3236], Loss: 2.8240, Perplexity: 16.8442
Epoch [1/3], Step [770/3236], Loss: 2.7937, Perplexity: 16.3420
Epoch [1/3], Step [771/3236], Loss: 2.6097, Perplexity: 13.5947
Epoch [1/3], Step [772/3236], Loss: 2.6667, Perplexity: 14.3927
Epoch [1/3], Step [773/3236], Loss: 2.7309, Perplexity: 15.3472
Epoch [1/3], Step [774/3236], Loss: 2.6855, Perplexity: 14.6660
Epoch [1/3], Step [775/3236], Loss: 2.7763, Perplexity: 16.0593
Epoch [1/3], Step [776/3236], Loss: 3.1113, Perplexity: 22.4505
Epoch [1/3], Step [777/3236], Loss: 3.6130, Perplexity: 37.0785
Epoch [1/3], Step [778/3236], Loss: 2.7519, Perplexity: 15.6730
Epoch [1/3], Step [779/3236], Loss: 2.7191, Perplexity: 15.1665
Epoch [1/3], Step [780/3236], Loss: 2.5586, Perplexity: 12.9175
Epoch [1/3], Step [781/3236], Loss: 2.7646, Perplexity: 15.8733
Epoch [1/3], Step [782/3236], Loss: 2.8893, Perplexity: 17.9809
Epoch [1/3], Step [783/3236], Loss: 2.8087, Perplexity: 16.5884
Epoch [1/3], Step [784/3236], Loss: 2.8548, Perplexity: 17.3716
Epoch [1/3], Step [785/3236], Loss: 3.1822, Perplexity: 24.1003
Epoch [1/3], Step [786/3236], Loss: 3.0361, Perplexity: 20.8247
Epoch [1/3], Step [787/3236], Loss: 2.7007, Perplexity: 14.8907
Epoch [1/3], Step [788/3236], Loss: 2.6106, Perplexity: 13.6069
Epoch [1/3], Step [789/3236], Loss: 2.8803, Perplexity: 17.8201
Epoch [1/3], Step [790/3236], Loss: 2.8411, Perplexity: 17.1346
Epoch [1/3], Step [791/3236], Loss: 2.7380, Perplexity: 15.4554
Epoch [1/3], Step [792/3236], Loss: 2.6571, Perplexity: 14.2549
Epoch [1/3], Step [793/3236], Loss: 3.1130, Perplexity: 22.4889
Epoch [1/3], Step [794/3236], Loss: 2.7332, Perplexity: 15.3824
Epoch [1/3], Step [795/3236], Loss: 2.8567, Perplexity: 17.4040
Epoch [1/3], Step [796/3236], Loss: 2.8570, Perplexity: 17.4099
Epoch [1/3], Step [797/3236], Loss: 2.7382, Perplexity: 15.4589
Epoch [1/3], Step [798/3236], Loss: 2.7923, Perplexity: 16.3183
Epoch [1/3], Step [799/3236], Loss: 2.8278, Perplexity: 16.9078
Epoch [1/3], Step [800/3236], Loss: 2.9768, Perplexity: 19.6257
Epoch [1/3], Step [801/3236], Loss: 2.6327, Perplexity: 13.9120
Epoch [1/3], Step [802/3236], Loss: 2.7984, Perplexity: 16.4185
Epoch [1/3], Step [803/3236], Loss: 2.7899, Perplexity: 16.2787
Epoch [1/3], Step [804/3236], Loss: 2.7056, Perplexity: 14.9628
Epoch [1/3], Step [805/3236], Loss: 2.8660, Perplexity: 17.5665
Epoch [1/3], Step [806/3236], Loss: 2.7425, Perplexity: 15.5261
Epoch [1/3], Step [807/3236], Loss: 2.7263, Perplexity: 15.2767
Epoch [1/3], Step [808/3236], Loss: 2.5942, Perplexity: 13.3860
Epoch [1/3], Step [809/3236], Loss: 2.6450, Perplexity: 14.0831
Epoch [1/3], Step [810/3236], Loss: 2.7005, Perplexity: 14.8878
Epoch [1/3], Step [811/3236], Loss: 2.7281, Perplexity: 15.3034
Epoch [1/3], Step [812/3236], Loss: 2.6028, Perplexity: 13.5011
Epoch [1/3], Step [813/3236], Loss: 3.1523, Perplexity: 23.3895
Epoch [1/3], Step [814/3236], Loss: 2.6699, Perplexity: 14.4391
Epoch [1/3], Step [815/3236], Loss: 2.7326, Perplexity: 15.3724
Epoch [1/3], Step [816/3236], Loss: 2.7261, Perplexity: 15.2729
Epoch [1/3], Step [817/3236], Loss: 2.8811, Perplexity: 17.8343
Epoch [1/3], Step [818/3236], Loss: 2.5448, Perplexity: 12.7408
Epoch [1/3], Step [819/3236], Loss: 2.7883, Perplexity: 16.2533
Epoch [1/3], Step [820/3236], Loss: 2.8496, Perplexity: 17.2811
Epoch [1/3], Step [821/3236], Loss: 2.5885, Perplexity: 13.3098
Epoch [1/3], Step [822/3236], Loss: 2.6999, Perplexity: 14.8777
Epoch [1/3], Step [823/3236], Loss: 2.6338, Perplexity: 13.9262
Epoch [1/3], Step [824/3236], Loss: 2.7958, Perplexity: 16.3760
Epoch [1/3], Step [825/3236], Loss: 2.4800, Perplexity: 11.9418
Epoch [1/3], Step [826/3236], Loss: 2.7355, Perplexity: 15.4176
Epoch [1/3], Step [827/3236], Loss: 2.7403, Perplexity: 15.4922
Epoch [1/3], Step [828/3236], Loss: 2.7543, Perplexity: 15.7106
Epoch [1/3], Step [829/3236], Loss: 2.5472, Perplexity: 12.7711
Epoch [1/3], Step [830/3236], Loss: 2.7370, Perplexity: 15.4400
Epoch [1/3], Step [831/3236], Loss: 2.7704, Perplexity: 15.9648
Epoch [1/3], Step [832/3236], Loss: 2.7988, Perplexity: 16.4242
Epoch [1/3], Step [833/3236], Loss: 2.5951, Perplexity: 13.3977
Epoch [1/3], Step [834/3236], Loss: 2.5702, Perplexity: 13.0680
Epoch [1/3], Step [835/3236], Loss: 2.7594, Perplexity: 15.7910
Epoch [1/3], Step [836/3236], Loss: 3.2446, Perplexity: 25.6509
Epoch [1/3], Step [837/3236], Loss: 2.5654, Perplexity: 13.0056
Epoch [1/3], Step [838/3236], Loss: 2.6333, Perplexity: 13.9196
Epoch [1/3], Step [839/3236], Loss: 3.0134, Perplexity: 20.3572
Epoch [1/3], Step [840/3236], Loss: 3.0115, Perplexity: 20.3181
Epoch [1/3], Step [841/3236], Loss: 2.6036, Perplexity: 13.5126
Epoch [1/3], Step [842/3236], Loss: 2.6211, Perplexity: 13.7512
Epoch [1/3], Step [843/3236], Loss: 2.5479, Perplexity: 12.7798
Epoch [1/3], Step [844/3236], Loss: 2.7044, Perplexity: 14.9451
Epoch [1/3], Step [845/3236], Loss: 2.4226, Perplexity: 11.2754
Epoch [1/3], Step [846/3236], Loss: 4.4386, Perplexity: 84.6591
Epoch [1/3], Step [847/3236], Loss: 2.7897, Perplexity: 16.2758
Epoch [1/3], Step [848/3236], Loss: 2.9012, Perplexity: 18.1953
Epoch [1/3], Step [849/3236], Loss: 2.6255, Perplexity: 13.8118
Epoch [1/3], Step [850/3236], Loss: 2.6435, Perplexity: 14.0617
Epoch [1/3], Step [851/3236], Loss: 2.8123, Perplexity: 16.6476
Epoch [1/3], Step [852/3236], Loss: 2.8187, Perplexity: 16.7545
Epoch [1/3], Step [853/3236], Loss: 2.5672, Perplexity: 13.0288
Epoch [1/3], Step [854/3236], Loss: 2.7606, Perplexity: 15.8099
Epoch [1/3], Step [855/3236], Loss: 2.6567, Perplexity: 14.2487
Epoch [1/3], Step [856/3236], Loss: 2.8668, Perplexity: 17.5813
Epoch [1/3], Step [857/3236], Loss: 2.6042, Perplexity: 13.5203
Epoch [1/3], Step [858/3236], Loss: 2.6083, Perplexity: 13.5755
Epoch [1/3], Step [859/3236], Loss: 2.6890, Perplexity: 14.7163
Epoch [1/3], Step [860/3236], Loss: 2.5537, Perplexity: 12.8552
Epoch [1/3], Step [861/3236], Loss: 2.6600, Perplexity: 14.2957
Epoch [1/3], Step [862/3236], Loss: 2.7342, Perplexity: 15.3980
Epoch [1/3], Step [863/3236], Loss: 2.5475, Perplexity: 12.7753
Epoch [1/3], Step [864/3236], Loss: 2.7062, Perplexity: 14.9721
Epoch [1/3], Step [865/3236], Loss: 2.6607, Perplexity: 14.3069
Epoch [1/3], Step [866/3236], Loss: 3.4410, Perplexity: 31.2196
Epoch [1/3], Step [867/3236], Loss: 2.6780, Perplexity: 14.5555
Epoch [1/3], Step [868/3236], Loss: 2.5574, Perplexity: 12.9016
Epoch [1/3], Step [869/3236], Loss: 2.7585, Perplexity: 15.7756
Epoch [1/3], Step [870/3236], Loss: 2.6365, Perplexity: 13.9643
Epoch [1/3], Step [871/3236], Loss: 2.9170, Perplexity: 18.4866
Epoch [1/3], Step [872/3236], Loss: 2.5897, Perplexity: 13.3262
Epoch [1/3], Step [873/3236], Loss: 2.6691, Perplexity: 14.4272
Epoch [1/3], Step [874/3236], Loss: 2.6782, Perplexity: 14.5590
Epoch [1/3], Step [875/3236], Loss: 2.7367, Perplexity: 15.4361
Epoch [1/3], Step [876/3236], Loss: 2.5841, Perplexity: 13.2511
Epoch [1/3], Step [877/3236], Loss: 2.8020, Perplexity: 16.4781
Epoch [1/3], Step [878/3236], Loss: 2.5748, Perplexity: 13.1282
Epoch [1/3], Step [879/3236], Loss: 3.0376, Perplexity: 20.8560
Epoch [1/3], Step [880/3236], Loss: 2.7894, Perplexity: 16.2717
Epoch [1/3], Step [881/3236], Loss: 2.8749, Perplexity: 17.7228
Epoch [1/3], Step [882/3236], Loss: 2.6337, Perplexity: 13.9254
Epoch [1/3], Step [883/3236], Loss: 2.7945, Perplexity: 16.3552
Epoch [1/3], Step [884/3236], Loss: 2.9068, Perplexity: 18.2985
Epoch [1/3], Step [885/3236], Loss: 2.5699, Perplexity: 13.0650
Epoch [1/3], Step [886/3236], Loss: 2.6376, Perplexity: 13.9803
Epoch [1/3], Step [887/3236], Loss: 3.5717, Perplexity: 35.5768
Epoch [1/3], Step [888/3236], Loss: 3.0540, Perplexity: 21.1990
Epoch [1/3], Step [889/3236], Loss: 2.6454, Perplexity: 14.0896
Epoch [1/3], Step [890/3236], Loss: 2.6302, Perplexity: 13.8771
Epoch [1/3], Step [891/3236], Loss: 2.7034, Perplexity: 14.9307
Epoch [1/3], Step [892/3236], Loss: 2.6570, Perplexity: 14.2533
Epoch [1/3], Step [893/3236], Loss: 3.1449, Perplexity: 23.2172
Epoch [1/3], Step [894/3236], Loss: 2.4874, Perplexity: 12.0294
Epoch [1/3], Step [895/3236], Loss: 2.8382, Perplexity: 17.0853
Epoch [1/3], Step [896/3236], Loss: 2.5751, Perplexity: 13.1323
Epoch [1/3], Step [897/3236], Loss: 2.5854, Perplexity: 13.2689
Epoch [1/3], Step [898/3236], Loss: 2.9720, Perplexity: 19.5302
Epoch [1/3], Step [899/3236], Loss: 2.6462, Perplexity: 14.1007
Epoch [1/3], Step [900/3236], Loss: 2.6423, Perplexity: 14.0448
Epoch [1/3], Step [901/3236], Loss: 2.6002, Perplexity: 13.4667
Epoch [1/3], Step [902/3236], Loss: 2.6369, Perplexity: 13.9696
Epoch [1/3], Step [903/3236], Loss: 2.5802, Perplexity: 13.2004
Epoch [1/3], Step [904/3236], Loss: 2.7396, Perplexity: 15.4807
Epoch [1/3], Step [905/3236], Loss: 2.7582, Perplexity: 15.7709
Epoch [1/3], Step [906/3236], Loss: 2.7005, Perplexity: 14.8879
Epoch [1/3], Step [907/3236], Loss: 2.5408, Perplexity: 12.6899
Epoch [1/3], Step [908/3236], Loss: 2.7110, Perplexity: 15.0444
Epoch [1/3], Step [909/3236], Loss: 2.6951, Perplexity: 14.8077
Epoch [1/3], Step [910/3236], Loss: 3.0850, Perplexity: 21.8667
Epoch [1/3], Step [911/3236], Loss: 3.3306, Perplexity: 27.9560
Epoch [1/3], Step [912/3236], Loss: 2.6275, Perplexity: 13.8387
Epoch [1/3], Step [913/3236], Loss: 2.7169, Perplexity: 15.1327
Epoch [1/3], Step [914/3236], Loss: 3.1892, Perplexity: 24.2683
Epoch [1/3], Step [915/3236], Loss: 2.7742, Perplexity: 16.0262
Epoch [1/3], Step [916/3236], Loss: 2.6416, Perplexity: 14.0358
Epoch [1/3], Step [917/3236], Loss: 2.6172, Perplexity: 13.6976
Epoch [1/3], Step [918/3236], Loss: 2.6564, Perplexity: 14.2451
Epoch [1/3], Step [919/3236], Loss: 2.6846, Perplexity: 14.6519
Epoch [1/3], Step [920/3236], Loss: 2.7085, Perplexity: 15.0074
Epoch [1/3], Step [921/3236], Loss: 2.7146, Perplexity: 15.0982
Epoch [1/3], Step [922/3236], Loss: 2.6527, Perplexity: 14.1926
Epoch [1/3], Step [923/3236], Loss: 2.6789, Perplexity: 14.5697
Epoch [1/3], Step [924/3236], Loss: 2.6200, Perplexity: 13.7360
Epoch [1/3], Step [925/3236], Loss: 2.7137, Perplexity: 15.0851
Epoch [1/3], Step [926/3236], Loss: 2.5821, Perplexity: 13.2247
Epoch [1/3], Step [927/3236], Loss: 2.5686, Perplexity: 13.0480
Epoch [1/3], Step [928/3236], Loss: 2.7628, Perplexity: 15.8449
Epoch [1/3], Step [929/3236], Loss: 2.7015, Perplexity: 14.9026
Epoch [1/3], Step [930/3236], Loss: 2.7209, Perplexity: 15.1935
Epoch [1/3], Step [931/3236], Loss: 2.9222, Perplexity: 18.5822
Epoch [1/3], Step [932/3236], Loss: 2.7667, Perplexity: 15.9068
Epoch [1/3], Step [933/3236], Loss: 2.8673, Perplexity: 17.5900
Epoch [1/3], Step [934/3236], Loss: 2.6206, Perplexity: 13.7445
Epoch [1/3], Step [935/3236], Loss: 2.7975, Perplexity: 16.4044
Epoch [1/3], Step [936/3236], Loss: 2.7109, Perplexity: 15.0430
Epoch [1/3], Step [937/3236], Loss: 2.7559, Perplexity: 15.7351
Epoch [1/3], Step [938/3236], Loss: 2.6381, Perplexity: 13.9865
Epoch [1/3], Step [939/3236], Loss: 2.6749, Perplexity: 14.5109
Epoch [1/3], Step [940/3236], Loss: 3.0016, Perplexity: 20.1177
Epoch [1/3], Step [941/3236], Loss: 3.1411, Perplexity: 23.1283
Epoch [1/3], Step [942/3236], Loss: 2.5172, Perplexity: 12.3933
Epoch [1/3], Step [943/3236], Loss: 2.5704, Perplexity: 13.0707
Epoch [1/3], Step [944/3236], Loss: 2.5180, Perplexity: 12.4037
Epoch [1/3], Step [945/3236], Loss: 2.8677, Perplexity: 17.5970
Epoch [1/3], Step [946/3236], Loss: 2.5228, Perplexity: 12.4635
Epoch [1/3], Step [947/3236], Loss: 2.7071, Perplexity: 14.9863
Epoch [1/3], Step [948/3236], Loss: 2.5997, Perplexity: 13.4597
Epoch [1/3], Step [949/3236], Loss: 3.0764, Perplexity: 21.6796
Epoch [1/3], Step [950/3236], Loss: 3.0139, Perplexity: 20.3667
Epoch [1/3], Step [951/3236], Loss: 2.6863, Perplexity: 14.6773
Epoch [1/3], Step [952/3236], Loss: 2.6712, Perplexity: 14.4576
Epoch [1/3], Step [953/3236], Loss: 2.6871, Perplexity: 14.6890
Epoch [1/3], Step [954/3236], Loss: 2.7412, Perplexity: 15.5062
Epoch [1/3], Step [955/3236], Loss: 2.5582, Perplexity: 12.9128
Epoch [1/3], Step [956/3236], Loss: 2.5494, Perplexity: 12.7990
Epoch [1/3], Step [957/3236], Loss: 2.5734, Perplexity: 13.1102
Epoch [1/3], Step [958/3236], Loss: 2.6574, Perplexity: 14.2591
Epoch [1/3], Step [959/3236], Loss: 2.6157, Perplexity: 13.6767
Epoch [1/3], Step [960/3236], Loss: 2.7756, Perplexity: 16.0475
Epoch [1/3], Step [961/3236], Loss: 3.0325, Perplexity: 20.7487
Epoch [1/3], Step [962/3236], Loss: 2.4517, Perplexity: 11.6085
Epoch [1/3], Step [963/3236], Loss: 2.6520, Perplexity: 14.1821
Epoch [1/3], Step [964/3236], Loss: 3.0523, Perplexity: 21.1640
Epoch [1/3], Step [965/3236], Loss: 2.8789, Perplexity: 17.7952
Epoch [1/3], Step [966/3236], Loss: 2.6787, Perplexity: 14.5655
Epoch [1/3], Step [967/3236], Loss: 2.7162, Perplexity: 15.1231
Epoch [1/3], Step [968/3236], Loss: 2.8270, Perplexity: 16.8940
Epoch [1/3], Step [969/3236], Loss: 2.5409, Perplexity: 12.6906
Epoch [1/3], Step [970/3236], Loss: 2.7925, Perplexity: 16.3223
Epoch [1/3], Step [971/3236], Loss: 2.6397, Perplexity: 14.0094
Epoch [1/3], Step [972/3236], Loss: 2.8647, Perplexity: 17.5434
Epoch [1/3], Step [973/3236], Loss: 2.6555, Perplexity: 14.2326
Epoch [1/3], Step [974/3236], Loss: 2.6164, Perplexity: 13.6867
Epoch [1/3], Step [975/3236], Loss: 2.7267, Perplexity: 15.2829
Epoch [1/3], Step [976/3236], Loss: 3.2821, Perplexity: 26.6313
Epoch [1/3], Step [977/3236], Loss: 2.7520, Perplexity: 15.6734
Epoch [1/3], Step [978/3236], Loss: 2.5930, Perplexity: 13.3698
Epoch [1/3], Step [979/3236], Loss: 2.6171, Perplexity: 13.6954
Epoch [1/3], Step [980/3236], Loss: 2.5203, Perplexity: 12.4323
Epoch [1/3], Step [981/3236], Loss: 2.6879, Perplexity: 14.7014
Epoch [1/3], Step [982/3236], Loss: 2.8319, Perplexity: 16.9780
Epoch [1/3], Step [983/3236], Loss: 2.6192, Perplexity: 13.7248
Epoch [1/3], Step [984/3236], Loss: 2.5663, Perplexity: 13.0176
Epoch [1/3], Step [985/3236], Loss: 2.7589, Perplexity: 15.7819
Epoch [1/3], Step [986/3236], Loss: 2.5311, Perplexity: 12.5676
Epoch [1/3], Step [987/3236], Loss: 2.6380, Perplexity: 13.9852
Epoch [1/3], Step [988/3236], Loss: 2.4879, Perplexity: 12.0360
Epoch [1/3], Step [989/3236], Loss: 3.2513, Perplexity: 25.8249
Epoch [1/3], Step [990/3236], Loss: 2.6970, Perplexity: 14.8354
Epoch [1/3], Step [991/3236], Loss: 2.4645, Perplexity: 11.7578
Epoch [1/3], Step [992/3236], Loss: 2.6406, Perplexity: 14.0217
Epoch [1/3], Step [993/3236], Loss: 2.6746, Perplexity: 14.5058
Epoch [1/3], Step [994/3236], Loss: 2.7146, Perplexity: 15.0992
Epoch [1/3], Step [995/3236], Loss: 2.8171, Perplexity: 16.7286
Epoch [1/3], Step [996/3236], Loss: 2.8819, Perplexity: 17.8489
Epoch [1/3], Step [997/3236], Loss: 3.2096, Perplexity: 24.7704
Epoch [1/3], Step [998/3236], Loss: 2.9181, Perplexity: 18.5064
Epoch [1/3], Step [999/3236], Loss: 2.8773, Perplexity: 17.7664
Epoch [1/3], Step [1000/3236], Loss: 2.6700, Perplexity: 14.4396