This record submission is called Long Context Seq2048 v2.
Configuration:
- Layout:
VOCAB_SIZE=1024 NUM_LAYERS=9 MODEL_DIM=512 NUM_HEADS=8 NUM_KV_HEADS=4 MLP_MULT=2 - Tied output/input embeddings:
TIE_EMBEDDINGS=1 - Sequence length:
TRAIN_SEQ_LEN=2048 - Batching:
TRAIN_BATCH_TOKENS=524288 - Learning rates:
TIED_EMBED_LR=0.04 MATRIX_LR=0.032 SCALAR_LR=0.032
Command:
NCCL_IB_DISABLE=1 \
RUN_ID=seq2048_sxm28_full_20260319a \
DATA_PATH=./data/datasets/fineweb10B_sp1024 \
TOKENIZER_PATH=./data/tokenizers/fineweb_1024_bpe.model \
VOCAB_SIZE=1024 \
MAX_WALLCLOCK_SECONDS=600 \
torchrun --standalone --nproc_per_node=8 \
records/track_10min_16mb/2026-03-18_LongContextSeq2048/train_gpt.pyVerification environment:
8x H100 80GB HBM3- all-to-all
NV18topology torch 2.8.0+cu128
Key metrics (from train.log in this folder, rerun on the target SXM-class box):
- Timed training stopped at
11564/20000steps due to the wallclock cap. - Pre-quant eval at stop:
val_loss:2.0269,val_bpb:1.2005 - Post-quant roundtrip eval:
val_loss:2.0359,val_bpb:1.2058 - Exact printed metric:
final_int8_zlib_roundtrip_exact val_bpb:1.20576485 - Train time:
600038ms(step_avg:51.89ms) - Peak memory:
10247 MiB allocated,10488 MiB reserved - Serialized model int8+zlib:
15819554 bytes - Code size for this standalone record script:
47716 bytes - Total submission size int8+zlib:
15867270 bytes
Additional full-run reproducibility logs included in this folder:
train.log: canonical SXM rerun,SEED=1337,val_bpb=1.20576485train_seed1338.log: SXM rerun,SEED=1338,val_bpb=1.20617460train_seed1339.log: SXM rerun,SEED=1339,val_bpb=1.20715923
Record-track significance note:
- The public repo state for this submission has
Naive Baselineat1.2243657. - The challenge therefore requires beating
1.2193657to claim a new record. - All three included SXM full runs clear that threshold:
SEED=1337:1.20576485SEED=1338:1.20617460SEED=1339:1.20715923
- Sample mean across the three runs:
1.20636623 - Sample standard deviation:
0.00071667 - One-sided one-sample t-test against
1.2193657:t=31.42withdf=2, which givesp=0.00051
Why this folder is standalone:
train_gpt.pycompiles from inside this record folder and was used for the canonical rerun whose output is saved astrain.log.- No extra Python source files are required for the training path.
- The only inputs expected at runtime are the cached dataset and tokenizer paths described in the main repo README.
Included files:
train_gpt.py(standalone winning recipe with defaults baked in)README.md(this file)submission.json(leaderboard metadata)train.log(canonical full log from the standalone record script)train_seed1338.log,train_seed1339.log(extra full reruns for reproducibility)logs/seq2048_sxm28_*(raw per-run tee output and trainer text logs from the SXM verification box)