Skip to content

Latest commit

 

History

History
65 lines (57 loc) · 2.93 KB

File metadata and controls

65 lines (57 loc) · 2.93 KB

This record submission is called Long Context Seq2048 v2.

Configuration:

  • Layout: VOCAB_SIZE=1024 NUM_LAYERS=9 MODEL_DIM=512 NUM_HEADS=8 NUM_KV_HEADS=4 MLP_MULT=2
  • Tied output/input embeddings: TIE_EMBEDDINGS=1
  • Sequence length: TRAIN_SEQ_LEN=2048
  • Batching: TRAIN_BATCH_TOKENS=524288
  • Learning rates: TIED_EMBED_LR=0.04 MATRIX_LR=0.032 SCALAR_LR=0.032

Command:

NCCL_IB_DISABLE=1 \
RUN_ID=seq2048_sxm28_full_20260319a \
DATA_PATH=./data/datasets/fineweb10B_sp1024 \
TOKENIZER_PATH=./data/tokenizers/fineweb_1024_bpe.model \
VOCAB_SIZE=1024 \
MAX_WALLCLOCK_SECONDS=600 \
torchrun --standalone --nproc_per_node=8 \
  records/track_10min_16mb/2026-03-18_LongContextSeq2048/train_gpt.py

Verification environment:

  • 8x H100 80GB HBM3
  • all-to-all NV18 topology
  • torch 2.8.0+cu128

Key metrics (from train.log in this folder, rerun on the target SXM-class box):

  • Timed training stopped at 11564/20000 steps due to the wallclock cap.
  • Pre-quant eval at stop: val_loss:2.0269, val_bpb:1.2005
  • Post-quant roundtrip eval: val_loss:2.0359, val_bpb:1.2058
  • Exact printed metric: final_int8_zlib_roundtrip_exact val_bpb:1.20576485
  • Train time: 600038ms (step_avg:51.89ms)
  • Peak memory: 10247 MiB allocated, 10488 MiB reserved
  • Serialized model int8+zlib: 15819554 bytes
  • Code size for this standalone record script: 47716 bytes
  • Total submission size int8+zlib: 15867270 bytes

Additional full-run reproducibility logs included in this folder:

  • train.log: canonical SXM rerun, SEED=1337, val_bpb=1.20576485
  • train_seed1338.log: SXM rerun, SEED=1338, val_bpb=1.20617460
  • train_seed1339.log: SXM rerun, SEED=1339, val_bpb=1.20715923

Record-track significance note:

  • The public repo state for this submission has Naive Baseline at 1.2243657.
  • The challenge therefore requires beating 1.2193657 to claim a new record.
  • All three included SXM full runs clear that threshold:
    • SEED=1337: 1.20576485
    • SEED=1338: 1.20617460
    • SEED=1339: 1.20715923
  • Sample mean across the three runs: 1.20636623
  • Sample standard deviation: 0.00071667
  • One-sided one-sample t-test against 1.2193657: t=31.42 with df=2, which gives p=0.00051

Why this folder is standalone:

  • train_gpt.py compiles from inside this record folder and was used for the canonical rerun whose output is saved as train.log.
  • No extra Python source files are required for the training path.
  • The only inputs expected at runtime are the cached dataset and tokenizer paths described in the main repo README.

Included files:

  • train_gpt.py (standalone winning recipe with defaults baked in)
  • README.md (this file)
  • submission.json (leaderboard metadata)
  • train.log (canonical full log from the standalone record script)
  • train_seed1338.log, train_seed1339.log (extra full reruns for reproducibility)
  • logs/seq2048_sxm28_* (raw per-run tee output and trainer text logs from the SXM verification box)