parameter-golf/records/track_10min_16mb/2026-03-17_NaiveBaseline/README.md at main · henrycashe26/parameter-golf

This record captures the Simple Baseline.

Trainer changes in this snapshot:

current repository train_gpt.py snapshot copied into the record folder
published fineweb10B_sp1024 dataset and tokenizer loaded from the new Hugging Face export
10-minute wallclock cap on 8xH100
periodic validation every 200 steps on the full fineweb_val_* split

Configuration:

Layout: VOCAB_SIZE=1024 NUM_LAYERS=9 MODEL_DIM=512 NUM_HEADS=8 NUM_KV_HEADS=4 MLP_MULT=2
Tied output/input embeddings: TIE_EMBEDDINGS=1
Tied embedding LR: TIED_EMBED_LR=0.05
Batching: TRAIN_BATCH_TOKENS=524288 TRAIN_SEQ_LEN=1024

Command (track-relevant params):

NCCL_IB_DISABLE=1 \
RUN_ID=hf_verify_sp1024_8gpu \
DATA_PATH=/root/code/parameter-golf/data/datasets/fineweb10B_sp1024 \
TOKENIZER_PATH=/root/code/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \
VOCAB_SIZE=1024 \
MAX_WALLCLOCK_SECONDS=600 \
TRAIN_LOG_EVERY=50 \
VAL_LOSS_EVERY=200 \
torchrun --standalone --nproc_per_node=8 /root/code/parameter-golf/train_gpt.py

Key metrics (from train.log):

Timed training stopped at 13780/20000 steps due to the wallclock cap.
Pre-quant eval at stop: val_loss:2.0606, val_bpb:1.2172
Post-quant roundtrip eval: val_loss:2.0727, val_bpb:1.2244
Exact printed metric: final_int8_zlib_roundtrip_exact val_bpb:1.22436570
Train time: 600038ms (step_avg:43.54ms)
Peak memory: 10184 MiB allocated, 10200 MiB reserved
Serialized model int8+zlib: 15815847 bytes
Code size: 47642 bytes
Total submission size int8+zlib: 15863489 bytes

Training volume:

Global batch: 524288 tokens/step
Total train tokens seen: 7224688640

Included files:

train_gpt.py (code snapshot used for the run)
train.log (exact remote training log)
submission.json (leaderboard metadata)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls