Skip to content

Commit 083396e

Browse files
Add hardik_top5_run submission package
1 parent 7427de2 commit 083396e

5 files changed

Lines changed: 2166 additions & 0 deletions

File tree

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Hardik Top5 Run
2+
3+
Draft submission package for the OpenAI Parameter Golf `track_10min_16mb` track.
4+
5+
This folder contains a self-contained copy of the current best local training script, prepared in the expected `records/...` format for pull request submission. The copied `train_gpt.py` has been patched so its default dataset and tokenizer paths resolve from the repository root even when executed from inside this records folder.
6+
7+
## Status
8+
9+
- Submission structure: ready
10+
- Script packaging: ready
11+
- Relative-path cleanup: ready
12+
- Reproducibility notes: ready
13+
- Final leaderboard claim: pending a fresh logged run for this exact script
14+
15+
## Architecture Summary
16+
17+
The current model is a Parameter Golf "podium build" based on the LeakyReLU^2 + TTT + Parallel Muon family, with the following default stack:
18+
19+
- Vocabulary: SentencePiece 8192-token model
20+
- Backbone: 11 transformer layers, 512 hidden dim, 8 attention heads, 4 KV heads
21+
- MLP: 3.0x expansion with LeakyReLU(0.5)^2 activation
22+
- Residual layout: parallel residual attention + MLP path
23+
- Recurrence: block recurrence enabled by default on layers `4,5` with `RECURRENCE_LOOPS=3`
24+
- Attention extras: QK gain, partial RoPE (`ROPE_DIMS=16`), XSA on the last 4 layers
25+
- Token enrichments: Bigram hash embedding and shared value embeddings
26+
- Optimizers: AdamW for token/scalar groups + custom Parallel Muon for matrix banks
27+
- Averaging: EMA by default, optional SWA/LAWA
28+
- Compression path: mixed int6 / int8 quantization with lzma export
29+
- Eval extras: sliding-window validation and optional legal score-first TTT
30+
31+
## Default Model Size
32+
33+
- Parameter count (default config): `31,581,276`
34+
- Code bytes for this packaged `train_gpt.py`: `97,310`
35+
36+
Note: the contest artifact limit is code bytes plus compressed model bytes. This folder does not yet include a verified compressed artifact size for the exact packaged script because a fresh training/eval run has not been logged for this copy yet.
37+
38+
## Innovations Used
39+
40+
1. SP8192 tokenizer defaults
41+
2. Depth recurrence through repeated middle blocks
42+
3. Parallel residual transformer blocks
43+
4. Learned QK gain scaling
44+
5. Parallel Muon / MuonEq-R optimizer path
45+
6. Hessian SDClip for GPTQ-style clipping
46+
7. GPTQ-style embedding quantization
47+
8. Optional legal score-first TTT with SGD or Adam
48+
49+
## Hardware Used
50+
51+
- Packaging/validation of this submission folder: local Windows machine
52+
- Target contest hardware: 8x H100 80GB SXM
53+
- Final authoritative leaderboard run hardware for this exact script: `TBD`
54+
55+
## Training Time
56+
57+
- Default script wallclock cap: `600` seconds (`MAX_WALLCLOCK_SECONDS=600`)
58+
- Fresh measured 8xH100 runtime for this packaged copy: `TBD`
59+
60+
## Achieved Score
61+
62+
- Fresh logged `val_bpb` for this packaged copy: `TBD`
63+
- Fresh logged `val_loss` for this packaged copy: `TBD`
64+
- Verified total submission bytes for this packaged copy: `TBD`
65+
66+
Do not claim a leaderboard score from this folder until `train.log` and `submission.json` are updated from a real run of the included script.
67+
68+
## Reproducibility
69+
70+
The script is designed to be configurable through environment variables and avoids absolute machine-specific paths.
71+
72+
- Seed default: `1337`
73+
- Dataset default: resolved from repo root as `data/datasets/fineweb10B_sp8192`
74+
- Tokenizer default: resolved from repo root as `data/tokenizers/fineweb_8192_bpe.model`
75+
- Optional acceleration imports (`flash_attn_interface`, `zstandard`) have safe fallbacks
76+
- No network calls are made during training or evaluation
77+
78+
## Run From This Folder
79+
80+
From repository root:
81+
82+
```bash
83+
cd records/track_10min_16mb/hardik_top5_run
84+
torchrun --standalone --nproc_per_node=8 train_gpt.py
85+
```
86+
87+
Example with explicit paths:
88+
89+
```bash
90+
cd records/track_10min_16mb/hardik_top5_run
91+
DATA_PATH=../../../data/datasets/fineweb10B_sp8192 \
92+
TOKENIZER_PATH=../../../data/tokenizers/fineweb_8192_bpe.model \
93+
RUN_ID=hardik_top5_run_seed1337 \
94+
SEED=1337 \
95+
torchrun --standalone --nproc_per_node=8 train_gpt.py
96+
```
97+
98+
## What To Update Before PR
99+
100+
1. Run the packaged script on the intended hardware.
101+
2. Replace the placeholder `train.log` with the real training log.
102+
3. Update `submission.json` with real `val_loss`, `val_bpb`, and `bytes_total`.
103+
4. If submitting as a new record, include enough independent seeds to satisfy the repo significance rule.
104+
105+
## Notes
106+
107+
- This folder intentionally mirrors the structure of existing successful records in `records/track_10min_16mb/`.
108+
- The root-level `train_gpt.py` remains your active development file; this folder is the frozen submission copy for PR review.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
numpy
2+
torch
3+
sentencepiece
4+
tqdm
5+
huggingface-hub
6+
datasets
7+
tiktoken
8+
setuptools
9+
typing-extensions==4.15.0
10+
11+
# Optional accelerators; the packaged script has fallbacks if these are absent.
12+
zstandard
13+
flash-attn
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"author": "Hardik Bhalekar",
3+
"github_id": "hardik-bhalekar",
4+
"name": "Hardik Top5 Run",
5+
"blurb": "Draft packaging of the current local SP8192 + recurrence + parallel residual + Muon / TTT stack. Update val_loss, val_bpb, bytes_total, and wording after running the packaged script and attaching real logs.",
6+
"date": "2026-04-26T00:00:00Z",
7+
"val_loss": null,
8+
"val_bpb": null,
9+
"bytes_total": null,
10+
"bytes_code": 97310
11+
}
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# PLACEHOLDER TRAIN LOG
2+
# Replace this file with the real log produced by:
3+
# cd records/track_10min_16mb/hardik_top5_run
4+
# torchrun --standalone --nproc_per_node=8 train_gpt.py
5+
#
6+
# Expected items to preserve in the final log:
7+
# - hardware banner / nvidia-smi output
8+
# - model_params line
9+
# - training step logs
10+
# - final quantized artifact byte count
11+
# - final val_loss / val_bpb lines
12+
# - any sliding-window or legal_ttt evaluation lines you intend to claim
13+
#
14+
# Suggested final snippet to verify:
15+
# final_int6_roundtrip val_loss:...
16+
# final_int6_roundtrip_exact val_loss:... val_bpb:...
17+
# final_int6_sliding_window val_loss:... val_bpb:...
18+
# legal_ttt val_loss:... val_bpb:... # only if claimed
19+
#
20+
# Fresh run metadata to fill:
21+
# RUN_ID=
22+
# SEED=
23+
# GPU=
24+
# WALLCLOCK_SECONDS=
25+
# BYTES_TOTAL=
26+
# VAL_LOSS=
27+
# VAL_BPB=

0 commit comments

Comments
 (0)