Skip to content

Commit 1fe3911

Browse files
committed
docs: add scylla sub105 design
1 parent 2e16f99 commit 1fe3911

1 file changed

Lines changed: 97 additions & 0 deletions

File tree

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Scylla Sub-1.05 Design
2+
3+
## Objective
4+
5+
Reach a defensible sub-`1.05` `val_bpb` submission by reproducing the measured
6+
Scylla frontier before adding any new ideas. Local March records top out at
7+
`1.12278022`; the SP8192 frontier in PR #1797 reaches `1.06157`; the only
8+
measured sub-`1.05` lane found in the current repo/PR landscape is PR #1813 at
9+
`0.94166052` over three seeds.
10+
11+
## Decision
12+
13+
Use the PR #1813 Scylla lane as the primary path. Keep the current SP1024/SP8192
14+
scripts intact and create an isolated Scylla reproduction lane with explicit
15+
provenance, asset checks, artifact checks, and launcher scripts. Do not mutate
16+
`train_gpt_kl.py` until the Scylla lane has reproduced the reference behavior.
17+
18+
## Reference Configuration
19+
20+
The target record is:
21+
22+
- Record path: `records/track_10min_16mb/2026-04-25_Scylla_QK525_DepthRecurrence_Experiment`
23+
- Score: `0.94166052` 3-seed mean, std `0.00066536`
24+
- Seeds: `1337`, `42`, `2025`
25+
- Worst artifact: `15,868,157` bytes, leaving only `131,843` bytes under the
26+
decimal `16,000,000` byte cap.
27+
- Architecture: 11 physical layers, 512 dim, 8 query heads, 4 KV heads, Scylla
28+
vocab size `998`, tied embeddings, train seq len `2048`, XSA on all layers.
29+
- Core knobs: `QK_GAIN_INIT=5.25`, `NUM_LOOPS=2`, `LOOP_START=3`,
30+
`LOOP_END=5`, `ENABLE_LOOPING_AT=0.35`, `BIGRAM_VOCAB_SIZE=2816`,
31+
`BIGRAM_DIM=40`, `USE_GPTQ=1`, `GPTQ_RESERVE_MS=9000`, `TTT_ENABLED=0`.
32+
- Compression: full GPTQ int6, `torch.save` quant payload, `lzma.compress`
33+
preset 6.
34+
35+
## Why This Lane
36+
37+
Compression-only work cannot close the gap from `1.1228` to sub-`1.05`. The
38+
useful compression findings are mostly negative: byte shuffle makes real
39+
artifacts larger, FP16 last-layer escape hatches exceed the cap, and INT4 hurts
40+
quality too much. PR #1813 crosses the target by changing the tokenizer/data
41+
regime and architecture schedule while still fitting under the size cap.
42+
43+
## Architecture
44+
45+
Create a separate lane with four responsibilities:
46+
47+
1. **Provenance capture**: copy the PR #1813 `train_gpt.py`, logs, and
48+
`submission.json` into `frontier_sources/scylla_pr1813/` for local review.
49+
2. **Asset validation**: add a script that verifies the Scylla tokenizer and
50+
dataset assets exist before any paid GPU launch.
51+
3. **Run launch**: add a shell launcher that runs the exact reference config
52+
for one or all canonical seeds.
53+
4. **Artifact validation**: add a checker that computes code bytes + model bytes
54+
against `16,000,000` and fails below a configurable safety margin.
55+
56+
## Data Flow
57+
58+
The launcher passes explicit env vars into the copied Scylla `train_gpt.py`.
59+
Training reads only training shards during optimization and GPTQ calibration.
60+
Validation remains disabled during training via `VAL_LOSS_EVERY=0`; scoring
61+
runs only after the wallclock stop. The artifact checker runs after training and
62+
validates the generated compressed model plus script size.
63+
64+
## Compliance Guardrails
65+
66+
- Keep `TTT_ENABLED=0` for the first reproduction.
67+
- Reject cache/PPM/SLOT/ETLB-style additions in the Scylla preflight.
68+
- Keep Scylla recurrence + GPTQ allowed only for the exact proven loop schedule.
69+
- Use decimal bytes, not MiB.
70+
- Treat the `131,843` byte PR #1813 margin as fragile; any code or serializer
71+
growth must be offset by measured artifact savings.
72+
73+
## Testing Strategy
74+
75+
Local tests should run without GPU and without the Scylla dataset:
76+
77+
- Python compile checks for new scripts.
78+
- Asset-check tests using temporary fake files.
79+
- Artifact-check tests using temporary fake model/code files.
80+
- Launcher dry-run or env-render check that confirms exact PR #1813 defaults.
81+
82+
GPU validation is staged:
83+
84+
1. Run one seed exactly, no ablations.
85+
2. Compare steps, artifact bytes, and final BPB to the PR #1813 logs.
86+
3. Run all three seeds only after the one-seed reproduction is within expected
87+
tolerance.
88+
4. Only then test narrow ablations: `BIGRAM_DIM=36/40/44`,
89+
`QK_GAIN_INIT=5.0/5.25`, loop `3-5` vs `4-5`, and LZMA preset `6/9`.
90+
91+
## Rejected Approaches
92+
93+
- **Compression-only local cleanup**: useful for hygiene, but not enough BPB.
94+
- **SP8192 as primary**: lower compliance risk, but the measured frontier is
95+
around `1.06157`, still above target.
96+
- **PPM/cache lane**: potentially strong but compliance-risky; keep it as a
97+
separate research lane, not the primary submission path.

0 commit comments

Comments
 (0)