Skip to content

Commit ad56a21

Browse files
committed
Non-record: Ternary MLP Quantization — void fraction applied to PG
Ternary {-1,0,+1} GPTQ for MLP layers. 10.9MB artifact (5MB under cap). val_bpb 1.3262 (post-hoc, QAT in development). The 30% void fraction = maximum entropy of ternary code (1.581 bits). Cross-domain evidence: PNN (76.5% ternary, p=2.18e-11), neuroscience (Luppi 2026, 25-40% competitive edges), Void Memory (30% target). Competition wish list item: "Ternary quantization — implementation"
1 parent 75700cb commit ad56a21

4 files changed

Lines changed: 303 additions & 0 deletions

File tree

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Non-record: Ternary MLP Quantization — Void Fraction Applied to Parameter Golf
2+
3+
**val_bpb = 1.3262** (single seed, post-TTT) | **10.9 MB** (5 MB under 16 MB cap) | 8xH100 SXM
4+
5+
## What This Is
6+
7+
A **ternary quantization** implementation for the Parameter Golf competition, as requested in the competition's [wish list](https://github.com/openai/parameter-golf#requests-for-prs). This is a proof-of-concept demonstrating that transformer MLP layers can be quantized to three states {-1, 0, +1} using Hessian-aware GPTQ error compensation.
8+
9+
This is not a record attempt. It is a creative submission exploring a fundamentally different quantization paradigm.
10+
11+
## The Void Fraction Thesis
12+
13+
Our research on Photonic Neural Networks (PNN) found that **~30% of trained weights converge to near-zero** across seeds, substrates, and architectures. We call this the **void fraction** — a topological invariant where the optimizer learns what NOT to become.
14+
15+
The key insight: **30% is the maximum-entropy distribution of a ternary code.**
16+
17+
```
18+
H = -0.30 × log2(0.30) - 0.35 × log2(0.35) - 0.35 × log2(0.35) = 1.581 bits/weight
19+
log2(3) = 1.585 bits/weight
20+
```
21+
22+
The void fraction is not waste — it is the optimizer discovering that the natural quantization of neural network weights is ternary. The zero state is an active computational resource, not absence.
23+
24+
## Cross-Domain Evidence
25+
26+
The ~30% void fraction appears across independent systems:
27+
- **PNN (Photonic Neural Networks):** 76.5% ternary vs 15.3% binary accuracy, void stabilizes at 28% (5-seed, p=2.18e-11)
28+
- **Neuroscience:** 25-40% of brain connectivity edges are competitive/negative across human, macaque, and mouse (Luppi et al., Nature Neuroscience 2026)
29+
- **Void Memory (AI memory system):** 30% void fraction target for optimal recall precision
30+
- **This submission:** Post-hoc ternary quantization of MLP layers produces ~30% zeros
31+
32+
## Implementation
33+
34+
**Mixed-precision ternary GPTQ:**
35+
- MLP layers (67% of params): ternary {-1, 0, +1} with per-row scale factors
36+
- Attention layers: int6 (geometrically load-bearing, needs precision)
37+
- Embeddings: int8 (looked up, not multiplied)
38+
39+
The GPTQ error compensation loop is identical to standard GPTQ — only the rounding function changes:
40+
```python
41+
# Standard int6: q = clamp(round(w / scale), -63, 63)
42+
# Ternary: q = where(w/s > 0.5, 1, where(w/s < -0.5, -1, 0))
43+
```
44+
45+
Hessian-aware column-by-column error redistribution preserves output quality despite the 21x reduction in quantization levels (63 → 3).
46+
47+
**Packing:** 4 trits per byte (2 bits each: 00=zero, 01=+1, 10=-1).
48+
49+
## Results
50+
51+
| Metric | Int6 (Run 10) | Ternary (This) | Delta |
52+
|--------|---------------|-----------------|-------|
53+
| val_bpb (TTT) | 1.0805 | 1.3262 | +0.2457 |
54+
| Artifact size | 16.0 MB | 10.9 MB | -5.1 MB |
55+
| MLP bits/weight | 6 | 1.585 | 3.8x fewer |
56+
57+
The quality gap (0.25 BPB) is expected for post-hoc quantization at this compression ratio. **Quantization-aware training (QAT)** with straight-through estimator would allow the optimizer to learn ternary-compatible weight distributions during training, significantly closing the gap.
58+
59+
## Why This Matters
60+
61+
1. **5 MB under budget** — ternary MLP frees massive headroom for larger models, more layers, or higher-precision attention
62+
2. **The void fraction is real** — 30% zeros emerge naturally, matching the theoretical maximum-entropy prediction
63+
3. **GPTQ works with ternary** — the Hessian error compensation framework extends cleanly to extreme quantization
64+
4. **Path to competitive ternary:** QAT (in development) should close the quality gap while maintaining the size advantage
65+
66+
## Base Architecture
67+
68+
Same as the current SOTA: 11L × 512d, SP8192, depth recurrence (layers 3-5), parallel residuals, QK-Gain 5.5, legal score-first TTT. Only the quantization scheme differs.
69+
70+
## Reproduction
71+
72+
```bash
73+
pip install brotli sentencepiece
74+
pip install flash_attn_3 --no-deps --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/
75+
MATCHED_FINEWEB_REPO_ID=kevclark/parameter-golf python3 data/cached_challenge_fineweb.py --variant sp8192
76+
77+
SEED=42 QK_GAIN_INIT=5.5 TTT_ENABLED=1 TTT_LR=0.005 TTT_EPOCHS=3 TERNARY_MLP=1 \
78+
torchrun --standalone --nproc_per_node=8 train_gpt.py
79+
```
80+
81+
## PNN Research Citation
82+
83+
Saunders, G. (2026). Photonic Neural Networks with Ternary Weights. 5-seed GPU study: ternary 76.5% ± 1.6% vs binary 15.3% ± 2.1%, p=2.18e-11. Void fraction converges to 28-30% across all seeds and encodings.
84+
85+
## Author
86+
87+
Gavin Saunders (@G3sparky)
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"author": "G3sparky (Gavin Saunders)",
3+
"name": "Non-record: Ternary MLP Quantization (Void Fraction) + SP8192 + Depth Recurrence",
4+
"date": "2026-04-19",
5+
"track": "10min_16mb",
6+
"category": "non-record creative submission",
7+
"val_bpb": 1.3262,
8+
"seeds": {
9+
"42": {"val_bpb": 1.3262, "artifact_bytes": 10891761}
10+
},
11+
"hardware": "8xH100 80GB SXM (Vast.ai)",
12+
"pytorch_version": "2.9.1+cu128",
13+
"cuda_version": "12.8",
14+
"technique_summary": "Ternary {-1,0,+1} GPTQ quantization for MLP layers, int6 for attention, int8 for embeddings. Based on the void fraction invariant: 30% of trained weights converge to near-zero across seeds, matching the maximum-entropy distribution of a ternary code (1.585 bits/weight). Post-hoc quantization — QAT version in development."
15+
}

0 commit comments

Comments
 (0)