|
1 | | -# AGENTS.md — AI Agent Technical Context |
| 1 | +# AGENTS.md |
2 | 2 |
|
3 | | -## Project Overview |
| 3 | +## Project Identity |
4 | 4 |
|
5 | | -**attnres** is the first Rust implementation of Attention Residuals (MoonshotAI/Kimi paper) using the [burn](https://github.com/tracel-ai/burn) deep learning framework. It provides a drop-in replacement for standard residual connections in Transformers. |
| 5 | +`attnres` is a Rust library that implements Attention Residuals for burn-based |
| 6 | +Transformer experiments, plus examples, benchmarks, and a web demo. |
6 | 7 |
|
7 | | -## Tech Stack |
| 8 | +## Current State |
8 | 9 |
|
9 | | -| Component | Technology | Version | |
10 | | -|-------------|-----------------|----------| |
11 | | -| Language | Rust | 2021 edition (1.80+) | |
12 | | -| ML Framework| burn | 0.20 | |
13 | | -| Test Backend| NdArray | (CPU, deterministic) | |
14 | | -| Testing | cargo test + proptest + criterion | — | |
15 | | -| Linting | clippy + rustfmt | — | |
16 | | -| CI | GitHub Actions | test, clippy, fmt, build-examples | |
| 10 | +- Status: alpha as of March 16, 2026. |
| 11 | +- Suitable for: research, examples, local experimentation, integration work on |
| 12 | + trusted inputs. |
| 13 | +- Not yet suitable for: production inference services, validated GPU claims, |
| 14 | + PyTorch checkpoint interchange, or a stable 1.0 API promise. |
| 15 | +- Important gap: there is no dedicated `spec.md` in this checkout. Use |
| 16 | + [ARCHITECTURE.md](ARCHITECTURE.md), README, module docs, and tests as the |
| 17 | + current source of truth. |
17 | 18 |
|
18 | | -## Project Structure |
| 19 | +## Verified Commands |
19 | 20 |
|
20 | | -``` |
21 | | -src/ |
22 | | -├── lib.rs # Public API re-exports + module declarations |
23 | | -├── config.rs # AttnResConfig — validated builder pattern (JSON save/load) |
24 | | -├── attn_res_op.rs # Core AttnRes operation (depth-wise softmax attention) |
25 | | -├── block_state.rs # BlockState — cumulative block representation tracking |
26 | | -├── layer.rs # AttnResLayer — transformer layer with dual AttnRes |
27 | | -├── model.rs # AttnResTransformer — full model with standard + two-phase forward |
28 | | -├── rms_norm.rs # RMSNorm implementation |
29 | | -├── serialization.rs # Model weight save/load (NamedMpk, binary, compact formats) |
30 | | -├── two_phase.rs # Two-phase inference primitives (phase1_batched, online_softmax_merge) |
31 | | -├── attention.rs # Multi-head self-attention |
32 | | -├── feed_forward.rs # Two-layer MLP with GELU activation |
33 | | -└── utils.rs # Causal mask generation helpers |
34 | | -
|
35 | | -tests/ |
36 | | -├── unit_tests.rs # Core algorithm correctness tests |
37 | | -├── differential_tests.rs # PyTorch reference comparison tests |
38 | | -├── property_tests.rs # proptest property-based tests |
39 | | -└── integration_tests.rs # Full model training loop tests |
40 | | -
|
41 | | -examples/ |
42 | | -├── train_tiny.rs # Train a small model on synthetic data |
43 | | -├── compare_residuals.rs # Compare AttnRes vs standard residuals |
44 | | -└── visualize_weights.rs # Visualize depth attention patterns |
45 | | -
|
46 | | -benches/ |
47 | | -└── attn_res_benchmark.rs # Criterion benchmarks |
48 | | -
|
49 | | -fixtures/ # Reference outputs from PyTorch |
50 | | -├── attn_res_forward.json |
51 | | -└── block_state_tracking.json |
52 | | -
|
53 | | -web-demo/ # Interactive web demo (WASM + Vite) |
54 | | -├── crate/ # Rust WASM crate (pure-Rust AttnRes reimplementation) |
55 | | -│ ├── Cargo.toml |
56 | | -│ └── src/lib.rs # wasm-bindgen exports: AttnResEngine |
57 | | -├── src/ # TypeScript frontend |
58 | | -│ ├── main.ts # App entry point |
59 | | -│ ├── style.css # Academic-grade styling |
60 | | -│ ├── viz.ts # Canvas 2D heatmaps, charts |
61 | | -│ └── diagrams.ts # Static architectural diagrams |
62 | | -├── index.html # Single-page app |
63 | | -├── package.json # Vite + TypeScript |
64 | | -└── vite.config.ts # Build config |
65 | | -``` |
66 | | - |
67 | | -## Commands |
| 21 | +These commands were run successfully during the latest quality pass: |
68 | 22 |
|
69 | 23 | ```bash |
70 | | -cargo build # Build the project |
71 | | -cargo test --all-features # Run all 87 tests |
72 | | -cargo test test_name # Run specific test |
73 | | -cargo clippy -- -D warnings # Lint (warnings = errors) |
74 | | -cargo fmt # Format code |
75 | | -cargo fmt -- --check # Check formatting without modifying |
76 | | -cargo bench # Run Criterion benchmarks |
77 | | -cargo run --example train_tiny # Train example |
78 | | -cargo run --example compare_residuals # Comparison example |
79 | | -cargo run --example visualize_weights # Visualization example |
80 | | - |
81 | | -# Web demo |
82 | | -cd web-demo && npm run build:wasm # Build WASM crate |
83 | | -cd web-demo && npm run dev # Start Vite dev server |
84 | | -cd web-demo && npm run build # Production build (WASM + Vite) |
| 24 | +cargo fmt -- --check |
| 25 | +cargo clippy -- -D warnings |
| 26 | +cargo test --all-features |
| 27 | +cargo build --examples |
| 28 | +cd web-demo && npm run build |
85 | 29 | ``` |
86 | 30 |
|
87 | | -## Architecture Essentials |
88 | | - |
89 | | -### Core Algorithm (AttnRes) |
90 | | - |
91 | | -Standard residual: `x_{l+1} = x_l + f_l(x_l)` (fixed unit weights) |
92 | | - |
93 | | -AttnRes: `x_{l+1} = Σ α_i · v_i` where α = softmax(w_l · RMSNorm(V)) over depth dimension |
| 31 | +Additional useful commands: |
94 | 32 |
|
95 | | -Key invariants: |
96 | | -1. **Zero-init pseudo-queries** → starts as uniform averaging (standard residual behavior) |
97 | | -2. **Two AttnRes per transformer layer** — one before self-attention, one before MLP |
98 | | -3. **Softmax over depth** (block/layer dimension), NOT over sequence tokens |
99 | | -4. **RMSNorm on keys** to prevent magnitude domination |
100 | | -5. **Block boundaries** at every `block_size/2` sublayers |
101 | | - |
102 | | -### Data Flow |
103 | | - |
104 | | -``` |
105 | | -Input IDs → Embedding → [AttnResLayer × N] → RMSNorm → LM Head → Logits |
106 | | - ↓ |
107 | | - AttnResOp(pre-attn) → RMSNorm → MultiHeadAttention |
108 | | - AttnResOp(pre-mlp) → RMSNorm → FeedForward |
| 33 | +```bash |
| 34 | +cargo bench |
| 35 | +cargo doc --open |
109 | 36 | ``` |
110 | 37 |
|
111 | | -### Configuration |
112 | | - |
113 | | -`AttnResConfig::new(d_model, num_layers, num_blocks)` where: |
114 | | -- `d_model`: Hidden dimension |
115 | | -- `num_layers`: Number of **sublayers** (transformer layers × 2) |
116 | | -- `num_blocks`: Number of blocks for Block AttnRes (set = num_layers for Full AttnRes) |
117 | | - |
118 | | -## Boundaries |
119 | | - |
120 | | -### Read-Only (never modify) |
121 | | -- `spec.md`, `paper.md`, `research_report.md`, `implementation_plan.md`, `LICENSE` |
122 | | - |
123 | | -### Gated (requires approval) |
124 | | -- `Cargo.toml` (dependency changes) |
125 | | -- `.github/workflows/` (CI changes) |
126 | | -- `cargo publish` |
127 | | - |
128 | | -## Source of Truth |
129 | | - |
130 | | -`spec.md` is the authoritative specification. All algorithm implementations must match the pseudocode and equations defined there. |
131 | | - |
132 | | -## Web Demo |
133 | | - |
134 | | -The `web-demo/` directory contains a fully interactive browser-based demo. The WASM crate (`web-demo/crate/`) is a pure-Rust reimplementation of the core AttnRes algorithm (no burn dependency for WASM portability), faithfully mirroring `src/attn_res_op.rs`. It exposes: |
135 | | - |
136 | | -- `AttnResEngine` — model creation, forward pass, training simulation |
137 | | -- `compute_attn_res()` — interactive core operation with custom pseudo-queries |
138 | | -- `train_step()` — simulated training showing depth attention pattern emergence |
139 | | - |
140 | | -Frontend: Vite + TypeScript with Canvas 2D visualizations (heatmaps, bar charts, loss curves). Academic design with full algorithm explanation. |
141 | | - |
142 | | -## Known Gaps |
143 | | - |
144 | | -- No PyTorch checkpoint loading (safetensors format) |
145 | | -- GPU backends (wgpu, CUDA, Metal) untested |
146 | | -- No distributed training support |
147 | | -- Pre-trained weight import/export utilities |
| 38 | +## Architecture Map |
| 39 | + |
| 40 | +- `src/config.rs`: `AttnResConfig`, `ConfigError`, validation helpers. |
| 41 | +- `src/attn_res_op.rs`: core depth-attention residual operator. |
| 42 | +- `src/block_state.rs`: completed blocks + current partial block. |
| 43 | +- `src/layer.rs`: one Transformer layer with two AttnRes operations. |
| 44 | +- `src/model.rs`: full model, hidden-state forward, two-phase forward. |
| 45 | +- `src/two_phase.rs`: batched inter-block pass + online softmax merge. |
| 46 | +- `src/attention.rs`: multi-head self-attention. |
| 47 | +- `src/feed_forward.rs`: two-layer GELU MLP. |
| 48 | +- `src/rms_norm.rs`: RMSNorm for 3D and 4D tensors. |
| 49 | +- `src/serialization.rs`: burn-record save/load helpers. |
| 50 | +- `tests/`: unit, integration, property, and differential coverage. |
| 51 | +- `examples/demo_tui.rs`: terminal demo with live routing visualization. |
| 52 | +- `web-demo/`: WASM crate plus Vite frontend. |
| 53 | + |
| 54 | +## Non-Negotiable Invariants |
| 55 | + |
| 56 | +- Pseudo-query vectors start at zero. |
| 57 | +- Depth softmax is over block/layer sources, not tokens. |
| 58 | +- Each Transformer layer uses two AttnRes operations. |
| 59 | +- Block boundaries are defined in sublayer space. |
| 60 | +- `BlockState.blocks[0]` is the embedding block. |
| 61 | +- Internal invariant failures should panic loudly rather than produce silent |
| 62 | + wrong outputs. |
| 63 | + |
| 64 | +## Conventions |
| 65 | + |
| 66 | +- Prefer `try_validate`, `try_init_model`, `try_init_layer`, and `try_init_op` |
| 67 | + for untrusted config input. Panic-based constructors remain for trusted, |
| 68 | + hard-coded configs. |
| 69 | +- Keep tensor shape comments short and accurate when code would otherwise be |
| 70 | + hard to parse. |
| 71 | +- Add tests for every algorithm or boundary-condition change. |
| 72 | +- Keep README and roadmap claims tied to commands or tests that actually ran. |
| 73 | + |
| 74 | +## Constraints |
| 75 | + |
| 76 | +- Do not modify `LICENSE`. |
| 77 | +- Do not change dependency versions in `Cargo.toml` without approval. |
| 78 | +- Do not change `.github/workflows/` without approval. |
| 79 | +- Do not claim backend support, benchmark numbers, or checkpoint compatibility |
| 80 | + unless the repository validates them. |
| 81 | + |
| 82 | +## Gotchas |
| 83 | + |
| 84 | +- `num_layers` counts sublayers, not full Transformer blocks. |
| 85 | +- Full AttnRes means `num_blocks == num_layers`, so block boundaries can occur |
| 86 | + between attention and MLP inside one Transformer layer. |
| 87 | +- The web demo is a separate pure-Rust reimplementation for WASM portability; |
| 88 | + do not assume it automatically stays in sync with `src/` without verification. |
0 commit comments