AbdelStark
diff --git a/‎AGENTS.md‎
Lines changed: 75 additions & 134 deletions b/‎AGENTS.md‎
Lines changed: 75 additions & 134 deletions
diff --git a/‎ARCHITECTURE.md‎
Lines changed: 122 additions & 0 deletions b/‎ARCHITECTURE.md‎
Lines changed: 122 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 25 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 25 additions & 0 deletions
@@ -1,147 +1,88 @@
-# AGENTS.md — AI Agent Technical Context
+# AGENTS.md
 
-## Project Overview
+## Project Identity
 
-**attnres** is the first Rust implementation of Attention Residuals (MoonshotAI/Kimi paper) using the [burn](https://github.com/tracel-ai/burn) deep learning framework. It provides a drop-in replacement for standard residual connections in Transformers.
+`attnres` is a Rust library that implements Attention Residuals for burn-based
+Transformer experiments, plus examples, benchmarks, and a web demo.
 
-## Tech Stack
+## Current State
 
-| Component   | Technology       | Version  |
-|-------------|-----------------|----------|
-| Language    | Rust            | 2021 edition (1.80+) |
-| ML Framework| burn            | 0.20     |
-| Test Backend| NdArray         | (CPU, deterministic) |
-| Testing     | cargo test + proptest + criterion | — |
-| Linting     | clippy + rustfmt | —       |
-| CI          | GitHub Actions   | test, clippy, fmt, build-examples |
+- Status: alpha as of March 16, 2026.
+- Suitable for: research, examples, local experimentation, integration work on
+  trusted inputs.
+- Not yet suitable for: production inference services, validated GPU claims,
+  PyTorch checkpoint interchange, or a stable 1.0 API promise.
+- Important gap: there is no dedicated `spec.md` in this checkout. Use
+  [ARCHITECTURE.md](ARCHITECTURE.md), README, module docs, and tests as the
+  current source of truth.
 
-## Project Structure
+## Verified Commands
 
-```
-src/
-├── lib.rs              # Public API re-exports + module declarations
-├── config.rs           # AttnResConfig — validated builder pattern (JSON save/load)
-├── attn_res_op.rs      # Core AttnRes operation (depth-wise softmax attention)
-├── block_state.rs      # BlockState — cumulative block representation tracking
-├── layer.rs            # AttnResLayer — transformer layer with dual AttnRes
-├── model.rs            # AttnResTransformer — full model with standard + two-phase forward
-├── rms_norm.rs         # RMSNorm implementation
-├── serialization.rs    # Model weight save/load (NamedMpk, binary, compact formats)
-├── two_phase.rs        # Two-phase inference primitives (phase1_batched, online_softmax_merge)
-├── attention.rs        # Multi-head self-attention
-├── feed_forward.rs     # Two-layer MLP with GELU activation
-└── utils.rs            # Causal mask generation helpers
-
-tests/
-├── unit_tests.rs       # Core algorithm correctness tests
-├── differential_tests.rs # PyTorch reference comparison tests
-├── property_tests.rs   # proptest property-based tests
-└── integration_tests.rs # Full model training loop tests
-
-examples/
-├── train_tiny.rs       # Train a small model on synthetic data
-├── compare_residuals.rs # Compare AttnRes vs standard residuals
-└── visualize_weights.rs # Visualize depth attention patterns
-
-benches/
-└── attn_res_benchmark.rs # Criterion benchmarks
-
-fixtures/                # Reference outputs from PyTorch
-├── attn_res_forward.json
-└── block_state_tracking.json
-
-web-demo/                # Interactive web demo (WASM + Vite)
-├── crate/               # Rust WASM crate (pure-Rust AttnRes reimplementation)
-│   ├── Cargo.toml
-│   └── src/lib.rs       # wasm-bindgen exports: AttnResEngine
-├── src/                 # TypeScript frontend
-│   ├── main.ts          # App entry point
-│   ├── style.css        # Academic-grade styling
-│   ├── viz.ts           # Canvas 2D heatmaps, charts
-│   └── diagrams.ts      # Static architectural diagrams
-├── index.html           # Single-page app
-├── package.json         # Vite + TypeScript
-└── vite.config.ts       # Build config
-```
-
-## Commands
+These commands were run successfully during the latest quality pass:
 
 ```bash
-cargo build                        # Build the project
-cargo test --all-features          # Run all 87 tests
-cargo test test_name               # Run specific test
-cargo clippy -- -D warnings        # Lint (warnings = errors)
-cargo fmt                          # Format code
-cargo fmt -- --check               # Check formatting without modifying
-cargo bench                        # Run Criterion benchmarks
-cargo run --example train_tiny     # Train example
-cargo run --example compare_residuals  # Comparison example
-cargo run --example visualize_weights  # Visualization example
-
-# Web demo
-cd web-demo && npm run build:wasm     # Build WASM crate
-cd web-demo && npm run dev            # Start Vite dev server
-cd web-demo && npm run build          # Production build (WASM + Vite)
+cargo fmt -- --check
+cargo clippy -- -D warnings
+cargo test --all-features
+cargo build --examples
+cd web-demo && npm run build
 ```
 
-## Architecture Essentials
-
-### Core Algorithm (AttnRes)
-
-Standard residual: `x_{l+1} = x_l + f_l(x_l)` (fixed unit weights)
-
-AttnRes: `x_{l+1} = Σ α_i · v_i` where α = softmax(w_l · RMSNorm(V)) over depth dimension
+Additional useful commands:
 
-Key invariants:
-1. **Zero-init pseudo-queries** → starts as uniform averaging (standard residual behavior)
-2. **Two AttnRes per transformer layer** — one before self-attention, one before MLP
-3. **Softmax over depth** (block/layer dimension), NOT over sequence tokens
-4. **RMSNorm on keys** to prevent magnitude domination
-5. **Block boundaries** at every `block_size/2` sublayers
-
-### Data Flow
-
-```
-Input IDs → Embedding → [AttnResLayer × N] → RMSNorm → LM Head → Logits
-                              ↓
-                    AttnResOp(pre-attn) → RMSNorm → MultiHeadAttention
-                    AttnResOp(pre-mlp)  → RMSNorm → FeedForward
+```bash
+cargo bench
+cargo doc --open
 ```
 
-### Configuration
-
-`AttnResConfig::new(d_model, num_layers, num_blocks)` where:
-- `d_model`: Hidden dimension
-- `num_layers`: Number of **sublayers** (transformer layers × 2)
-- `num_blocks`: Number of blocks for Block AttnRes (set = num_layers for Full AttnRes)
-
-## Boundaries
-
-### Read-Only (never modify)
-- `spec.md`, `paper.md`, `research_report.md`, `implementation_plan.md`, `LICENSE`
-
-### Gated (requires approval)
-- `Cargo.toml` (dependency changes)
-- `.github/workflows/` (CI changes)
-- `cargo publish`
-
-## Source of Truth
-
-`spec.md` is the authoritative specification. All algorithm implementations must match the pseudocode and equations defined there.
-
-## Web Demo
-
-The `web-demo/` directory contains a fully interactive browser-based demo. The WASM crate (`web-demo/crate/`) is a pure-Rust reimplementation of the core AttnRes algorithm (no burn dependency for WASM portability), faithfully mirroring `src/attn_res_op.rs`. It exposes:
-
-- `AttnResEngine` — model creation, forward pass, training simulation
-- `compute_attn_res()` — interactive core operation with custom pseudo-queries
-- `train_step()` — simulated training showing depth attention pattern emergence
-
-Frontend: Vite + TypeScript with Canvas 2D visualizations (heatmaps, bar charts, loss curves). Academic design with full algorithm explanation.
-
-## Known Gaps
-
-- No PyTorch checkpoint loading (safetensors format)
-- GPU backends (wgpu, CUDA, Metal) untested
-- No distributed training support
-- Pre-trained weight import/export utilities
+## Architecture Map
+
+- `src/config.rs`: `AttnResConfig`, `ConfigError`, validation helpers.
+- `src/attn_res_op.rs`: core depth-attention residual operator.
+- `src/block_state.rs`: completed blocks + current partial block.
+- `src/layer.rs`: one Transformer layer with two AttnRes operations.
+- `src/model.rs`: full model, hidden-state forward, two-phase forward.
+- `src/two_phase.rs`: batched inter-block pass + online softmax merge.
+- `src/attention.rs`: multi-head self-attention.
+- `src/feed_forward.rs`: two-layer GELU MLP.
+- `src/rms_norm.rs`: RMSNorm for 3D and 4D tensors.
+- `src/serialization.rs`: burn-record save/load helpers.
+- `tests/`: unit, integration, property, and differential coverage.
+- `examples/demo_tui.rs`: terminal demo with live routing visualization.
+- `web-demo/`: WASM crate plus Vite frontend.
+
+## Non-Negotiable Invariants
+
+- Pseudo-query vectors start at zero.
+- Depth softmax is over block/layer sources, not tokens.
+- Each Transformer layer uses two AttnRes operations.
+- Block boundaries are defined in sublayer space.
+- `BlockState.blocks[0]` is the embedding block.
+- Internal invariant failures should panic loudly rather than produce silent
+  wrong outputs.
+
+## Conventions
+
+- Prefer `try_validate`, `try_init_model`, `try_init_layer`, and `try_init_op`
+  for untrusted config input. Panic-based constructors remain for trusted,
+  hard-coded configs.
+- Keep tensor shape comments short and accurate when code would otherwise be
+  hard to parse.
+- Add tests for every algorithm or boundary-condition change.
+- Keep README and roadmap claims tied to commands or tests that actually ran.
+
+## Constraints
+
+- Do not modify `LICENSE`.
+- Do not change dependency versions in `Cargo.toml` without approval.
+- Do not change `.github/workflows/` without approval.
+- Do not claim backend support, benchmark numbers, or checkpoint compatibility
+  unless the repository validates them.
+
+## Gotchas
+
+- `num_layers` counts sublayers, not full Transformer blocks.
+- Full AttnRes means `num_blocks == num_layers`, so block boundaries can occur
+  between attention and MLP inside one Transformer layer.
+- The web demo is a separate pure-Rust reimplementation for WASM portability;
+  do not assume it automatically stays in sync with `src/` without verification.
@@ -0,0 +1,122 @@
+# Architecture
+
+`attnres` is a library-first reference implementation of Attention Residuals
+for burn-based Transformer experiments.
+
+## Scope
+
+The repository contains three primary surfaces:
+
+- The core Rust crate under `src/`.
+- Rust examples and benchmarks under `examples/` and `benches/`.
+- A browser demo under `web-demo/`.
+
+This document is the current architecture reference for the repository. A
+dedicated formal `spec.md` is not present in this checkout.
+
+## Forward Path
+
+Standard model forward:
+
+```text
+input_ids
+  -> embedding
+  -> BlockState { blocks: [embedding], partial_block: None }
+  -> for each AttnResLayer:
+       1. depth attention before self-attention
+       2. self-attention sublayer
+       3. depth attention before MLP
+       4. MLP sublayer
+       5. block-state update
+  -> final RMSNorm
+  -> LM head
+  -> logits
+```
+
+Two-phase forward:
+
+```text
+completed blocks
+  -> phase1_batched: inter-block attention statistics for each sublayer
+  -> sequential intra-block updates
+  -> online_softmax_merge
+  -> same final RMSNorm + LM head
+```
+
+## Module Map
+
+- `src/config.rs`
+  - Owns `AttnResConfig`.
+  - Validates user-supplied config.
+  - Exposes `ConfigError`, `try_validate`, `try_init_model`, and panic-based
+    compatibility helpers.
+
+- `src/attn_res_op.rs`
+  - Implements the core depth-attention residual operator.
+  - Stacks completed blocks plus the optional partial block.
+  - Applies RMSNorm to keys before computing depth logits.
+
+- `src/block_state.rs`
+  - Tracks completed blocks and the currently accumulating block.
+  - Treats the token embedding as the first completed block.
+
+- `src/layer.rs`
+  - Implements one Transformer layer with two AttnRes calls.
+  - Handles block-boundary transitions at sublayer granularity.
+
+- `src/model.rs`
+  - Builds the full model.
+  - Provides standard forward, two-phase forward, and hidden-state forward.
+
+- `src/attention.rs`
+  - Standard multi-head self-attention.
+  - Assumes additive masks with large negative values for masked positions.
+
+- `src/feed_forward.rs`
+  - Two-layer GELU MLP.
+
+- `src/rms_norm.rs`
+  - RMSNorm for both `[B, T, D]` and `[N, B, T, D]` tensors.
+
+- `src/two_phase.rs`
+  - Batched phase-1 inter-block attention.
+  - Online softmax merge for intra-block values.
+
+- `src/serialization.rs`
+  - Save/load helpers for burn recorders.
+  - Accepts `Path`-like inputs rather than forcing UTF-8 strings.
+
+## Invariants
+
+These are the invariants that most directly affect correctness:
+
+- Pseudo-query vectors are zero-initialized.
+- Attention weights are normalized over the depth dimension.
+- Each Transformer layer performs two AttnRes operations.
+- Block boundaries are defined in sublayer space, not Transformer-layer space.
+- `BlockState.blocks[0]` is always the token embedding block.
+- `partial_block` is expected to exist after at least one sublayer has run.
+
+If one of these invariants is violated by internal code, the crate prefers a
+direct panic with a specific message rather than silently returning a wrong
+result.
+
+## Error Model
+
+- Invalid caller-supplied configuration should use `ConfigError`.
+- Serialization failures return `SerializationError`.
+- Internal invariant breaks still panic because they represent library bugs, not
+  recoverable runtime conditions.
+
+## Verification
+
+The current repository-wide baseline checked during the March 16, 2026 quality
+pass is:
+
+```bash
+cargo fmt -- --check
+cargo clippy -- -D warnings
+cargo test --all-features
+cargo build --examples
+cd web-demo && npm run build
+```
@@ -0,0 +1,25 @@
+# Changelog
+
+All notable user-visible changes to this project will be documented here.
+
+The repository did not maintain a structured changelog before March 16, 2026.
+
+## [Unreleased]
+
+### Added
+
+- `ConfigError` with typed validation for invalid model configuration.
+- Fallible initialization helpers such as `try_validate`, `try_init_model`,
+  `try_init_layer`, and `try_init_op`.
+- `ARCHITECTURE.md` and `CONTRIBUTING.md`.
+
+### Changed
+
+- Serialization APIs now accept `Path`-like inputs instead of only `&str`.
+- README, roadmap, and agent context files now reflect the current alpha status
+  and verified commands instead of stale or aspirational claims.
+
+### Fixed
+
+- Explicit validation for `num_heads = 0`.
+- Explicit validation for out-of-range `layer_idx` values.