Precision architecture for AI agents, ML systems, and the data and analytics layers they depend on.
ByteStack Labs marketplace for Claude Code. Reliability skills that audit AI which passes evaluation and fails in production.
production-autopsy reproduces the failure, quantifies the drop by slice, and isolates the root cause by ablation; calibration-guard and trajectory-eval land next. Each skill emits a verification script and a diagnostic report, committed unedited. Every figure re-derives from runnable code, and the script exits non-zero if a single one fails to reproduce.
Public fixtures diagnosed by the agent-reliability plugin, with the tool's output committed unedited as the receipt.
The hero fixture, invoice-extraction, scores 100% exact-match on evaluation and 86.25% on format-shifted production input. The 13.75-point drop concentrates in four input-format slices that collapse to zero; two ablations isolate positional field assignment as the cause. verify.py re-derives every figure from the raw data and exits non-zero if a single one fails to reproduce. Standard library, no model, no GPU.
The standard linear Kalman filter, derived from first principles and built to diagnose itself.
A complete derivation from the Bayesian foundation through the recursive algorithm, a NumPy-only reference implementation where every line cites the equation it implements, and diagnostic instrumentation that reveals whether a running filter is actually optimal. NIS, NEES, innovation whiteness, and divergence detection, each derived from the properties the mathematics guarantees rather than bolted on after. The core depends on NumPy alone. Every claim traces to the derivation; every test verifies a property the math proves.
This account represents ByteStack Labs. All repositories, publications, and active work live under the ByteStack-Labs organization. Founded by Jesse Moses, Founder & Chief Architect.
Precision is the authority.






