This repository explores a simple thesis:
code is the source of truth
For exact tasks, the runtime should own legality, state transitions, and backtracking. The model should learn the narrow decision surface on top of that runtime by evaluating ambiguous PSVM states, not replace the runtime with a one-shot guess.
The project is built around problem-shaped virtual machines (PSVMs).
Instead of asking a model to learn:
task -> final answer
or:
task -> generic C/WASM machine semantics
we use:
task -> custom ops -> exact PSVM state -> local model estimates branch value
In practice that means:
- Write or keep an exact reference runtime.
- Define the smallest sound op surface for the task.
- Export canonical traces and state/decision records from the runtime.
- Encode structured state snapshots.
- Train a local structured model to estimate branch value over PSVM states or rank legal arguments.
- Keep the exact runtime in the loop for verification and rollback.
The model handles ambiguity by scoring branches. The code handles truth.
If you use the repository as a software artifact, cite:
- CITATION.cff
- paper.md
- paper.pdf
- versioned GitHub release:
v0.1.1
If you want the broader research-paper framing rather than the software-paper framing, use:
- docs/paper-idea-problem-shaped-vms.pdf
- docs/arxiv-submission-package.md
- docs/joss-resubmission-plan.md
Generic compiled traces are too noisy for narrow exact tasks. They include machine detail the task does not care about:
- stack plumbing
- memory bookkeeping
- compiler-induced structure
- large instruction surfaces
A PSVM keeps only the transitions that carry semantic weight for the task. That gives:
- smaller action spaces
- shorter traces
- cleaner supervision
- more interpretable execution
- cheaper browser-local inference
The repo currently centers on two browser-local game tasks and two browser-local document tasks:
-
sudoku.html Exact 9x9 Sudoku solve with:
- exact browser-side runtime
- deterministic backtracking
- local guided branch ranking with
Auto,Transformer,Transformer (Regret),Transformer (Hard), orGNNselection - visible trace and model stats
-
weiqi/index.html 5x5 Weiqi capture PSVM with exact local rules.
-
invoice/README.md OCR receipt total extraction with:
- exact money-candidate extraction from structured
pdftotext -tsvrows - layout-aware cues such as right-edge alignment and cue-before-amount position
- deterministic teacher ranking over legal total branches
- a local transformer that scores
TOTALvsNOT_TOTALcandidates - explicit rejection of account-statement style documents with running balances
- a browser demo at receipt.html
- exact money-candidate extraction from structured
-
tally/README.md Tally-style voucher extraction with:
- voucher-family classification and schema selection
- schema-aligned field candidate extraction from OCR/layout
- constraint-guided resolver over top-ranked scalar field candidates
- shared invoice fields plus industry extensions for pharma, medical, trading, and stockist flows
- deterministic-first PSVM emission of Tally-shaped records, with an optional tiny local transformer for field selection
- a browser demo at tally.html
The main Sudoku page is the current source of truth for the end-to-end architecture.
Sudoku is the clearest example of the stack:
structured state -> local value policy (transformer or GNN) -> ranked PLACE candidates -> exact runtime -> new state
What remains exact:
- candidate generation
- legality checks
- contradiction detection
- backtracking
- halt conditions
What the model does:
- rank branch choices where ambiguity exists
That means the current guided solver is model-guided exact search, not a pure free-running model-only solver.
This repository explicitly treats code and runtime behavior as authoritative.
- The solver defines what a legal step is.
- The verifier defines whether a branch is valid.
- The canonical trace comes from the runtime.
- The model is trained against exact state/decision records derived from that trace.
So the meta-pattern is:
state -> model estimates branch value -> exact runtime -> new state
not:
state -> model -> magic answer
That broader direction is interesting, but for this project it is too broad.
For narrow exact tasks like Sudoku, Weiqi tactics, or small rule-checking tools, the efficient path is:
task -> custom ops -> PSVM -> weights
not:
task -> arbitrary C -> full machine semantics -> weights
The latter keeps too much irrelevant machine detail alive in the training target.
What is working now:
- exact browser-side Sudoku solve
- guided local model path on Sudoku
- live guided board animation
- exact backtracking and verifier-backed execution
- structured ONNX models running locally in the browser
- packed tensor-shard training path for structured Sudoku models
- PSVM-style OCR receipt total extraction and candidate ranking under
invoice/ - synthetic OCR receipt dataset export and local total-selector training
- Tally-style voucher-family classification and schema-aligned field extraction under
tally/ - a browser demo for Tally-shaped OCR extraction at
tally.html
What is not claimed yet:
- pure free-running model-only 9x9 Sudoku solving
- model outperforming the deterministic reference policy
- a general-purpose compiled-code-to-weights system
The invoice lane now follows the same repo thesis as Sudoku:
OCR text -> legal money candidates -> model ranks branches -> exact runtime emits total
That means the model is not asked to invent the receipt total end-to-end. It only scores legal candidates extracted by the runtime.
In short:
- AI/ML view: a constrained candidate-ranking problem over extracted money spans
- layman view: collect all amount-looking numbers, then pick the one that most looks like the final total
- main limitation: this is invoice/receipt-shaped, not a general parser for arbitrary tables or account statements
See invoice/README.md for the detailed runtime, training, and browser flow.
The Tally lane follows a broader document-extraction PSVM:
OCR/layout -> voucher family -> schema -> legal field candidates -> local ranker -> resolver -> exact runtime emits Tally-shaped record
That means the system is not trying to hallucinate a full accounting document from raw OCR text. It first narrows the document family, then only fills fields that the selected voucher schema allows. See tally/README.md for the schema, browser demo, and current limitations.
In short:
- AI/ML view: constrained information extraction over voucher families and field candidates, with a small deterministic resolver for global consistency
- layman view: detect the document type, look for the likely invoice fields, and fill a Tally-shaped record
- main limitation: the local model is still small and synthetic-data-trained, the demo expects pasted OCR/TSV rather than direct PDF conversion, and arbitrary table-heavy layouts still need more parser/constraint coverage
There is now an adversarial harness for that exact gap:
node scripts/evaluate_tally_harness.mjs- reports candidate recall, top-1 accuracy, instability, and line-item recall by failure class
- useful classes today:
candidate_missing,implicit_field,layout_drift,ocr_corruption,numeric_ambiguity,ranking_ambiguity,structural_inconsistency
Serve the repo root with any static file server:
.venv/bin/python -m pip install -r requirements.txt
python3 -m http.server 8000Then open:
http://localhost:8000/sudoku.htmlhttp://localhost:8000/weiqi/http://localhost:8000/receipt.htmlhttp://localhost:8000/tally.html
For a quick local verification before exploring the demos:
node --test tally/schema.test.mjs tally/psvm.test.mjs tally/model.test.mjs
node --test invoice/receipt.test.mjs invoice/total_psvm.test.mjs
node --check tally/app.mjs tally/worker.mjsThat path checks the Tally and invoice PSVM lanes without requiring model retraining or a browser session.
The structured Sudoku training path lives under the legacy soduku/ directory name.
One-command training wrapper:
PYTHON=../transformer-in-notion-executor/.venv/bin/python \
sh scripts/train_sudoku_extreme.sh \
--top-puzzles-by-rating 25 \
--limit-puzzles 0 \
--min-rating 80 \
--op-epochs 1 \
--value-epochs 1To train the GNN value path instead of the transformer value path, add:
--value-arch gnnThis pipeline does:
- stream the CSV dataset
- export structured manifests
- pack them into tensor shards
- train the op/value models
- export browser-local ONNX artifacts
- sudoku.html - final Sudoku page
- app.mjs - UI wiring and live board/model updates
- logic/sudoku.mjs - exact Sudoku runtime, trace generation, guided solve path
- logic/executor.mjs - prompt/program/tool-call artifact builder
- soduku/model-worker.mjs - guided model worker with explicit transformer, regret-transformer, and GNN selection
- soduku/model.mjs - structured op/value model loading
- soduku/value-model.mjs - structured value-model loading and
Auto / Transformer / GNNrouting - soduku/structured-onnx.mjs - ONNX Runtime setup for structured state tensors
- invoice/psvm.mjs - exact invoice arithmetic PSVM
- tally/README.md - Tally voucher extraction overview, browser flow, and limitations
- tally/schema.mjs - voucher families, core shared fields, and industry extensions
- tally/psvm.mjs - voucher-family classifier and schema-aligned field extractor
- invoice/total_psvm.mjs - exact OCR receipt total candidate extractor and teacher ranker
- tally.html - browser Tally extraction demo
- tally/app.mjs - browser UI for voucher-family and field-candidate inspection
- invoice/export_total_dataset.mjs - synthetic OCR receipt dataset generator
- invoice/train_total_selector.py - local transformer trainer for
TOTALvsNOT_TOTAL - scripts/predict_receipt_total.py - local inference over extracted receipt candidates
- docs/paper-idea-problem-shaped-vms.md - implementation and systems paper draft for the PSVM thesis
- docs/psvm-yellow-paper.md - companion yellow-paper-style spec for PSVM runtime and trace semantics
- soduku/structured_transformer_common.py - shared structured transformer/GNN training/export utilities
- soduku/meta.md - meta pattern and runtime philosophy
- weiqi/psvm5x5.mjs - exact Weiqi PSVM
The project’s position is:
- code is the source of truth
- exact runtimes own correctness
- custom ops beat generic machine detail for narrow tasks
- models should learn ambiguity, not replace exact semantics
That is the whole bet.