Transformer Runtime Lab

This repository explores a simple thesis:

code is the source of truth

For exact tasks, the runtime should own legality, state transitions, and backtracking. The model should learn the narrow decision surface on top of that runtime by evaluating ambiguous PSVM states, not replace the runtime with a one-shot guess.

Core idea

The project is built around problem-shaped virtual machines (PSVMs).

Instead of asking a model to learn:

task -> final answer

or:

task -> generic C/WASM machine semantics

we use:

task -> custom ops -> exact PSVM state -> local model estimates branch value

In practice that means:

Write or keep an exact reference runtime.
Define the smallest sound op surface for the task.
Export canonical traces and state/decision records from the runtime.
Encode structured state snapshots.
Train a local structured model to estimate branch value over PSVM states or rank legal arguments.
Keep the exact runtime in the loop for verification and rollback.

The model handles ambiguity by scoring branches. The code handles truth.

Citation

If you use the repository as a software artifact, cite:

CITATION.cff
paper.md
paper.pdf
versioned GitHub release: v0.1.1

If you want the broader research-paper framing rather than the software-paper framing, use:

Why this approach

Generic compiled traces are too noisy for narrow exact tasks. They include machine detail the task does not care about:

stack plumbing
memory bookkeeping
compiler-induced structure
large instruction surfaces

A PSVM keeps only the transitions that carry semantic weight for the task. That gives:

smaller action spaces
shorter traces
cleaner supervision
more interpretable execution
cheaper browser-local inference

Current focus

The repo currently centers on two browser-local game tasks and two browser-local document tasks:

sudoku.html Exact 9x9 Sudoku solve with:
- exact browser-side runtime
- deterministic backtracking
- local guided branch ranking with Auto, Transformer, Transformer (Regret), Transformer (Hard), or GNN selection
- visible trace and model stats
weiqi/index.html 5x5 Weiqi capture PSVM with exact local rules.
invoice/README.md OCR receipt total extraction with:
- exact money-candidate extraction from structured pdftotext -tsv rows
- layout-aware cues such as right-edge alignment and cue-before-amount position
- deterministic teacher ranking over legal total branches
- a local transformer that scores TOTAL vs NOT_TOTAL candidates
- explicit rejection of account-statement style documents with running balances
- a browser demo at receipt.html
tally/README.md Tally-style voucher extraction with:
- voucher-family classification and schema selection
- schema-aligned field candidate extraction from OCR/layout
- constraint-guided resolver over top-ranked scalar field candidates
- shared invoice fields plus industry extensions for pharma, medical, trading, and stockist flows
- deterministic-first PSVM emission of Tally-shaped records, with an optional tiny local transformer for field selection
- a browser demo at tally.html

The main Sudoku page is the current source of truth for the end-to-end architecture.

Sudoku architecture

Sudoku is the clearest example of the stack:

structured state -> local value policy (transformer or GNN) -> ranked PLACE candidates -> exact runtime -> new state

What remains exact:

candidate generation
legality checks
contradiction detection
backtracking
halt conditions

What the model does:

rank branch choices where ambiguity exists

That means the current guided solver is model-guided exact search, not a pure free-running model-only solver.

Code as source of truth

This repository explicitly treats code and runtime behavior as authoritative.

The solver defines what a legal step is.
The verifier defines whether a branch is valid.
The canonical trace comes from the runtime.
The model is trained against exact state/decision records derived from that trace.

So the meta-pattern is:

state -> model estimates branch value -> exact runtime -> new state

not:

state -> model -> magic answer

Why not compile C directly into weights

That broader direction is interesting, but for this project it is too broad.

For narrow exact tasks like Sudoku, Weiqi tactics, or small rule-checking tools, the efficient path is:

task -> custom ops -> PSVM -> weights

not:

task -> arbitrary C -> full machine semantics -> weights

The latter keeps too much irrelevant machine detail alive in the training target.

Project status

What is working now:

exact browser-side Sudoku solve
guided local model path on Sudoku
live guided board animation
exact backtracking and verifier-backed execution
structured ONNX models running locally in the browser
packed tensor-shard training path for structured Sudoku models
PSVM-style OCR receipt total extraction and candidate ranking under invoice/
synthetic OCR receipt dataset export and local total-selector training
Tally-style voucher-family classification and schema-aligned field extraction under tally/
a browser demo for Tally-shaped OCR extraction at tally.html

What is not claimed yet:

pure free-running model-only 9x9 Sudoku solving
model outperforming the deterministic reference policy
a general-purpose compiled-code-to-weights system

Invoice / OCR Receipts

The invoice lane now follows the same repo thesis as Sudoku:

OCR text -> legal money candidates -> model ranks branches -> exact runtime emits total

That means the model is not asked to invent the receipt total end-to-end. It only scores legal candidates extracted by the runtime.

In short:

AI/ML view: a constrained candidate-ranking problem over extracted money spans
layman view: collect all amount-looking numbers, then pick the one that most looks like the final total
main limitation: this is invoice/receipt-shaped, not a general parser for arbitrary tables or account statements

See invoice/README.md for the detailed runtime, training, and browser flow.

Tally / Voucher Extraction

The Tally lane follows a broader document-extraction PSVM:

OCR/layout -> voucher family -> schema -> legal field candidates -> local ranker -> resolver -> exact runtime emits Tally-shaped record

That means the system is not trying to hallucinate a full accounting document from raw OCR text. It first narrows the document family, then only fills fields that the selected voucher schema allows. See tally/README.md for the schema, browser demo, and current limitations.

In short:

AI/ML view: constrained information extraction over voucher families and field candidates, with a small deterministic resolver for global consistency
layman view: detect the document type, look for the likely invoice fields, and fill a Tally-shaped record
main limitation: the local model is still small and synthetic-data-trained, the demo expects pasted OCR/TSV rather than direct PDF conversion, and arbitrary table-heavy layouts still need more parser/constraint coverage

There is now an adversarial harness for that exact gap:

node scripts/evaluate_tally_harness.mjs
reports candidate recall, top-1 accuracy, instability, and line-item recall by failure class
useful classes today: candidate_missing, implicit_field, layout_drift, ocr_corruption, numeric_ambiguity, ranking_ambiguity, structural_inconsistency

Local development

Serve the repo root with any static file server:

.venv/bin/python -m pip install -r requirements.txt
python3 -m http.server 8000

Then open:

http://localhost:8000/sudoku.html
http://localhost:8000/weiqi/
http://localhost:8000/receipt.html
http://localhost:8000/tally.html

Minimal smoke test

For a quick local verification before exploring the demos:

node --test tally/schema.test.mjs tally/psvm.test.mjs tally/model.test.mjs
node --test invoice/receipt.test.mjs invoice/total_psvm.test.mjs
node --check tally/app.mjs tally/worker.mjs

That path checks the Tally and invoice PSVM lanes without requiring model retraining or a browser session.

Sudoku training

The structured Sudoku training path lives under the legacy soduku/ directory name.

One-command training wrapper:

PYTHON=../transformer-in-notion-executor/.venv/bin/python \
sh scripts/train_sudoku_extreme.sh \
  --top-puzzles-by-rating 25 \
  --limit-puzzles 0 \
  --min-rating 80 \
  --op-epochs 1 \
  --value-epochs 1

To train the GNN value path instead of the transformer value path, add:

  --value-arch gnn

This pipeline does:

stream the CSV dataset
export structured manifests
pack them into tensor shards
train the op/value models
export browser-local ONNX artifacts

Important files

sudoku.html - final Sudoku page
app.mjs - UI wiring and live board/model updates
logic/sudoku.mjs - exact Sudoku runtime, trace generation, guided solve path
logic/executor.mjs - prompt/program/tool-call artifact builder
soduku/model-worker.mjs - guided model worker with explicit transformer, regret-transformer, and GNN selection
soduku/model.mjs - structured op/value model loading
soduku/value-model.mjs - structured value-model loading and Auto / Transformer / GNN routing
soduku/structured-onnx.mjs - ONNX Runtime setup for structured state tensors
invoice/psvm.mjs - exact invoice arithmetic PSVM
tally/README.md - Tally voucher extraction overview, browser flow, and limitations
tally/schema.mjs - voucher families, core shared fields, and industry extensions
tally/psvm.mjs - voucher-family classifier and schema-aligned field extractor
invoice/total_psvm.mjs - exact OCR receipt total candidate extractor and teacher ranker
tally.html - browser Tally extraction demo
tally/app.mjs - browser UI for voucher-family and field-candidate inspection
invoice/export_total_dataset.mjs - synthetic OCR receipt dataset generator
invoice/train_total_selector.py - local transformer trainer for TOTAL vs NOT_TOTAL
scripts/predict_receipt_total.py - local inference over extracted receipt candidates
docs/paper-idea-problem-shaped-vms.md - implementation and systems paper draft for the PSVM thesis
docs/psvm-yellow-paper.md - companion yellow-paper-style spec for PSVM runtime and trace semantics
soduku/structured_transformer_common.py - shared structured transformer/GNN training/export utilities
soduku/meta.md - meta pattern and runtime philosophy
weiqi/psvm5x5.mjs - exact Weiqi PSVM

Design summary

The project’s position is:

code is the source of truth
exact runtimes own correctness
custom ops beat generic machine detail for narrow tasks
models should learn ambiguity, not replace exact semantics

That is the whole bet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Runtime Lab

Core idea

Citation

Why this approach

Current focus

Sudoku architecture

Code as source of truth

Why not compile C directly into weights

Project status

Invoice / OCR Receipts

Tally / Voucher Extraction

Local development

Minimal smoke test

Sudoku training

Important files

Design summary

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
.github/workflows		.github/workflows
docs		docs
invoice		invoice
logic		logic
models/tictactoe-bert		models/tictactoe-bert
scripts		scripts
soduku		soduku
tally		tally
wasm		wasm
weiqi		weiqi
.gitignore		.gitignore
.nojekyll		.nojekyll
.zenodo.json		.zenodo.json
AVIRAJ.md		AVIRAJ.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
app.mjs		app.mjs
index.html		index.html
paper.bib		paper.bib
paper.md		paper.md
paper.pdf		paper.pdf
receipt-app.mjs		receipt-app.mjs
receipt.html		receipt.html
requirements.txt		requirements.txt
styles.css		styles.css
sudoku.html		sudoku.html
tally.html		tally.html

Folders and files

Latest commit

History

Repository files navigation

Transformer Runtime Lab

Core idea

Citation

Why this approach

Current focus

Sudoku architecture

Code as source of truth

Why not compile C directly into weights

Project status

Invoice / OCR Receipts

Tally / Voucher Extraction

Local development

Minimal smoke test

Sudoku training

Important files

Design summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages