Add decoder-only autoregressive transformer example (#57) by lukstafi · Pull Request #448 · ahrefs/ocannl

lukstafi · 2026-04-15T08:20:10Z

Summary

Add decoder_only_block and decoder_only to lib/nn_blocks.ml — reusable building blocks for decoder-only transformers (masked self-attention + FFN, no cross-attention)
Add test/training/transformer_names.ml — a complete training + autoregressive generation example on the Names dataset using character-level encoding and causal masking
Add test/operations/decoder_only_test.ml — regression test exercising the new decoder_only API with a 2-layer stack forward pass

Test plan

dune build @check passes
dune runtest test/training/transformer_names.ml passes (loss decreases, generates name-like output)
dune runtest test/operations/decoder_only_test.ml passes (output shape validated)

🤖 Generated with Claude Code

Train a small decoder-only transformer on sequences from an 8-state binary-input finite state machine. The model learns the FSM transition function, demonstrated by >90% valid-transition accuracy on held-out sequences. Key implementation details: - Single attention block without layer_norm (recentered init workaround) - Separate forward-only inference routine sharing trained weights - Valid-transition accuracy metric for hidden-input FSM evaluation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…key crash Add proposal for task-bb30d0be covering two independent test failures: update test_ndarray_binary_io.expected and fix Map.of_alist_exn crash in derive_projections for 1x1 output convolution/pooling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add per-test output directory isolation via build_files_prefix config option to prevent flaky test failures from parallel test execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…assignment (ahrefs#420) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…refs#427) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…hrefs#412) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Covers aligned memory allocation, compiler flag validation, auto-vectorization- friendly code generation (restrict, pragma hints, aligned locals), and platform detection macros. Scoped as foundation for tiling task's explicit SIMD micro-kernels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…#253) Gap analysis identifying which llm.c techniques apply to OCANNL, which are already covered by existing tasks (tiling ahrefs#412, AVX ahrefs#164, megakernels ahrefs#318), and which unique lessons remain (warp shuffles, fused reductions, GELU, AdamW, vectorized memory access). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…hrefs#277) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nd (ahrefs#170) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Proposes Equality_with_index fetch op and simplify_llc pattern detection to replace one-hot * matrix multiply with direct row lookup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ahrefs#151) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…hrefs#57) Add decoder_only_block and decoder_only to nn_blocks.ml as reusable building blocks for autoregressive language models (masked self-attention + FFN, no cross-attention). Add test/training/transformer_names.ml: a complete training + generation example using character-level encoding on the Names dataset with causal masking, SGD training, and autoregressive token-by-token generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Exercises the new Nn_blocks.decoder_only helper with a 2-layer stack, causal mask, and forward pass, validating output shape. This ensures the new public API added in the previous commit has CI coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lukstafi · 2026-04-15T08:30:51Z

@codex review Focus on bugs, correctness issues, and edge cases. Do not check adherence to a spec or plan.

lukstafi · 2026-04-15T13:02:13Z

Closing: this PR was filed against the wrong repo due to a gh-resolved bug (see lukstafi/ludics#302). Ported to lukstafi#5.

lukstafi force-pushed the ludics/gh-ocannl-57-s1/root branch from 72c5cba to 6eb00e5 Compare April 15, 2026 08:20

lukstafi and others added 29 commits April 15, 2026 10:21

proposal: gh-ocannl-308 Sasha Rush tensor puzzles

63ad9ee

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-356 rename param to kparam

09378b3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-333 remove hosted memory mode and Ndarray

eb26057

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-313 MSVC support for C backend

e147ebc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-134 shared for loops for virtual tensors

efc65ce

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-409 relax ocannl_ prefix requirement

cecb5e9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-281 collapse repeated identifiers in debug_name

05cca5b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-133 inlining for non-linear virtual nodes

65c0df8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-296 lowering_and_inlining.md + low_level audit

8f999a6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-383 blacklist operator names in identifiers

b18de86

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-344 universal pool allocator

369f8fc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-382 remove unnecessary zeroing-out

e293f7d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: gh-ocannl-425 migrate slipshow presentations

f632812

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: transformer inference for open-weights model

c39b90a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: PoPE (Polar Position Embeddings) support (ahrefs#444)

9e8e29a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: centered param init and to_routine context return

d1d4414

proposal: add Nn_blocks.cross_entropy_loss helper

007fd38

proposal: MSVC support for C backend on native Windows (ahrefs#313)

eb35e87

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: GPT-2 Small inference pipeline (ahrefs#377)

6e9eba6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: fix build_files/ sharing between test executables

802859e

Add per-test output directory isolation via build_files_prefix config option to prevent flaky test failures from parallel test execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: polish MNIST and CIFAR classifier examples (gh-ocannl-54)

97f64ae

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: optimize away zeroing-out before non-reducing accumulating …

15621ad

…assignment (ahrefs#420) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: reproduce small-size Transformer digit addition results (ah…

c1a3e66

…refs#427) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: tiling and multi-threaded GPU kernels for v0.8 performance (a…

383d73f

…hrefs#412) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: study krnl and autograph for lessons applicable to OCANNL (a…

f25bd1e

…hrefs#277) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lukstafi and others added 6 commits April 15, 2026 10:21

proposal: use cuMemAllocHost for pinned staging buffers in CUDA backe…

f973b99

…nd (ahrefs#170) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add proposal for one-hot encoding optimization (gh-ocannl-343)

da5c981

Proposes Equality_with_index fetch op and simplify_llc pattern detection to replace one-hot * matrix multiply with direct row lookup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

proposal: restore low_level.Fill for dedicated backend fill operations (

36d4e09

ahrefs#151) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add proposal for gh-ocannl-291: audit manual sharing specification

9cc954e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lukstafi force-pushed the ludics/gh-ocannl-57-s1/root branch from 6eb00e5 to ee621a9 Compare April 15, 2026 08:22

lukstafi mentioned this pull request Apr 15, 2026

Add decoder-only autoregressive transformer example (#57) lukstafi/ocannl-staging#5

Merged

3 tasks

lukstafi closed this Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add decoder-only autoregressive transformer example (#57)#448

Add decoder-only autoregressive transformer example (#57)#448
lukstafi wants to merge 35 commits intoahrefs:masterfrom
lukstafi:ludics/gh-ocannl-57-s1/root

lukstafi commented Apr 15, 2026

Uh oh!

lukstafi commented Apr 15, 2026

Uh oh!

lukstafi commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lukstafi commented Apr 15, 2026

Summary

Test plan

Uh oh!

lukstafi commented Apr 15, 2026

Uh oh!

lukstafi commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant