Add decoder-only autoregressive transformer example (#57)#448
Closed
lukstafi wants to merge 35 commits intoahrefs:masterfrom
Closed
Add decoder-only autoregressive transformer example (#57)#448lukstafi wants to merge 35 commits intoahrefs:masterfrom
lukstafi wants to merge 35 commits intoahrefs:masterfrom
Conversation
72c5cba to
6eb00e5
Compare
Train a small decoder-only transformer on sequences from an 8-state binary-input finite state machine. The model learns the FSM transition function, demonstrated by >90% valid-transition accuracy on held-out sequences. Key implementation details: - Single attention block without layer_norm (recentered init workaround) - Separate forward-only inference routine sharing trained weights - Valid-transition accuracy metric for hidden-input FSM evaluation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…key crash Add proposal for task-bb30d0be covering two independent test failures: update test_ndarray_binary_io.expected and fix Map.of_alist_exn crash in derive_projections for 1x1 output convolution/pooling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add per-test output directory isolation via build_files_prefix config option to prevent flaky test failures from parallel test execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…assignment (ahrefs#420) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…refs#427) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…hrefs#412) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers aligned memory allocation, compiler flag validation, auto-vectorization- friendly code generation (restrict, pragma hints, aligned locals), and platform detection macros. Scoped as foundation for tiling task's explicit SIMD micro-kernels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…#253) Gap analysis identifying which llm.c techniques apply to OCANNL, which are already covered by existing tasks (tiling ahrefs#412, AVX ahrefs#164, megakernels ahrefs#318), and which unique lessons remain (warp shuffles, fused reductions, GELU, AdamW, vectorized memory access). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…hrefs#277) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nd (ahrefs#170) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Proposes Equality_with_index fetch op and simplify_llc pattern detection to replace one-hot * matrix multiply with direct row lookup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ahrefs#151) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…hrefs#57) Add decoder_only_block and decoder_only to nn_blocks.ml as reusable building blocks for autoregressive language models (masked self-attention + FFN, no cross-attention). Add test/training/transformer_names.ml: a complete training + generation example using character-level encoding on the Names dataset with causal masking, SGD training, and autoregressive token-by-token generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exercises the new Nn_blocks.decoder_only helper with a 2-layer stack, causal mask, and forward pass, validating output shape. This ensures the new public API added in the previous commit has CI coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6eb00e5 to
ee621a9
Compare
Collaborator
Author
|
@codex review Focus on bugs, correctness issues, and edge cases. Do not check adherence to a spec or plan. |
3 tasks
Collaborator
Author
|
Closing: this PR was filed against the wrong repo due to a gh-resolved bug (see lukstafi/ludics#302). Ported to lukstafi#5. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
decoder_only_blockanddecoder_onlytolib/nn_blocks.ml— reusable building blocks for decoder-only transformers (masked self-attention + FFN, no cross-attention)test/training/transformer_names.ml— a complete training + autoregressive generation example on the Names dataset using character-level encoding and causal maskingtest/operations/decoder_only_test.ml— regression test exercising the newdecoder_onlyAPI with a 2-layer stack forward passTest plan
dune build @checkpassesdune runtest test/training/transformer_names.mlpasses (loss decreases, generates name-like output)dune runtest test/operations/decoder_only_test.mlpasses (output shape validated)🤖 Generated with Claude Code