Target: ICFP 2026 week (August 24, 2026)
This roadmap outlines the development plan for OCANNL from the current state to version 1.0, incorporating academic paper milestones for workshops collocated with ICFP 2026 (OCaml Workshop, FProPer). Dates indicate end of period targets.
Theme: Parser fixes and shape error detection
-
Menhir einsum parser stabilization
- Fix any remaining issues with the recently migrated Menhir-based parser
- Ensure proper error messages for malformed einsum specifications
-
Missing hidden dimensions detection ✓ (implemented)
- Verify the existing implementation works as expected
- Fix/validate the failing test case
Theme: Convolution padding
-
Padding inference for convolutions (#354, #386)
- Integrate padding inference into shape inference pipeline
- Refactor
use_paddingfrom global setting toConv_inputconstructor field - Synthetic toy CNN example: counting
-
Example: Sokoban RL (stretch goal)
- Policy gradient example with CNN architecture
-
HIP backend (#411) — parallel / background task, can slip to v0.6.4
- Standalone bindings package for HIP (AMD hardware)
- Backend implementation
Theme: Shape concatenation and position embeddings
-
Axis concatenation in einsum (#49)
- Implement
^syntax for tensor stacking/concatenation - Handle shifting as special case:
1^i=>ifor left shift - Handle padding as special case:
i=>1^ifor left padding
- Implement
-
Dimension units vs axis labels (#298)
- Clarify design decisions
- Document rationale
-
RoPE and position embeddings (#398)
- Rotary Position Embeddings implementation
- Other non-learned position embedding variants
-
Transformer toy example (#57)
- Fully working decoder-only autoregressive transformer
- Names dataset language model
Theme: Frontend finalization (paper-ready for workshop submissions)
This is the "paper-ready" release with mature frontend API.
-
Context handling finalization
- Migrate from "hosted tensor" idea to always requiring context
- Clean, consistent API for tensor access and device handling
-
Remove hosted tensor mode (#333)
- Get rid of
arrayfield ofTnode.t - Remove or minimize
Ndarraymodule
- Get rid of
-
Deprecated streams cleanup
- Remove legacy streams functionality
-
Tensor persistence (#373)
- Tensor saving, loading, and restoring
-
Documentation
- Context API slides (README item 3)
Paper should use v0.7.0 examples demonstrating the mature frontend. Target venues: OCaml Workshop and FProPer (if it happens), both collocated with ICFP 2026. Workshop submission deadlines are typically May–June.
Theme: Real-world examples and backend polish
-
Tokenizer bindings
- Bindings to tokenizer from llama.cpp or equivalent
-
Transformer inference demo (#377)
- Inference for a small open-weights model (GPT-2, LLaMA, or Gemma)
-
CNN training examples
- MNIST training setup
- CIFAR-10 training setup
-
Backend improvements
- Resolve multicore_cc non-determinism (#341)
- MSVC support for Windows C backend (#313)
- Add
-march=nativeoptimization (#311)
Theme: Compiler optimizations
-
Optimizations
- Loop invariant hoisting (#350)
- Common subexpression elimination (#351)
- Local scope initialization tracking (#340)
-
Memory management
- Universal Pool Allocator (#344)
Theme: GPU-style performance
This is a substantial milestone requiring ~2 months.
-
Tiling optimizations (inspired by #412)
- Fast matrix multiplication patterns from Böhm's CPU article
- CUDA matmul optimizations from Böhm's CUDA article
- Lessons from llm.c
-
Megakernel exploration (#318)
- Investigate megakernel approach alignment with OCANNL design
- May require splitting routines into multiple kernels
-
Metal optimizations (#320)
- Use private mode appropriately
Theme: Program search and optimization
This is a research-heavy milestone requiring ~2.5 months.
-
Static scheduling via program search
- Alternative to tinygrad's dynamic scheduling
- Halide-inspired search strategies
-
Cost functions
- Per-backend execution-based metrics
- Aggregate cost functions across backends
-
Code graph rewriting
- Broader range of rewriting rules
- Tiling and layout mechanism augmentation
Theme: Documentation, completeness, ergonomics
-
Documentation completeness
- Lowering and inlining documentation (#296)
- einops.rocks comparison documentation (#413)
-
Feature completeness
- Address select "explore" issues to demonstrate capability
- Sharding and slicing with minimal copying (#293)
-
Ergonomics
- Concise syntax for merge buffer transfers
- Execution dependency tracking (mirroring compilation)
- Improve configuration handling (#409)
-
Safety
- Merge buffer static verification (#288)
- Memory mode sharing audit (#291)
-
Shape inference enhancements
- Axis labels (as opposed to dimension units)
- Shape schemes for tensor functions (#404)
-
Advanced examples
- BERT/ModernBERT implementation (#297)
- LLM101n replication (#275)
- DisTrO distributed training (#278)
-
Performance explorations
- Lean Attention / Flash Attention (#263)
- Quantization for optimizers (#271)
- XLA backend (#300)
| Version | Target | Duration | Key Deliverables |
|---|---|---|---|
| v0.6.2 | End Nov 2025 | now | Menhir parser fixes, hidden dimension errors |
| v0.6.3 | Mid-Dec 2025 | 2.5 weeks | Padding inference |
| v0.6.4 | End Dec 2025 | 2 weeks | Shape concatenation |
| v0.6.5 | Mid-Jan 2026 | 2 weeks | RoPE, transformer toy example |
| v0.7.0 | End Feb 2026 | 2 weeks | Frontend finalization (workshop paper-ready) |
| v0.7.1 | Mid-Mar 2026 | 6 weeks | Real-world examples, backend polish |
| v0.7.2 | Mid-Apr 2026 | 4 weeks | Compiler optimizations, pool allocator |
| v0.8 | Mid-Jun 2026 | 2 months | GPU tiling, megakernels |
| v0.9 | Aug 24, 2026 | 2.5 months | Program search (ICFP week) |
| v1.0 | End Oct 2026 | 2 months | Documentation, completeness, safety |
Target deadline: May–June 2026 (exact dates TBD per workshop CFP)
"Generalized Einsum with Row Variables: Shape Inference for Deep Learning in OCaml"
- Generalized einsum notation with convolutions, strided iteration, and concatenation
- Row variables for flexible axis handling ("principle of least commitment")
- Constraint-based shape inference with provenance tracking for error messages
- Dimension units design rationale (vs axis labels)
- Integration with OCaml's type system via syntax extensions
- einops (#413)
- torchdim/DumPy (#316)
- Named tensors in PyTorch/JAX
- Dependent types for tensor shapes
- Nov 2025: Release v0.6.2
- Dec 2025: Finish v0.6.3; outline paper, literature review
- Jan–Feb 2026: Finish v0.6.4–0.6.5 (concatenation, RoPE, transformer)
- Feb–Mar 2026: Finish v0.7.0 (frontend finalization); first draft with v0.7.0 examples
- Apr–May 2026: Paper revision and polish
- May–June 2026: Workshop submission deadline (OCaml Workshop / FProPer)
The paper needs working examples with OCANNL's mature frontend:
- Clean context-based API (no hosted tensors)
- Shape concatenation syntax (
^) - Complete transformer example with RoPE
- Consistent, documented API surface