Languages / 语言: English · 简体中文
An OCaml dialect with a Zig/BPF backend. We do not invent a new language. Source files use the standard
.mlextension. We replace the backend, not the frontend.
.ml source
│
▼
[ ocamlc -bin-annot ] ◀── upstream OCaml, used as a library, never forked
│ .cmt (Typedtree)
▼
[ zxc-frontend (small OCaml glue) ]
│ .cir.sexp (versioned wire format)
▼
[ omlz (Zig) : ANF → Core IR → ArenaStrategy → Lowered IR → Zig codegen ]
│ .zig
▼
[ zig build-lib -target bpfel-freestanding -femit-llvm-bc ]
│ .bc (LLVM bitcode)
▼
[ sbpf-linker --cpu v2 --export entrypoint ] ◀── v3 opt-in (ADR-013)
│
▼
Solana BPF .so
- Frontend: upstream OCaml
compiler-libs(no fork, no re-implementation). See ADR-009 / ADR-010. - Compiler host language for everything below the frontend: Zig 0.16.
- Source language: OCaml (subset, growing).
- Primary target: Solana BPF (
bpfel-freestanding). - Memory model (P3): arena, fully inferred, hidden from the user; BPF entry programs use a 32 KiB arena.
- Core IR shape: ANF (A-Normal Form), typed, layout-tagged.
- CLI binary name:
omlz(OCaml on Zig). - Build driver: a single
build.zigorchestrates both the OCaml frontend bridge and the Zig pipeline (ADR-011).
OCaml has an elegant frontend (HM types, ADTs, pattern matching, modules) and a battle-tested type system. What it lacks is a backend story for resource-constrained, deterministic environments such as Solana BPF, where the OCaml runtime (GC, boxed floats, exceptions) cannot run.
ZxCaml keeps the OCaml language and reuses its mental model, but routes the program through a new pipeline that produces flat, GC-free, BPF-ready code via Zig.
We deliberately do not fork an OCaml compiler distribution (upstream
OCaml or OxCaml). Instead, we use upstream compiler-libs as a library
for parsing and type-checking, and we own everything from Typedtree
onwards. The reasoning is captured in
docs/alternatives-considered.md
and locked in ADR-009 / ADR-010.
Because every ZxCaml program is by construction valid OCaml
(ADR-001), and because omlz already requires a working OCaml
toolchain on the developer's machine (ADR-010), the same .ml
file can be compiled and run two ways:
one .ml file
├── ocaml / dune → native x86_64 / arm64 binary (local testing, fuzzing, REPL)
└── omlz → Solana BPF .so (deployment)
This means ZxCaml does not need a dedicated x86 backend to give
you native execution. Install ocaml (which you already have
installed for omlz), or install OxCaml, and run the same file
with dune exec. The two paths compute the same result; this is
guaranteed by the determinism invariant (ADR-008).
For the longer discussion of how OxCaml relates to this project —
and why we still don't fork it — see
docs/oxcaml-relationship.md.
For full install details and troubleshooting, see Installing.
From the repository root, build omlz and the canonical Solana BPF example:
./init.sh && zig build && zig-out/bin/omlz build examples/solana_hello.ml --target=bpf -o sh.soThe command sequence uses the same init.sh setup script as CI.
P8 Compiler Optimizations is implemented. P1-P7 deliver the walking skeleton, subset expansion, Solana runtime integration, Mollusk test infrastructure, external declarations, Anchor IDL, functional persistent stdlib, region inference, and OCaml subset expansion (desugars, patterns, strings, expanded stdlib). P8 adds source-level compiler optimizations: constant folding, dead code elimination, self-recursive tail call optimization, and function inlining.
omlz works end-to-end: parse/type-check OCaml with upstream
compiler-libs → emit sexp 1.0 → lower through Core IR with constant folding, DCE, inlining, escape
analysis → interpret, build native Zig, build Solana BPF .so artifacts,
or emit Anchor-compatible IDL.
- CLI commands:
omlz check <file>,omlz check --no-alloc <file>,omlz run <file>,omlz build --target=native <file> -o <out>,omlz build --target=bpf <file> -o <out>,omlz idl <file> - Wire format: version 1.0 (P1
0.4; P2 added user ADTs in0.5, nested/guarded patterns in0.6, and tuples/records in0.7; P3 added account/syscall references in0.8and CPI types/references in0.9; P4 added instruction_data; P5 added external declarations; P6 added escape analysis annotations) - OCaml subset: let bindings, nested let, let rec, curried functions, function application, arithmetic/comparison operators, if/then/else, user-defined ADTs, nested constructor patterns, guarded match arms, literal constant patterns, or-patterns, alias patterns, tuples, records, field access, functional record update, lists (
[]/::), sequence expressions (;), function cases (function |), string operations (^, length, get, sub), char operations (code, chr), and pattern matching over all of those forms - Stdlib: bundled
List(length,map,filter,fold_left,rev,append,hd,tl,nth,exists,for_all,find,sort,combine,split),Option(is_none,is_some,value,get,fold),Result(is_ok,is_error,ok,error,map,bind),Fun(id,const,flip),Map(empty,singleton,add,find,remove,mem,size,to_list),Set(empty,singleton,add,mem,remove,size,to_list,union,inter),String(length,get,sub),Char(code,chr),Crypto(sha256,keccak256), andPubkey(zero,token_program,of_hex) modules - Memory model: arena-only with region inference for automatic stack allocation of non-escaping locals; BPF entry arena is 32 KiB
- Backends: tree-walk interpreter, Zig native codegen, BPF codegen via
sbpf-linker --cpu v2 - Solana accounts: built-in
accountrecord values expose key, lamports, data, owner, and signer/writable/executable flags parsed from the BPF input buffer as zero-copy views; the runtime parser also tracks rent epoch - Solana syscalls: bindings for logging,
sol_log_64, pubkey logging, SHA-256/Keccak, Clock/Rent sysvars, and remaining compute units useexternaldeclarations to bind directly to Zig runtime symbols - External declarations:
external name : type = "zig_symbol"syntax enables direct FFI to Zig runtime functions with type safety enforced by the frontend - CPI and PDA helpers: built-in
instruction/account_metarecords,invoke,invoke_signed, PDA helpers, and return-data syscalls mirror the Solana C ABI - SPL-Token: helper support and an acceptance example encode legacy Tokenkeg Transfer instructions with source/destination/authority metas
- no_alloc:
omlz check --no-allocruns a conservative Core IR allocation proof and reports the allocation-causing node on failure - IDL:
omlz idl <file>emits Anchor 0.30+ compatible JSON with SHA-256 discriminators, instruction accounts/args, account types, events, errors, and constants - BPF closures: hardened first-class closures — closures capturing ADT values, multi-environment captures, and nested closures are lowered without unsupported BPF code-pointer relocations and are covered by Solana closure acceptance tests
- Solana acceptance: deploy + invoke against
solana-test-validatorworks for the canonical hello harness, closure harness, account/syscall harness, simple CPI harness, and SPL-Token transfer harness - Region inference: automatic escape analysis marks non-escaping local values for stack allocation, reducing arena pressure and improving BPF compute efficiency
- Constant folding: compile-time evaluation of arithmetic, comparison, string concatenation, boolean conditions, and known-constructor matches in Core IR
- Dead code elimination: removes unused let bindings (preserving side-effectful and potentially trapping operations) and unreachable if branches
- Tail call optimization: self-recursive tail calls are detected during ANF lowering and emitted as
while (true)loops in generated Zig, enabling deep recursion (n > 10000) without stack overflow - Function inlining: small single-expression functions (≤3 Core IR nodes) are inlined at call sites with alpha-renaming, enabling further constant folding; supports all types including String, ADT, Tuple, and Record
- Determinism: interpreter ≡ Zig native across the P1 + P2 + P3 + P4 + P5 + P6 + P7 examples corpus
- CI: GitHub Actions workflow with
macos-latest+ubuntu-latestmatrix runs./init.sh,zig build,zig build test,cargo test(Mollusk SVM), P3no_allocand IDL smoke checks, Mollusk tests, and an examplesomlz checkcorpus loop - Mollusk SVM tests: 10 integration tests in
tests/using Mollusk SVM v0.12.1 (hello, demo, simple_cpi, counter, vault, external_demo, crypto_demo) - Diagnostics: human-friendly
path:line:col: severity: messagerendering - Examples: 42 programs in
examples/, including ADT, nested/guarded pattern, tuple, record, stdlib, closure, BPF smoke, account/syscall, CPI, SPL-Token, counter, vault, external demo, crypto demo, multi-instruction, region allocation, string demo, and tail recursion (TCO) programs - Golden/UI tests: Core IR/sexp snapshot and UI tests run through
zig build test - Install:
./init.sh && zig build(see INSTALLING.md)
Read in order:
| # | Doc | What it pins down |
|---|---|---|
| — | Installing | Fresh setup, prerequisites, quickstart, and troubleshooting |
| 00 | Overview | Vision, scope, three cold showers (anti-traps) |
| 01 | Architecture | Pipeline, layered IR, extension points |
| 02 | Grammar | OCaml subset accepted through P2 |
| 03 | Core IR | ANF IR data model, the central contract |
| 04 | Memory model | Arena-only current model, region descriptor for the future |
| 05 | Backends | Zig codegen, tree-walk interpreter, backend trait |
| 06 | BPF target | Toolchain chain to Solana .so (zig + sbpf-linker) |
| 07 | Repo layout | Directory contract, who owns what |
| 08 | Roadmap | Phases P1–P7, with P1/P2 release notes |
| 09 | Decisions (ADRs) | Locked decisions, with reasons |
| 10 | Frontend bridge | OCaml compiler-libs → sexp → Zig |
| 11 | Solana P3 guide | Account layout, syscalls, CPI, SPL-Token, no_alloc, IDL, and CI coverage |
| — | Alternatives considered | Why not self-write, why not fork OxCaml |
| — | OxCaml relationship | What OxCaml is, four ways to "use" it, which to pick |
| — | zignocchio relationship | The Zig→Solana SDK we read for ideas, what we learned, what we did not import (ADR-014) |
Borrow OCaml's frontend. Throw away its runtime. Land on BPF via Zig.
Borrow ≠ fork. We call
compiler-libsas a library; we never patch it.