Skip to content

DaviRain-Su/zxcaml

Repository files navigation

ZxCaml

Languages / 语言: English · 简体中文

An OCaml dialect with a Zig/BPF backend. We do not invent a new language. Source files use the standard .ml extension. We replace the backend, not the frontend.


TL;DR

.ml source
   │
   ▼
[ ocamlc -bin-annot ]    ◀── upstream OCaml, used as a library, never forked
   │  .cmt (Typedtree)
   ▼
[ zxc-frontend (small OCaml glue) ]
   │  .cir.sexp  (versioned wire format)
   ▼
[ omlz (Zig)  : ANF → Core IR → ArenaStrategy → Lowered IR → Zig codegen ]
   │  .zig
   ▼
[ zig build-lib -target bpfel-freestanding -femit-llvm-bc ]
   │  .bc (LLVM bitcode)
   ▼
[ sbpf-linker --cpu v2 --export entrypoint ]    ◀── v3 opt-in (ADR-013)
   │
   ▼
Solana BPF .so
  • Frontend: upstream OCaml compiler-libs (no fork, no re-implementation). See ADR-009 / ADR-010.
  • Compiler host language for everything below the frontend: Zig 0.16.
  • Source language: OCaml (subset, growing).
  • Primary target: Solana BPF (bpfel-freestanding).
  • Memory model (P3): arena, fully inferred, hidden from the user; BPF entry programs use a 32 KiB arena.
  • Core IR shape: ANF (A-Normal Form), typed, layout-tagged.
  • CLI binary name: omlz (OCaml on Zig).
  • Build driver: a single build.zig orchestrates both the OCaml frontend bridge and the Zig pipeline (ADR-011).

Why this exists

OCaml has an elegant frontend (HM types, ADTs, pattern matching, modules) and a battle-tested type system. What it lacks is a backend story for resource-constrained, deterministic environments such as Solana BPF, where the OCaml runtime (GC, boxed floats, exceptions) cannot run.

ZxCaml keeps the OCaml language and reuses its mental model, but routes the program through a new pipeline that produces flat, GC-free, BPF-ready code via Zig.

We deliberately do not fork an OCaml compiler distribution (upstream OCaml or OxCaml). Instead, we use upstream compiler-libs as a library for parsing and type-checking, and we own everything from Typedtree onwards. The reasoning is captured in docs/alternatives-considered.md and locked in ADR-009 / ADR-010.


Native execution comes for free

Because every ZxCaml program is by construction valid OCaml (ADR-001), and because omlz already requires a working OCaml toolchain on the developer's machine (ADR-010), the same .ml file can be compiled and run two ways:

one .ml file
  ├── ocaml / dune  →  native x86_64 / arm64 binary   (local testing, fuzzing, REPL)
  └── omlz          →  Solana BPF .so                 (deployment)

This means ZxCaml does not need a dedicated x86 backend to give you native execution. Install ocaml (which you already have installed for omlz), or install OxCaml, and run the same file with dune exec. The two paths compute the same result; this is guaranteed by the determinism invariant (ADR-008).

For the longer discussion of how OxCaml relates to this project — and why we still don't fork it — see docs/oxcaml-relationship.md.


Quickstart

For full install details and troubleshooting, see Installing. From the repository root, build omlz and the canonical Solana BPF example:

./init.sh && zig build && zig-out/bin/omlz build examples/solana_hello.ml --target=bpf -o sh.so

The command sequence uses the same init.sh setup script as CI.

Status

P8 Compiler Optimizations is implemented. P1-P7 deliver the walking skeleton, subset expansion, Solana runtime integration, Mollusk test infrastructure, external declarations, Anchor IDL, functional persistent stdlib, region inference, and OCaml subset expansion (desugars, patterns, strings, expanded stdlib). P8 adds source-level compiler optimizations: constant folding, dead code elimination, self-recursive tail call optimization, and function inlining.

omlz works end-to-end: parse/type-check OCaml with upstream compiler-libs → emit sexp 1.0 → lower through Core IR with constant folding, DCE, inlining, escape analysis → interpret, build native Zig, build Solana BPF .so artifacts, or emit Anchor-compatible IDL.

Current features

  • CLI commands: omlz check <file>, omlz check --no-alloc <file>, omlz run <file>, omlz build --target=native <file> -o <out>, omlz build --target=bpf <file> -o <out>, omlz idl <file>
  • Wire format: version 1.0 (P1 0.4; P2 added user ADTs in 0.5, nested/guarded patterns in 0.6, and tuples/records in 0.7; P3 added account/syscall references in 0.8 and CPI types/references in 0.9; P4 added instruction_data; P5 added external declarations; P6 added escape analysis annotations)
  • OCaml subset: let bindings, nested let, let rec, curried functions, function application, arithmetic/comparison operators, if/then/else, user-defined ADTs, nested constructor patterns, guarded match arms, literal constant patterns, or-patterns, alias patterns, tuples, records, field access, functional record update, lists ([] / ::), sequence expressions (;), function cases (function |), string operations (^, length, get, sub), char operations (code, chr), and pattern matching over all of those forms
  • Stdlib: bundled List (length, map, filter, fold_left, rev, append, hd, tl, nth, exists, for_all, find, sort, combine, split), Option (is_none, is_some, value, get, fold), Result (is_ok, is_error, ok, error, map, bind), Fun (id, const, flip), Map (empty, singleton, add, find, remove, mem, size, to_list), Set (empty, singleton, add, mem, remove, size, to_list, union, inter), String (length, get, sub), Char (code, chr), Crypto (sha256, keccak256), and Pubkey (zero, token_program, of_hex) modules
  • Memory model: arena-only with region inference for automatic stack allocation of non-escaping locals; BPF entry arena is 32 KiB
  • Backends: tree-walk interpreter, Zig native codegen, BPF codegen via sbpf-linker --cpu v2
  • Solana accounts: built-in account record values expose key, lamports, data, owner, and signer/writable/executable flags parsed from the BPF input buffer as zero-copy views; the runtime parser also tracks rent epoch
  • Solana syscalls: bindings for logging, sol_log_64, pubkey logging, SHA-256/Keccak, Clock/Rent sysvars, and remaining compute units use external declarations to bind directly to Zig runtime symbols
  • External declarations: external name : type = "zig_symbol" syntax enables direct FFI to Zig runtime functions with type safety enforced by the frontend
  • CPI and PDA helpers: built-in instruction / account_meta records, invoke, invoke_signed, PDA helpers, and return-data syscalls mirror the Solana C ABI
  • SPL-Token: helper support and an acceptance example encode legacy Tokenkeg Transfer instructions with source/destination/authority metas
  • no_alloc: omlz check --no-alloc runs a conservative Core IR allocation proof and reports the allocation-causing node on failure
  • IDL: omlz idl <file> emits Anchor 0.30+ compatible JSON with SHA-256 discriminators, instruction accounts/args, account types, events, errors, and constants
  • BPF closures: hardened first-class closures — closures capturing ADT values, multi-environment captures, and nested closures are lowered without unsupported BPF code-pointer relocations and are covered by Solana closure acceptance tests
  • Solana acceptance: deploy + invoke against solana-test-validator works for the canonical hello harness, closure harness, account/syscall harness, simple CPI harness, and SPL-Token transfer harness
  • Region inference: automatic escape analysis marks non-escaping local values for stack allocation, reducing arena pressure and improving BPF compute efficiency
  • Constant folding: compile-time evaluation of arithmetic, comparison, string concatenation, boolean conditions, and known-constructor matches in Core IR
  • Dead code elimination: removes unused let bindings (preserving side-effectful and potentially trapping operations) and unreachable if branches
  • Tail call optimization: self-recursive tail calls are detected during ANF lowering and emitted as while (true) loops in generated Zig, enabling deep recursion (n > 10000) without stack overflow
  • Function inlining: small single-expression functions (≤3 Core IR nodes) are inlined at call sites with alpha-renaming, enabling further constant folding; supports all types including String, ADT, Tuple, and Record
  • Determinism: interpreter ≡ Zig native across the P1 + P2 + P3 + P4 + P5 + P6 + P7 examples corpus
  • CI: GitHub Actions workflow with macos-latest + ubuntu-latest matrix runs ./init.sh, zig build, zig build test, cargo test (Mollusk SVM), P3 no_alloc and IDL smoke checks, Mollusk tests, and an examples omlz check corpus loop
  • Mollusk SVM tests: 10 integration tests in tests/ using Mollusk SVM v0.12.1 (hello, demo, simple_cpi, counter, vault, external_demo, crypto_demo)
  • Diagnostics: human-friendly path:line:col: severity: message rendering
  • Examples: 42 programs in examples/, including ADT, nested/guarded pattern, tuple, record, stdlib, closure, BPF smoke, account/syscall, CPI, SPL-Token, counter, vault, external demo, crypto demo, multi-instruction, region allocation, string demo, and tail recursion (TCO) programs
  • Golden/UI tests: Core IR/sexp snapshot and UI tests run through zig build test
  • Install: ./init.sh && zig build (see INSTALLING.md)

Documents

Read in order:

# Doc What it pins down
Installing Fresh setup, prerequisites, quickstart, and troubleshooting
00 Overview Vision, scope, three cold showers (anti-traps)
01 Architecture Pipeline, layered IR, extension points
02 Grammar OCaml subset accepted through P2
03 Core IR ANF IR data model, the central contract
04 Memory model Arena-only current model, region descriptor for the future
05 Backends Zig codegen, tree-walk interpreter, backend trait
06 BPF target Toolchain chain to Solana .so (zig + sbpf-linker)
07 Repo layout Directory contract, who owns what
08 Roadmap Phases P1–P7, with P1/P2 release notes
09 Decisions (ADRs) Locked decisions, with reasons
10 Frontend bridge OCaml compiler-libs → sexp → Zig
11 Solana P3 guide Account layout, syscalls, CPI, SPL-Token, no_alloc, IDL, and CI coverage
Alternatives considered Why not self-write, why not fork OxCaml
OxCaml relationship What OxCaml is, four ways to "use" it, which to pick
zignocchio relationship The Zig→Solana SDK we read for ideas, what we learned, what we did not import (ADR-014)

One-line summary

Borrow OCaml's frontend. Throw away its runtime. Land on BPF via Zig.

Borrow ≠ fork. We call compiler-libs as a library; we never patch it.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors