OCaml microbenchmark suite ported from sandmark, organised into simple/, with_deps/, with_packages/, and multicore/. Each benchmark ships its own *.build.sh script honouring a small, well-documented build contract.
Designed to work two ways:
- Standalone — activate any opam switch with
duneonPATHand run a benchmark's*.build.shto compile a binary you can execute directly. See §Build Script Contract. - Orchestrated — used as the benchmark backend for running-ng, which manages opam switches per OCaml runtime and drives sweeps. See §Integration With
running-ng.
Benchmarks are organised into four top-level groups:
benches/
simple/ # stdlib / unix only; single build script, no generated data
with_deps/ # require dune multi-library builds or generated input data
with_packages/ # require external opam packages (zarith, lwt, decompress, yojson, …)
macrobenchmarks/ # real-world tools installed via opam; benchmark the installed binary
Each benchmark lives in its own subfolder:
benches/
simple/
<benchmark-name>/
<benchmark-name>.ml # source (or multiple .ml files)
<benchmark-name>.build.sh # builds the binary via ocamlopt or dune
with_deps/
<benchmark-name>/
<source files ...>
<benchmark-name>.build.sh # builds the benchmark binary
<benchmark-name>.build.deps.sh # generates runtime-independent input data
macrobenchmarks/
<tool-name>/
<tool-name>.build.sh # installs via opam and copies the binary
<input files ...> # workload inputs (.why, .v, .mly, .cub, …)
For benchmarks in with_deps/ that require pre-generated input data (e.g. a
graph edge list), a companion <benchmark>.build.deps.sh script handles data
generation. The main build.sh calls it automatically before building the
binary.
Key properties:
build.deps.shreceives the same env vars asbuild.sh(in particularRUNNING_OCAML_BENCH_DIR). The opam switch is activated, so compiler tools are onPATH.- Generated data files are placed in the benchmark directory and are runtime-version-independent: the script skips generation if the file already exists, so data is produced once and reused across all compiler versions in a sweep.
For OCamlBenchmarkSuite, when path points to a directory, running-ng uses build mode.
Conventions used by default:
- build script:
<benchmark-name>.build.sh - output binary:
<benchmark-name>-<runtime-name>
So benchmark almabench with runtime ocaml-local produces:
simple/almabench/almabench-ocaml-local
You can override this in running-ng config with build_script and binary, but the convention above means those fields are usually unnecessary.
running-ng creates an opam switch for each runtime (via opam-compiler) and
activates that switch's environment before invoking each build script. Build
scripts can therefore assume the compiler (ocamlopt), dune, and any
packages installed in the switch are available on PATH.
Every build script honors the same env-var contract — the same set used by ~/macro-benches/ — so a script behaves identically whether running-ng invokes it or you run it by hand:
| Variable | Meaning | Fallback when unset |
|---|---|---|
RUNNING_OCAML_BENCH_DIR |
Directory containing this benchmark's sources | The script's own directory ($(cd "$(dirname "$0")" && pwd)) |
RUNNING_OCAML_OUTPUT |
Path where the built binary must be written | ${BENCH_DIR}/<name>-${RUNTIME_NAME} |
RUNNING_OCAML_RUNTIME_NAME |
Runtime identifier (e.g. ocaml-5.4.1) |
runtime |
RUNNING_OCAML_SWITCH |
Opam switch name (when applicable) | unset |
When running-ng drives the build, it sets all four. When you run the script standalone, the fallbacks resolve to something sensible — usually placing the binary next to the source file — so bash ./<name>.build.sh just works.
Every benchmark follows the same structure. Pick the template that matches
your benchmark and replace <name> with the benchmark name.
benches/<group>/<name>/
<name>.ml # source file(s)
dune # dune build file
dune-project # (generate-opam-files false)
<name>.build.sh # build script
(executable
(name <name>)
(modules <name>)
(libraries unix) ; add libraries as needed
(modes native)
(ocamlopt_flags (:standard -O3)))All build scripts follow the same pattern: resolve BENCH_DIR and OUT from the env-var contract (§Build Script Contract), do the build, copy the binary to OUT. Pick the right template:
Template A — No external packages (simple/, multicore/effects, with_deps/):
#!/usr/bin/env bash
set -euo pipefail
BENCH_DIR="${RUNNING_OCAML_BENCH_DIR:-$(cd "$(dirname "$0")" && pwd)}"
OUT="${RUNNING_OCAML_OUTPUT:-${BENCH_DIR}/<name>-${RUNNING_OCAML_RUNTIME_NAME:-runtime}}"
dune build --root "${BENCH_DIR}" --profile release <name>.exe
cp "${BENCH_DIR}/_build/default/<name>.exe" "${OUT}"
chmod +x "${OUT}"Template B — With opam packages (with_packages/, multicore/numerical):
#!/usr/bin/env bash
set -euo pipefail
BENCH_DIR="${RUNNING_OCAML_BENCH_DIR:-$(cd "$(dirname "$0")" && pwd)}"
OUT="${RUNNING_OCAML_OUTPUT:-${BENCH_DIR}/<name>-${RUNNING_OCAML_RUNTIME_NAME:-runtime}}"
opam install <packages> -y
dune build --root "${BENCH_DIR}" --profile release <name>.exe
cp "${BENCH_DIR}/_build/default/<name>.exe" "${OUT}"
chmod +x "${OUT}"For real-world OCaml applications (alt-ergo, coq, cpdf, menhir, …),
use the vendored monorepo at ~/macro-benches/,
which builds every tool from a single dune workspace and follows the
same env-var contract.
Add the benchmark to the appropriate suite in the shared micro base
(running-ng/src/running/config/base/ocaml/micro_base.yml):
suites:
<suite-name>:
programs:
<name>:
path: "${RUNNING_BENCH_DIR}/<group>/<name>"
args: "<arguments>"
benchmarks:
<suite-name>:
- <name>cd ~/benches && make cleanThe Makefile provides three targets:
clean— runs bothclean-duneandclean-with-deps, then removes compiled objects (.o,.a,.so,.cmi,.cmx,.cmxa,.cmo,.cma,.cmt,.cmti,.annot,.opt) and tagged benchmark binaries (*-ocaml-*,*-oxcaml-*).clean-dune— removes_buildand_build-runningdirectories (dune build caches).clean-with-deps— removes generated input data (e.g.graph500seq/edges.data).
When to clean: After changing compiler flags (e.g. adding --enable-multidomain to an OxCaml runtime), stale cached binaries may mask the change. Run make clean to force a rebuild on the next benchmark run.
All benchmarks below are sourced from sandmark unless noted otherwise.
Build approach is either ocamlopt (single .ml compiled directly) or dune (multi-file, uses a dune file in the benchmark dir).
Counts are based on the build scripts present in this repo (~/benches).
For macrobenchmarks, each build script installs one tool but runs it as multiple programs with different inputs — the count reflects individual programs.
Sequential and multicore benchmarks are registered in running-ng's ocaml_gc_sweep_example.yml (and gc_sweep_all_versions.yml).
Macrobenchmarks are registered in running-ng's macrobenchmarks.yml.
| Directory | Programs | Requires |
|---|---|---|
simple/ |
39 | stdlib / unix |
with_deps/ |
10 | dune multi-lib or generated data |
with_packages/ |
20 | external opam packages |
macrobenchmarks/ |
14 | opam-installed tools (alt-ergo, coq, cpdf, cubicle, frama-c, menhir) |
multicore/multicore-effects |
17 | OCaml ≥ 5, effects |
multicore/multicore-structures |
7 | OCaml ≥ 5, stdlib Atomic |
multicore/multicore-numerical |
23 | OCaml ≥ 5, domainslib |
multicore/multicore-grammatrix |
2 | OCaml ≥ 5, domainslib |
multicore/multicore-minilight |
1 | OCaml ≥ 5, domainslib |
multicore/alloc_multicore |
1 | OCaml ≥ 5, stdlib Domain |
multicore/pingpong_multicore |
1 | OCaml ≥ 5, domainslib |
multicore/graph500par |
1 | OCaml ≥ 5, domainslib |
multicore/oxcaml-prefetch |
1 | OxCaml compiler fork |
multicore/multicore-gcroots |
3 | OCaml ≥ 5, C stubs (CAML_INTERNALS) |
| Total | 140 |
- Source: sandmark
benchmarks/markbench/ - Build: dune +
unix - Args: (none) — defaults to 10
Gc.full_majorcycles; pass an integer to override - Description: Microbenchmark for the major GC mark phase. Allocates a large live set and calls
Gc.full_majorrepeatedly, measuring seconds per GC cycle. Sensitive too(space overhead) ands(minor heap size).
- Source: sandmark
benchmarks/multicore-minilight/sequential/ - Build: dune (stdlib only, multi-file), in
simple/minilight/ - Args:
<scene-file>— absolute path toroomfront.ml.txt; use/home/udesou/benches/simple/minilight/roomfront.ml.txt - Description: Sequential MiniLight 1.5.2 global illumination renderer. Traces rays through a Cornell box scene using an octree spatial index; exercises float arithmetic, object-oriented style (classes), and moderate allocation. The sandmark dune listed
domainslibbut the sequential sources do not use it. - Note: The parallel version is
multicore/multicore-minilight/minilight_multicore.
- Source: sandmark
benchmarks/almabench/(originally OCamlPro's ocamlbench-repo) - Build: ocamlopt (stdlib only)
- Args: (none)
- Description: Floating-point benchmark computing energy levels of a quantum-mechanical system. Exercises the minor heap heavily with small float arrays.
- Source: sandmark
benchmarks/bdd/(originally OCamlPro's ocamlbench-repo) - Build: ocamlopt (stdlib only)
- Args: (none)
- Description: Binary Decision Diagram operations (AND, OR, NOT, quantification) on propositional formulae. Pointer-heavy graph structure; exercises major GC and sharing.
- Source: sandmark
benchmarks/hamming/ - Build: ocamlopt (stdlib only)
- Args:
<N>— number of Hamming numbers to iterate over; config uses500000 - Description: Generates the infinite lazy Hamming sequence (numbers of the form 2^i × 3^j × 5^k) using lazy streams and lazy merging. Exercises lazy allocation and minor GC.
- Source: sandmark
benchmarks/soli/ - Build: ocamlopt (stdlib only)
- Args:
<nruns>— number of solver runs; config uses50 - Description: Peg solitaire solver using backtracking search. Exercises call stack and moderate allocation; useful for testing the interaction between recursion depth and minor heap pressure.
- Source: sandmark
benchmarks/kb/(originally OCamlPro's ocamlbench-repo) - Build: ocamlopt (stdlib only)
- Args: (none) — runs 100 iterations of Knuth-Bendix completion internally
- Description: Knuth-Bendix completion procedure (with exceptions). Algebraic term rewriting; heavily allocates and collects term structures. A classic OCaml GC benchmark.
- Source: sandmark
benchmarks/kb/kb_no_exc.ml(shares directory withkb) - Build: ocamlopt (stdlib only) — build script is
kb_no_exc.build.shinbenches/kb/ - Args: (none) — runs 100 iterations of Knuth-Bendix completion internally
- Description: Same algorithm as
kbbut with the exception-based search replaced by an explicit option type. Useful for comparing exception overhead against allocation/GC cost.
- Source: sandmark
benchmarks/lexifi-g2pp/(originally OCamlPro's ocamlbench-repo) - Build: dune (stdlib only, multi-file; entry point:
main.exe) - Args: (none)
- Description: Calibrates a G2++ two-factor interest rate model (LexiFi's financial library benchmark). Involves iterative numerical optimisation over a large structured dataset. Exercises both arithmetic and moderate allocation in a realistic workload.
- Source: sandmark
benchmarks/zdd/ - Build: ocamlopt (stdlib only)
- Args:
<words-file>— absolute path towords.txt; use/home/udesou/benches/simple/zdd/words.txt - Description: Zero-suppressed Binary Decision Diagram (ZDD) operations over an English word dictionary. Builds a ZDD from all words, then counts matches for a pattern query. Exercises pointer-heavy DAG structures similar to
bdd. - Note: The run cwd is a temp dir, so the word file must be passed as an absolute path.
- Source: sandmark
benchmarks/benchmarksgame/fannkuchredux.ml - Build: ocamlopt (stdlib only), compiled with
-noassert -unsafeas in sandmark - Args:
<N>— permutation length; config uses11 - Description: Counts the maximum number of flips needed to sort a permutation, and sums the sign of each intermediate permutation (Pfannkuchen benchmark). Pure computation with no allocation; useful as a control benchmark where GC has negligible impact.
Six benchmarks sharing benches/numerical-analysis/. Each has its own build script; two require two source files compiled in order.
- Source: sandmark
benchmarks/numerical-analysis/crout_decomposition.ml(originally OCamlPro's ocamlbench-repo) - Build: ocamlopt (stdlib only)
- Args: (none)
- Description: Crout matrix decomposition (LU factorisation variant) on a fixed matrix. Dense linear algebra; exercises float array allocation.
- Source: sandmark
benchmarks/numerical-analysis/qr_decomposition.ml(originally OCamlPro's ocamlbench-repo) - Build: ocamlopt (stdlib only)
- Args: (none)
- Description: QR decomposition via Gram-Schmidt on a fixed matrix. Dense linear algebra; similar allocation profile to
crout_decomposition.
- Source: sandmark
benchmarks/numerical-analysis/durand_kerner_aberth.ml(originally OCamlPro's ocamlbench-repo) - Build: ocamlopt (stdlib only)
- Args: (none) — optional percentage of coefficient array (default 100); runs 10 iterations
- Description: Finds all roots of a polynomial simultaneously using the Durand–Kerner / Weierstrass method. Complex-number arithmetic on float arrays.
- Source: sandmark
benchmarks/numerical-analysis/fft.ml(originally OCamlPro's ocamlbench-repo) - Build: ocamlopt +
unix.cmxa(usesUnix.timesfor timing output) - Args: (none) — optional array size (default 1048576)
- Description: Cooley–Tukey FFT followed by inverse FFT on a complex float array. In-place computation; exercises large float array allocation and cache effects.
- Source: sandmark
benchmarks/numerical-analysis/levinson_durbin.ml+levinson_durbin_dataset.ml - Build: ocamlopt (stdlib only), two-file: dataset compiled first
- Args: (none)
- Description: Levinson–Durbin recursion for autoregressive modelling of Japanese vowel sound data. Exercises float array allocation with a real-world-sized numerical dataset.
- Source: sandmark
benchmarks/numerical-analysis/naive_multilayer.ml+naive_multilayer_dataset.ml - Build: ocamlopt (stdlib only), two-file: dataset compiled first
- Args: (none)
- Description: Naive multilayer neural network (forward pass + backpropagation) on the UCI Ionosphere dataset. Dense matrix operations; exercises both float array allocation and functional list structure.
- Source: sandmark
benchmarks/sequence/sequence_cps.ml(originally OCamlPro's ocamlbench-repo) - Build: ocamlopt (stdlib only)
- Args:
<N>— sequence length; config uses10000 - Description: Builds a lazy CPS-style sequence of integers 0…N, then maps, filters, and folds it to compute a sum. Exercises higher-order function application and minor heap allocation in a functional pipeline; no external libraries required.
Sandmark's benchmarks/stdlib/ suite: 10 single-file benchmarks covering core stdlib
data structures. Each takes <bench_type> [args] and dispatches to sub-benchmarks.
- Source: sandmark
benchmarks/stdlib/array_bench.ml - Build: ocamlopt (stdlib only)
- Args:
<bench_type>— e.g.make,init,map, etc. - Description: Array allocation, initialisation, map, sort, and iteration microbenchmarks.
- Source: sandmark
benchmarks/stdlib/bytes_bench.ml - Build: ocamlopt (stdlib only)
- Args:
<bench_type> - Description: Bytes buffer operations: blit, fill, sub, compare.
- Source: sandmark
benchmarks/stdlib/string_bench.ml - Build: ocamlopt (stdlib only)
- Args:
<bench_type> - Description: String operations: concat, contains, split, compare.
- Source: sandmark
benchmarks/stdlib/map_bench.ml - Build: ocamlopt (stdlib only)
- Args:
<bench_type> - Description: Functional map (AVL tree) insert, lookup, fold, merge.
- Source: sandmark
benchmarks/stdlib/set_bench.ml - Build: ocamlopt (stdlib only)
- Args:
<bench_type> - Description: Functional set insert, union, inter, diff.
- Source: sandmark
benchmarks/stdlib/stack_bench.ml - Build: ocamlopt (stdlib only)
- Args:
<bench_type> - Description: Stack push/pop operations.
- Source: sandmark
benchmarks/stdlib/hashtbl_bench.ml - Build: ocamlopt (stdlib only)
- Args:
<bench_type> - Description: Hashtable add, find, replace, fold.
- Source: sandmark
benchmarks/stdlib/pervasives_bench.ml - Build: ocamlopt (stdlib only)
- Args:
<bench_type> - Description: Stdlib arithmetic and comparison functions.
- Source: sandmark
benchmarks/stdlib/str_bench.ml - Build: ocamlopt +
str.cmxa(-I +str str.cmxa) - Args:
<bench_type> - Description: Regular expression operations from the
Strlibrary.
- Source: sandmark
benchmarks/stdlib/big_array_bench.ml - Build: ocamlopt; links
bigarray.cmxaonly on OCaml 4.x (bundled into stdlib on 5.x) - Args:
<bench_type> - Description: Bigarray allocation and element access patterns.
Sandmark's benchmarks/simple-tests/ suite: small stdlib-only benchmarks covering
allocation, lazy evaluation, stacks, finalizers, and weak/ephemeron tables.
- Source: sandmark
benchmarks/simple-tests/alloc.ml - Build: ocamlopt (stdlib only)
- Args: none
- Description: Minor heap allocation rate benchmark: allocates tuples and small lists at high frequency.
- Source: sandmark
benchmarks/simple-tests/lists.ml - Build: ocamlopt (stdlib only)
- Args: none
- Description: List operations: append, rev, map, filter, fold.
- Source: sandmark
benchmarks/simple-tests/stress.ml - Build: ocamlopt (stdlib only)
- Args: none
- Description: Allocation stress test; exercises minor and major GC.
- Source: sandmark
benchmarks/simple-tests/lazylist.ml - Build: ocamlopt (stdlib only)
- Args: none
- Description: Lazy list operations via
Lazy.tsuspension.
- Source: sandmark
benchmarks/simple-tests/lazy_primes.ml - Build: ocamlopt (stdlib only)
- Args: none
- Description: Lazy sieve of Eratosthenes using
Lazy.t-deferred streams.
- Source: sandmark
benchmarks/simple-tests/morestacks.ml - Build: ocamlopt (stdlib only)
- Args: none
- Description: Stack operations on functional and imperative stacks.
- Source: sandmark
benchmarks/simple-tests/stacks.ml - Build: ocamlopt (stdlib only)
- Args: none
- Description: Stdlib
Stackmodule push/pop under various patterns.
- Source: sandmark
benchmarks/simple-tests/finalise.ml - Build: ocamlopt (stdlib only)
- Args: none
- Description: GC finalizer registration and invocation throughput (
Gc.finalise).
- Source: sandmark
benchmarks/simple-tests/weakretain.ml - Build: ocamlopt (stdlib only)
- Args: none
- Description: Weak pointer retention: allocates objects and checks how many survive GC through a
Weak.tarray.
- Source: sandmark
benchmarks/simple-tests/weak_htbl.ml - Build: ocamlopt (stdlib only); OCaml 5.x adaptation (see
SANDMARK_ADAPTATIONS.md) - Args:
<N>— table size - Description: Correctness and performance test for ephemeron-based hash tables (
Ephemeron.K{1,2,n}.Make) versus regularHashtblandMap-backed tables. Note:iterwas removed from Ephemeron modules in OCaml 5.x; the correctness assertions for weak tables now pass vacuously.
- Source: sandmark
benchmarks/graph500seq/ - Build: dune (multi-library), in
with_deps/graph500seq/ - Args:
<edges-file>— absolute path toedges.data; generated bybuild.deps.shat/home/udesou/benches/with_deps/graph500seq/edges.data - Description: Graph500 Kernel 1 (BFS-reachable subgraph construction) on a Kronecker random graph. Builds a sparse adjacency representation from a large array of edges. Memory-intensive pointer chasing; exercises the major GC heavily.
- graphTypes: In sandmark,
GraphTypesis provided by a parent dune-project scope. Here it is defined explicitly ingraphTypes.ml(type vertex = int,type weight = float,type edge = vertex * vertex * weight). - Data generation:
graph500seq.build.deps.shbuildsgen.exewith dune and runs it (-scale 21 -edgefactor 16) to produceedges.data(~64M edges). This is done once and reused across all runtime versions since the data is OCaml-version-independent. - Timeout: 300 s (longer than simple benchmarks due to data loading + graph construction).
- Source: sandmark
benchmarks/benchmarksgame/knucleotide.ml - Build: dune (stdlib only), in
with_deps/benchmarksgame/ - Args:
<input-file>— absolute path toinput25000000.txt; generated bybenchmarksgame.build.deps.sh - Description: Counts k-nucleotide frequencies (k=1,2) and specific subsequence occurrences in a 25M-nucleotide FASTA sequence. Uses a custom
Hashtbl.MakewithByteskeys. - Adaptation: Accepts input file path as
argv[1](falls back to"input25000000.txt"for standalone use).
- Source: sandmark
benchmarks/benchmarksgame/knucleotide3.ml - Build: dune (stdlib only), in
with_deps/benchmarksgame/ - Args:
<input-file>— absolute path toinput25000000.txt; generated bybenchmarksgame.build.deps.sh - Description: Same k-nucleotide counting as
knucleotidebut with a packed-integer hash key optimisation (encodes bases as 2-bit values, avoidingBytesallocation for keys ≤ 31 bases on 64-bit). - Adaptation: Accepts input file path as
argv[1](falls back to"input25000000.txt"for standalone use).
- Source: sandmark
benchmarks/benchmarksgame/revcomp2.ml - Build: dune (stdlib only), in
with_deps/benchmarksgame/ - Args:
<input-file>— absolute path toinput25000000.txt; generated bybenchmarksgame.build.deps.sh - Description: Reverse-complement all DNA sequences in a FASTA file and print the result. Pure
Bytes/string I/O; exercises buffer allocation and output. - Adaptation: Accepts input file path as
argv[1](falls back to"input25000000.txt"for standalone use).
- Source: sandmark
benchmarks/benchmarksgame/regexredux2.ml - Build: dune (
strlibrary), inwith_deps/benchmarksgame/ - Args:
<input-file>— absolute path toinput5000000.txt; generated bybenchmarksgame.build.deps.sh - Description: Counts regex pattern matches in a 5M-nucleotide FASTA sequence, then applies a series of substitutions. Uses
Str(OCaml's built-in regex library). - Adaptation: Accepts input file path as
argv[1](falls back to"input5000000.txt"for standalone use).
- Source: sandmark
benchmarks/mpl/bench/primes/ - Build: dune (multi-library), in
with_deps/mpl/; auto-installsdomainslib - Args:
-N <int> -procs <int>— parallel prime sieve;-Nis the upper bound (default 100M),-procsis domain count - Description: Parallel sieve of Eratosthenes using the mpl Forkjoin library (wraps
Domainslib.Task). OCaml ≥ 5 required.
- Source: sandmark
benchmarks/mpl/bench/msort_ints/ - Build: dune (multi-library), in
with_deps/mpl/; auto-installsdomainslib - Args:
-N <int> -procs <int>— array size (default 10M) and domain count - Description: Parallel merge sort on a random integer array using the mpl Seq/Merge/Quicksort libraries. OCaml ≥ 5 required.
- Source: sandmark
benchmarks/mpl/bench/msort_strings/ - Build: dune (multi-library), in
with_deps/mpl/; auto-installsdomainslib - Args:
-f <words64.txt> -procs <int>— absolute path toinputs/words64.txt(37 MB, bundled) - Description: Parallel merge sort on strings read from a word file. OCaml ≥ 5 required.
- Source: sandmark
benchmarks/mpl/bench/tokens/ - Build: dune (multi-library), in
with_deps/mpl/; auto-installsdomainslib - Args:
-f <words.txt> --no-output -procs <int>— absolute path toinputs/words.txt(590 KB, bundled) - Description: Parallel token frequency count using a concurrent hashset. OCaml ≥ 5 required.
- Source: sandmark
benchmarks/mpl/bench/raytracer/ - Build: dune (multi-library), in
with_deps/mpl/; auto-installsdomainslib - Args:
-n <width> -procs <int>— image width in pixels (default 2000) and domain count - Description: Parallel ray tracer (rgbbox scene by default). Creates its own
Domainslib.Taskpool independently of the Forkjoin pool. OCaml ≥ 5 required.
Benchmarks in with_packages/ require external opam packages. No manual
package installation is needed — each build script runs opam install <pkg> -y
to install its dependencies into the active opam switch.
running-ng creates a dedicated opam switch for each runtime via
opam-compiler (e.g., running-ng-ocaml-v5.4). The switch environment is
activated before invoking the build script, so opam install targets the
correct switch automatically. Packages are cached in the switch and reused
across benchmark builds for the same runtime.
To force a specific switch, add build_env to the suite in the YAML config:
sandmark-with-packages:
type: OCamlBenchmarkSuite
build_env:
OPAM_SWITCH: "my-custom-switch"
programs:
fasta3: ...OPAM_SWITCH takes precedence over all auto-detection.
Seven programs sharing benches/with_packages/benchmarksgame/. Each has its own build script; all use dune and link against zarith, str, and unix.
- binarytrees5 — Args:
21. Allocates and traverses binary trees of depth 21 using Zarith big integers for node values. GC-intensive short-lived allocation. - fasta3 — Args:
25000000. Generates a DNA sequence of 25M characters using cumulative probability tables. Exercises sequential array access. - fasta6 — Args:
25000000. Alternative fasta generator; same input size, different internal algorithm. - mandelbrot6 — Args:
16000. Renders a 16000×16000 Mandelbrot set image in PBM format. Pure floating-point; no GC pressure. - nbody — Args:
50000000. N-body planetary simulation (5 bodies, 50M steps). Pure floating-point; tests float unboxing. - pidigits5 — Args:
10000. Computes 10000 digits of π using the Stern-Brocot tree algorithm via Zarith arbitrary-precision integers. - spectralnorm2 — Args:
5500. Approximates the spectral norm of an infinite matrix. Dense floating-point; exercises float arrays.
Four programs sharing benches/with_packages/zarith/. Each has its own build script; all use dune.
- zarith_fact — Args:
40 1000000. Computes factorial of 40, repeated 1M times. Exercises Zarith multiplication. Needszarith. - zarith_fib — Args:
Z 40. Fibonacci of 40 using Zarith big integers. Exercises Zarith addition. Needszarith,num. - zarith_pi — Args:
10000. Computes 10000 π digits via the Stern-Brocot streaming algorithm. Exercises Zarith division/comparison. Needszarith. - zarith_tak — Args:
Z 2500. Tak function with n=2500 using Zarith integers. Exercises recursive calls with big-integer arithmetic. Needszarith,num.
- Source: sandmark
benchmarks/chameneos/ - Build: dune +
lwt.unix - Args:
<meetings>— number of colour-changing meetings; config uses600000 - Description: Simulates chameneos creatures meeting in a waiting room and swapping colours, implemented with Lwt lightweight threads. Exercises Lwt cooperative scheduling and mvar synchronisation.
Both in benches/with_packages/thread-lwt/; shared dune file.
- Source: sandmark
benchmarks/thread-lwt/ - Build: dune +
lwt,lwt.unix - Args:
<N>— number of ring-pass iterations; config uses20000 - thread_ring_lwt_mvar — Token passed around a ring of 503 Lwt threads via
Lwt_mvar. Exercises mvar hand-off latency. - thread_ring_lwt_stream — Same ring, but using
Lwt_streamchannels. Slightly higher allocation than the mvar variant.
- Source: sandmark
benchmarks/decompress/test_decompress.ml - Build: dune +
bigstringaf,checkseum.ocaml,decompress.zl - Args: (none) — defaults to 64 compress/decompress iterations on 32 KB of data
- Description: Microbenchmark for the
decompresspure-OCaml zlib implementation. Compresses then decompresses a block of data in a loop. Exercises allocation ofBigarray-backed buffers and the functional zipper-style stream API.
- Source: sandmark
benchmarks/yojson/ydump.ml - Build: dune +
yojson,camlp-streams - Args:
-c <json-file>— compact-print a JSON file; config uses the bundledsample.json(absolute path required since run cwd is a temp dir) - Description: Parses and pretty-prints a JSON document using the Yojson library. Exercises OCaml's
Buffer-based output, recursive tree traversal, and moderate allocation from parsing.
- Source: sandmark
benchmarks/multicore-effects/ms_sched.ml+test_sched.ml(adapted) - Build: ocamlfind +
saturn_lockfree - Args:
<num_domains> <tasks_to_spawn> <list_length> - Description: Microbenchmark for a concurrent round-robin effects-based scheduler (
ms_sched.ml). Spawns<tasks_to_spawn>tasks per run, each allocating a list of length<list_length>. The scheduler uses a Saturn Michael–Scott queue as its run queue andDomain.spawnto run workers across<num_domains>domains. Exercises effect handler dispatch, continuation enqueuing, and domain coordination. - Note: In
with_packages/(notmulticore/) because it depends onsaturn_lockfree. SeeSANDMARK_ADAPTATIONS.mdfor the porting changes from the sandmark original.
- Source: sandmark
benchmarks/valet/(4 files:valet_core.ml,valet_react.ml,test_lib.ml,test_lwt.ml) - Build: dune +
uuidm,ocplib-endian,react,lwt - Args:
<n>— number of users/readers/doors; each of n persons swipes n times → O(n²) events - Description: Reactive access-control simulation. n people each hold a UUID-backed QR code; n QR readers feed into a controller (via
reactevent streams) that maps codes to users, which doors then act on. All persons run concurrently viaLwt.joinwithLwt.pause ()yields between each swipe. Exercises Lwt cooperative scheduling,reactevent propagation, and UUID/map allocation. - OxCaml: incompatible (lwt.unix locality error, same as
chameneos_redux_lwt)
- Source: sandmark
benchmarks/sauvola/contrast.ml - Build: dune +
camlimages(camlimages.all_formatssub-library) - Args:
<input.ppm> <output_prefix>— config uses the bundledexample2_small.ppm(absolute path); output goes to/tmp/sauvola_out__*.ppm - Description: Applies 8 image binarisation algorithms (adaptive contrast spreading, Niblack global/local, Sauvola global/local) to a PPM image. Each algorithm creates a new
rgb24image and iterates over all pixels, exercising OO-style image allocation and GC-heavy pixel-by-pixel access patterns.
- Source: sandmark
benchmarks/owl/owl_gc.ml - Build: dune +
owl-base(pure OCaml; no CBLAS/LAPACK required) - Args: (none)
- Description: Computes a Gromov-Wasserstein distance matrix over 100 random 100×100 distance matrices using Owl's dense matrix operations (
Bigarray-backed). Exercises largeBigarrayallocation and GC interaction with non-moving arrays. Usesowl-baseinstead ofowlto avoid CBLAS/LAPACK build requirements.
Benchmarks that require OCaml 5.x and the Effect module. Source is in multicore/; the flat layout mirrors simple/ (each benchmark in its own subfolder, with an optional build.deps.sh for generated data).
Use OCamlMulticoreBenchmarkSuite (instead of OCamlBenchmarkSuite) in running-ng configs. This suite type enforces OCaml >= 5 at build/run time and raises a clear error if you attempt to sweep with an older compiler.
Single-file effect benchmarks compiled with ocamlopt. Adapted from sandmark benchmarks/multicore-effects/ for the OCaml 5.2+ Effect module API (sandmark's originals use the pre-5.2 effect keyword syntax, which is not accepted by OCaml 5.2+).
- Source: sandmark
benchmarks/multicore-effects/algorithmic_differentiation.ml(adapted) - Build: ocamlopt (stdlib only)
- Args:
<iterations>— default 100 - Description: Reverse-mode automatic differentiation using deep effect handlers (
AddandMulteffects). Exercises deep effect handler dispatch, continuation resumption, and float array allocation.
- Source: sandmark
benchmarks/multicore-effects/rec_eff_{fib,seq_fib}.ml(adapted) - Build: ocamlopt (stdlib only)
- Args:
<iters> <n>— default4 40(expected output per iter: 102334155) - Description: Recursive Fibonacci.
rec_eff_fibinstalls atry_witheffect handler at each recursive call site (handler is never triggered; effectEis never performed) — tests the overhead of handler installation compared to the pure-recursiverec_seq_fibbaseline.
- Source: sandmark
benchmarks/multicore-effects/rec_eff_{tak,seq_tak}.ml(adapted) - Build: ocamlopt (stdlib only)
- Args:
<iters> <x> <y> <z>— default1 40 20 11(expected output per iter: 12) - Description: Takeuchi function. Same handler-overhead comparison pattern as rec_{eff,seq}_fib; three handler installations per recursive call.
- Source: sandmark
benchmarks/multicore-effects/rec_eff_{ack,seq_ack}.ml(adapted) - Build: ocamlopt (stdlib only)
- Args:
<iters> <m> <n>— default2 3 11(expected output per iter: 16381) - Description: Ackermann function. Same pattern; tests effect handler overhead on a deeply recursive, stack-intensive computation.
- Source: sandmark
benchmarks/multicore-effects/effect_throughput_val.ml(adapted) - Build: ocamlopt (stdlib only)
- Args:
<n_iter>— default1_000_000 - Description: Measures the throughput of an effect handler block where
performis never called and a value is returned directly. TheE : unit Effect.thandler is installed but never triggered; cost is purely the handler frame setup and teardown (stack allocation, context switch in/out, deallocation).
- Source: sandmark
benchmarks/multicore-effects/effect_throughput_perform.ml(adapted) - Build: ocamlopt (stdlib only)
- Args:
<n_iter>— default1_000_000 - Description: Measures the throughput of a full perform–resume cycle.
E : int -> int Effect.tis performed once per iteration and the continuation is immediately resumed with the same value. Cost includes the perform (stack switch to handler), thecontinue k xcall (stack switch back), and frame deallocation.
- Source: sandmark
benchmarks/multicore-effects/effect_throughput_perform_drop.ml(adapted) - Build: ocamlopt (stdlib only)
- Args:
<n_iter>— default1_000_000 - Description: Like
effect_throughput_performbut the continuation is abandoned (not resumed). Measures the perform overhead plus the cost of GC-collecting a dropped continuation.
- Source: sandmark
benchmarks/multicore-effects/rec_eff_evenodd.ml/rec_seq_evenodd.ml(adapted / verbatim) - Build: ocamlopt (stdlib only)
- Args:
<iters> <n>; defaults2 500_000_000 - Description: Even-odd mutual recursion benchmark.
rec_eff_evenoddinstalls a dummy effect handler at eachoddcall;rec_seq_evenoddis the plain baseline. Measures effect handler call overhead on a tight mutual recursion loop.
- Source: sandmark
benchmarks/multicore-effects/rec_eff_motzkin.ml/rec_seq_motzkin.ml(adapted / verbatim) - Build: ocamlopt (stdlib only)
- Args:
<iters> <n>; defaults4 21 - Description: Computes the n'th Motzkin number (number of ways to draw non-intersecting chords between n circle points).
rec_eff_motzkinwraps each recursive call in a dummytry_with;rec_seq_motzkinis the baseline. n=21 yields 142547559.
- Source: sandmark
benchmarks/multicore-effects/rec_eff_sudan.ml/rec_seq_sudan.ml(adapted / verbatim) - Build: ocamlopt (stdlib only)
- Args:
<iters> <n> <x> <y>; defaults10_000_000 2 2 2 - Description: Computes the Sudan function (recursive but not primitive recursive).
rec_eff_sudanwraps the inner recursive call in a dummytry_with;rec_seq_sudanis the baseline. Defaults yield 15569256417.
- Source: sandmark
benchmarks/multicore-effects/eratosthenes.ml(adapted) - Build: ocamlopt (stdlib only)
- Args:
<n>— generate primes up ton; default101 - Description: Message-passing Sieve of Eratosthenes implemented entirely with effects. Uses four effects (
Spawn,Yield,Send,Recv) and two layered handlers:run(round-robin scheduler handlingSpawn/Yield) andmailbox(per-pid message queue handlingSend/Recv). The outermailboxhandler catchesSend/Recvthat bubble throughrun's handler. Exercises effect handler chaining, continuation queuing, and a Map-backed mailbox.
Lock-free concurrent data structures implemented with OCaml 5 stdlib Atomic. No external packages required — the sandmark originals referenced kcas, but all atomic operations (Atomic.t, Atomic.get, Atomic.set, Atomic.compare_and_set) are available in the stdlib since OCaml 5.0. Each test program is compiled together with its data-structure module using ocamlfind -package unix.
Data structure modules (in the benchmark directory, compiled alongside each test):
ms_queue.ml— Michael–Scott lock-free MPMC queue usingAtomic.tand CAS loops.treiber_stack.ml— Treiber lock-free LIFO stack usingAtomic.t.spsc_queue.ml— Wait-free bounded SPSC queue with cache-line padding.
- Source: sandmark
benchmarks/multicore-structures/test_queue_sequential.ml - Build: ocamlfind + unix (stdlib Atomic, no domainslib)
- Args:
<items>— number of items to enqueue/dequeue - Description: Sequentially enqueues then dequeues
<items>integers through the MS queue. Checks that no items are lost and reports throughput (items/ms).
- Source: sandmark
benchmarks/multicore-structures/test_queue_parallel.ml - Build: ocamlfind + unix
- Args:
<items> - Description: One domain enqueues
<items>integers while a second domain concurrently dequeues. Exercises the MS queue's CAS-based enqueue/dequeue paths under concurrent access.
- Source: sandmark
benchmarks/multicore-structures/test_stack_sequential.ml - Build: ocamlfind + unix
- Args:
<items> - Description: Sequential push/pop stress test on the Treiber stack.
- Source: sandmark
benchmarks/multicore-structures/test_stack_parallel.ml - Build: ocamlfind + unix
- Args:
<items> - Description: Concurrent push (one domain) / pop (another domain) on the Treiber stack.
- Source: sandmark
benchmarks/multicore-structures/test_spsc_queue_sequential.ml - Build: ocamlfind + unix
- Args:
<items>— items per run; repeats 1000 times - Description: Sequential enqueue/dequeue cycle on the SPSC queue. Reports ns/item throughput.
- Source: sandmark
benchmarks/multicore-structures/test_spsc_queue_parallel.ml - Build: ocamlfind + unix
- Args:
<items> - Description: One domain enqueues while another dequeues via the SPSC queue. Exercises the wait-free fast path.
- Source: sandmark
benchmarks/multicore-structures/test_spsc_queue_pingpong_parallel.ml - Build: ocamlfind + unix
- Args:
<num_threads> <num_messages> - Description: Creates a ring of
<num_threads>domains, each connected to the next by an SPSC queue.Pingmessages circulate until aByeterminates each thread. Measures inter-domain message-passing latency through a chain of SPSC queues.
Parallel versions of classic numerical benchmarks using domainslib. Each multicore benchmark has a corresponding sequential baseline. All compiled with ocamlfind -package domainslib (or stdlib-only for sequentials). First argument is always <num_domains>.
- Source: sandmark
benchmarks/multicore-numerical/mandelbrot6_multicore.ml - Build: ocamlfind + domainslib
- Args:
<num_domains> <width>— default1 200 - Description: Parallel Mandelbrot set renderer. Uses
Task.parallel_forover rows; each domain computes a horizontal strip. Outputs PBM binary format to stdout. Based on benchmarksgame Mandelbrot #6.
- Source: sandmark
benchmarks/multicore-numerical/{nbody_multicore,nbody}.ml - Build: ocamlfind + domainslib (multicore); ocamlopt stdlib (sequential)
- Args:
<num_domains> <n> <num_bodies>— default1 500 1024; sequential:<n> <num_bodies>— default500 1024 - Description: N-body gravitational simulation. Parallel version uses
Task.parallel_forfor the velocity-update inner loop andTask.parallel_for_reducefor energy computation.
- Source: sandmark
benchmarks/multicore-numerical/{floyd_warshall_multicore,floyd_warshall}.ml - Build: ocamlfind + domainslib; stdlib
- Args:
<num_domains> <n>— default1 4; sequential:<n>— default4 - Description: All-pairs shortest path (Floyd–Warshall). The outer
kloop is sequential (dependency), inneriloop parallelised withTask.parallel_for. Uses an algebraicedgetype (Value of int | Infinity).
- Source: sandmark
benchmarks/multicore-numerical/{game_of_life_multicore,game_of_life}.ml - Build: ocamlfind + domainslib; stdlib
- Args:
<num_domains> <n_times> <board_size>— default1 2 1024; sequential:<n_times> <board_size>— default2 1024 - Description: Conway's Game of Life on a
board_size × board_sizegrid, iteratedn_timessteps. Row updates parallelised withTask.parallel_for.
- Source: sandmark
benchmarks/multicore-numerical/binarytrees5_multicore.ml - Build: ocamlfind + domainslib
- Args:
<num_domains> <max_depth>— default1 10 - Description: Binary tree construction and checksum benchmark (benchmarksgame binary-trees #5). Uses
Task.async/Task.awaitto parallelise tree checks across depths and domains. Exercises GC allocation and domain-local work stealing.
- Source: sandmark
benchmarks/multicore-numerical/spectralnorm2_multicore.ml - Build: ocamlfind + domainslib
- Args:
<num_domains> <n>— default1 2000 - Description: Spectral norm of the infinite matrix A where
A[i,j] = 1/((i+j)*(i+j+1)/2+i+1). Power iteration usingTask.parallel_forfor matrix-vector products. Based on benchmarksgame spectral-norm #2.
- Source: sandmark
benchmarks/multicore-numerical/fannkuchredux_multicore.ml - Build: ocamlfind + domainslib
- Args:
<workers> <n>— default10 7 - Description: Fannkuch-redux (permutation counting). Divides the factorial permutation space into
workerschunks and usesTask.parallel_forto count flip operations in parallel.
- Source: sandmark
benchmarks/multicore-numerical/{quicksort_multicore,quicksort}.ml - Build: ocamlfind + domainslib; stdlib
- Args:
<num_domains> <n>— default1 2000; sequential:<n>— default2000 - Description: Parallel quicksort using
Task.async/Task.awaitto spawn recursive subproblems. Depth-bounded spawning (halves remaining depth budget at each partition).
- Source: sandmark
benchmarks/multicore-numerical/{mergesort_multicore,mergesort}.ml - Build: ocamlfind + domainslib; stdlib
- Args:
<num_domains> <n>— default1 1024; sequential:<n>— default1024 - Description: Parallel merge sort using
Task.async/Task.await. Falls back to bubble sort below threshold (32 elements). Uses an in-place double-buffer merge strategy.
- Source: sandmark
benchmarks/multicore-numerical/{matrix_multiplication_multicore,matrix_multiplication}.ml - Build: ocamlfind + domainslib; stdlib
- Args:
<num_domains> <size>— default1 1024; sequential:<size>— default1024 - Description: Dense integer matrix multiplication. Row-parallel using
Task.parallel_forover the output rows.
- Source: sandmark
benchmarks/multicore-numerical/matrix_multiplication_tiling_multicore.ml - Build: ocamlfind + domainslib
- Args:
<num_domains> <size>— default1 1024 - Description: Tiled matrix multiplication using explicit
Domainslib.Chan-based task distribution rather thanparallel_for. Tile size is 64. The channel-based dispatch is chosen because the loop has decreasing work per iteration, which makes staticparallel_forchunking suboptimal.
- Source: sandmark
benchmarks/multicore-numerical/{LU_decomposition_multicore,LU_decomposition}.ml - Build: ocamlfind + domainslib; stdlib
- Args:
<num_domains> <mat_size>— default1 1200; sequential:<mat_size>— default1200 - Description: In-place LU decomposition of a random float matrix. Uses
Task.parallel_forfor row elimination andDomain.DLSfor domain-local random state. Stores L and U in packed form.
- Source: sandmark
benchmarks/multicore-numerical/{nqueens_multicore,nqueens}.ml - Build: ocamlfind + domainslib; stdlib
- Args:
<num_domains> <board_size>— default2 13; sequential:<board_size>— default13 - Description: N-queens solver. Parallel version spawns a
Task.asyncfor each valid queen placement at each row, aggregating results withTask.await.
- Source: sandmark
benchmarks/multicore-numerical/{evolutionary_algorithm_multicore,evolutionary_algorithm}.ml - Build: ocamlfind + domainslib; stdlib
- Args:
<num_domains> <n> <lambda>— default4 1000 1000; sequential:<n> <lambda>— default1000 1000 - Description: Minimal genetic algorithm optimising the Onemax fitness function. Parallel version uses
Task.parallel_forto evaluate and mutate the population in each generation. UsesDomain.DLSfor domain-local random state.
Gram matrix benchmark from the Yamanishi laboratory. Compiled with ocamlfind; requires a data/ subdirectory with CSV input files (bundled). The benchmark reads feature vectors from a CSV (space-separated floats) and computes the symmetric Gram matrix via dot products. Default input is data/tox21_nrar_ligands_std_rand_01.csv (7026 samples).
A shared helper module utls.ml is compiled alongside the main benchmark in each build.
- Source: sandmark
benchmarks/multicore-grammatrix/grammatrix.ml+utls/utls.ml - Build: ocamlfind + unix (sequential)
- Args:
<ncores> <input_file>— default1 data/tox21_nrar_ligands_std_rand_01.csv - Description: Sequential Gram matrix computation. Reads feature vectors, computes the full N×N symmetric matrix in O(N²) dot products, then prints a corner summary. The
ncoresargument is accepted but ignored (present for interface parity with the multicore version).
- Source: sandmark
benchmarks/multicore-grammatrix/grammatrix_multicore.ml+utls/utls.ml - Build: ocamlfind + domainslib + unix
- Args:
<num_domains> <chunk_size> <input_file>— default4 16 data/tox21_nrar_ligands_std_rand_01.csv - Description: Parallel Gram matrix computation using explicit
Domainslib.Chan-based task distribution. Work chunks of<chunk_size>rows are sent through a bounded channel; each domain fetches and processes chunks until aQuitmessage is received. Channel-based dispatch is preferred overparallel_forhere because earlier rows have more work (triangular iteration), so pre-computing and queuing chunks in decreasing-work order improves load balance. Note: the benchmark must be run from themulticore-grammatrix/directory so that thedata/relative path resolves correctly.
Multicore GC stress test using OxCaml-specific APIs. Requires an OxCaml compiler (Jane Street's OCaml fork) — will not compile with stock OCaml.
- Source: custom benchmark (not from sandmark)
- Build: ocamlopt (stdlib only)
- Args: (none)
- Description: Spawns 8 domains using
Domain.Safe.spawn(OxCaml API), each building a large binary tree of depth 28 with 10-byte string leaves. After all domains have built their trees, the main domain runs 10Gc.full_majorcycles. Exercises concurrent major GC marking across multiple domains with a large shared live set. UsesSys.poll_actionsfor cooperative domain coordination andAtomicfor synchronisation. - OxCaml APIs used:
Domain.Safe.spawn,Sys.poll_actions - Suite type:
OCamlOxcamlBenchmarkSuite— fails with an error if the runtime is nottype: OxCaml.
Parallel global illumination renderer (MiniLight 1.5.2). A Monte Carlo path tracer with an octree spatial index. All nine source modules are compiled together in dependency order using ocamlfind -package domainslib. Only the parallel entry point (minilight_multicore) is provided; the sequential variant is omitted because its camera.ml has a different API signature.
Compilation order: vector3f → triangle → surfacePoint → spatialIndex → scene → image → rayTracer → camera → minilight_multicore
- Source: sandmark
benchmarks/multicore-minilight/parallel/(all modules) - Build: ocamlfind + domainslib (9-module compilation)
- Args:
<scene_file>— path to a MiniLight scene description (e.g.roomfront.ml.txt, bundled) - Description: Parallel path tracer. Each frame's pixel rows are distributed across domains using
Task.parallel_forinsideCamera.frame. UsesDomain.DLSfor per-domainRandom.Stateto avoid contention. Renders progressively, printing progress to stderr and saving PPM output to<scene_file>.ppm. Note: the renderer runs until interrupted; for benchmarking, wrap with a timeout or limit iterations in the scene file.
Parallel Graph500 Kronecker graph generator and BFS kernel. Two executables are built from shared library modules; gen must be run first to produce an edge-list data file that kernel1_run_multicore then reads.
Compilation order for both executables: graphTypes → sparseGraph → generate → [gen | kernel1Par → kernel1_run_multicore]
- Source: sandmark
benchmarks/graph500par/gen.ml(+generate.ml,sparseGraph.ml,graphTypes.ml) - Build: ocamlfind + domainslib + unix
- Args:
[-scale SCALE] [-edgefactor EDGE_FACTOR] [-ndomains NUM_DOMAINS] OUTPUT_FILE— defaultsscale=12 edgefactor=16 ndomains=1 - Description: Kronecker graph generator implementing the Graph500 specification. Generates
2^scalevertices andedgefactor * 2^scaleedges using a probabilistic bit-setting algorithm with random permutations. Edge generation usesTask.parallel_for. Writes the edge list toOUTPUT_FILEviaMarshal.
- Source: sandmark
benchmarks/graph500par/kernel1_run_multicore.ml(+kernel1Par.ml,generate.ml,sparseGraph.ml,graphTypes.ml) - Build: ocamlfind + domainslib + unix
- Args:
[-ndomains NUM_DOMAINS] EDGE_LIST_FILE - Description: Graph500 Kernel 1 — parallel construction of a sparse adjacency-list representation. Reads the pre-generated edge list from
EDGE_LIST_FILE, removes self-loops, finds the maximum vertex label usingTask.parallel_for_reduce, and builds the sparse graph usingTask.parallel_forwith lock-freeAtomic.t-based adjacency lists. Reports I/O and construction time.
- Source: sandmark
benchmarks/simple-tests/alloc_multicore.ml - Build: ocamlopt (stdlib only — uses
Domain.spawn/Domain.join) - Args:
<num_domains> <iterations>; config uses2 200_000 - Description: Parallel minor-heap allocation benchmark. Each domain allocates small mutable records
{ an_int; a_string; a_float }in a tight loop. Measures allocation throughput under parallel GC pressure.
- Source: sandmark
benchmarks/simple-tests/pingpong_multicore.ml - Build: ocamlfind + domainslib (auto-installed)
- Args:
<num_domains> <chan_size> <total_messages>; config uses3 1 1000000 - Description: Multi-domain channel ping-pong benchmark using
Domainslib.Chan. A producer sends messages through a pipeline of worker domains, each incrementing a counter before forwarding. Measures channel throughput and domain synchronisation overhead.
These benchmarks were not added because their dependencies are complex or unusual.
valet— Requireslwt,react, anduuidm; unusual event-loop structure.simple-tests(partial) —ocamlcapirequires C stubs (skipped).alloc_multicoreandpingpong_multicoreare now ported (seemulticore/alloc_multicore/andmulticore/pingpong_multicore/).irmin— Requiresirmin,irmin-pack,index, and related packages.owl— Requiresowl-base.mpl— Requires several packages (mtime,progress, etc.).
These benchmarks require domainslib, multiple domains, or OCaml 5 effect handlers, and are not meaningful on OCaml 4.x.
multicore-effects— fully ported:algorithmic_differentiation,rec_eff_fib,rec_seq_fib,rec_eff_tak,rec_seq_tak,rec_eff_ack,rec_seq_ack,effect_throughput_val,effect_throughput_perform,effect_throughput_perform_drop,eratosthenes,rec_eff_evenodd,rec_seq_evenodd,rec_eff_motzkin,rec_seq_motzkin,rec_eff_sudan,rec_seq_sudan,ms_sched/test_sched(inwith_packages/test_sched/). Not ported:queensandeffect_throughput_clonerequire multi-shot continuations (Obj.clone_continuation), removed in OCaml 5.2 with no stdlib replacement.multicore-grammatrix— Added tomulticore/multicore-grammatrix/.multicore-minilight— Added tomulticore/multicore-minilight/.multicore-numerical— Added tomulticore/multicore-numerical/.multicore-structures— Added tomulticore/multicore-structures/; uses OCaml 5 stdlibAtomic(nokcasrequired).graph500par— Added tomulticore/graph500par/.
These benchmarks require compiling C foreign stubs alongside OCaml code, which is not yet supported by the simple ocamlopt build scripts used here. They may be revisited once a mixed-language build strategy is in place.
multicore-gcroots— Tests concurrent GC root registration across domains. The sandmark version wraps internal OCaml GC C APIs (caml_register_generational_global_root, etc.) via a C stub library (globrootsprim). A pure-OCaml rewrite usingGc.minor()/Gc.full_major()across domains could approximate the intent, but would not be the same benchmark.
minilight— The sandmark dune file only declares a data file alias (roomfront.ml.txt); no executable stanza is present, suggesting the benchmark needs a different integration approach.
Note: alt-ergo, coq, cpdf, cubicle, frama-c, and menhir have been ported and are in macrobenchmarks/ (see below).