Skip to content

ocaml-bench/benches

Repository files navigation

benches

OCaml microbenchmark suite ported from sandmark, organised into simple/, with_deps/, with_packages/, and multicore/. Each benchmark ships its own *.build.sh script honouring a small, well-documented build contract.

Designed to work two ways:

  • Standalone — activate any opam switch with dune on PATH and run a benchmark's *.build.sh to compile a binary you can execute directly. See §Build Script Contract.
  • Orchestrated — used as the benchmark backend for running-ng, which manages opam switches per OCaml runtime and drives sweeps. See §Integration With running-ng.

Directory Layout

Benchmarks are organised into four top-level groups:

benches/
  simple/           # stdlib / unix only; single build script, no generated data
  with_deps/        # require dune multi-library builds or generated input data
  with_packages/    # require external opam packages (zarith, lwt, decompress, yojson, …)
  macrobenchmarks/  # real-world tools installed via opam; benchmark the installed binary

Each benchmark lives in its own subfolder:

benches/
  simple/
    <benchmark-name>/
      <benchmark-name>.ml          # source (or multiple .ml files)
      <benchmark-name>.build.sh    # builds the binary via ocamlopt or dune

  with_deps/
    <benchmark-name>/
      <source files ...>
      <benchmark-name>.build.sh       # builds the benchmark binary
      <benchmark-name>.build.deps.sh  # generates runtime-independent input data

  macrobenchmarks/
    <tool-name>/
      <tool-name>.build.sh         # installs via opam and copies the binary
      <input files ...>            # workload inputs (.why, .v, .mly, .cub, …)

build.deps.sh convention

For benchmarks in with_deps/ that require pre-generated input data (e.g. a graph edge list), a companion <benchmark>.build.deps.sh script handles data generation. The main build.sh calls it automatically before building the binary.

Key properties:

  • build.deps.sh receives the same env vars as build.sh (in particular RUNNING_OCAML_BENCH_DIR). The opam switch is activated, so compiler tools are on PATH.
  • Generated data files are placed in the benchmark directory and are runtime-version-independent: the script skips generation if the file already exists, so data is produced once and reused across all compiler versions in a sweep.

Integration With running-ng

For OCamlBenchmarkSuite, when path points to a directory, running-ng uses build mode.

Conventions used by default:

  • build script: <benchmark-name>.build.sh
  • output binary: <benchmark-name>-<runtime-name>

So benchmark almabench with runtime ocaml-local produces:

  • simple/almabench/almabench-ocaml-local

You can override this in running-ng config with build_script and binary, but the convention above means those fields are usually unnecessary.

Build Script Contract

running-ng creates an opam switch for each runtime (via opam-compiler) and activates that switch's environment before invoking each build script. Build scripts can therefore assume the compiler (ocamlopt), dune, and any packages installed in the switch are available on PATH.

Every build script honors the same env-var contract — the same set used by ~/macro-benches/ — so a script behaves identically whether running-ng invokes it or you run it by hand:

Variable Meaning Fallback when unset
RUNNING_OCAML_BENCH_DIR Directory containing this benchmark's sources The script's own directory ($(cd "$(dirname "$0")" && pwd))
RUNNING_OCAML_OUTPUT Path where the built binary must be written ${BENCH_DIR}/<name>-${RUNTIME_NAME}
RUNNING_OCAML_RUNTIME_NAME Runtime identifier (e.g. ocaml-5.4.1) runtime
RUNNING_OCAML_SWITCH Opam switch name (when applicable) unset

When running-ng drives the build, it sets all four. When you run the script standalone, the fallbacks resolve to something sensible — usually placing the binary next to the source file — so bash ./<name>.build.sh just works.

Adding a New Benchmark

Every benchmark follows the same structure. Pick the template that matches your benchmark and replace <name> with the benchmark name.

1. Create the directory and source files

benches/<group>/<name>/
  <name>.ml          # source file(s)
  dune               # dune build file
  dune-project        # (generate-opam-files false)
  <name>.build.sh    # build script

2. Write the dune file

(executable
 (name <name>)
 (modules <name>)
 (libraries unix)            ; add libraries as needed
 (modes native)
 (ocamlopt_flags (:standard -O3)))

3. Write the build script

All build scripts follow the same pattern: resolve BENCH_DIR and OUT from the env-var contract (§Build Script Contract), do the build, copy the binary to OUT. Pick the right template:

Template A — No external packages (simple/, multicore/effects, with_deps/):

#!/usr/bin/env bash
set -euo pipefail
BENCH_DIR="${RUNNING_OCAML_BENCH_DIR:-$(cd "$(dirname "$0")" && pwd)}"
OUT="${RUNNING_OCAML_OUTPUT:-${BENCH_DIR}/<name>-${RUNNING_OCAML_RUNTIME_NAME:-runtime}}"
dune build --root "${BENCH_DIR}" --profile release <name>.exe
cp "${BENCH_DIR}/_build/default/<name>.exe" "${OUT}"
chmod +x "${OUT}"

Template B — With opam packages (with_packages/, multicore/numerical):

#!/usr/bin/env bash
set -euo pipefail
BENCH_DIR="${RUNNING_OCAML_BENCH_DIR:-$(cd "$(dirname "$0")" && pwd)}"
OUT="${RUNNING_OCAML_OUTPUT:-${BENCH_DIR}/<name>-${RUNNING_OCAML_RUNTIME_NAME:-runtime}}"

opam install <packages> -y

dune build --root "${BENCH_DIR}" --profile release <name>.exe
cp "${BENCH_DIR}/_build/default/<name>.exe" "${OUT}"
chmod +x "${OUT}"

For real-world OCaml applications (alt-ergo, coq, cpdf, menhir, …), use the vendored monorepo at ~/macro-benches/, which builds every tool from a single dune workspace and follows the same env-var contract.

4. Register in the running-ng config

Add the benchmark to the appropriate suite in the shared micro base (running-ng/src/running/config/base/ocaml/micro_base.yml):

suites:
  <suite-name>:
    programs:
      <name>:
        path: "${RUNNING_BENCH_DIR}/<group>/<name>"
        args: "<arguments>"

benchmarks:
  <suite-name>:
    - <name>

Cleaning Build Artifacts

cd ~/benches && make clean

The Makefile provides three targets:

  • clean — runs both clean-dune and clean-with-deps, then removes compiled objects (.o, .a, .so, .cmi, .cmx, .cmxa, .cmo, .cma, .cmt, .cmti, .annot, .opt) and tagged benchmark binaries (*-ocaml-*, *-oxcaml-*).
  • clean-dune — removes _build and _build-running directories (dune build caches).
  • clean-with-deps — removes generated input data (e.g. graph500seq/edges.data).

When to clean: After changing compiler flags (e.g. adding --enable-multidomain to an OxCaml runtime), stale cached binaries may mask the change. Run make clean to force a rebuild on the next benchmark run.


Benchmarks

All benchmarks below are sourced from sandmark unless noted otherwise. Build approach is either ocamlopt (single .ml compiled directly) or dune (multi-file, uses a dune file in the benchmark dir).

Benchmark Count Summary

Counts are based on the build scripts present in this repo (~/benches). For macrobenchmarks, each build script installs one tool but runs it as multiple programs with different inputs — the count reflects individual programs. Sequential and multicore benchmarks are registered in running-ng's ocaml_gc_sweep_example.yml (and gc_sweep_all_versions.yml). Macrobenchmarks are registered in running-ng's macrobenchmarks.yml.

Directory Programs Requires
simple/ 39 stdlib / unix
with_deps/ 10 dune multi-lib or generated data
with_packages/ 20 external opam packages
macrobenchmarks/ 14 opam-installed tools (alt-ergo, coq, cpdf, cubicle, frama-c, menhir)
multicore/multicore-effects 17 OCaml ≥ 5, effects
multicore/multicore-structures 7 OCaml ≥ 5, stdlib Atomic
multicore/multicore-numerical 23 OCaml ≥ 5, domainslib
multicore/multicore-grammatrix 2 OCaml ≥ 5, domainslib
multicore/multicore-minilight 1 OCaml ≥ 5, domainslib
multicore/alloc_multicore 1 OCaml ≥ 5, stdlib Domain
multicore/pingpong_multicore 1 OCaml ≥ 5, domainslib
multicore/graph500par 1 OCaml ≥ 5, domainslib
multicore/oxcaml-prefetch 1 OxCaml compiler fork
multicore/multicore-gcroots 3 OCaml ≥ 5, C stubs (CAML_INTERNALS)
Total 140

markbench

  • Source: sandmark benchmarks/markbench/
  • Build: dune + unix
  • Args: (none) — defaults to 10 Gc.full_major cycles; pass an integer to override
  • Description: Microbenchmark for the major GC mark phase. Allocates a large live set and calls Gc.full_major repeatedly, measuring seconds per GC cycle. Sensitive to o (space overhead) and s (minor heap size).

minilight

  • Source: sandmark benchmarks/multicore-minilight/sequential/
  • Build: dune (stdlib only, multi-file), in simple/minilight/
  • Args: <scene-file> — absolute path to roomfront.ml.txt; use /home/udesou/benches/simple/minilight/roomfront.ml.txt
  • Description: Sequential MiniLight 1.5.2 global illumination renderer. Traces rays through a Cornell box scene using an octree spatial index; exercises float arithmetic, object-oriented style (classes), and moderate allocation. The sandmark dune listed domainslib but the sequential sources do not use it.
  • Note: The parallel version is multicore/multicore-minilight/minilight_multicore.

almabench

  • Source: sandmark benchmarks/almabench/ (originally OCamlPro's ocamlbench-repo)
  • Build: ocamlopt (stdlib only)
  • Args: (none)
  • Description: Floating-point benchmark computing energy levels of a quantum-mechanical system. Exercises the minor heap heavily with small float arrays.

bdd

  • Source: sandmark benchmarks/bdd/ (originally OCamlPro's ocamlbench-repo)
  • Build: ocamlopt (stdlib only)
  • Args: (none)
  • Description: Binary Decision Diagram operations (AND, OR, NOT, quantification) on propositional formulae. Pointer-heavy graph structure; exercises major GC and sharing.

hamming

  • Source: sandmark benchmarks/hamming/
  • Build: ocamlopt (stdlib only)
  • Args: <N> — number of Hamming numbers to iterate over; config uses 500000
  • Description: Generates the infinite lazy Hamming sequence (numbers of the form 2^i × 3^j × 5^k) using lazy streams and lazy merging. Exercises lazy allocation and minor GC.

soli

  • Source: sandmark benchmarks/soli/
  • Build: ocamlopt (stdlib only)
  • Args: <nruns> — number of solver runs; config uses 50
  • Description: Peg solitaire solver using backtracking search. Exercises call stack and moderate allocation; useful for testing the interaction between recursion depth and minor heap pressure.

kb

  • Source: sandmark benchmarks/kb/ (originally OCamlPro's ocamlbench-repo)
  • Build: ocamlopt (stdlib only)
  • Args: (none) — runs 100 iterations of Knuth-Bendix completion internally
  • Description: Knuth-Bendix completion procedure (with exceptions). Algebraic term rewriting; heavily allocates and collects term structures. A classic OCaml GC benchmark.

kb_no_exc

  • Source: sandmark benchmarks/kb/kb_no_exc.ml (shares directory with kb)
  • Build: ocamlopt (stdlib only) — build script is kb_no_exc.build.sh in benches/kb/
  • Args: (none) — runs 100 iterations of Knuth-Bendix completion internally
  • Description: Same algorithm as kb but with the exception-based search replaced by an explicit option type. Useful for comparing exception overhead against allocation/GC cost.

lexifi-g2pp

  • Source: sandmark benchmarks/lexifi-g2pp/ (originally OCamlPro's ocamlbench-repo)
  • Build: dune (stdlib only, multi-file; entry point: main.exe)
  • Args: (none)
  • Description: Calibrates a G2++ two-factor interest rate model (LexiFi's financial library benchmark). Involves iterative numerical optimisation over a large structured dataset. Exercises both arithmetic and moderate allocation in a realistic workload.

zdd

  • Source: sandmark benchmarks/zdd/
  • Build: ocamlopt (stdlib only)
  • Args: <words-file> — absolute path to words.txt; use /home/udesou/benches/simple/zdd/words.txt
  • Description: Zero-suppressed Binary Decision Diagram (ZDD) operations over an English word dictionary. Builds a ZDD from all words, then counts matches for a pattern query. Exercises pointer-heavy DAG structures similar to bdd.
  • Note: The run cwd is a temp dir, so the word file must be passed as an absolute path.

fannkuchredux

  • Source: sandmark benchmarks/benchmarksgame/fannkuchredux.ml
  • Build: ocamlopt (stdlib only), compiled with -noassert -unsafe as in sandmark
  • Args: <N> — permutation length; config uses 11
  • Description: Counts the maximum number of flips needed to sort a permutation, and sums the sign of each intermediate permutation (Pfannkuchen benchmark). Pure computation with no allocation; useful as a control benchmark where GC has negligible impact.

numerical-analysis

Six benchmarks sharing benches/numerical-analysis/. Each has its own build script; two require two source files compiled in order.

crout_decomposition

  • Source: sandmark benchmarks/numerical-analysis/crout_decomposition.ml (originally OCamlPro's ocamlbench-repo)
  • Build: ocamlopt (stdlib only)
  • Args: (none)
  • Description: Crout matrix decomposition (LU factorisation variant) on a fixed matrix. Dense linear algebra; exercises float array allocation.

qr_decomposition

  • Source: sandmark benchmarks/numerical-analysis/qr_decomposition.ml (originally OCamlPro's ocamlbench-repo)
  • Build: ocamlopt (stdlib only)
  • Args: (none)
  • Description: QR decomposition via Gram-Schmidt on a fixed matrix. Dense linear algebra; similar allocation profile to crout_decomposition.

durand_kerner_aberth

  • Source: sandmark benchmarks/numerical-analysis/durand_kerner_aberth.ml (originally OCamlPro's ocamlbench-repo)
  • Build: ocamlopt (stdlib only)
  • Args: (none) — optional percentage of coefficient array (default 100); runs 10 iterations
  • Description: Finds all roots of a polynomial simultaneously using the Durand–Kerner / Weierstrass method. Complex-number arithmetic on float arrays.

fft

  • Source: sandmark benchmarks/numerical-analysis/fft.ml (originally OCamlPro's ocamlbench-repo)
  • Build: ocamlopt + unix.cmxa (uses Unix.times for timing output)
  • Args: (none) — optional array size (default 1048576)
  • Description: Cooley–Tukey FFT followed by inverse FFT on a complex float array. In-place computation; exercises large float array allocation and cache effects.

levinson_durbin

  • Source: sandmark benchmarks/numerical-analysis/levinson_durbin.ml + levinson_durbin_dataset.ml
  • Build: ocamlopt (stdlib only), two-file: dataset compiled first
  • Args: (none)
  • Description: Levinson–Durbin recursion for autoregressive modelling of Japanese vowel sound data. Exercises float array allocation with a real-world-sized numerical dataset.

naive_multilayer

  • Source: sandmark benchmarks/numerical-analysis/naive_multilayer.ml + naive_multilayer_dataset.ml
  • Build: ocamlopt (stdlib only), two-file: dataset compiled first
  • Args: (none)
  • Description: Naive multilayer neural network (forward pass + backpropagation) on the UCI Ionosphere dataset. Dense matrix operations; exercises both float array allocation and functional list structure.

sequence_cps

  • Source: sandmark benchmarks/sequence/sequence_cps.ml (originally OCamlPro's ocamlbench-repo)
  • Build: ocamlopt (stdlib only)
  • Args: <N> — sequence length; config uses 10000
  • Description: Builds a lazy CPS-style sequence of integers 0…N, then maps, filters, and folds it to compute a sum. Exercises higher-order function application and minor heap allocation in a functional pipeline; no external libraries required.

simple/stdlib benchmarks

Sandmark's benchmarks/stdlib/ suite: 10 single-file benchmarks covering core stdlib data structures. Each takes <bench_type> [args] and dispatches to sub-benchmarks.

array_bench

  • Source: sandmark benchmarks/stdlib/array_bench.ml
  • Build: ocamlopt (stdlib only)
  • Args: <bench_type> — e.g. make, init, map, etc.
  • Description: Array allocation, initialisation, map, sort, and iteration microbenchmarks.

bytes_bench

  • Source: sandmark benchmarks/stdlib/bytes_bench.ml
  • Build: ocamlopt (stdlib only)
  • Args: <bench_type>
  • Description: Bytes buffer operations: blit, fill, sub, compare.

string_bench

  • Source: sandmark benchmarks/stdlib/string_bench.ml
  • Build: ocamlopt (stdlib only)
  • Args: <bench_type>
  • Description: String operations: concat, contains, split, compare.

map_bench

  • Source: sandmark benchmarks/stdlib/map_bench.ml
  • Build: ocamlopt (stdlib only)
  • Args: <bench_type>
  • Description: Functional map (AVL tree) insert, lookup, fold, merge.

set_bench

  • Source: sandmark benchmarks/stdlib/set_bench.ml
  • Build: ocamlopt (stdlib only)
  • Args: <bench_type>
  • Description: Functional set insert, union, inter, diff.

stack_bench

  • Source: sandmark benchmarks/stdlib/stack_bench.ml
  • Build: ocamlopt (stdlib only)
  • Args: <bench_type>
  • Description: Stack push/pop operations.

hashtbl_bench

  • Source: sandmark benchmarks/stdlib/hashtbl_bench.ml
  • Build: ocamlopt (stdlib only)
  • Args: <bench_type>
  • Description: Hashtable add, find, replace, fold.

pervasives_bench

  • Source: sandmark benchmarks/stdlib/pervasives_bench.ml
  • Build: ocamlopt (stdlib only)
  • Args: <bench_type>
  • Description: Stdlib arithmetic and comparison functions.

str_bench

  • Source: sandmark benchmarks/stdlib/str_bench.ml
  • Build: ocamlopt + str.cmxa (-I +str str.cmxa)
  • Args: <bench_type>
  • Description: Regular expression operations from the Str library.

big_array_bench

  • Source: sandmark benchmarks/stdlib/big_array_bench.ml
  • Build: ocamlopt; links bigarray.cmxa only on OCaml 4.x (bundled into stdlib on 5.x)
  • Args: <bench_type>
  • Description: Bigarray allocation and element access patterns.

simple/simple-tests benchmarks

Sandmark's benchmarks/simple-tests/ suite: small stdlib-only benchmarks covering allocation, lazy evaluation, stacks, finalizers, and weak/ephemeron tables.

alloc

  • Source: sandmark benchmarks/simple-tests/alloc.ml
  • Build: ocamlopt (stdlib only)
  • Args: none
  • Description: Minor heap allocation rate benchmark: allocates tuples and small lists at high frequency.

lists

  • Source: sandmark benchmarks/simple-tests/lists.ml
  • Build: ocamlopt (stdlib only)
  • Args: none
  • Description: List operations: append, rev, map, filter, fold.

stress

  • Source: sandmark benchmarks/simple-tests/stress.ml
  • Build: ocamlopt (stdlib only)
  • Args: none
  • Description: Allocation stress test; exercises minor and major GC.

lazylist

  • Source: sandmark benchmarks/simple-tests/lazylist.ml
  • Build: ocamlopt (stdlib only)
  • Args: none
  • Description: Lazy list operations via Lazy.t suspension.

lazy_primes

  • Source: sandmark benchmarks/simple-tests/lazy_primes.ml
  • Build: ocamlopt (stdlib only)
  • Args: none
  • Description: Lazy sieve of Eratosthenes using Lazy.t-deferred streams.

morestacks

  • Source: sandmark benchmarks/simple-tests/morestacks.ml
  • Build: ocamlopt (stdlib only)
  • Args: none
  • Description: Stack operations on functional and imperative stacks.

stacks

  • Source: sandmark benchmarks/simple-tests/stacks.ml
  • Build: ocamlopt (stdlib only)
  • Args: none
  • Description: Stdlib Stack module push/pop under various patterns.

finalise

  • Source: sandmark benchmarks/simple-tests/finalise.ml
  • Build: ocamlopt (stdlib only)
  • Args: none
  • Description: GC finalizer registration and invocation throughput (Gc.finalise).

weakretain

  • Source: sandmark benchmarks/simple-tests/weakretain.ml
  • Build: ocamlopt (stdlib only)
  • Args: none
  • Description: Weak pointer retention: allocates objects and checks how many survive GC through a Weak.t array.

weak_htbl

  • Source: sandmark benchmarks/simple-tests/weak_htbl.ml
  • Build: ocamlopt (stdlib only); OCaml 5.x adaptation (see SANDMARK_ADAPTATIONS.md)
  • Args: <N> — table size
  • Description: Correctness and performance test for ephemeron-based hash tables (Ephemeron.K{1,2,n}.Make) versus regular Hashtbl and Map-backed tables. Note: iter was removed from Ephemeron modules in OCaml 5.x; the correctness assertions for weak tables now pass vacuously.

with_deps benchmarks

graph500seq

  • Source: sandmark benchmarks/graph500seq/
  • Build: dune (multi-library), in with_deps/graph500seq/
  • Args: <edges-file> — absolute path to edges.data; generated by build.deps.sh at /home/udesou/benches/with_deps/graph500seq/edges.data
  • Description: Graph500 Kernel 1 (BFS-reachable subgraph construction) on a Kronecker random graph. Builds a sparse adjacency representation from a large array of edges. Memory-intensive pointer chasing; exercises the major GC heavily.
  • graphTypes: In sandmark, GraphTypes is provided by a parent dune-project scope. Here it is defined explicitly in graphTypes.ml (type vertex = int, type weight = float, type edge = vertex * vertex * weight).
  • Data generation: graph500seq.build.deps.sh builds gen.exe with dune and runs it (-scale 21 -edgefactor 16) to produce edges.data (~64M edges). This is done once and reused across all runtime versions since the data is OCaml-version-independent.
  • Timeout: 300 s (longer than simple benchmarks due to data loading + graph construction).

knucleotide

  • Source: sandmark benchmarks/benchmarksgame/knucleotide.ml
  • Build: dune (stdlib only), in with_deps/benchmarksgame/
  • Args: <input-file> — absolute path to input25000000.txt; generated by benchmarksgame.build.deps.sh
  • Description: Counts k-nucleotide frequencies (k=1,2) and specific subsequence occurrences in a 25M-nucleotide FASTA sequence. Uses a custom Hashtbl.Make with Bytes keys.
  • Adaptation: Accepts input file path as argv[1] (falls back to "input25000000.txt" for standalone use).

knucleotide3

  • Source: sandmark benchmarks/benchmarksgame/knucleotide3.ml
  • Build: dune (stdlib only), in with_deps/benchmarksgame/
  • Args: <input-file> — absolute path to input25000000.txt; generated by benchmarksgame.build.deps.sh
  • Description: Same k-nucleotide counting as knucleotide but with a packed-integer hash key optimisation (encodes bases as 2-bit values, avoiding Bytes allocation for keys ≤ 31 bases on 64-bit).
  • Adaptation: Accepts input file path as argv[1] (falls back to "input25000000.txt" for standalone use).

revcomp2

  • Source: sandmark benchmarks/benchmarksgame/revcomp2.ml
  • Build: dune (stdlib only), in with_deps/benchmarksgame/
  • Args: <input-file> — absolute path to input25000000.txt; generated by benchmarksgame.build.deps.sh
  • Description: Reverse-complement all DNA sequences in a FASTA file and print the result. Pure Bytes/string I/O; exercises buffer allocation and output.
  • Adaptation: Accepts input file path as argv[1] (falls back to "input25000000.txt" for standalone use).

regexredux2

  • Source: sandmark benchmarks/benchmarksgame/regexredux2.ml
  • Build: dune (str library), in with_deps/benchmarksgame/
  • Args: <input-file> — absolute path to input5000000.txt; generated by benchmarksgame.build.deps.sh
  • Description: Counts regex pattern matches in a 5M-nucleotide FASTA sequence, then applies a series of substitutions. Uses Str (OCaml's built-in regex library).
  • Adaptation: Accepts input file path as argv[1] (falls back to "input5000000.txt" for standalone use).

primes

  • Source: sandmark benchmarks/mpl/bench/primes/
  • Build: dune (multi-library), in with_deps/mpl/; auto-installs domainslib
  • Args: -N <int> -procs <int> — parallel prime sieve; -N is the upper bound (default 100M), -procs is domain count
  • Description: Parallel sieve of Eratosthenes using the mpl Forkjoin library (wraps Domainslib.Task). OCaml ≥ 5 required.

msort_ints

  • Source: sandmark benchmarks/mpl/bench/msort_ints/
  • Build: dune (multi-library), in with_deps/mpl/; auto-installs domainslib
  • Args: -N <int> -procs <int> — array size (default 10M) and domain count
  • Description: Parallel merge sort on a random integer array using the mpl Seq/Merge/Quicksort libraries. OCaml ≥ 5 required.

msort_strings

  • Source: sandmark benchmarks/mpl/bench/msort_strings/
  • Build: dune (multi-library), in with_deps/mpl/; auto-installs domainslib
  • Args: -f <words64.txt> -procs <int> — absolute path to inputs/words64.txt (37 MB, bundled)
  • Description: Parallel merge sort on strings read from a word file. OCaml ≥ 5 required.

tokens

  • Source: sandmark benchmarks/mpl/bench/tokens/
  • Build: dune (multi-library), in with_deps/mpl/; auto-installs domainslib
  • Args: -f <words.txt> --no-output -procs <int> — absolute path to inputs/words.txt (590 KB, bundled)
  • Description: Parallel token frequency count using a concurrent hashset. OCaml ≥ 5 required.

raytracer

  • Source: sandmark benchmarks/mpl/bench/raytracer/
  • Build: dune (multi-library), in with_deps/mpl/; auto-installs domainslib
  • Args: -n <width> -procs <int> — image width in pixels (default 2000) and domain count
  • Description: Parallel ray tracer (rgbbox scene by default). Creates its own Domainslib.Task pool independently of the Forkjoin pool. OCaml ≥ 5 required.

with_packages benchmarks

Benchmarks in with_packages/ require external opam packages. No manual package installation is needed — each build script runs opam install <pkg> -y to install its dependencies into the active opam switch.

How package installation works

running-ng creates a dedicated opam switch for each runtime via opam-compiler (e.g., running-ng-ocaml-v5.4). The switch environment is activated before invoking the build script, so opam install targets the correct switch automatically. Packages are cached in the switch and reused across benchmark builds for the same runtime.

Overriding the opam switch

To force a specific switch, add build_env to the suite in the YAML config:

sandmark-with-packages:
  type: OCamlBenchmarkSuite
  build_env:
    OPAM_SWITCH: "my-custom-switch"
  programs:
    fasta3: ...

OPAM_SWITCH takes precedence over all auto-detection.

benchmarksgame

Seven programs sharing benches/with_packages/benchmarksgame/. Each has its own build script; all use dune and link against zarith, str, and unix.

  • binarytrees5 — Args: 21. Allocates and traverses binary trees of depth 21 using Zarith big integers for node values. GC-intensive short-lived allocation.
  • fasta3 — Args: 25000000. Generates a DNA sequence of 25M characters using cumulative probability tables. Exercises sequential array access.
  • fasta6 — Args: 25000000. Alternative fasta generator; same input size, different internal algorithm.
  • mandelbrot6 — Args: 16000. Renders a 16000×16000 Mandelbrot set image in PBM format. Pure floating-point; no GC pressure.
  • nbody — Args: 50000000. N-body planetary simulation (5 bodies, 50M steps). Pure floating-point; tests float unboxing.
  • pidigits5 — Args: 10000. Computes 10000 digits of π using the Stern-Brocot tree algorithm via Zarith arbitrary-precision integers.
  • spectralnorm2 — Args: 5500. Approximates the spectral norm of an infinite matrix. Dense floating-point; exercises float arrays.

zarith

Four programs sharing benches/with_packages/zarith/. Each has its own build script; all use dune.

  • zarith_fact — Args: 40 1000000. Computes factorial of 40, repeated 1M times. Exercises Zarith multiplication. Needs zarith.
  • zarith_fib — Args: Z 40. Fibonacci of 40 using Zarith big integers. Exercises Zarith addition. Needs zarith, num.
  • zarith_pi — Args: 10000. Computes 10000 π digits via the Stern-Brocot streaming algorithm. Exercises Zarith division/comparison. Needs zarith.
  • zarith_tak — Args: Z 2500. Tak function with n=2500 using Zarith integers. Exercises recursive calls with big-integer arithmetic. Needs zarith, num.

chameneos_redux_lwt

  • Source: sandmark benchmarks/chameneos/
  • Build: dune + lwt.unix
  • Args: <meetings> — number of colour-changing meetings; config uses 600000
  • Description: Simulates chameneos creatures meeting in a waiting room and swapping colours, implemented with Lwt lightweight threads. Exercises Lwt cooperative scheduling and mvar synchronisation.

thread_ring_lwt_mvar / thread_ring_lwt_stream

Both in benches/with_packages/thread-lwt/; shared dune file.

  • Source: sandmark benchmarks/thread-lwt/
  • Build: dune + lwt, lwt.unix
  • Args: <N> — number of ring-pass iterations; config uses 20000
  • thread_ring_lwt_mvar — Token passed around a ring of 503 Lwt threads via Lwt_mvar. Exercises mvar hand-off latency.
  • thread_ring_lwt_stream — Same ring, but using Lwt_stream channels. Slightly higher allocation than the mvar variant.

test_decompress

  • Source: sandmark benchmarks/decompress/test_decompress.ml
  • Build: dune + bigstringaf, checkseum.ocaml, decompress.zl
  • Args: (none) — defaults to 64 compress/decompress iterations on 32 KB of data
  • Description: Microbenchmark for the decompress pure-OCaml zlib implementation. Compresses then decompresses a block of data in a loop. Exercises allocation of Bigarray-backed buffers and the functional zipper-style stream API.

ydump

  • Source: sandmark benchmarks/yojson/ydump.ml
  • Build: dune + yojson, camlp-streams
  • Args: -c <json-file> — compact-print a JSON file; config uses the bundled sample.json (absolute path required since run cwd is a temp dir)
  • Description: Parses and pretty-prints a JSON document using the Yojson library. Exercises OCaml's Buffer-based output, recursive tree traversal, and moderate allocation from parsing.

test_sched

  • Source: sandmark benchmarks/multicore-effects/ms_sched.ml + test_sched.ml (adapted)
  • Build: ocamlfind + saturn_lockfree
  • Args: <num_domains> <tasks_to_spawn> <list_length>
  • Description: Microbenchmark for a concurrent round-robin effects-based scheduler (ms_sched.ml). Spawns <tasks_to_spawn> tasks per run, each allocating a list of length <list_length>. The scheduler uses a Saturn Michael–Scott queue as its run queue and Domain.spawn to run workers across <num_domains> domains. Exercises effect handler dispatch, continuation enqueuing, and domain coordination.
  • Note: In with_packages/ (not multicore/) because it depends on saturn_lockfree. See SANDMARK_ADAPTATIONS.md for the porting changes from the sandmark original.

test_lwt (valet)

  • Source: sandmark benchmarks/valet/ (4 files: valet_core.ml, valet_react.ml, test_lib.ml, test_lwt.ml)
  • Build: dune + uuidm, ocplib-endian, react, lwt
  • Args: <n> — number of users/readers/doors; each of n persons swipes n times → O(n²) events
  • Description: Reactive access-control simulation. n people each hold a UUID-backed QR code; n QR readers feed into a controller (via react event streams) that maps codes to users, which doors then act on. All persons run concurrently via Lwt.join with Lwt.pause () yields between each swipe. Exercises Lwt cooperative scheduling, react event propagation, and UUID/map allocation.
  • OxCaml: incompatible (lwt.unix locality error, same as chameneos_redux_lwt)

contrast

  • Source: sandmark benchmarks/sauvola/contrast.ml
  • Build: dune + camlimages (camlimages.all_formats sub-library)
  • Args: <input.ppm> <output_prefix> — config uses the bundled example2_small.ppm (absolute path); output goes to /tmp/sauvola_out__*.ppm
  • Description: Applies 8 image binarisation algorithms (adaptive contrast spreading, Niblack global/local, Sauvola global/local) to a PPM image. Each algorithm creates a new rgb24 image and iterates over all pixels, exercising OO-style image allocation and GC-heavy pixel-by-pixel access patterns.

owl_gc

  • Source: sandmark benchmarks/owl/owl_gc.ml
  • Build: dune + owl-base (pure OCaml; no CBLAS/LAPACK required)
  • Args: (none)
  • Description: Computes a Gromov-Wasserstein distance matrix over 100 random 100×100 distance matrices using Owl's dense matrix operations (Bigarray-backed). Exercises large Bigarray allocation and GC interaction with non-moving arrays. Uses owl-base instead of owl to avoid CBLAS/LAPACK build requirements.

multicore benchmarks

Benchmarks that require OCaml 5.x and the Effect module. Source is in multicore/; the flat layout mirrors simple/ (each benchmark in its own subfolder, with an optional build.deps.sh for generated data).

Use OCamlMulticoreBenchmarkSuite (instead of OCamlBenchmarkSuite) in running-ng configs. This suite type enforces OCaml >= 5 at build/run time and raises a clear error if you attempt to sweep with an older compiler.

multicore/multicore-effects

Single-file effect benchmarks compiled with ocamlopt. Adapted from sandmark benchmarks/multicore-effects/ for the OCaml 5.2+ Effect module API (sandmark's originals use the pre-5.2 effect keyword syntax, which is not accepted by OCaml 5.2+).

algorithmic_differentiation

  • Source: sandmark benchmarks/multicore-effects/algorithmic_differentiation.ml (adapted)
  • Build: ocamlopt (stdlib only)
  • Args: <iterations> — default 100
  • Description: Reverse-mode automatic differentiation using deep effect handlers (Add and Mult effects). Exercises deep effect handler dispatch, continuation resumption, and float array allocation.

rec_eff_fib / rec_seq_fib

  • Source: sandmark benchmarks/multicore-effects/rec_eff_{fib,seq_fib}.ml (adapted)
  • Build: ocamlopt (stdlib only)
  • Args: <iters> <n> — default 4 40 (expected output per iter: 102334155)
  • Description: Recursive Fibonacci. rec_eff_fib installs a try_with effect handler at each recursive call site (handler is never triggered; effect E is never performed) — tests the overhead of handler installation compared to the pure-recursive rec_seq_fib baseline.

rec_eff_tak / rec_seq_tak

  • Source: sandmark benchmarks/multicore-effects/rec_eff_{tak,seq_tak}.ml (adapted)
  • Build: ocamlopt (stdlib only)
  • Args: <iters> <x> <y> <z> — default 1 40 20 11 (expected output per iter: 12)
  • Description: Takeuchi function. Same handler-overhead comparison pattern as rec_{eff,seq}_fib; three handler installations per recursive call.

rec_eff_ack / rec_seq_ack

  • Source: sandmark benchmarks/multicore-effects/rec_eff_{ack,seq_ack}.ml (adapted)
  • Build: ocamlopt (stdlib only)
  • Args: <iters> <m> <n> — default 2 3 11 (expected output per iter: 16381)
  • Description: Ackermann function. Same pattern; tests effect handler overhead on a deeply recursive, stack-intensive computation.

effect_throughput_val

  • Source: sandmark benchmarks/multicore-effects/effect_throughput_val.ml (adapted)
  • Build: ocamlopt (stdlib only)
  • Args: <n_iter> — default 1_000_000
  • Description: Measures the throughput of an effect handler block where perform is never called and a value is returned directly. The E : unit Effect.t handler is installed but never triggered; cost is purely the handler frame setup and teardown (stack allocation, context switch in/out, deallocation).

effect_throughput_perform

  • Source: sandmark benchmarks/multicore-effects/effect_throughput_perform.ml (adapted)
  • Build: ocamlopt (stdlib only)
  • Args: <n_iter> — default 1_000_000
  • Description: Measures the throughput of a full perform–resume cycle. E : int -> int Effect.t is performed once per iteration and the continuation is immediately resumed with the same value. Cost includes the perform (stack switch to handler), the continue k x call (stack switch back), and frame deallocation.

effect_throughput_perform_drop

  • Source: sandmark benchmarks/multicore-effects/effect_throughput_perform_drop.ml (adapted)
  • Build: ocamlopt (stdlib only)
  • Args: <n_iter> — default 1_000_000
  • Description: Like effect_throughput_perform but the continuation is abandoned (not resumed). Measures the perform overhead plus the cost of GC-collecting a dropped continuation.

rec_eff_evenodd / rec_seq_evenodd

  • Source: sandmark benchmarks/multicore-effects/rec_eff_evenodd.ml / rec_seq_evenodd.ml (adapted / verbatim)
  • Build: ocamlopt (stdlib only)
  • Args: <iters> <n>; defaults 2 500_000_000
  • Description: Even-odd mutual recursion benchmark. rec_eff_evenodd installs a dummy effect handler at each odd call; rec_seq_evenodd is the plain baseline. Measures effect handler call overhead on a tight mutual recursion loop.

rec_eff_motzkin / rec_seq_motzkin

  • Source: sandmark benchmarks/multicore-effects/rec_eff_motzkin.ml / rec_seq_motzkin.ml (adapted / verbatim)
  • Build: ocamlopt (stdlib only)
  • Args: <iters> <n>; defaults 4 21
  • Description: Computes the n'th Motzkin number (number of ways to draw non-intersecting chords between n circle points). rec_eff_motzkin wraps each recursive call in a dummy try_with; rec_seq_motzkin is the baseline. n=21 yields 142547559.

rec_eff_sudan / rec_seq_sudan

  • Source: sandmark benchmarks/multicore-effects/rec_eff_sudan.ml / rec_seq_sudan.ml (adapted / verbatim)
  • Build: ocamlopt (stdlib only)
  • Args: <iters> <n> <x> <y>; defaults 10_000_000 2 2 2
  • Description: Computes the Sudan function (recursive but not primitive recursive). rec_eff_sudan wraps the inner recursive call in a dummy try_with; rec_seq_sudan is the baseline. Defaults yield 15569256417.

eratosthenes

  • Source: sandmark benchmarks/multicore-effects/eratosthenes.ml (adapted)
  • Build: ocamlopt (stdlib only)
  • Args: <n> — generate primes up to n; default 101
  • Description: Message-passing Sieve of Eratosthenes implemented entirely with effects. Uses four effects (Spawn, Yield, Send, Recv) and two layered handlers: run (round-robin scheduler handling Spawn/Yield) and mailbox (per-pid message queue handling Send/Recv). The outer mailbox handler catches Send/Recv that bubble through run's handler. Exercises effect handler chaining, continuation queuing, and a Map-backed mailbox.

multicore/multicore-structures

Lock-free concurrent data structures implemented with OCaml 5 stdlib Atomic. No external packages required — the sandmark originals referenced kcas, but all atomic operations (Atomic.t, Atomic.get, Atomic.set, Atomic.compare_and_set) are available in the stdlib since OCaml 5.0. Each test program is compiled together with its data-structure module using ocamlfind -package unix.

Data structure modules (in the benchmark directory, compiled alongside each test):

  • ms_queue.ml — Michael–Scott lock-free MPMC queue using Atomic.t and CAS loops.
  • treiber_stack.ml — Treiber lock-free LIFO stack using Atomic.t.
  • spsc_queue.ml — Wait-free bounded SPSC queue with cache-line padding.

test_queue_sequential

  • Source: sandmark benchmarks/multicore-structures/test_queue_sequential.ml
  • Build: ocamlfind + unix (stdlib Atomic, no domainslib)
  • Args: <items> — number of items to enqueue/dequeue
  • Description: Sequentially enqueues then dequeues <items> integers through the MS queue. Checks that no items are lost and reports throughput (items/ms).

test_queue_parallel

  • Source: sandmark benchmarks/multicore-structures/test_queue_parallel.ml
  • Build: ocamlfind + unix
  • Args: <items>
  • Description: One domain enqueues <items> integers while a second domain concurrently dequeues. Exercises the MS queue's CAS-based enqueue/dequeue paths under concurrent access.

test_stack_sequential

  • Source: sandmark benchmarks/multicore-structures/test_stack_sequential.ml
  • Build: ocamlfind + unix
  • Args: <items>
  • Description: Sequential push/pop stress test on the Treiber stack.

test_stack_parallel

  • Source: sandmark benchmarks/multicore-structures/test_stack_parallel.ml
  • Build: ocamlfind + unix
  • Args: <items>
  • Description: Concurrent push (one domain) / pop (another domain) on the Treiber stack.

test_spsc_queue_sequential

  • Source: sandmark benchmarks/multicore-structures/test_spsc_queue_sequential.ml
  • Build: ocamlfind + unix
  • Args: <items> — items per run; repeats 1000 times
  • Description: Sequential enqueue/dequeue cycle on the SPSC queue. Reports ns/item throughput.

test_spsc_queue_parallel

  • Source: sandmark benchmarks/multicore-structures/test_spsc_queue_parallel.ml
  • Build: ocamlfind + unix
  • Args: <items>
  • Description: One domain enqueues while another dequeues via the SPSC queue. Exercises the wait-free fast path.

test_spsc_queue_pingpong_parallel

  • Source: sandmark benchmarks/multicore-structures/test_spsc_queue_pingpong_parallel.ml
  • Build: ocamlfind + unix
  • Args: <num_threads> <num_messages>
  • Description: Creates a ring of <num_threads> domains, each connected to the next by an SPSC queue. Ping messages circulate until a Bye terminates each thread. Measures inter-domain message-passing latency through a chain of SPSC queues.

multicore/multicore-numerical

Parallel versions of classic numerical benchmarks using domainslib. Each multicore benchmark has a corresponding sequential baseline. All compiled with ocamlfind -package domainslib (or stdlib-only for sequentials). First argument is always <num_domains>.

mandelbrot6_multicore

  • Source: sandmark benchmarks/multicore-numerical/mandelbrot6_multicore.ml
  • Build: ocamlfind + domainslib
  • Args: <num_domains> <width> — default 1 200
  • Description: Parallel Mandelbrot set renderer. Uses Task.parallel_for over rows; each domain computes a horizontal strip. Outputs PBM binary format to stdout. Based on benchmarksgame Mandelbrot #6.

nbody_multicore / nbody

  • Source: sandmark benchmarks/multicore-numerical/{nbody_multicore,nbody}.ml
  • Build: ocamlfind + domainslib (multicore); ocamlopt stdlib (sequential)
  • Args: <num_domains> <n> <num_bodies> — default 1 500 1024; sequential: <n> <num_bodies> — default 500 1024
  • Description: N-body gravitational simulation. Parallel version uses Task.parallel_for for the velocity-update inner loop and Task.parallel_for_reduce for energy computation.

floyd_warshall_multicore / floyd_warshall

  • Source: sandmark benchmarks/multicore-numerical/{floyd_warshall_multicore,floyd_warshall}.ml
  • Build: ocamlfind + domainslib; stdlib
  • Args: <num_domains> <n> — default 1 4; sequential: <n> — default 4
  • Description: All-pairs shortest path (Floyd–Warshall). The outer k loop is sequential (dependency), inner i loop parallelised with Task.parallel_for. Uses an algebraic edge type (Value of int | Infinity).

game_of_life_multicore / game_of_life

  • Source: sandmark benchmarks/multicore-numerical/{game_of_life_multicore,game_of_life}.ml
  • Build: ocamlfind + domainslib; stdlib
  • Args: <num_domains> <n_times> <board_size> — default 1 2 1024; sequential: <n_times> <board_size> — default 2 1024
  • Description: Conway's Game of Life on a board_size × board_size grid, iterated n_times steps. Row updates parallelised with Task.parallel_for.

binarytrees5_multicore

  • Source: sandmark benchmarks/multicore-numerical/binarytrees5_multicore.ml
  • Build: ocamlfind + domainslib
  • Args: <num_domains> <max_depth> — default 1 10
  • Description: Binary tree construction and checksum benchmark (benchmarksgame binary-trees #5). Uses Task.async/Task.await to parallelise tree checks across depths and domains. Exercises GC allocation and domain-local work stealing.

spectralnorm2_multicore

  • Source: sandmark benchmarks/multicore-numerical/spectralnorm2_multicore.ml
  • Build: ocamlfind + domainslib
  • Args: <num_domains> <n> — default 1 2000
  • Description: Spectral norm of the infinite matrix A where A[i,j] = 1/((i+j)*(i+j+1)/2+i+1). Power iteration using Task.parallel_for for matrix-vector products. Based on benchmarksgame spectral-norm #2.

fannkuchredux_multicore

  • Source: sandmark benchmarks/multicore-numerical/fannkuchredux_multicore.ml
  • Build: ocamlfind + domainslib
  • Args: <workers> <n> — default 10 7
  • Description: Fannkuch-redux (permutation counting). Divides the factorial permutation space into workers chunks and uses Task.parallel_for to count flip operations in parallel.

quicksort_multicore / quicksort

  • Source: sandmark benchmarks/multicore-numerical/{quicksort_multicore,quicksort}.ml
  • Build: ocamlfind + domainslib; stdlib
  • Args: <num_domains> <n> — default 1 2000; sequential: <n> — default 2000
  • Description: Parallel quicksort using Task.async/Task.await to spawn recursive subproblems. Depth-bounded spawning (halves remaining depth budget at each partition).

mergesort_multicore / mergesort

  • Source: sandmark benchmarks/multicore-numerical/{mergesort_multicore,mergesort}.ml
  • Build: ocamlfind + domainslib; stdlib
  • Args: <num_domains> <n> — default 1 1024; sequential: <n> — default 1024
  • Description: Parallel merge sort using Task.async/Task.await. Falls back to bubble sort below threshold (32 elements). Uses an in-place double-buffer merge strategy.

matrix_multiplication_multicore / matrix_multiplication

  • Source: sandmark benchmarks/multicore-numerical/{matrix_multiplication_multicore,matrix_multiplication}.ml
  • Build: ocamlfind + domainslib; stdlib
  • Args: <num_domains> <size> — default 1 1024; sequential: <size> — default 1024
  • Description: Dense integer matrix multiplication. Row-parallel using Task.parallel_for over the output rows.

matrix_multiplication_tiling_multicore

  • Source: sandmark benchmarks/multicore-numerical/matrix_multiplication_tiling_multicore.ml
  • Build: ocamlfind + domainslib
  • Args: <num_domains> <size> — default 1 1024
  • Description: Tiled matrix multiplication using explicit Domainslib.Chan-based task distribution rather than parallel_for. Tile size is 64. The channel-based dispatch is chosen because the loop has decreasing work per iteration, which makes static parallel_for chunking suboptimal.

LU_decomposition_multicore / LU_decomposition

  • Source: sandmark benchmarks/multicore-numerical/{LU_decomposition_multicore,LU_decomposition}.ml
  • Build: ocamlfind + domainslib; stdlib
  • Args: <num_domains> <mat_size> — default 1 1200; sequential: <mat_size> — default 1200
  • Description: In-place LU decomposition of a random float matrix. Uses Task.parallel_for for row elimination and Domain.DLS for domain-local random state. Stores L and U in packed form.

nqueens_multicore / nqueens

  • Source: sandmark benchmarks/multicore-numerical/{nqueens_multicore,nqueens}.ml
  • Build: ocamlfind + domainslib; stdlib
  • Args: <num_domains> <board_size> — default 2 13; sequential: <board_size> — default 13
  • Description: N-queens solver. Parallel version spawns a Task.async for each valid queen placement at each row, aggregating results with Task.await.

evolutionary_algorithm_multicore / evolutionary_algorithm

  • Source: sandmark benchmarks/multicore-numerical/{evolutionary_algorithm_multicore,evolutionary_algorithm}.ml
  • Build: ocamlfind + domainslib; stdlib
  • Args: <num_domains> <n> <lambda> — default 4 1000 1000; sequential: <n> <lambda> — default 1000 1000
  • Description: Minimal genetic algorithm optimising the Onemax fitness function. Parallel version uses Task.parallel_for to evaluate and mutate the population in each generation. Uses Domain.DLS for domain-local random state.

multicore/multicore-grammatrix

Gram matrix benchmark from the Yamanishi laboratory. Compiled with ocamlfind; requires a data/ subdirectory with CSV input files (bundled). The benchmark reads feature vectors from a CSV (space-separated floats) and computes the symmetric Gram matrix via dot products. Default input is data/tox21_nrar_ligands_std_rand_01.csv (7026 samples).

A shared helper module utls.ml is compiled alongside the main benchmark in each build.

grammatrix

  • Source: sandmark benchmarks/multicore-grammatrix/grammatrix.ml + utls/utls.ml
  • Build: ocamlfind + unix (sequential)
  • Args: <ncores> <input_file> — default 1 data/tox21_nrar_ligands_std_rand_01.csv
  • Description: Sequential Gram matrix computation. Reads feature vectors, computes the full N×N symmetric matrix in O(N²) dot products, then prints a corner summary. The ncores argument is accepted but ignored (present for interface parity with the multicore version).

grammatrix_multicore

  • Source: sandmark benchmarks/multicore-grammatrix/grammatrix_multicore.ml + utls/utls.ml
  • Build: ocamlfind + domainslib + unix
  • Args: <num_domains> <chunk_size> <input_file> — default 4 16 data/tox21_nrar_ligands_std_rand_01.csv
  • Description: Parallel Gram matrix computation using explicit Domainslib.Chan-based task distribution. Work chunks of <chunk_size> rows are sent through a bounded channel; each domain fetches and processes chunks until a Quit message is received. Channel-based dispatch is preferred over parallel_for here because earlier rows have more work (triangular iteration), so pre-computing and queuing chunks in decreasing-work order improves load balance. Note: the benchmark must be run from the multicore-grammatrix/ directory so that the data/ relative path resolves correctly.

multicore/oxcaml-prefetch (OxCaml only)

Multicore GC stress test using OxCaml-specific APIs. Requires an OxCaml compiler (Jane Street's OCaml fork) — will not compile with stock OCaml.

oxcaml_prefetch

  • Source: custom benchmark (not from sandmark)
  • Build: ocamlopt (stdlib only)
  • Args: (none)
  • Description: Spawns 8 domains using Domain.Safe.spawn (OxCaml API), each building a large binary tree of depth 28 with 10-byte string leaves. After all domains have built their trees, the main domain runs 10 Gc.full_major cycles. Exercises concurrent major GC marking across multiple domains with a large shared live set. Uses Sys.poll_actions for cooperative domain coordination and Atomic for synchronisation.
  • OxCaml APIs used: Domain.Safe.spawn, Sys.poll_actions
  • Suite type: OCamlOxcamlBenchmarkSuite — fails with an error if the runtime is not type: OxCaml.

multicore/multicore-minilight

Parallel global illumination renderer (MiniLight 1.5.2). A Monte Carlo path tracer with an octree spatial index. All nine source modules are compiled together in dependency order using ocamlfind -package domainslib. Only the parallel entry point (minilight_multicore) is provided; the sequential variant is omitted because its camera.ml has a different API signature.

Compilation order: vector3f → triangle → surfacePoint → spatialIndex → scene → image → rayTracer → camera → minilight_multicore

minilight_multicore

  • Source: sandmark benchmarks/multicore-minilight/parallel/ (all modules)
  • Build: ocamlfind + domainslib (9-module compilation)
  • Args: <scene_file> — path to a MiniLight scene description (e.g. roomfront.ml.txt, bundled)
  • Description: Parallel path tracer. Each frame's pixel rows are distributed across domains using Task.parallel_for inside Camera.frame. Uses Domain.DLS for per-domain Random.State to avoid contention. Renders progressively, printing progress to stderr and saving PPM output to <scene_file>.ppm. Note: the renderer runs until interrupted; for benchmarking, wrap with a timeout or limit iterations in the scene file.

multicore/graph500par

Parallel Graph500 Kronecker graph generator and BFS kernel. Two executables are built from shared library modules; gen must be run first to produce an edge-list data file that kernel1_run_multicore then reads.

Compilation order for both executables: graphTypes → sparseGraph → generate → [gen | kernel1Par → kernel1_run_multicore]

gen

  • Source: sandmark benchmarks/graph500par/gen.ml (+ generate.ml, sparseGraph.ml, graphTypes.ml)
  • Build: ocamlfind + domainslib + unix
  • Args: [-scale SCALE] [-edgefactor EDGE_FACTOR] [-ndomains NUM_DOMAINS] OUTPUT_FILE — defaults scale=12 edgefactor=16 ndomains=1
  • Description: Kronecker graph generator implementing the Graph500 specification. Generates 2^scale vertices and edgefactor * 2^scale edges using a probabilistic bit-setting algorithm with random permutations. Edge generation uses Task.parallel_for. Writes the edge list to OUTPUT_FILE via Marshal.

kernel1_run_multicore

  • Source: sandmark benchmarks/graph500par/kernel1_run_multicore.ml (+ kernel1Par.ml, generate.ml, sparseGraph.ml, graphTypes.ml)
  • Build: ocamlfind + domainslib + unix
  • Args: [-ndomains NUM_DOMAINS] EDGE_LIST_FILE
  • Description: Graph500 Kernel 1 — parallel construction of a sparse adjacency-list representation. Reads the pre-generated edge list from EDGE_LIST_FILE, removes self-loops, finds the maximum vertex label using Task.parallel_for_reduce, and builds the sparse graph using Task.parallel_for with lock-free Atomic.t-based adjacency lists. Reports I/O and construction time.

multicore/alloc_multicore

alloc_multicore

  • Source: sandmark benchmarks/simple-tests/alloc_multicore.ml
  • Build: ocamlopt (stdlib only — uses Domain.spawn/Domain.join)
  • Args: <num_domains> <iterations>; config uses 2 200_000
  • Description: Parallel minor-heap allocation benchmark. Each domain allocates small mutable records { an_int; a_string; a_float } in a tight loop. Measures allocation throughput under parallel GC pressure.

multicore/pingpong_multicore

pingpong_multicore

  • Source: sandmark benchmarks/simple-tests/pingpong_multicore.ml
  • Build: ocamlfind + domainslib (auto-installed)
  • Args: <num_domains> <chan_size> <total_messages>; config uses 3 1 1000000
  • Description: Multi-domain channel ping-pong benchmark using Domainslib.Chan. A producer sends messages through a pipeline of worker domains, each incrementing a counter before forwarding. Measures channel throughput and domain synchronisation overhead.

TODO — Benchmarks Not Yet Added

Need external opam packages (not yet integrated)

These benchmarks were not added because their dependencies are complex or unusual.

  • valet — Requires lwt, react, and uuidm; unusual event-loop structure.
  • simple-tests (partial)ocamlcapi requires C stubs (skipped). alloc_multicore and pingpong_multicore are now ported (see multicore/alloc_multicore/ and multicore/pingpong_multicore/).
  • irmin — Requires irmin, irmin-pack, index, and related packages.
  • owl — Requires owl-base.
  • mpl — Requires several packages (mtime, progress, etc.).

Need multicore / OCaml 5 effects

These benchmarks require domainslib, multiple domains, or OCaml 5 effect handlers, and are not meaningful on OCaml 4.x.

  • multicore-effects — fully ported: algorithmic_differentiation, rec_eff_fib, rec_seq_fib, rec_eff_tak, rec_seq_tak, rec_eff_ack, rec_seq_ack, effect_throughput_val, effect_throughput_perform, effect_throughput_perform_drop, eratosthenes, rec_eff_evenodd, rec_seq_evenodd, rec_eff_motzkin, rec_seq_motzkin, rec_eff_sudan, rec_seq_sudan, ms_sched/test_sched (in with_packages/test_sched/). Not ported: queens and effect_throughput_clone require multi-shot continuations (Obj.clone_continuation), removed in OCaml 5.2 with no stdlib replacement.
  • multicore-grammatrix — Added to multicore/multicore-grammatrix/.
  • multicore-minilight — Added to multicore/multicore-minilight/.
  • multicore-numerical — Added to multicore/multicore-numerical/.
  • multicore-structures — Added to multicore/multicore-structures/; uses OCaml 5 stdlib Atomic (no kcas required).
  • graph500par — Added to multicore/graph500par/.

Need C stubs or mixed OCaml/C build

These benchmarks require compiling C foreign stubs alongside OCaml code, which is not yet supported by the simple ocamlopt build scripts used here. They may be revisited once a mixed-language build strategy is in place.

  • multicore-gcroots — Tests concurrent GC root registration across domains. The sandmark version wraps internal OCaml GC C APIs (caml_register_generational_global_root, etc.) via a C stub library (globrootsprim). A pure-OCaml rewrite using Gc.minor()/Gc.full_major() across domains could approximate the intent, but would not be the same benchmark.

Need external tool binaries or large external data

  • minilight — The sandmark dune file only declares a data file alias (roomfront.ml.txt); no executable stanza is present, suggesting the benchmark needs a different integration approach.

Note: alt-ergo, coq, cpdf, cubicle, frama-c, and menhir have been ported and are in macrobenchmarks/ (see below).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages