Name	Name	Last commit message	Last commit date
parent directory ..
Makefile	Makefile
README.md	README.md
bench_lcms.c	bench_lcms.c
fetch-lcms2.ps1	fetch-lcms2.ps1
fetch-lcms2.sh	fetch-lcms2.sh

`bench/lcms_c/` — native lcms2 benchmark

Companion to bench/lcms-comparison/ that measures native lcms2 (C, -O3 -march=native) on the exact same 4 workflows, exact same pixel count, exact same seeded PRNG input, and exact same timing loop (warmup + median-of-5 batches of 100 iters). The MPx/s numbers drop straight into the comparison tables in ../../docs/Performance.md.

This is the missing row in the existing comparison story: the JS bench reports jsColorEngine vs lcms-wasm (LittleCMS compiled to wasm32 via Emscripten). This tool reports against the real, native, autotools-built lcms2 as it ships in libjpeg, Pillow, Photoshop's bundled CMS, and every Linux desktop that uses ICC profiles — the actual comparison people reach for when they ask "is this really faster than lcms?".

Methodology match. Every knob that matters is identical between this tool and bench/lcms-comparison/bench.js: same 65536 pixels, same 300-iter warmup, same 5 × 100 timed batches, same seeded PRNG, same INTENT_RELATIVE_COLORIMETRIC, same 8-bit everywhere. Two lcms2 flag variants are measured (default and cmsFLAGS_HIGHRESPRECALC) exactly as the JS bench does.

What it measures

#	Workflow	Profiles
1	RGB → Lab	sRGB (virtual) → LabD50 (virtual)
2	RGB → CMYK	sRGB (virtual) → GRACoL2006_Coated1v2.icc
3	CMYK → RGB	GRACoL2006_Coated1v2.icc → sRGB (virtual)
4	CMYK → CMYK	GRACoL2006_Coated1v2.icc → GRACoL2006_Coated1v2.icc

Same workflow set as the browser bench and the JS comparison — so you can drop jsColorEngine numbers, lcms-wasm numbers, and native-lcms numbers into a single row-per-workflow table.

One-time setup — fetch lcms2 source

The repo ships the bench glue (bench_lcms.c, Makefile, README.md, fetch scripts) but not the ~11 MB upstream lcms2 source tree. Download it once with the included script:

cd bench/lcms_c
./fetch-lcms2.sh                  # default: lcms2-2.18
./fetch-lcms2.sh 2.17             # or pin a specific version

Windows PowerShell equivalent:

cd bench\lcms_c
.\fetch-lcms2.ps1

This drops lcms2-2.18/ next to the Makefile. Re-run when you want to upgrade the lcms2 baseline.

Build — WSL2 (recommended on Windows)

# One-time: install a C toolchain
sudo apt update
sudo apt install -y build-essential

# From the repo root, go into this folder (and fetch lcms2 once,
# if you haven't already):
cd bench/lcms_c
./fetch-lcms2.sh
make

# Run the bench:
./bench_lcms

That's it. The Makefile compiles all ~27 lcms2 source files directly into the benchmark binary — no autotools, no system liblcms2-dev, no configure step. A clean build takes ~10-15s on a modern laptop.

Build — Linux / macOS

Same as WSL2. On macOS, Xcode Command Line Tools (xcode-select --install) gives you clang + make; the Makefile picks them up via CC=cc or CC=clang.

Build — native Windows (MinGW-w64 / MSYS2)

Yes, there's gcc for Windows — MinGW-w64 via MSYS2. Slightly more setup than WSL2 but produces a native .exe with no Linux layer:

Install MSYS2.
Open the MSYS2 MinGW64 shell and install the toolchain:
```
pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-make
```

cd to bench/lcms_c/ and:

./fetch-lcms2.sh     # or PowerShell: .\fetch-lcms2.ps1
mingw32-make
./bench_lcms.exe

The compiler + the flags are equivalent to the WSL2 build; numbers should agree within noise. WSL2 is simpler if you just want the numbers; MinGW-w64 is the right choice if you want to bundle the bench into a Windows CI pipeline.

Example output

==============================================================
 jsColorEngine companion — native lcms2 MPx/s
==============================================================
 pixels per iter  : 65536
 batches x iters  : 5 x 100
 warmup           : 300 iters
 profile          : ../../__tests__/GRACoL2006_Coated1v2.icc
 lcms2 version    : 2180
 compiler         : gcc 11.4.0
 arch             : x86_64

--------------------------------------------------------------
 RGB  -> Lab    (sRGB    -> LabD50)
--------------------------------------------------------------
  flags = 0                    :    XX.X MPx/s   (Y.YY ms/iter)
  HIGHRESPRECALC               :    XX.X MPx/s   (Y.YY ms/iter)   (default vs highres max diff: N LSB)

... (3 more workflows) ...

==============================================================
 SUMMARY — Mpx/s (higher is better)
==============================================================
  workflow                          lcms-def  lcms-hi
  --------------------------------  --------  --------
  RGB  -> Lab    (sRGB    -> LabD50)    XX.X M    XX.X M
  RGB  -> CMYK   (sRGB    -> GRACoL)    XX.X M    XX.X M
  CMYK -> RGB    (GRACoL  -> sRGB)      XX.X M    XX.X M
  CMYK -> CMYK   (GRACoL  -> GRACoL)    XX.X M    XX.X M

Markdown:
| Workflow | lcms2 native default | lcms2 native HIGHRESPRECALC |
...

Real numbers depend on CPU — run it yourself and drop them into docs/Performance.md §4 "How does this compare to LittleCMS in C?" to replace the current wasm × 1.5–2.5 estimate with measured values.

Targets

Make target	What it does
`make` (default)	Build `./bench_lcms` with `-O3 -DNDEBUG -march=native`
`make run`	Build + run with default profile path
`make steelman`	Rebuild with `-ffast-math -funroll-loops -flto` on top of the release flags (see below)
`make debug`	Rebuild with `-O0 -g` for `gdb` / `valgrind`
`make clean`	Delete binary + all `.o` files

`make steelman` — native lcms2 ceiling

The default build already matches lcms2's own release autotools build (-O3 -march=native -fno-strict-aliasing). The steelman target is for answering the honest question "how fast can native lcms actually go?" — it turns on the three aggressive-but-legit compiler flags on top of release:

Flag	What it does	Cost
`-ffast-math`	Relax strict IEEE 754 — allow FMA contraction, reassociation, assume no NaN/Inf, reciprocal-instead-of-div. Enables a lot more auto-vectorization.	Results can shift by ~1 LSB vs strict-IEEE build. ΔE still well under 1 against the reference LUT.
`-funroll-loops`	Aggressive unrolling of the tetrahedral / matrix inner loops.	Larger binary (~+5-10%).
`-flto`	Link-time optimization — lets the compiler inline across lcms2 translation units (e.g. inline `cmsGetTransform`'s hot leaf fn into the per-pixel dispatcher).	Longer link time (~5-10s extra), nothing at runtime.

cd bench/lcms_c
make steelman           # fully rebuilds
./bench_lcms            # or: taskset -c 0 ./bench_lcms

To go back to the reference release build:

make clean && make

-ffast-math is a free GCC/Clang flag (no license). PGO (-fprofile-generate / -fprofile-use) is the next step up and would give another ~5-15%, but requires a two-pass build + sample workload. Not wired in yet — worth doing if someone wants to push the ceiling further.

On SIMD: -O3 -march=native on any modern x86_64 CPU already enables SSE4.2 / AVX / AVX2 / FMA for the compiler's auto-vectorizer. lcms2 also has explicit SSE2 intrinsics in cmsOptimization.c (unless you build with -DCMS_DONT_USE_SSE2, which we don't). No hand-written AVX paths — those come entirely from the compiler.

Options at runtime

# Override the profile path:
./bench_lcms /path/to/some_other.icc

# Pin to a single core for cleaner numbers on hybrid CPUs
# (P-core on Intel 12th-gen+, not strictly required but stabilises
# the median):
taskset -c 0 ./bench_lcms

What's deliberately not configurable (yet)

16-bit path. We pin TYPE_*_8 on both sides (and in the JS bench) for apples-to-apples with the engine's hot path. A 16-bit variant would be a second build target, useful but out of scope for v1.2 close-out. Add an 8-vs-16 switch when the 16-bit kernels land in jsColorEngine v1.3.
Fast-float plugin. lcms2's fast_float plugin lives in a separate repo (Little-CMS/Fast-Float-Plugin) and uses SSE2 + 8-bit specialisations internally. If we ever want to claim parity against "the fastest lcms setup", that's the thing to link. For now, plain lcms2 is the honest baseline — it's what ships in the distro packages everyone actually installs.
Rendering intents other than relative colorimetric. Matches the rest of the bench suite. Add a CLI flag if anyone needs it.

Relationship to the rest of the bench suite

bench/lcms-comparison/ — jsColorEngine vs lcms-wasm head-to-head (Node, JS only, no native C toolchain needed).
bench/lcms_c/ (this folder) — native lcms2 baseline. Requires a C compiler; produces the wasm × ?× number that Performance.md §4 currently estimates.
samples/bench/ — every jsColorEngine lutMode + lcms-wasm side-by-side in the browser. Same four workflows.
bench/mpx_summary.js — jsColorEngine alone, node, used as the authoritative source for the README speed tables.

Licence

Only the build glue (Makefile, bench_lcms.c, this README.md) is part of jsColorEngine. The lcms2-2.18/ subtree is a vendored copy of upstream LittleCMS 2.18 (MIT), included under its own LICENSE — we redistribute but do not modify.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

`bench/lcms_c/` — native lcms2 benchmark

What it measures

One-time setup — fetch lcms2 source

Build — WSL2 (recommended on Windows)

Build — Linux / macOS

Build — native Windows (MinGW-w64 / MSYS2)

Example output

Targets

`make steelman` — native lcms2 ceiling

Options at runtime

What's deliberately not configurable (yet)

Relationship to the rest of the bench suite

Licence

FilesExpand file tree

lcms_c

Directory actions

More options

Directory actions

More options

Latest commit

History

lcms_c

Folders and files

parent directory

README.md

bench/lcms_c/ — native lcms2 benchmark

What it measures

One-time setup — fetch lcms2 source

Build — WSL2 (recommended on Windows)

Build — Linux / macOS

Build — native Windows (MinGW-w64 / MSYS2)

Example output

Targets

make steelman — native lcms2 ceiling

Options at runtime

What's deliberately not configurable (yet)

Relationship to the rest of the bench suite

Licence

`bench/lcms_c/` — native lcms2 benchmark

`make steelman` — native lcms2 ceiling