Companion to bench/lcms-comparison/ that
measures native lcms2 (C, -O3 -march=native) on the exact
same 4 workflows, exact same pixel count, exact same seeded PRNG
input, and exact same timing loop (warmup + median-of-5 batches of
100 iters). The MPx/s numbers drop straight into the comparison
tables in ../../docs/Performance.md.
This is the missing row in the existing comparison story: the JS
bench reports jsColorEngine vs lcms-wasm (LittleCMS compiled to
wasm32 via Emscripten). This tool reports against the real,
native, autotools-built lcms2 as it ships in libjpeg, Pillow,
Photoshop's bundled CMS, and every Linux desktop that uses ICC
profiles — the actual comparison people reach for when they ask
"is this really faster than lcms?".
Methodology match. Every knob that matters is identical between this tool and
bench/lcms-comparison/bench.js: same65536pixels, same 300-iter warmup, same 5 × 100 timed batches, same seeded PRNG, sameINTENT_RELATIVE_COLORIMETRIC, same 8-bit everywhere. Two lcms2 flag variants are measured (default andcmsFLAGS_HIGHRESPRECALC) exactly as the JS bench does.
| # | Workflow | Profiles |
|---|---|---|
| 1 | RGB → Lab | sRGB (virtual) → LabD50 (virtual) |
| 2 | RGB → CMYK | sRGB (virtual) → GRACoL2006_Coated1v2.icc |
| 3 | CMYK → RGB | GRACoL2006_Coated1v2.icc → sRGB (virtual) |
| 4 | CMYK → CMYK | GRACoL2006_Coated1v2.icc → GRACoL2006_Coated1v2.icc |
Same workflow set as the browser bench and the JS comparison — so
you can drop jsColorEngine numbers, lcms-wasm numbers, and
native-lcms numbers into a single row-per-workflow table.
The repo ships the bench glue (bench_lcms.c, Makefile,
README.md, fetch scripts) but not the ~11 MB upstream lcms2
source tree. Download it once with the included script:
cd bench/lcms_c
./fetch-lcms2.sh # default: lcms2-2.18
./fetch-lcms2.sh 2.17 # or pin a specific versionWindows PowerShell equivalent:
cd bench\lcms_c
.\fetch-lcms2.ps1This drops lcms2-2.18/ next to the Makefile. Re-run when you
want to upgrade the lcms2 baseline.
# One-time: install a C toolchain
sudo apt update
sudo apt install -y build-essential
# From the repo root, go into this folder (and fetch lcms2 once,
# if you haven't already):
cd bench/lcms_c
./fetch-lcms2.sh
make
# Run the bench:
./bench_lcmsThat's it. The Makefile compiles all ~27 lcms2 source files
directly into the benchmark binary — no autotools, no system
liblcms2-dev, no configure step. A clean build takes ~10-15s
on a modern laptop.
Same as WSL2. On macOS, Xcode Command Line Tools (xcode-select --install) gives you clang + make; the Makefile picks them up
via CC=cc or CC=clang.
Yes, there's gcc for Windows — MinGW-w64 via MSYS2. Slightly
more setup than WSL2 but produces a native .exe with no Linux
layer:
- Install MSYS2.
- Open the MSYS2 MinGW64 shell and install the toolchain:
pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-make cdtobench/lcms_c/and:./fetch-lcms2.sh # or PowerShell: .\fetch-lcms2.ps1 mingw32-make ./bench_lcms.exe
The compiler + the flags are equivalent to the WSL2 build; numbers should agree within noise. WSL2 is simpler if you just want the numbers; MinGW-w64 is the right choice if you want to bundle the bench into a Windows CI pipeline.
==============================================================
jsColorEngine companion — native lcms2 MPx/s
==============================================================
pixels per iter : 65536
batches x iters : 5 x 100
warmup : 300 iters
profile : ../../__tests__/GRACoL2006_Coated1v2.icc
lcms2 version : 2180
compiler : gcc 11.4.0
arch : x86_64
--------------------------------------------------------------
RGB -> Lab (sRGB -> LabD50)
--------------------------------------------------------------
flags = 0 : XX.X MPx/s (Y.YY ms/iter)
HIGHRESPRECALC : XX.X MPx/s (Y.YY ms/iter) (default vs highres max diff: N LSB)
... (3 more workflows) ...
==============================================================
SUMMARY — Mpx/s (higher is better)
==============================================================
workflow lcms-def lcms-hi
-------------------------------- -------- --------
RGB -> Lab (sRGB -> LabD50) XX.X M XX.X M
RGB -> CMYK (sRGB -> GRACoL) XX.X M XX.X M
CMYK -> RGB (GRACoL -> sRGB) XX.X M XX.X M
CMYK -> CMYK (GRACoL -> GRACoL) XX.X M XX.X M
Markdown:
| Workflow | lcms2 native default | lcms2 native HIGHRESPRECALC |
...
Real numbers depend on CPU — run it yourself and drop them into
docs/Performance.md §4 "How does this compare to LittleCMS in C?"
to replace the current wasm × 1.5–2.5 estimate with measured
values.
| Make target | What it does |
|---|---|
make (default) |
Build ./bench_lcms with -O3 -DNDEBUG -march=native |
make run |
Build + run with default profile path |
make steelman |
Rebuild with -ffast-math -funroll-loops -flto on top of the release flags (see below) |
make debug |
Rebuild with -O0 -g for gdb / valgrind |
make clean |
Delete binary + all .o files |
The default build already matches lcms2's own release autotools
build (-O3 -march=native -fno-strict-aliasing). The steelman
target is for answering the honest question "how fast can native
lcms actually go?" — it turns on the three aggressive-but-legit
compiler flags on top of release:
| Flag | What it does | Cost |
|---|---|---|
-ffast-math |
Relax strict IEEE 754 — allow FMA contraction, reassociation, assume no NaN/Inf, reciprocal-instead-of-div. Enables a lot more auto-vectorization. | Results can shift by ~1 LSB vs strict-IEEE build. ΔE still well under 1 against the reference LUT. |
-funroll-loops |
Aggressive unrolling of the tetrahedral / matrix inner loops. | Larger binary (~+5-10%). |
-flto |
Link-time optimization — lets the compiler inline across lcms2 translation units (e.g. inline cmsGetTransform's hot leaf fn into the per-pixel dispatcher). |
Longer link time (~5-10s extra), nothing at runtime. |
cd bench/lcms_c
make steelman # fully rebuilds
./bench_lcms # or: taskset -c 0 ./bench_lcmsTo go back to the reference release build:
make clean && make-ffast-math is a free GCC/Clang flag (no license). PGO
(-fprofile-generate / -fprofile-use) is the next step up and
would give another ~5-15%, but requires a two-pass build + sample
workload. Not wired in yet — worth doing if someone wants to push
the ceiling further.
On SIMD:
-O3 -march=nativeon any modern x86_64 CPU already enables SSE4.2 / AVX / AVX2 / FMA for the compiler's auto-vectorizer. lcms2 also has explicit SSE2 intrinsics incmsOptimization.c(unless you build with-DCMS_DONT_USE_SSE2, which we don't). No hand-written AVX paths — those come entirely from the compiler.
# Override the profile path:
./bench_lcms /path/to/some_other.icc
# Pin to a single core for cleaner numbers on hybrid CPUs
# (P-core on Intel 12th-gen+, not strictly required but stabilises
# the median):
taskset -c 0 ./bench_lcms- 16-bit path. We pin
TYPE_*_8on both sides (and in the JS bench) for apples-to-apples with the engine's hot path. A 16-bit variant would be a second build target, useful but out of scope for v1.2 close-out. Add an 8-vs-16 switch when the 16-bit kernels land in jsColorEngine v1.3. - Fast-float plugin. lcms2's
fast_floatplugin lives in a separate repo (Little-CMS/Fast-Float-Plugin) and uses SSE2 + 8-bit specialisations internally. If we ever want to claim parity against "the fastest lcms setup", that's the thing to link. For now, plain lcms2 is the honest baseline — it's what ships in the distro packages everyone actually installs. - Rendering intents other than relative colorimetric. Matches the rest of the bench suite. Add a CLI flag if anyone needs it.
bench/lcms-comparison/— jsColorEngine vslcms-wasmhead-to-head (Node, JS only, no native C toolchain needed).bench/lcms_c/(this folder) — native lcms2 baseline. Requires a C compiler; produces thewasm × ?×number that Performance.md §4 currently estimates.samples/bench/— every jsColorEnginelutMode+lcms-wasmside-by-side in the browser. Same four workflows.bench/mpx_summary.js— jsColorEngine alone, node, used as the authoritative source for the README speed tables.
Only the build glue (Makefile, bench_lcms.c, this README.md)
is part of jsColorEngine. The lcms2-2.18/ subtree is a vendored
copy of upstream LittleCMS 2.18 (MIT), included under its own
LICENSE — we redistribute but do not modify.