Cmake thin wrapper by IdansPort · Pull Request #31 · port-labs/jq-node-bindings

IdansPort · 2026-02-19T13:07:41Z

Performance Comparison: cmake-thin-wrapper vs main

Summary

This branch (cmake-thin-wrapper) replaces the node-gyp build system with CMake and introduces several optimizations:

CMake build with aggressive compiler optimizations (-O3, LTO)
Mainstream jq 1.8.1 (no custom fork required)
Thread-local LRU cache for compiled jq filters
Thin C++ NAPI wrapper
Worker pool for true async execution with timeout support

Benchmark Results

Micro-benchmark (10K iterations)

Sync Execution (`exec`)

Operation	main	cmake-thin-wrapper	Improvement
Small JSON (.foo)	364,701 ops/sec (3 μs)	835,146 ops/sec (1 μs)	2.3x faster
Small JSON (.)	348,666 ops/sec (3 μs)	746,652 ops/sec (1 μs)	2.1x faster
Medium JSON (100 users)	3,301 ops/sec (303 μs)	12,588 ops/sec (79 μs)	3.8x faster
Large JSON (1000 items)	84 ops/sec (11,867 μs)	319 ops/sec (3,136 μs)	3.8x faster

Async Execution (`execAsync`)

Operation	main	worker_threads	N-API async	Best
Small JSON (.foo)	89,137 ops/sec	60,893 ops/sec (16 μs)	102,703 ops/sec (10 μs)
N-API +69%
Medium JSON (100 users)	3,061 ops/sec	6,962 ops/sec (136 μs)	11,119 ops/sec (90 μs)
N-API +52%
Large JSON (1000 items)	80 ops/sec	239 ops/sec (3,775 μs)	335 ops/sec (2,982 μs)
N-API +26%
Async overhead	—	14 μs/op	9 μs/op	N-API -36%

Realistic Workload (108K complex queries on 27KB JSON)

Branch	Mode	Time	Ops/sec	vs main sync
main	sync	49.65s	2,175	baseline
main	async (batched)	14.34s	7,531	3.5x faster
cmake-thin-wrapper	sync	14.54s	7,428	3.4x faster
cmake-thin-wrapper	async (workers)	2.93s	36,860	17x faster
cmake-thin-wrapper-async	async (N-API)	4.03s	26,799	12x faster

Note: Worker threads excel on high-core machines (15+ threads). N-API async has lower per-op
overhead and performs better on typical 2-4 core machines.

Key insight: cmake-thin-wrapper async is 3x faster than main async (4.65s vs 14.34s)

Key Optimizations

CMake Build System

Replaced node-gyp with cmake-js
Enabled -O3, -funroll-loops, -ftree-vectorize
Link Time Optimization (LTO) for cross-module inlining
Native CPU targeting (-mcpu=native)

Mainstream jq 1.8.1

Uses official jq release (no custom fork)
Benefits from upstream performance improvements
Easier maintenance and updates

Thread-Local LRU Cache

Each libuv worker thread has its own cache (thread-safe)
Caches compiled jq programs (default: 100 entries per thread)
Eliminates recompilation overhead for repeated filters

Thin C++ Wrapper

Minimal N-API binding layer
Direct jq library calls without abstraction overhead

N-API Async Work for Async Execution

execAsync() uses N-API's native async work API
Runs on libuv's thread pool (4 threads by default)
~36% lower per-operation overhead vs worker_threads (9μs vs 14μs)
~70% faster throughput on small/medium workloads
Built-in timeout support (default: 30 seconds)
Non-blocking — doesn't freeze the event loop

Usage

Sync (fastest for single operations)

const jq = require('@port-labs/jq-node-bindings');
const result = jq.exec(data, '.foo.bar');

Async with Batching (fastest for high throughput)

const jq = require('@port-labs/jq-node-bindings');

const BATCH_SIZE = 1000;
for (let i = 0; i < queries.length; i += BATCH_SIZE) {
    const batch = queries.slice(i, i + BATCH_SIZE);
    await Promise.all(batch.map(q =>
        jq.execAsync(data, q, { throwOnError: true })
    ));
}

Test Environment

Platform: darwin (macOS)
Architecture: arm64 (Apple Silicon)
Node.js: v22.17.0
jq: 1.8.1 (mainstream release)
CPU: 12 cores

- Replace node-gyp/autoconf with CMake + cmake-js - Rewrite binding.cc: 727 → 230 lines (remove cache, mutexes, async executor) - execAsync is now a simple Promise wrapper (async is JS concept, not parallelism) - Add benchmark.js for performance testing Deleted: binding.gyp, deps/jq.gyp, configure, util/configure.js Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Cache: - Simple LRU cache for compiled jq filters (no mutex needed) - Default cache size of 100 filters - ~700k ops/sec for small JSON (was 1.2k without cache) Production optimizations: - -O3 maximum optimization level - -ffast-math aggressive floating-point optimizations - -funroll-loops, -ftree-vectorize (SIMD) - -fno-exceptions, -fno-rtti (smaller binary) - -march=native / -mcpu=native (CPU-specific) - LTO (Link Time Optimization) - Symbol stripping for smaller binary Performance vs original: - Small JSON: +74% (401k → 699k ops/sec) - Medium JSON: +250% / 3.5x (3.4k → 12k ops/sec) - Large JSON: +238% / 3.4x (92 → 311 ops/sec) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

IdansPort and others added 7 commits February 19, 2026 11:43

feat: use mainstream jq

c2d393b

drop risky -ffast_math as 1.81 gives enough performance boost

47a1c76

local thread cache

e672c37

Delete commit.md

b34a293

feat : use NAPI async

b95fa07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Cmake thin wrapper#31

Cmake thin wrapper#31
IdansPort wants to merge 7 commits intomainfrom
cmake-thin-wrapper

IdansPort commented Feb 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

IdansPort commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Comparison: cmake-thin-wrapper vs main

Summary

Benchmark Results

Micro-benchmark (10K iterations)

Sync Execution (exec)

Sync Execution (exec)

Async Execution (execAsync)

Realistic Workload (108K complex queries on 27KB JSON)

Key Optimizations

CMake Build System

Mainstream jq 1.8.1

Thread-Local LRU Cache

Thin C++ Wrapper

N-API Async Work for Async Execution

Usage

Sync (fastest for single operations)

Async with Batching (fastest for high throughput)

Test Environment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

IdansPort commented Feb 19, 2026 •

edited

Loading

Sync Execution (`exec`)

Sync Execution (`exec`)

Async Execution (`execAsync`)