Perf optimizations by cmdcolin · Pull Request #164 · GMOD/cram-js

cmdcolin · 2026-03-21T19:13:12Z

Modest performance improvements for CRAM long and short reads. It also investigated lazily parsing tags but didn't cause any speed improvement (yet...)

This PR is largely claude code generated

I also investigated whether multi-threading with sharedarraybuffer could help but it said probably not (yet...)

Key optimizations

Batch ITF8 pre-decoding via WASM — Pre-decodes variable-length integers in bulk from
external codec blocks, replacing per-call overhead with a single pass. Biggest win for long
reads where ExternalCodec.decode dominated CPU time.
Bound decoder closure caching — getCodecForDataSeries results are now cached and reused
in the hot decode loop instead of being re-resolved per record.
Eliminated intermediate object allocations — decodeRecord() now constructs CramRecord
directly instead of returning a temporary 17-property plain object that was immediately
destructured and GC'd. Also eliminated the mateToUse intermediate object. Removes ~81k
transient objects per 54k-record slice.
Node strip-types compatibility — Migrated from tsx to node --experimental-strip-types,
removed CommonJS shims.

Benchmark results (p50, 40 iterations)

   | File | Records | master | optimized | Speedup |
   |------|---------|--------|-----------|---------|
   | Short reads (2.5MB) | 54k | 281ms | 215ms | **1.30x** |
   | Long reads (1.5MB) | 1k | 91ms | 60ms | **1.52x** |
   | 400x short reads (14MB) | 800k | 7,521ms | 6,713ms | **1.12x** |
   | 400x long reads (70MB) | 2k | 4,287ms | 2,874ms | **1.49x** |

GC / memory impact

Heap per record: 902 → 854 bytes (~5% reduction in retained heap)
Eliminated ~81k transient intermediate objects per slice decode
Long reads benefit most (~1.5x) because ExternalCodec.decode was a larger fraction of CPU time

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two optimizations targeting ExternalCodec.decode, which was 10-28% of CPU: 1. Batch ITF8 pre-decode: Before the record decode loop, decode all ITF8 values from external int blocks into Int32Arrays in a tight loop. During record decoding, reading a pre-decoded int is just values[index++] instead of branchy ITF8 parsing with per-call cursor/block lookups. 2. Bound decode closures: For each data series, create a closure at slice setup time that captures the resolved content buffer and cursor directly. This eliminates per-call codec cache lookup, blocksByContentId Record lookup, cursors.externalBlocks.getCursor() Map lookup, and dataType branching. Also adds batch_itf8_decode to the htscodecs WASM module (C implementation) for potential future use, though the pure JS batch approach proved faster due to avoiding WASM memory copy overhead. Benchmarks (p50, 40 iterations): - Short reads (54k records): ~1.4x faster - Long reads (37 records): ~1.4-1.7x faster Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Have decodeRecord() construct CramRecord directly instead of returning a temporary plain object that gets immediately destructured and GC'd. Also eliminates the mateToUse intermediate object by building the mate record in its final shape. Removes ~81k transient objects per 54k-record slice decode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

# Conflicts: # eslint.config.mjs # package.json # scripts/analyze-profile.ts # scripts/bench-large.ts # scripts/profile-compare.ts # scripts/profile-cpu-branch.ts # scripts/profile-cpu.ts # src/craiIndex.ts # src/cramFile/codecs/byteArrayLength.ts # src/cramFile/codecs/external.ts # src/cramFile/container/index.ts # src/cramFile/file.ts # src/cramFile/record.ts # src/cramFile/sectionParsers.ts # src/cramFile/slice/decodeRecord.ts # src/cramFile/slice/index.ts # yarn.lock

Replaces the string-keyed decodeDataSeries indirection with a fixed-shape object literal holding all 28 data-series decoders. Hot call sites in decodeRecord and decodeReadFeatures become direct property accesses (bd.FC(), bd.BF()) so V8 inline-caches them. Read-feature schemas now hold pre-resolved decoder references rather than string keys, and the inner FC/FP loop fetches its decoders into locals. HuffmanIntCodec.buildCaches now no-ops on empty codeBooks instead of throwing RangeError on Math.max(...[]); this is required so the bd literal can call getCodecForDataSeries for every series eagerly without try/catch. ~22% faster on long-read decoding (decodeReadFeatures was 16% of CPU); modest gain on short reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Simplified exports and removed redundant types declarations - Standardized build scripts to use pnpm consistently - Added main field for backwards compatibility - Removed redundant module field - Standardized tsconfig with strict TypeScript and es2022 target - Fixed type errors and infrastructure issues where applicable Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Required to import JavaScript files generated by WASM build Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Modern fork with better performance and fewer dependencies. Updates eslint config to use import-x rules. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Better type safety for array/object access. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

- huffman: fix crash when inner loop reaches last code (bounds check was after array access); remove dead commented-out method; nest early-return in buildCaches into if block; use ?? -1 instead of ! for bitCodeToValue lookup; remove spurious inner braces in _decode - decodeRecord: fold lengthOnRef computation into decodeReadFeatures return value, eliminating the second pass over read features; fix push(...spread) in getAllMatedRecords; hoist duplicate `content` variable in bind(); extract decodeQualityScores/decodeReadBases helpers; use Uint8Array+decodeLatin1 in decodeReadBases fallback; remove dead RFFn alias; fix stale comment - index.ts: inline ByteArrayStopCodec decode in bind() fast path; deduplicate tag decoder subarray body via readTagLen closure; fix indentation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cmdcolin and others added 5 commits March 21, 2026 13:28

Performance optimizations and strip-types compatibility

bc85638

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add large file benchmark script

b55556a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cmdcolin force-pushed the perf-optimizations branch from 3a8257f to a61813c Compare April 27, 2026 16:54

cmdcolin and others added 5 commits April 27, 2026 13:26

Add allowJs: true to tsconfig for WASM imports

12ddec7

Required to import JavaScript files generated by WASM build Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Replace eslint-plugin-import with eslint-plugin-import-x

5edde42

Modern fork with better performance and fewer dependencies. Updates eslint config to use import-x rules. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Enable noUncheckedIndexedAccess: true in tsconfig

35bc443

Better type safety for array/object access. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

cmdcolin force-pushed the perf-optimizations branch 3 times, most recently from 0897349 to 39bf0ad Compare April 27, 2026 19:40

cmdcolin force-pushed the perf-optimizations branch from 39bf0ad to ebd003f Compare April 27, 2026 19:47

cmdcolin merged commit ecb7cb6 into main Apr 27, 2026
1 check passed

cmdcolin deleted the perf-optimizations branch April 27, 2026 19:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf optimizations#164

Perf optimizations#164
cmdcolin merged 11 commits intomainfrom
perf-optimizations

cmdcolin commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cmdcolin commented Mar 21, 2026

Key optimizations

Benchmark results (p50, 40 iterations)

GC / memory impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant