|
| 1 | +# ADR 001: CRAM Parsing Optimization |
| 2 | + |
| 3 | +**Status:** Decided — no action taken |
| 4 | +**Date:** 2026-04-26 |
| 5 | + |
| 6 | +## Context |
| 7 | + |
| 8 | +Profiled two representative workloads to identify parsing bottlenecks: |
| 9 | + |
| 10 | +- **SRR396637** (Illumina short reads): 54,695 records, 181ms p50, 301K records/sec |
| 11 | +- **HG002** (ONT long reads): 37 records, 53ms p50, 701 records/sec |
| 12 | + |
| 13 | +``` |
| 14 | +Short reads Long reads |
| 15 | +19.7% GC 18.3% decodeRecord |
| 16 | +15.9% decodeRecord 15.9% wasm-function[61] |
| 17 | +10.3% wasm-function[61] 15.6% GC |
| 18 | + 7.9% _fetchRecords 11.8% decodeReadFeatures |
| 19 | + 4.0% decodeLatin1 8.0% addReferenceSequence |
| 20 | +``` |
| 21 | + |
| 22 | +GC dominates short reads (allocation pressure from ~55K records). `decodeReadFeatures` dominates long reads (hundreds–thousands of `{code, pos, refPos, data}` objects per read). |
| 23 | + |
| 24 | +## Options Considered |
| 25 | + |
| 26 | +**Lazy tag parsing** |
| 27 | + |
| 28 | +Store raw codec bytes on the record; defer `parseTagData` calls until `record.tags` is first accessed. The `decodeTags: false` infrastructure already exists as a partial hint. |
| 29 | + |
| 30 | +- API concern: `tags` is a plain writable public field. Converting to a getter is technically a breaking change (silent failure on assignment), though no known consumer mutates it. |
| 31 | +- Actual access patterns in `jbrowse-components/plugins/alignments/src/CramAdapter`: `feature.get('tags')` is called for every record on every render to check the SA tag (`extractFeatureArrays.ts:69`) and the MM tag (`processFeatureAlignments.ts:194`). This forces a full tags parse for every record regardless of laziness — making whole-object laziness a no-op. |
| 32 | +- Per-tag laziness (Proxy) could help but the spread `{ ...this.record.tags, RG }` in `CramSlightlyLazyFeature.ts:119` would materialize all values anyway, and Proxy overhead is non-trivial. |
| 33 | + |
| 34 | +**Flat ReadFeature typed arrays** |
| 35 | + |
| 36 | +Replace `ReadFeature[]` (array of objects) with parallel typed arrays (`codes: Uint8Array`, `positions: Int32Array`, etc.) to eliminate per-feature object allocation in long reads. |
| 37 | + |
| 38 | +- `readFeatures?: ReadFeature[]` is a public field on the exported `CramRecord`. This is a breaking API change. |
| 39 | + |
| 40 | +**WASM feature decode** |
| 41 | + |
| 42 | +Move the `decodeReadFeatures` loop into WASM, outputting typed arrays directly. No API surface change — WASM is an internal implementation detail. Would address both the 11.8% `decodeReadFeatures` cost and a large fraction of the 15.6% GC in long reads. |
| 43 | + |
| 44 | +- High implementation effort: requires writing the decode loop in C, defining a typed-array ABI, and wiring into the existing codec infrastructure. |
| 45 | +- No concrete performance complaint driving it. |
| 46 | + |
| 47 | +**Bulk read name decode** |
| 48 | + |
| 49 | +Decode all read names in a block with a single TextDecoder call rather than N individual calls. No API change, low effort, ~3% potential savings for short reads. |
| 50 | + |
| 51 | +## Decision |
| 52 | + |
| 53 | +**No optimizations pursued.** |
| 54 | + |
| 55 | +Reasons: |
| 56 | + |
| 57 | +- The short-read GC problem (19.7%) has no clean solution — it comes from allocating ~55K `CramRecord` objects per fetch, which is inherent to the workload. Object pooling would require callers to cooperate (signal when records can be released), a much larger API change than any of the above. |
| 58 | +- Lazy tag parsing is moot because tags are accessed for every record in the render path (SA and MM checks). |
| 59 | +- WASM feature decode is high effort with no concrete pain driving it. |
| 60 | +- The primary consumer (JBrowse) is refactoring from an HTML5 canvas system that re-decoded on every frame to a WebGL/WebGPU pipeline. The new architecture amortizes parse cost across renders, substantially reducing the pressure that motivated this investigation. |
| 61 | + |
| 62 | +## Revisit If |
| 63 | + |
| 64 | +- A user reports a specific slow-load region (e.g. "loading this 50K-read window takes 3 seconds"). |
| 65 | +- Long-read ONT rendering becomes a more prominent use case. |
| 66 | +- The WebGL/WebGPU refactor lands and a new profile reveals a different bottleneck. |
0 commit comments