Skip to content

Commit b9a5dfd

Browse files
committed
Updates
1 parent f4b1cd4 commit b9a5dfd

17 files changed

Lines changed: 914 additions & 766 deletions
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# ADR 001: CRAM Parsing Optimization
2+
3+
**Status:** Decided — no action taken
4+
**Date:** 2026-04-26
5+
6+
## Context
7+
8+
Profiled two representative workloads to identify parsing bottlenecks:
9+
10+
- **SRR396637** (Illumina short reads): 54,695 records, 181ms p50, 301K records/sec
11+
- **HG002** (ONT long reads): 37 records, 53ms p50, 701 records/sec
12+
13+
```
14+
Short reads Long reads
15+
19.7% GC 18.3% decodeRecord
16+
15.9% decodeRecord 15.9% wasm-function[61]
17+
10.3% wasm-function[61] 15.6% GC
18+
7.9% _fetchRecords 11.8% decodeReadFeatures
19+
4.0% decodeLatin1 8.0% addReferenceSequence
20+
```
21+
22+
GC dominates short reads (allocation pressure from ~55K records). `decodeReadFeatures` dominates long reads (hundreds–thousands of `{code, pos, refPos, data}` objects per read).
23+
24+
## Options Considered
25+
26+
**Lazy tag parsing**
27+
28+
Store raw codec bytes on the record; defer `parseTagData` calls until `record.tags` is first accessed. The `decodeTags: false` infrastructure already exists as a partial hint.
29+
30+
- API concern: `tags` is a plain writable public field. Converting to a getter is technically a breaking change (silent failure on assignment), though no known consumer mutates it.
31+
- Actual access patterns in `jbrowse-components/plugins/alignments/src/CramAdapter`: `feature.get('tags')` is called for every record on every render to check the SA tag (`extractFeatureArrays.ts:69`) and the MM tag (`processFeatureAlignments.ts:194`). This forces a full tags parse for every record regardless of laziness — making whole-object laziness a no-op.
32+
- Per-tag laziness (Proxy) could help but the spread `{ ...this.record.tags, RG }` in `CramSlightlyLazyFeature.ts:119` would materialize all values anyway, and Proxy overhead is non-trivial.
33+
34+
**Flat ReadFeature typed arrays**
35+
36+
Replace `ReadFeature[]` (array of objects) with parallel typed arrays (`codes: Uint8Array`, `positions: Int32Array`, etc.) to eliminate per-feature object allocation in long reads.
37+
38+
- `readFeatures?: ReadFeature[]` is a public field on the exported `CramRecord`. This is a breaking API change.
39+
40+
**WASM feature decode**
41+
42+
Move the `decodeReadFeatures` loop into WASM, outputting typed arrays directly. No API surface change — WASM is an internal implementation detail. Would address both the 11.8% `decodeReadFeatures` cost and a large fraction of the 15.6% GC in long reads.
43+
44+
- High implementation effort: requires writing the decode loop in C, defining a typed-array ABI, and wiring into the existing codec infrastructure.
45+
- No concrete performance complaint driving it.
46+
47+
**Bulk read name decode**
48+
49+
Decode all read names in a block with a single TextDecoder call rather than N individual calls. No API change, low effort, ~3% potential savings for short reads.
50+
51+
## Decision
52+
53+
**No optimizations pursued.**
54+
55+
Reasons:
56+
57+
- The short-read GC problem (19.7%) has no clean solution — it comes from allocating ~55K `CramRecord` objects per fetch, which is inherent to the workload. Object pooling would require callers to cooperate (signal when records can be released), a much larger API change than any of the above.
58+
- Lazy tag parsing is moot because tags are accessed for every record in the render path (SA and MM checks).
59+
- WASM feature decode is high effort with no concrete pain driving it.
60+
- The primary consumer (JBrowse) is refactoring from an HTML5 canvas system that re-decoded on every frame to a WebGL/WebGPU pipeline. The new architecture amortizes parse cost across renders, substantially reducing the pressure that motivated this investigation.
61+
62+
## Revisit If
63+
64+
- A user reports a specific slow-load region (e.g. "loading this 50K-read window takes 3 seconds").
65+
- Long-read ONT rendering becomes a more prominent use case.
66+
- The WebGL/WebGPU refactor lands and a new profile reveals a different bottleneck.

package.json

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
"profile:cpu": "node --expose-gc --experimental-strip-types scripts/profile-cpu.ts",
3636
"profile:analyze": "node --experimental-strip-types scripts/analyze-profile.ts",
3737
"profile:compare": "node --expose-gc --experimental-strip-types scripts/profile-compare.ts",
38+
"profile:large": "node --expose-gc --experimental-strip-types scripts/profile-large.ts",
3839
"lint": "eslint --report-unused-disable-directives --max-warnings 0",
3940
"format": "prettier --write .",
4041
"docs": "documentation readme --shallow src/indexedCramFile.ts --section=IndexedCramFile; documentation readme --shallow src/cramFile/file.ts --section=CramFile; documentation readme --shallow src/craiIndex.ts --section=CraiIndex; documentation readme --shallow src/cramFile/file.ts --section=CramFile; documentation readme --shallow src/cramFile/record.ts --section=CramRecord",
@@ -63,21 +64,21 @@
6364
},
6465
"devDependencies": {
6566
"@eslint/js": "^10.0.1",
66-
"@gmod/indexedfasta": "^5.0.3",
67+
"@gmod/indexedfasta": "^5.0.4",
6768
"@types/md5": "^2.3.6",
68-
"@vitest/coverage-v8": "^4.1.2",
69+
"@vitest/coverage-v8": "^4.1.5",
6970
"buffer": "^6.0.3",
7071
"documentation": "^14.0.3",
71-
"eslint": "^9.39.4",
72+
"eslint": "^10.2.1",
7273
"eslint-plugin-import": "^2.32.0",
7374
"eslint-plugin-unicorn": "^64.0.0",
7475
"mock-fs": "^5.5.0",
75-
"prettier": "^3.8.1",
76+
"prettier": "^3.8.3",
7677
"rimraf": "^6.1.3",
77-
"typescript": "^6.0.2",
78-
"typescript-eslint": "^8.57.2",
79-
"vitest": "^4.1.2",
80-
"webpack": "^5.105.4",
78+
"typescript": "^6.0.3",
79+
"typescript-eslint": "^8.59.0",
80+
"vitest": "^4.1.5",
81+
"webpack": "^5.106.2",
8182
"webpack-cli": "^7.0.2"
8283
},
8384
"publishConfig": {

0 commit comments

Comments
 (0)