GMOD
diff --git a/‎agent-docs/architectural-decision-record/001-cram-parsing-optimization.md‎
Lines changed: 66 additions & 0 deletions b/‎agent-docs/architectural-decision-record/001-cram-parsing-optimization.md‎
Lines changed: 66 additions & 0 deletions
diff --git a/‎package.json‎
Lines changed: 9 additions & 8 deletions b/‎package.json‎
Lines changed: 9 additions & 8 deletions
@@ -0,0 +1,66 @@
+# ADR 001: CRAM Parsing Optimization
+
+**Status:** Decided — no action taken  
+**Date:** 2026-04-26
+
+## Context
+
+Profiled two representative workloads to identify parsing bottlenecks:
+
+- **SRR396637** (Illumina short reads): 54,695 records, 181ms p50, 301K records/sec
+- **HG002** (ONT long reads): 37 records, 53ms p50, 701 records/sec
+
+```
+Short reads              Long reads
+19.7%  GC               18.3%  decodeRecord
+15.9%  decodeRecord      15.9%  wasm-function[61]
+10.3%  wasm-function[61] 15.6%  GC
+ 7.9%  _fetchRecords     11.8%  decodeReadFeatures
+ 4.0%  decodeLatin1       8.0%  addReferenceSequence
+```
+
+GC dominates short reads (allocation pressure from ~55K records). `decodeReadFeatures` dominates long reads (hundreds–thousands of `{code, pos, refPos, data}` objects per read).
+
+## Options Considered
+
+**Lazy tag parsing**
+
+Store raw codec bytes on the record; defer `parseTagData` calls until `record.tags` is first accessed. The `decodeTags: false` infrastructure already exists as a partial hint.
+
+- API concern: `tags` is a plain writable public field. Converting to a getter is technically a breaking change (silent failure on assignment), though no known consumer mutates it.
+- Actual access patterns in `jbrowse-components/plugins/alignments/src/CramAdapter`: `feature.get('tags')` is called for every record on every render to check the SA tag (`extractFeatureArrays.ts:69`) and the MM tag (`processFeatureAlignments.ts:194`). This forces a full tags parse for every record regardless of laziness — making whole-object laziness a no-op.
+- Per-tag laziness (Proxy) could help but the spread `{ ...this.record.tags, RG }` in `CramSlightlyLazyFeature.ts:119` would materialize all values anyway, and Proxy overhead is non-trivial.
+
+**Flat ReadFeature typed arrays**
+
+Replace `ReadFeature[]` (array of objects) with parallel typed arrays (`codes: Uint8Array`, `positions: Int32Array`, etc.) to eliminate per-feature object allocation in long reads.
+
+- `readFeatures?: ReadFeature[]` is a public field on the exported `CramRecord`. This is a breaking API change.
+
+**WASM feature decode**
+
+Move the `decodeReadFeatures` loop into WASM, outputting typed arrays directly. No API surface change — WASM is an internal implementation detail. Would address both the 11.8% `decodeReadFeatures` cost and a large fraction of the 15.6% GC in long reads.
+
+- High implementation effort: requires writing the decode loop in C, defining a typed-array ABI, and wiring into the existing codec infrastructure.
+- No concrete performance complaint driving it.
+
+**Bulk read name decode**
+
+Decode all read names in a block with a single TextDecoder call rather than N individual calls. No API change, low effort, ~3% potential savings for short reads.
+
+## Decision
+
+**No optimizations pursued.**
+
+Reasons:
+
+- The short-read GC problem (19.7%) has no clean solution — it comes from allocating ~55K `CramRecord` objects per fetch, which is inherent to the workload. Object pooling would require callers to cooperate (signal when records can be released), a much larger API change than any of the above.
+- Lazy tag parsing is moot because tags are accessed for every record in the render path (SA and MM checks).
+- WASM feature decode is high effort with no concrete pain driving it.
+- The primary consumer (JBrowse) is refactoring from an HTML5 canvas system that re-decoded on every frame to a WebGL/WebGPU pipeline. The new architecture amortizes parse cost across renders, substantially reducing the pressure that motivated this investigation.
+
+## Revisit If
+
+- A user reports a specific slow-load region (e.g. "loading this 50K-read window takes 3 seconds").
+- Long-read ONT rendering becomes a more prominent use case.
+- The WebGL/WebGPU refactor lands and a new profile reveals a different bottleneck.
@@ -35,6 +35,7 @@
     "profile:cpu": "node --expose-gc --experimental-strip-types scripts/profile-cpu.ts",
     "profile:analyze": "node --experimental-strip-types scripts/analyze-profile.ts",
     "profile:compare": "node --expose-gc --experimental-strip-types scripts/profile-compare.ts",
+    "profile:large": "node --expose-gc --experimental-strip-types scripts/profile-large.ts",
     "lint": "eslint --report-unused-disable-directives --max-warnings 0",
     "format": "prettier --write .",
     "docs": "documentation readme --shallow src/indexedCramFile.ts --section=IndexedCramFile; documentation readme --shallow src/cramFile/file.ts --section=CramFile; documentation readme --shallow src/craiIndex.ts --section=CraiIndex;  documentation readme --shallow src/cramFile/file.ts --section=CramFile; documentation readme --shallow src/cramFile/record.ts --section=CramRecord",
@@ -63,21 +64,21 @@
   },
   "devDependencies": {
     "@eslint/js": "^10.0.1",
-    "@gmod/indexedfasta": "^5.0.3",
+    "@gmod/indexedfasta": "^5.0.4",
     "@types/md5": "^2.3.6",
-    "@vitest/coverage-v8": "^4.1.2",
+    "@vitest/coverage-v8": "^4.1.5",
     "buffer": "^6.0.3",
     "documentation": "^14.0.3",
-    "eslint": "^9.39.4",
+    "eslint": "^10.2.1",
     "eslint-plugin-import": "^2.32.0",
     "eslint-plugin-unicorn": "^64.0.0",
     "mock-fs": "^5.5.0",
-    "prettier": "^3.8.1",
+    "prettier": "^3.8.3",
     "rimraf": "^6.1.3",
-    "typescript": "^6.0.2",
-    "typescript-eslint": "^8.57.2",
-    "vitest": "^4.1.2",
-    "webpack": "^5.105.4",
+    "typescript": "^6.0.3",
+    "typescript-eslint": "^8.59.0",
+    "vitest": "^4.1.5",
+    "webpack": "^5.106.2",
     "webpack-cli": "^7.0.2"
   },
   "publishConfig": {