Memory footprint analysis

# ApiDOM Memory Footprint Analysis

## Test Subject

- **File**: GitHub REST API OpenAPI description (`api.github.com.2022-11-28.deref.json`)
- **Format**: OpenAPI 3.1.0, fully dereferenced
- **Size**: 73.4 MB (JSON), 54.2 MB (YAML equivalent)
- **Node.js**: v24.10.0, V8

## Pipeline Overview

### JSON path (strict mode)
```
JSON string → JSON.parse (POJO) → baseRefract (Generic ApiDOM) → refractOpenApi3_1 (Semantic ApiDOM)
```

### YAML path (tree-sitter)
```
YAML string → tree-sitter CST → CSTTransformer → YAML AST → YAMLASTTransformer → Generic ApiDOM → refractOpenApi3_1 (Semantic ApiDOM)
```

CSTTransformer and YAMLASTTransformer run serially — each stage is consumed before the next produces output. CST and YAML AST do not accumulate in memory simultaneously.

## Measurement Results

### JSON Pipeline (73.4 MB file)

| Stage | Incremental Cost | Cumulative Heap | Multiplier vs String |
|---|---|---|---|
| Raw string | +147 MB | 147 MB | 2.0x |
| JSON.parse (POJO) | +52 MB | 199 MB | 2.7x |
| baseRefract (generic ApiDOM) | +408 MB | 607 MB | 8.3x |
| refractOpenApi3_1 (semantic ApiDOM) | +2,064 MB | 2,671 MB | 36.4x |

Note: the semantic refraction delta (+2,064 MB) includes an internal `baseRefract` call (~408 MB for a second generic tree) plus the semantic tree construction.

**Realistic single-call to `refractOpenApi3_1`**: ~2,278 MB total, **~31x** the string.

### YAML Pipeline (0.73 MB file, 30 paths subset)

| Stage | No Source Maps | With Source Maps |
|---|---|---|
| CST (tree-sitter) | ~0 MB | ~0 MB |
| Generic ApiDOM | +6.5 MB | +6.2 MB |
| Semantic ApiDOM | +21.0 MB | +20.4 MB |
| **Total** | **27.2 MB (37x)** | **26.6 MB (36x)** |

**tree-sitter-yaml limitation**: max 32,768 lines per file (see [tree-sitter-yaml#35](https://github.com/tree-sitter-grammars/tree-sitter-yaml/pull/35)).

### Key Observations

- **CST cost is negligible** — tree-sitter allocates in native/WASM memory, not on the V8 heap.
- **Source maps add almost nothing** — the 6 number fields (`startLine` through `endOffset`) per element are cheap.
- **Semantic refraction dominates** — ~20-21 MB for the same data regardless of JSON or YAML origin.
- **Multiplier vs string differs by format** — YAML shows ~37x, JSON shows ~59x for the same data, because YAML strings are larger (more whitespace) for identical content.

## Element Counts (73.4 MB file)

| Metric | Generic Tree | Semantic Tree |
|---|---|---|
| Total elements | 3,003,457 | 3,003,457 |
| MemberElements | 888,408 | 888,408 |
| Meta materialized | **0** | **1,491,815** |
| Attributes materialized | 0 | 0 |

The element counts are identical. The difference is entirely in **meta materialization**.

## Root Cause: Meta Materialization

### How meta materialization works

Each Element has a lazily-initialized `_meta` property. When code accesses `.classes`, `.meta.set()`, or similar, it triggers creation of a full ObjectElement tree:

```
_meta = ObjectElement                  (~80 bytes)
  └── MemberElement                    (~80 bytes)
       └── KeyValuePair               (~32 bytes)
            ├── StringElement (key)    (~80 bytes)
            └── ArrayElement (value)   (~80 bytes)
               └── StringElement       (~80 bytes)
```

**~6 Element objects (~430 bytes) per materialization.**

### Top meta materialization sources

| Source | Elements Affected | Savings if Removed |
|---|---|---|
| `FixedFieldsVisitor`: `newMemberElement.classes.push('fixed-field')` | ~591K | -568 MB |
| `PatternedFieldsVisitor`: `newMemberElement.classes.push('patterned-field')` | (included above) | (included above) |
| `JSONSchemaVisitor.handleDialectIdentifier`: `this.element.meta.set('inheritedDialectIdentifier', ...)` | ~251K | -272 MB |
| `JSONSchemaVisitor.handleSchemaIdentifier`: `this.element.meta.set('ancestorsSchemaIdentifiers', ...)` | (included above) | (included above) |
| `copyMetaAndAttributes` propagation | ~N/A | -149 MB |
| Other visitors (`specification-extension`, `content`, `parameters`, `reference-element`, `path-template`, etc.) | ~649K remaining | ~200 MB |

### Meta materialization by element type (semantic tree)

| Element Type | Count |
|---|---|
| schema | 251,530 |
| member | 233,163 |
| string | 224,913 |
| array | 159,347 |
| object | 22,135 |
| response | 3,171 |
| mediaType | 2,992 |
| other | 3,317 |
| **Total** | **900,537** (after disabling fixed/patterned field classes) |

## Memory Breakdown (73.4 MB file, original unpatched)

| Component | MB | % of Total |
|---|---|---|
| Raw string | 147 | 5% |
| POJO (JSON.parse) | 52 | 2% |
| Generic tree (from explicit baseRefract call) | 408 | 15% |
| Generic tree (internal to refractOpenApi3_1) | ~408 | 15% |
| Semantic tree (deep copy of elements) | ~408 | 15% |
| Meta materialization (~9M extra Element objects) | ~989 | 37% |
| Visitor instances + traversal overhead | ~274 | 10% |
| **Total** | **~2,686** | **100%** |

**Meta materialization accounts for 37% of total memory.**

## Why `cloneDeep` Is Not the Problem

Making `cloneDeep` a no-op (identity function) saved only ~1 MB. This is because:

- With `cloneDeep` active: originals are created, clones are created, originals are GC'd after refraction. Final state: clones.
- With `cloneDeep` as no-op: originals are shared into the semantic tree. No clones created, but originals can't be GC'd (still referenced). Final state: originals.

Same number of live objects in both cases. `cloneDeep` determines **which** objects survive, not **how many**.

## Why Generic ApiDOM Is ~8x the POJO

Every JSON object property `"key": value` creates 4 heap objects:

```
MemberElement       (~80 bytes)
  KeyValuePair      (~32 bytes)
    StringElement   (~80 bytes)  ← key
    StringElement   (~80 bytes)  ← value
```

vs. a POJO property: one hidden class slot (~8 bytes) + string pointer.

This 8x overhead is reasonable for a fully-typed element tree where every node is individually addressable and can carry metadata and source positions.

## Optimization Opportunities

### 1. Cheap classes storage (highest impact, no mutation required)

**Savings: ~989 MB (37% of total)**

Replace the full ObjectElement-based meta materialization for classes with a lightweight storage mechanism (bitfield, Set, or simple array) directly on the Element instance:

```js
// instead of (creates ~6 Element objects):
newMemberElement.classes.push('fixed-field');

// use a lightweight property:
newMemberElement._classes = FIXED_FIELD_BIT;
// or
newMemberElement._classList = ['fixed-field'];
```

The same principle applies to all meta usage — `meta` currently uses the full Element tree to store what could be a simple key-value lookup.

### 2. Lightweight schema metadata (no mutation required)

**Savings: ~272 MB**

`inheritedDialectIdentifier` and `ancestorsSchemaIdentifiers` are stored via `meta.set()` on every Schema element, creating ~12 Element objects per schema. These could use a plain Map or direct properties while keeping the same self-contained design (schemas work in isolation without walking the parent chain).

### 3. Progressive erasure with mutable opt-in (requires controlled mutation)

**Savings: ~408 MB peak memory**

As the visitor processes each generic node and creates the semantic equivalent, null out the generic node so GC can reclaim it:

```
Before refraction:   [generic: 100%] [semantic: 0%]    = 1x
Mid refraction:      [generic: 50%]  [semantic: 50%]   = 1x
After refraction:    [generic: 0%]   [semantic: 100%]  = 1x
```

This eliminates the 2x peak from holding both trees simultaneously. Combined with skipping `cloneDeep` (move semantics instead of copy), the generic tree's elements are transferred to the semantic tree rather than duplicated.

**Design**: `refract*()` functions stay immutable by default. Parser adapters opt into mutable/consuming mode since they own the generic tree and know nobody else holds a reference:

```js
// public API: immutable (safe for direct callers)
refractOpenApi3_1(genericTree);

// parser adapter internals: mutable (memory efficient)
refractOpenApi3_1(result, { consume: true });
```

This extends the same serial consume-and-discard pattern already used by `CSTTransformer` and `YAMLASTTransformer` one step further to the generic → semantic boundary.

### Combined impact estimate (73.4 MB file)

| Configuration | Total Heap | Multiplier |
|---|---|---|
| Original (no optimizations) | ~2,278 MB | ~31x |
| + Cheap classes storage | ~1,289 MB | ~17.6x |
| + Lightweight schema metadata | ~1,017 MB | ~13.9x |
| + Progressive erasure (consume mode) | ~609 MB | ~8.3x |
| Theoretical floor (semantic tree only) | ~440 MB | ~6x |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory footprint analysis #108

ApiDOM Memory Footprint Analysis

Test Subject

Pipeline Overview

JSON path (strict mode)

YAML path (tree-sitter)

Measurement Results

JSON Pipeline (73.4 MB file)

YAML Pipeline (0.73 MB file, 30 paths subset)

Key Observations

Element Counts (73.4 MB file)

Root Cause: Meta Materialization

How meta materialization works

Top meta materialization sources

Meta materialization by element type (semantic tree)

Memory Breakdown (73.4 MB file, original unpatched)

Why `cloneDeep` Is Not the Problem

Why Generic ApiDOM Is ~8x the POJO

Optimization Opportunities

1. Cheap classes storage (highest impact, no mutation required)

2. Lightweight schema metadata (no mutation required)

3. Progressive erasure with mutable opt-in (requires controlled mutation)

Combined impact estimate (73.4 MB file)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stage	Incremental Cost	Cumulative Heap	Multiplier vs String
Raw string	+147 MB	147 MB	2.0x
JSON.parse (POJO)	+52 MB	199 MB	2.7x
baseRefract (generic ApiDOM)	+408 MB	607 MB	8.3x
refractOpenApi3_1 (semantic ApiDOM)	+2,064 MB	2,671 MB	36.4x

Stage	No Source Maps	With Source Maps
CST (tree-sitter)	~0 MB	~0 MB
Generic ApiDOM	+6.5 MB	+6.2 MB
Semantic ApiDOM	+21.0 MB	+20.4 MB
Total	27.2 MB (37x)	26.6 MB (36x)

Metric	Generic Tree	Semantic Tree
Total elements	3,003,457	3,003,457
MemberElements	888,408	888,408
Meta materialized	0	1,491,815
Attributes materialized	0	0

Source	Elements Affected	Savings if Removed
`FixedFieldsVisitor`: `newMemberElement.classes.push('fixed-field')`	~591K	-568 MB
`PatternedFieldsVisitor`: `newMemberElement.classes.push('patterned-field')`	(included above)	(included above)
`JSONSchemaVisitor.handleDialectIdentifier`: `this.element.meta.set('inheritedDialectIdentifier', ...)`	~251K	-272 MB
`JSONSchemaVisitor.handleSchemaIdentifier`: `this.element.meta.set('ancestorsSchemaIdentifiers', ...)`	(included above)	(included above)
`copyMetaAndAttributes` propagation	~N/A	-149 MB
Other visitors (`specification-extension`, `content`, `parameters`, `reference-element`, `path-template`, etc.)	~649K remaining	~200 MB

Element Type	Count
schema	251,530
member	233,163
string	224,913
array	159,347
object	22,135
response	3,171
mediaType	2,992
other	3,317
Total	900,537 (after disabling fixed/patterned field classes)

Component	MB	% of Total
Raw string	147	5%
POJO (JSON.parse)	52	2%
Generic tree (from explicit baseRefract call)	408	15%
Generic tree (internal to refractOpenApi3_1)	~408	15%
Semantic tree (deep copy of elements)	~408	15%
Meta materialization (~9M extra Element objects)	~989	37%
Visitor instances + traversal overhead	~274	10%
Total	~2,686	100%

Configuration	Total Heap	Multiplier
Original (no optimizations)	~2,278 MB	~31x
+ Cheap classes storage	~1,289 MB	~17.6x
+ Lightweight schema metadata	~1,017 MB	~13.9x
+ Progressive erasure (consume mode)	~609 MB	~8.3x
Theoretical floor (semantic tree only)	~440 MB	~6x

Memory footprint analysis #108

Description

ApiDOM Memory Footprint Analysis

Test Subject

Pipeline Overview

JSON path (strict mode)

YAML path (tree-sitter)

Measurement Results

JSON Pipeline (73.4 MB file)

YAML Pipeline (0.73 MB file, 30 paths subset)

Key Observations

Element Counts (73.4 MB file)

Root Cause: Meta Materialization

How meta materialization works

Top meta materialization sources

Meta materialization by element type (semantic tree)

Memory Breakdown (73.4 MB file, original unpatched)

Why cloneDeep Is Not the Problem

Why Generic ApiDOM Is ~8x the POJO

Optimization Opportunities

1. Cheap classes storage (highest impact, no mutation required)

2. Lightweight schema metadata (no mutation required)

3. Progressive erasure with mutable opt-in (requires controlled mutation)

Combined impact estimate (73.4 MB file)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Why `cloneDeep` Is Not the Problem