Skip to content

Reduce indexer memory consumption by ~75% and startup time by ~20%#4016

Closed
shauns wants to merge 1 commit into
Shopify:mainfrom
shauns:reduce-memory-and-startup-costs
Closed

Reduce indexer memory consumption by ~75% and startup time by ~20%#4016
shauns wants to merge 1 commit into
Shopify:mainfrom
shauns:reduce-memory-and-startup-costs

Conversation

@shauns
Copy link
Copy Markdown

@shauns shauns commented Mar 18, 2026

.

A comprehensive set of memory and startup optimizations targeting the Ruby
indexer, measured against the Shopify core monorepo (~64K Ruby files,
~889K index entries). RSS after indexing dropped from ~2.2 GB to ~550 MB.

## Biggest wins

### Replace character-level PrefixTree with sorted-array binary search
The old trie created one node per character of every fully qualified name,
resulting in 4.1M Node objects each with a Hash for children — consuming
nearly 1 GB. The new implementation uses a sorted array with binary search
for prefix matching, reducing the data structure from ~955 MB to ~20 MB.

### Pack Location data as integers, create objects lazily
Location objects (storing start/end line/column) are packed into a single
62-bit fixnum stored directly on entries. The Location object is only
created when actually accessed. This eliminates 1.5M Location objects that
were allocated during indexing but rarely read.

### Remove @configuration from Entry instances
Every entry stored a reference to the shared Configuration. Moving this to
a module-level accessor eliminated one instance variable per entry. This
pushed Method and Class entries below Ruby's shape capacity threshold,
dropping them from 160 bytes to 80 bytes each — halving memory for the two
largest entry types (388K methods, 76K classes).

### Minimize Entry object shapes
By deferring initialization of @comments (nil during indexing) and
@visibility (:public for 83% of entries), the common-case Entry shape
has fewer instance variables, letting Ruby allocate smaller objects.

## Other optimizations

- **String deduplication**: Use String#-@ to intern entry names, nesting
  components, module operation names, parent class names, and URI strings
- **Shared Signature singletons**: 221K parameterless methods share a
  single frozen Signature instance instead of each creating their own
- **Parameter interning**: Cache Parameter objects by (type, name) so
  methods with the same parameter names share objects
- **Deferred prefix tree build**: Skip incremental tree insertion during
  initial indexing; build in one pass at the end
- **Use Prism.parse_file**: Avoid creating Ruby string for file contents
  during initial indexing
- **External store for entries PrefixTree**: The entries prefix tree
  references the @entries hash directly instead of duplicating it
- **Aggressive post-indexing GC**: 3 rounds of GC.start + GC.compact
  after indexing to minimize heap fragmentation

## Results (Shopify core monorepo)

| Metric | Before | After | Change |
|--------|--------|-------|--------|
| RSS after indexing | 2,217 MB | ~548 MB | **-75%** |
| Startup time | 49.3s | ~39s | **-21%** |
| Index entries | 889K | 889K | unchanged |
| Files indexed | 64K | 64K | unchanged |

All 18,333 existing tests continue to pass.
@shauns shauns closed this Mar 18, 2026
@shauns shauns deleted the reduce-memory-and-startup-costs branch March 18, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant