Skip to content

Document BFS dominator tree approximation bug and add regression test#2800

Open
pyricau wants to merge 34 commits intomainfrom
worktree-treemap-heapdump
Open

Document BFS dominator tree approximation bug and add regression test#2800
pyricau wants to merge 34 commits intomainfrom
worktree-treemap-heapdump

Conversation

@pyricau
Copy link
Copy Markdown
Member

@pyricau pyricau commented Feb 26, 2026

Summary

Replaces the dominator-tree-based retained size computation with a simpler, exact two-phase BFS approach. Retained size is now computed only for the first LEAKING node in each leak path.

Why replace the dominator tree?

`DominatorTree` builds dominators incrementally during BFS using a Lowest Common Ancestor approach. This is an approximation: when a cross-edge is processed, the parent's dominator may be stale and never revisited, leaving the child's dominator too specific. Retained sizes are then under-attributed.

New algorithm

Phase 1a — GC root BFS, leaked objects as leaves

BFS from GC roots where leaked objects are treated as leaf nodes: their paths are recorded but their children are never enqueued. The resulting visited set is R₀ — all objects reachable from GC roots without passing through any leaked object.

  • Case A: all N leaked objects found before both queues empty → stop early, R₀ = visited set at that moment.
  • Case B: queues exhausted before all N found → some leaked objects are only reachable through other leaked objects. R₀ is now complete; Phase 1b runs.

Phase 1b — secondary BFS (Case B only)

Seeded from already-found leaked objects, now traversed normally. Finds remaining leaked objects and records them as sub-leaked: reachable only through another leaked object. These are added as labels on the parent's leak trace rather than reported as independent leaks. Objects found in neither phase are truly unreachable.

Phase 2 — retained size BFS

Multi-source BFS seeded from the first LEAKING node in each path, restricted to objects not in R₀. Each object's shallow size is attributed to all seeds that can reach it (shared objects count toward all of them). One BFS per seed (N is typically small).

Other changes

  • `DominatorTree` marked `@Deprecated` in favour of `LinkEvalDominatorTree`
  • `deduplicateShortestPaths` and its trie removed — treating leaked objects as leaves makes it a no-op (a leaked object can never appear as an intermediate node in another path)
  • `PathFindingResults` gains `visitedSet` (R₀) and `subLeakedObjectPaths`, loses `dominatorTree`
  • `computeRetainedHeapSize` parameter kept on `Factory` for API compatibility but is now ignored

Test plan

  • `./gradlew 🦈shark:test` passes
  • `./gradlew 🦈shark-android:test` passes

🤖 Generated with Claude Code

data = data.copyOf(data.size * 2)
}
data[size * 2] = first
data[size * 2 + 1] = second
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move size*2 to a local var to avoid doing it 3 times

## Background

DominatorTree builds an approximate dominator tree incrementally during
BFS, using Lowest Common Ancestor (LCA) to update immediate dominators
when a node is reached via multiple paths. This approximation breaks
when a cross-edge (to an already-visited node) is processed while the
parent's dominator is still stale — the child's dominator is left too
specific, causing over-attribution of retained sizes.

A convergence loop (re-running LCA over all cross-edges until stable)
was explored as a fix, but benchmarks showed it is unsuitable for
production: on a 25 MB heap dump it required 781 iterations and added
~62 seconds on top of a 1.5-second analysis (~40× overhead). The cost
scales with the depth of the longest stale-dominator propagation chain
and the number of cross-edges, making it unpredictable in general.

## Solution

New class LinkEvalDominatorTree implements the exact Lengauer-Tarjan
algorithm with link-eval (union-find path compression), adapted from
Android Studio's perflib/.../LinkEvalDominators.kt (Apache 2.0,
http://adambuchsbaum.com/papers/dom-toplas.pdf). The result is always
exact regardless of graph topology.

The algorithm runs in 4 phases:

1. Iterative DFS — assigns depth-first numbers (DFNs), records each
   node's DFS-tree parent, and accumulates all edges (tree + cross) in
   a flat buffer. All GC roots are children of a virtual root at DFN 0.
   Children are pushed to the stack unconditionally; duplicates are
   detected at pop time (DFN already assigned), deferring the
   objectIndex lookup to avoid an extra binary search per edge.

2. CSR predecessor list — two-pass conversion of the flat edge buffer
   into a Compressed Sparse Row structure (offsets + packed DFNs). Only
   this structure scales with edge count; all other arrays are O(nodes).

3. Lengauer-Tarjan Steps 2+3+4 — reverse-DFS-order computation of
   semi-dominators (Step 2) and tentative immediate dominators (Step 3)
   via intrusive singly-linked bucket lists; forward pass (Step 4)
   resolves deferred assignments. The link-eval compress function is
   iterative (not recursive) to avoid stack overflow on deep heap graphs.

4. Populate retained sizes — maps DFN results back to object IDs,
   populates a DominatorTree instance, and delegates to
   DominatorTree.buildFullDominatorTree.

## Memory design

All large arrays use VarIntArray / VarLongArray (ByteArray-backed,
variable bytes per entry) instead of IntArray / LongArray. For heaps
with up to ~16 M objects bytesPerDfn = 3, saving 25% vs plain IntArray.
The INVALID sentinel is the all-0xFF bit pattern, always > any valid DFN.

HeapObject.objectIndex (a dense 0-based Int) is used as the node key
in Phase 1, replacing the IdentityHashMap from Android Studio's generic
implementation and avoiding the ~20 MB overhead of a LongLongScatterMap.

Memory on a 193 MB heap dump (~983 K reachable objects, 2.5 M edges):
  ~48 MB peak during CSR construction (edge buffer + CSR arrays alive)
  ~43 MB during the L-T algorithm
  ~13 MB during retained-size computation (doms[] + id maps remain
     alongside DominatorTree.dominated; freed after buildFullDominatorTree)

For a 50 MB dump (~350 K objects): ~16 MB peak.

## Changes

- Add LinkEvalDominatorTree.kt with full L-T implementation
- Add LinkEvalDominatorTreeTest: diamond-graph test proving correctness
  where DominatorTree gives the wrong answer (a.retainedSize=10 not 20)
- DominatorTree: strip the convergence loop API (collectCrossEdges
  param, crossEdges field, runConvergenceLoop, pruneSettledCrossEdges)
  and update KDoc to document the approximation bug and point here
- DominatorTreeTest: remove convergence loop tests; keep the regression
  test documenting the stale-dominator bug, and refine the test DSL
- HprofInMemoryIndex.objectAtIndex: fix off-by-one (index > 0 →
  index >= 0) so the class object at index 0 resolves correctly
- Update API dump

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@pyricau pyricau force-pushed the worktree-treemap-heapdump branch from 6db8cc1 to 785b68d Compare March 4, 2026 17:08
pyricau and others added 2 commits March 4, 2026 15:51
Replaces the approximate dominator-tree-based retained size computation
with an exact two-phase BFS approach:

- Phase 1a: BFS from GC roots treating leaked objects as leaves, building
  R₀ (the set of objects independently reachable from GC roots). Stops
  early when all leaked objects are found (Case A) or runs secondary BFS
  (Case B) when some leaked objects are only reachable through others.

- Phase 2: Per-seed BFS restricted to objects not in R₀, computing
  retained size by attributing shared objects to all seeds that can reach
  them (rather than the single dominator).

Retained size is now always computed (computeRetainedHeapSize parameter
is kept for backward API compatibility but ignored). Updates PathFindingResults
to carry visitedSet (R₀), subLeakedObjectPaths, and objectReferenceReader
instead of DominatorTree. ObjectDominators now uses its own BFS.

Updates all tests to match new behavior including rendering tests (now
always show "Retaining X B in Y objects"), perf tests (new IO metrics
and memory profile), and correctness tests (shared objects now attributed
to all seeds rather than excluded).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes unused import of ReferencePathNode.RootNode which was flagged
by detekt NoUnusedImports rule.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tcmulcahy
Copy link
Copy Markdown

The PR description seems inaccurate (not sure if I'm misunderstanding something?)

Adds a convergence loop to DominatorTree: stores cross-edges during BFS, then re-processes them with updated dominator values after the BFS completes, iterating until the tree stabilizes (typically 2–3 passes)

I don't see this... It looks like the problem was just documented and then a new implementation (that avoids the problem) was introduced.

@tcmulcahy
Copy link
Copy Markdown

I see other changes in the PR that don't seem to be documented in the PR description. e.g. RealLeakTracerFactory and others - can you update the PR description and then ping me and I'll re-review?

*
* Returns a MutableLongLongMap of objectId → packed (retainedBytes, retainedCount).
*/
private fun FindLeakInput.computeRetainedSizesForFirstLeakingNodes(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL: think deeply about the contents of this comment before making changes.

Move computeRetainedSizesForFirstLeakingNodes as a new exposed method of PrioritizingShortestPathFinder. This means "visitedSet" can now be a private field of the results, which also means we can continue using the LongScatterSet internal data structure without making any copy. It also should mean we don't need to include objectReferenceReader in the results, since it's going to stay local to PrioritizingShortestPathFinder.

Once you've done that:

I think there's a bug today: R0 is everything visited from roots to leaked object ids, so it includes nodes that led to the leaking object id, so if we decide that objects up the leak trace are also leaking, we can't start a BFS from those as they're already in R0 so their outgoing refs will immediately be ignored.

Let's change the approach: we're going to compute retained size for leaked objects ids only. Also, change the retained size computation rules so that an object's size is assigned to the first leaking object id that runs into it (of course, still excluding R0 objects). This means we should be able to change the implementation here, where instead of doing one partial BFS for each leaking object id, we can instead use leaking object id as roots, keep the old visitedSet, and essentially do a single graph traversal from all leaking object id, instead of currently one traversal per object.

Once you've implemented that, make sure tests pass, fix what needs fixing.

At this point we have 3 arounds of traversal: the initial one, the second one to find missing leaking objects, the 3rd to compute retained size. You should be able to have step 2 happen as part of step 3. And instead of rewriting a secondary BFS, I think you should be able to move closer to the old code. So:

  • treat leaking object ids as leaves that shouldn't be explored in the first pass, and keep track of those that were found
  • reintroduce the "compute retained size" boolean. If we find all leaking object ids, we still have nodes to visite in the graph, but we don't care about retained size, just stop, we're done (if we care about retained size then you must continue because we do need a proper R0). If we're done with the graph but haven't found all leaking objects, or we need to compute retained size, then restart graph traversal from the found leaking object ids. I'm not sure this needs to be a secondary method, we can probably continue the initial loop, your call to figure that out. That secondary graph traversal should: 1) only enqueue one leaking object id first. Do the graph traversal from there, updating visitedSet as we go, and adding any unvisited object to the retained size. Also any newly visited leaking object id should be added to a list of sub leaking object ids for that parent object id. Once we're done with exploring the subgraph from that first leaking object id, enqueue the second & do this again, etc. Lastly, of course, if we don't care about retained size and only care about finding all remaining leaking object ids then we should stop if we do find them.

Make sure tests pass.

pyricau and others added 6 commits March 5, 2026 07:16
- DominatorTree: deprecation now points to LinkEvalDominatorTree only
- PrioritizingShortestPathFinder: add comprehensive KDoc explaining R0,
  Phase 1a/1b, Case A/B, and Phase 2; fix misleading "shouldn't normally
  happen" comment in secondary BFS path-walking
- RealLeakTracerFactory: fix comment on seed-in-R0 check (expected, not exceptional)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…inder

Per code review feedback:
- Move computeRetainedSizes from RealLeakTracerFactory to a new public method on
  PrioritizingShortestPathFinder, making the algorithm cohesive with the BFS
- Keep visitedSet as internal LongScatterSet in PathFindingResults (no copy to HashSet<Long>),
  reducing memory usage by ~2 MB
- Remove objectReferenceReader from PathFindingResults (stays local to the finder)
- Fix R₀ bug: leaked object IDs are no longer pre-included in visitedSet after Phase 1;
  R₀ now represents only objects reachable from GC roots WITHOUT going through any leaked object
- Change attribution model to first-BFS-wins: retained size is attributed to the first
  leaked object whose BFS reaches a given object (not shared across all seeds)
- Merge secondary BFS (finding sub-leaked objects) with retained size BFS into a single
  sequential pass in computeRetainedSizes
- Reintroduce computeRetainedHeapSize: Boolean on Factory as actually used; when false,
  Phase 1 stops immediately once all leaked objects are found (fewer I/O reads)
- Update test baselines for new attribution model and early-stop behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…izingShortestPathFinder

- Move computeRetainedSizes to PrioritizingShortestPathFinder as an exposed
  method, so visitedSet stays internal (no copy) and objectReferenceReader
  is not exposed in PathFindingResults
- Fix R₀ bug: nodes on paths to leaked objects are in R₀; computing retained
  size from "first LEAKING node by inspection" would skip all outgoing refs.
  Now retained size is computed for leaked object IDs only (Phase 1 terminal
  nodes, explicitly removed from R₀ before Phase 2)
- Reintroduce computeRetainedHeapSize boolean: false = stop Phase 1 early
  when all found; true = drain full graph for complete R₀. Phase 2 also
  stops early when false and all sub-leaked objects are discovered
- Single sequential BFS per Phase 1 seed with shared visitedSet (first-BFS-
  wins attribution). Other Phase 1 seeds encountered are treated as leaves
  to prevent cross-attribution
- Merge secondary BFS (finding leaked-through-leaked objects) with retained
  size BFS into one combined pass (computeRetainedSizes)
- Move ObjectSizeCalculator fun interface to top-level (out of deprecated
  DominatorTree) since it is now primarily used by PrioritizingShortestPathFinder
- Add RetainedSizeResults nested class with retainedSizes and subLeakedObjectPaths
- Update RealLeakTracerFactory to use new RetainedSizeResults API

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ator to Factory

- Remove computeRetainedSizes as a public method; Phase 2 (retained size BFS
  and sub-leaked object discovery) now runs inside findShortestPathsFromGcRoots
  as a natural continuation after Phase 1
- Add objectSizeCalculatorFactory to PrioritizingShortestPathFinder.Factory;
  callers that want retained sizes (HeapAnalyzer, HprofIOPerfTest) pass
  { heapGraph -> AndroidObjectSizeCalculator(heapGraph) }
- Remove default value from computeRetainedHeapSize in Factory
- PathFindingResults now carries retainedSizes (LongLongMap, immutable) and
  subLeakedObjectPaths directly; drop internal visitedSet and leakingObjectIds
- RealLeakTracerFactory simplified: no more PrioritizingShortestPathFinder cast,
  no more AndroidObjectSizeCalculator creation; just uses pathFindingResults fields
- Remove RetainedSizeResults class (no longer needed)
- HprofIOPerfTest: pass objectSizeCalculatorFactory when computeRetainedHeapSize=true;
  update _o frozen metrics (previously AndroidObjectSizeCalculator was created
  unconditionally even when not computing sizes, causing unnecessary native bitmap
  scans on Android O+ heaps)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
graph.findObjectById(node.objectId)
} catch (objectIdNotFound: IllegalArgumentException) {
// This should never happen (a heap should only have references to objects that exist)
// but when it does happen, let's at least display how we got there.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bring back this comment

pyricau and others added 15 commits March 5, 2026 11:20
…actory, TestUtil default false

- PathFindingResults.retainedSizes is now LongLongMap? (null when objectSizeCalculator not provided)
- PrioritizingShortestPathFinder returns null retainedSizes when objectSizeCalculatorFactory is null
- subLeakedObjectPaths changed to Map<Long, Long> (single parent per sub-leaked object)
- ObjectSizeCalculator.Factory added as nested interface; callers use it instead of lambda
- TestUtil.checkForLeaks default computeRetainedHeapSize reverted to false (matches main)
- LeakTraceRenderingTest: remove subagent-added Retaining lines; add rendersRetainedSize test
- HeapAnalysisStringRenderingTest: revert subagent change; add successWithLeaksAndRetainedHeapSize
- HeapAnalyzer: remove dead-code analyze() overload that hardcoded computeRetainedHeapSize=true
- LeakTraceObject: revert unrelated lowercase() cleanup (out of PR scope)
- Regenerate shark.api

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…thFinder

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The private analyzeImpl was introduced with a different parameter order
than the public analyze(file, graph, ...) method, forcing the indirection.
Inline the implementation back into analyze and have the file-based
overload call it directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This step was emitted when the dominator tree was computed, which no
longer happens. Phase 2 BFS retained size computation is part of the
FINDING_PATHS_TO_RETAINED_OBJECTS step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…le nullable objectSizeCalculatorFactory

Non-null objectSizeCalculatorFactory implies retained heap size should
be computed; null means skip sizing and stop early once all leaked
objects are found.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…inatorTree

- Remove DominatorTree.kt, ObjectDominators.kt, DominatorTreeTest.kt
- Move DominatorNode as top-level data class in LinkEvalDominatorTree.kt
- Replace Phase 4 in LinkEvalDominatorTree with inline bottom-up computation
- ObjectGrowthDetector: use LinkEvalDominatorTree for retained sizes in both traversal paths
- TreeMapScreen: define OfflineDominatorNode locally, replace ObjectDominators calls
- HprofRetainedHeapPerfTest: simplify retainedHeap to return Bytes directly
- PathFindingResults: remove internal from constructor
- ShortestPathFinder: update TODO comment to remove ObjectDominators reference
- Regenerate shark.api

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously Map<Long, Long> (sub → parent), now Map<Long, List<Long>>
(parent → list of sub-leaked ids). Consumers need the parent-keyed
direction, so this eliminates the reverse-map building step in
RealLeakTracerFactory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Single while(true) loop driven by seedsIterator (null = Phase 1,
  non-null = Phase 2)
- Phase 2 advances to the next seed when phase2Queue drains, detected
  naturally by the same loop condition
- Seeds remain in visitedSet after Phase 1; the visited-set check
  naturally treats them as leaves during other seeds' BFS, removing
  the need for unprocessedSeedIds
- No more visitedSet.remove(leakingIds) step; each seed's own shallow
  size is attributed directly via processPhase2Object rather than by
  re-enqueueing the seed itself

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In a worktree, .git is a file not a directory, so .git/hooks doesn't
exist as a path. Use git rev-parse --git-common-dir to resolve the
shared git directory, which works in both normal checkouts and worktrees.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Restore corrupted import in PrioritizingShortestPathFinder (revert
  had garbled 'import shark.internal.ReferencePathNode' to
  'import shark.internal.e')
- Update OpenJdkInstanceRefReadersTest to use objectSizeCalculatorFactory
  (replacing the removed computeRetainedHeapSize param); uses
  HeapObject.recordSize as the shallow size for JDK heap objects

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pyricau and others added 10 commits March 5, 2026 16:35
…minatorTree

Bug 1 (indexedObjectOrNull): primitive array objectIndex computation used
primitiveArrayIndex.size instead of objectArrayIndex.size, producing wrong indices.

Bug 2 (objectAtIndex): require() guard checked `index` (full objectIndex) instead
of `shiftedIndex` (local index within primitiveArrayIndex), always failing for
primitive arrays beyond objectArrayIndex.size.

Both bugs were latent before this PR: the old DominatorTree used LongScatterMap
keyed by objectId; LinkEvalDominatorTree uses objectIndex as array indices, so
incorrect objectIndex values caused out-of-bounds access and spurious require() failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace MutableLongLongMap (which packed both byte size and object count
into a single Long) with a plain MutableMap<Long, Long> storing only the
retained heap byte size per leaked object id.

retainedObjectCount is no longer computed and is always null in
LeakTraceObject. Update the rendering to omit the "in N objects" suffix
when object count is unknown.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Seeds are added to bfsQueue unconditionally in Phase 2 regardless of
visitedSet state, so removing them from visitedSet first has no effect.
The visitedSet.add(seedId) call in Phase 2 is already a no-op but
harmless — seeds remain in R₀ throughout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
phase1SeedIds was identical to foundLeakingObjectIds — both contain
exactly the leaked object ids found during Phase 1 BFS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Phase 2

seedPathNode was only used to extract seedId, which unprocessedSeedIds
already contains. Drive Phase 2 directly from the set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Seeds are added to visitedSet when enqueued during Phase 1, so the
add call at the start of each Phase 2 seed's BFS is always a no-op.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Instead of a nested while, prime bfsQueue with the first seed before
the loop and advance to the next seed at the bottom when the queue
empties. Also converts catch { continue } to catch { null } with an
if (heapObject != null) guard so the seed-advance check always runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- TODO 1: Add RetainedSizeAccumulator inner class encapsulating both
  objectSizeCalculator and sizes: Map<Long, Retained>, eliminating the
  constant double-null-check. Switch PathFindingResults.retainedSizes
  from LongLongMap to Map<Long, Retained>.
- TODO 2: Remove redundant unprocessedSeedIds guard inside the reference
  loop — seeds are already in visitedSet and handled by the existing check.
- TODO 3: Break out of the BFS loop immediately when the last sub-leaked
  object is found and retained sizes are not needed (convert forEach to
  for loop with labeled break@bfsLoop).
- TODO 4: Remove redundant if (visitedSet.add(refId)) check in the
  sub-leaked branch — refId is guaranteed not yet visited at that point.
- TODO 5: Remove the break in bfsQueue.isEmpty() — TODO 3 now handles
  early exit; the while loop stops naturally when the queue drains.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants