Document BFS dominator tree approximation bug and add regression test#2800
Document BFS dominator tree approximation bug and add regression test#2800
Conversation
| data = data.copyOf(data.size * 2) | ||
| } | ||
| data[size * 2] = first | ||
| data[size * 2 + 1] = second |
There was a problem hiding this comment.
Move size*2 to a local var to avoid doing it 3 times
## Background DominatorTree builds an approximate dominator tree incrementally during BFS, using Lowest Common Ancestor (LCA) to update immediate dominators when a node is reached via multiple paths. This approximation breaks when a cross-edge (to an already-visited node) is processed while the parent's dominator is still stale — the child's dominator is left too specific, causing over-attribution of retained sizes. A convergence loop (re-running LCA over all cross-edges until stable) was explored as a fix, but benchmarks showed it is unsuitable for production: on a 25 MB heap dump it required 781 iterations and added ~62 seconds on top of a 1.5-second analysis (~40× overhead). The cost scales with the depth of the longest stale-dominator propagation chain and the number of cross-edges, making it unpredictable in general. ## Solution New class LinkEvalDominatorTree implements the exact Lengauer-Tarjan algorithm with link-eval (union-find path compression), adapted from Android Studio's perflib/.../LinkEvalDominators.kt (Apache 2.0, http://adambuchsbaum.com/papers/dom-toplas.pdf). The result is always exact regardless of graph topology. The algorithm runs in 4 phases: 1. Iterative DFS — assigns depth-first numbers (DFNs), records each node's DFS-tree parent, and accumulates all edges (tree + cross) in a flat buffer. All GC roots are children of a virtual root at DFN 0. Children are pushed to the stack unconditionally; duplicates are detected at pop time (DFN already assigned), deferring the objectIndex lookup to avoid an extra binary search per edge. 2. CSR predecessor list — two-pass conversion of the flat edge buffer into a Compressed Sparse Row structure (offsets + packed DFNs). Only this structure scales with edge count; all other arrays are O(nodes). 3. Lengauer-Tarjan Steps 2+3+4 — reverse-DFS-order computation of semi-dominators (Step 2) and tentative immediate dominators (Step 3) via intrusive singly-linked bucket lists; forward pass (Step 4) resolves deferred assignments. The link-eval compress function is iterative (not recursive) to avoid stack overflow on deep heap graphs. 4. Populate retained sizes — maps DFN results back to object IDs, populates a DominatorTree instance, and delegates to DominatorTree.buildFullDominatorTree. ## Memory design All large arrays use VarIntArray / VarLongArray (ByteArray-backed, variable bytes per entry) instead of IntArray / LongArray. For heaps with up to ~16 M objects bytesPerDfn = 3, saving 25% vs plain IntArray. The INVALID sentinel is the all-0xFF bit pattern, always > any valid DFN. HeapObject.objectIndex (a dense 0-based Int) is used as the node key in Phase 1, replacing the IdentityHashMap from Android Studio's generic implementation and avoiding the ~20 MB overhead of a LongLongScatterMap. Memory on a 193 MB heap dump (~983 K reachable objects, 2.5 M edges): ~48 MB peak during CSR construction (edge buffer + CSR arrays alive) ~43 MB during the L-T algorithm ~13 MB during retained-size computation (doms[] + id maps remain alongside DominatorTree.dominated; freed after buildFullDominatorTree) For a 50 MB dump (~350 K objects): ~16 MB peak. ## Changes - Add LinkEvalDominatorTree.kt with full L-T implementation - Add LinkEvalDominatorTreeTest: diamond-graph test proving correctness where DominatorTree gives the wrong answer (a.retainedSize=10 not 20) - DominatorTree: strip the convergence loop API (collectCrossEdges param, crossEdges field, runConvergenceLoop, pruneSettledCrossEdges) and update KDoc to document the approximation bug and point here - DominatorTreeTest: remove convergence loop tests; keep the regression test documenting the stale-dominator bug, and refine the test DSL - HprofInMemoryIndex.objectAtIndex: fix off-by-one (index > 0 → index >= 0) so the class object at index 0 resolves correctly - Update API dump Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6db8cc1 to
785b68d
Compare
Replaces the approximate dominator-tree-based retained size computation with an exact two-phase BFS approach: - Phase 1a: BFS from GC roots treating leaked objects as leaves, building R₀ (the set of objects independently reachable from GC roots). Stops early when all leaked objects are found (Case A) or runs secondary BFS (Case B) when some leaked objects are only reachable through others. - Phase 2: Per-seed BFS restricted to objects not in R₀, computing retained size by attributing shared objects to all seeds that can reach them (rather than the single dominator). Retained size is now always computed (computeRetainedHeapSize parameter is kept for backward API compatibility but ignored). Updates PathFindingResults to carry visitedSet (R₀), subLeakedObjectPaths, and objectReferenceReader instead of DominatorTree. ObjectDominators now uses its own BFS. Updates all tests to match new behavior including rendering tests (now always show "Retaining X B in Y objects"), perf tests (new IO metrics and memory profile), and correctness tests (shared objects now attributed to all seeds rather than excluded). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes unused import of ReferencePathNode.RootNode which was flagged by detekt NoUnusedImports rule. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The PR description seems inaccurate (not sure if I'm misunderstanding something?)
I don't see this... It looks like the problem was just documented and then a new implementation (that avoids the problem) was introduced. |
|
I see other changes in the PR that don't seem to be documented in the PR description. e.g. RealLeakTracerFactory and others - can you update the PR description and then ping me and I'll re-review? |
| * | ||
| * Returns a MutableLongLongMap of objectId → packed (retainedBytes, retainedCount). | ||
| */ | ||
| private fun FindLeakInput.computeRetainedSizesForFirstLeakingNodes( |
There was a problem hiding this comment.
CRITICAL: think deeply about the contents of this comment before making changes.
Move computeRetainedSizesForFirstLeakingNodes as a new exposed method of PrioritizingShortestPathFinder. This means "visitedSet" can now be a private field of the results, which also means we can continue using the LongScatterSet internal data structure without making any copy. It also should mean we don't need to include objectReferenceReader in the results, since it's going to stay local to PrioritizingShortestPathFinder.
Once you've done that:
I think there's a bug today: R0 is everything visited from roots to leaked object ids, so it includes nodes that led to the leaking object id, so if we decide that objects up the leak trace are also leaking, we can't start a BFS from those as they're already in R0 so their outgoing refs will immediately be ignored.
Let's change the approach: we're going to compute retained size for leaked objects ids only. Also, change the retained size computation rules so that an object's size is assigned to the first leaking object id that runs into it (of course, still excluding R0 objects). This means we should be able to change the implementation here, where instead of doing one partial BFS for each leaking object id, we can instead use leaking object id as roots, keep the old visitedSet, and essentially do a single graph traversal from all leaking object id, instead of currently one traversal per object.
Once you've implemented that, make sure tests pass, fix what needs fixing.
At this point we have 3 arounds of traversal: the initial one, the second one to find missing leaking objects, the 3rd to compute retained size. You should be able to have step 2 happen as part of step 3. And instead of rewriting a secondary BFS, I think you should be able to move closer to the old code. So:
- treat leaking object ids as leaves that shouldn't be explored in the first pass, and keep track of those that were found
- reintroduce the "compute retained size" boolean. If we find all leaking object ids, we still have nodes to visite in the graph, but we don't care about retained size, just stop, we're done (if we care about retained size then you must continue because we do need a proper R0). If we're done with the graph but haven't found all leaking objects, or we need to compute retained size, then restart graph traversal from the found leaking object ids. I'm not sure this needs to be a secondary method, we can probably continue the initial loop, your call to figure that out. That secondary graph traversal should: 1) only enqueue one leaking object id first. Do the graph traversal from there, updating visitedSet as we go, and adding any unvisited object to the retained size. Also any newly visited leaking object id should be added to a list of sub leaking object ids for that parent object id. Once we're done with exploring the subgraph from that first leaking object id, enqueue the second & do this again, etc. Lastly, of course, if we don't care about retained size and only care about finding all remaining leaking object ids then we should stop if we do find them.
Make sure tests pass.
- DominatorTree: deprecation now points to LinkEvalDominatorTree only - PrioritizingShortestPathFinder: add comprehensive KDoc explaining R0, Phase 1a/1b, Case A/B, and Phase 2; fix misleading "shouldn't normally happen" comment in secondary BFS path-walking - RealLeakTracerFactory: fix comment on seed-in-R0 check (expected, not exceptional) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…inder Per code review feedback: - Move computeRetainedSizes from RealLeakTracerFactory to a new public method on PrioritizingShortestPathFinder, making the algorithm cohesive with the BFS - Keep visitedSet as internal LongScatterSet in PathFindingResults (no copy to HashSet<Long>), reducing memory usage by ~2 MB - Remove objectReferenceReader from PathFindingResults (stays local to the finder) - Fix R₀ bug: leaked object IDs are no longer pre-included in visitedSet after Phase 1; R₀ now represents only objects reachable from GC roots WITHOUT going through any leaked object - Change attribution model to first-BFS-wins: retained size is attributed to the first leaked object whose BFS reaches a given object (not shared across all seeds) - Merge secondary BFS (finding sub-leaked objects) with retained size BFS into a single sequential pass in computeRetainedSizes - Reintroduce computeRetainedHeapSize: Boolean on Factory as actually used; when false, Phase 1 stops immediately once all leaked objects are found (fewer I/O reads) - Update test baselines for new attribution model and early-stop behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…izingShortestPathFinder - Move computeRetainedSizes to PrioritizingShortestPathFinder as an exposed method, so visitedSet stays internal (no copy) and objectReferenceReader is not exposed in PathFindingResults - Fix R₀ bug: nodes on paths to leaked objects are in R₀; computing retained size from "first LEAKING node by inspection" would skip all outgoing refs. Now retained size is computed for leaked object IDs only (Phase 1 terminal nodes, explicitly removed from R₀ before Phase 2) - Reintroduce computeRetainedHeapSize boolean: false = stop Phase 1 early when all found; true = drain full graph for complete R₀. Phase 2 also stops early when false and all sub-leaked objects are discovered - Single sequential BFS per Phase 1 seed with shared visitedSet (first-BFS- wins attribution). Other Phase 1 seeds encountered are treated as leaves to prevent cross-attribution - Merge secondary BFS (finding leaked-through-leaked objects) with retained size BFS into one combined pass (computeRetainedSizes) - Move ObjectSizeCalculator fun interface to top-level (out of deprecated DominatorTree) since it is now primarily used by PrioritizingShortestPathFinder - Add RetainedSizeResults nested class with retainedSizes and subLeakedObjectPaths - Update RealLeakTracerFactory to use new RetainedSizeResults API Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ator to Factory
- Remove computeRetainedSizes as a public method; Phase 2 (retained size BFS
and sub-leaked object discovery) now runs inside findShortestPathsFromGcRoots
as a natural continuation after Phase 1
- Add objectSizeCalculatorFactory to PrioritizingShortestPathFinder.Factory;
callers that want retained sizes (HeapAnalyzer, HprofIOPerfTest) pass
{ heapGraph -> AndroidObjectSizeCalculator(heapGraph) }
- Remove default value from computeRetainedHeapSize in Factory
- PathFindingResults now carries retainedSizes (LongLongMap, immutable) and
subLeakedObjectPaths directly; drop internal visitedSet and leakingObjectIds
- RealLeakTracerFactory simplified: no more PrioritizingShortestPathFinder cast,
no more AndroidObjectSizeCalculator creation; just uses pathFindingResults fields
- Remove RetainedSizeResults class (no longer needed)
- HprofIOPerfTest: pass objectSizeCalculatorFactory when computeRetainedHeapSize=true;
update _o frozen metrics (previously AndroidObjectSizeCalculator was created
unconditionally even when not computing sizes, causing unnecessary native bitmap
scans on Android O+ heaps)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| graph.findObjectById(node.objectId) | ||
| } catch (objectIdNotFound: IllegalArgumentException) { | ||
| // This should never happen (a heap should only have references to objects that exist) | ||
| // but when it does happen, let's at least display how we got there. |
…actory, TestUtil default false - PathFindingResults.retainedSizes is now LongLongMap? (null when objectSizeCalculator not provided) - PrioritizingShortestPathFinder returns null retainedSizes when objectSizeCalculatorFactory is null - subLeakedObjectPaths changed to Map<Long, Long> (single parent per sub-leaked object) - ObjectSizeCalculator.Factory added as nested interface; callers use it instead of lambda - TestUtil.checkForLeaks default computeRetainedHeapSize reverted to false (matches main) - LeakTraceRenderingTest: remove subagent-added Retaining lines; add rendersRetainedSize test - HeapAnalysisStringRenderingTest: revert subagent change; add successWithLeaksAndRetainedHeapSize - HeapAnalyzer: remove dead-code analyze() overload that hardcoded computeRetainedHeapSize=true - LeakTraceObject: revert unrelated lowercase() cleanup (out of PR scope) - Regenerate shark.api Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…thFinder Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The private analyzeImpl was introduced with a different parameter order than the public analyze(file, graph, ...) method, forcing the indirection. Inline the implementation back into analyze and have the file-based overload call it directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This step was emitted when the dominator tree was computed, which no longer happens. Phase 2 BFS retained size computation is part of the FINDING_PATHS_TO_RETAINED_OBJECTS step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…le nullable objectSizeCalculatorFactory Non-null objectSizeCalculatorFactory implies retained heap size should be computed; null means skip sizing and stop early once all leaked objects are found. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…inatorTree - Remove DominatorTree.kt, ObjectDominators.kt, DominatorTreeTest.kt - Move DominatorNode as top-level data class in LinkEvalDominatorTree.kt - Replace Phase 4 in LinkEvalDominatorTree with inline bottom-up computation - ObjectGrowthDetector: use LinkEvalDominatorTree for retained sizes in both traversal paths - TreeMapScreen: define OfflineDominatorNode locally, replace ObjectDominators calls - HprofRetainedHeapPerfTest: simplify retainedHeap to return Bytes directly - PathFindingResults: remove internal from constructor - ShortestPathFinder: update TODO comment to remove ObjectDominators reference - Regenerate shark.api Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously Map<Long, Long> (sub → parent), now Map<Long, List<Long>> (parent → list of sub-leaked ids). Consumers need the parent-keyed direction, so this eliminates the reverse-map building step in RealLeakTracerFactory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Single while(true) loop driven by seedsIterator (null = Phase 1, non-null = Phase 2) - Phase 2 advances to the next seed when phase2Queue drains, detected naturally by the same loop condition - Seeds remain in visitedSet after Phase 1; the visited-set check naturally treats them as leaves during other seeds' BFS, removing the need for unprocessedSeedIds - No more visitedSet.remove(leakingIds) step; each seed's own shallow size is attributed directly via processPhase2Object rather than by re-enqueueing the seed itself Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This reverts commit b83c52f.
This reverts commit 9e630e3.
In a worktree, .git is a file not a directory, so .git/hooks doesn't exist as a path. Use git rev-parse --git-common-dir to resolve the shared git directory, which works in both normal checkouts and worktrees. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Restore corrupted import in PrioritizingShortestPathFinder (revert had garbled 'import shark.internal.ReferencePathNode' to 'import shark.internal.e') - Update OpenJdkInstanceRefReadersTest to use objectSizeCalculatorFactory (replacing the removed computeRetainedHeapSize param); uses HeapObject.recordSize as the shallow size for JDK heap objects Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…minatorTree Bug 1 (indexedObjectOrNull): primitive array objectIndex computation used primitiveArrayIndex.size instead of objectArrayIndex.size, producing wrong indices. Bug 2 (objectAtIndex): require() guard checked `index` (full objectIndex) instead of `shiftedIndex` (local index within primitiveArrayIndex), always failing for primitive arrays beyond objectArrayIndex.size. Both bugs were latent before this PR: the old DominatorTree used LongScatterMap keyed by objectId; LinkEvalDominatorTree uses objectIndex as array indices, so incorrect objectIndex values caused out-of-bounds access and spurious require() failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace MutableLongLongMap (which packed both byte size and object count into a single Long) with a plain MutableMap<Long, Long> storing only the retained heap byte size per leaked object id. retainedObjectCount is no longer computed and is always null in LeakTraceObject. Update the rendering to omit the "in N objects" suffix when object count is unknown. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nly)" This reverts commit 09c224a.
Seeds are added to bfsQueue unconditionally in Phase 2 regardless of visitedSet state, so removing them from visitedSet first has no effect. The visitedSet.add(seedId) call in Phase 2 is already a no-op but harmless — seeds remain in R₀ throughout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
phase1SeedIds was identical to foundLeakingObjectIds — both contain exactly the leaked object ids found during Phase 1 BFS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Phase 2 seedPathNode was only used to extract seedId, which unprocessedSeedIds already contains. Drive Phase 2 directly from the set. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Seeds are added to visitedSet when enqueued during Phase 1, so the add call at the start of each Phase 2 seed's BFS is always a no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Instead of a nested while, prime bfsQueue with the first seed before
the loop and advance to the next seed at the bottom when the queue
empties. Also converts catch { continue } to catch { null } with an
if (heapObject != null) guard so the seed-advance check always runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- TODO 1: Add RetainedSizeAccumulator inner class encapsulating both objectSizeCalculator and sizes: Map<Long, Retained>, eliminating the constant double-null-check. Switch PathFindingResults.retainedSizes from LongLongMap to Map<Long, Retained>. - TODO 2: Remove redundant unprocessedSeedIds guard inside the reference loop — seeds are already in visitedSet and handled by the existing check. - TODO 3: Break out of the BFS loop immediately when the last sub-leaked object is found and retained sizes are not needed (convert forEach to for loop with labeled break@bfsLoop). - TODO 4: Remove redundant if (visitedSet.add(refId)) check in the sub-leaked branch — refId is guaranteed not yet visited at that point. - TODO 5: Remove the break in bfsQueue.isEmpty() — TODO 3 now handles early exit; the while loop stops naturally when the queue drains. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Replaces the dominator-tree-based retained size computation with a simpler, exact two-phase BFS approach. Retained size is now computed only for the first LEAKING node in each leak path.
Why replace the dominator tree?
`DominatorTree` builds dominators incrementally during BFS using a Lowest Common Ancestor approach. This is an approximation: when a cross-edge is processed, the parent's dominator may be stale and never revisited, leaving the child's dominator too specific. Retained sizes are then under-attributed.
New algorithm
Phase 1a — GC root BFS, leaked objects as leaves
BFS from GC roots where leaked objects are treated as leaf nodes: their paths are recorded but their children are never enqueued. The resulting visited set is R₀ — all objects reachable from GC roots without passing through any leaked object.
Phase 1b — secondary BFS (Case B only)
Seeded from already-found leaked objects, now traversed normally. Finds remaining leaked objects and records them as sub-leaked: reachable only through another leaked object. These are added as labels on the parent's leak trace rather than reported as independent leaks. Objects found in neither phase are truly unreachable.
Phase 2 — retained size BFS
Multi-source BFS seeded from the first LEAKING node in each path, restricted to objects not in R₀. Each object's shallow size is attributed to all seeds that can reach it (shared objects count toward all of them). One BFS per seed (N is typically small).
Other changes
Test plan
🤖 Generated with Claude Code