You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[processor/spanpruning] Optimize executeAggregations by reusing trace tree (#47771)
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
Eliminate the parentReplacements and spansToRemove maps from
executeAggregations by leveraging the existing traceTree structure.
Parent replacement lookups now walk the tree's parent pointers via a new
replacementSpanID field on spanNode, and span removal uses the tree's
markedForRemoval flags in a single pass per ScopeSpans.
There are two benefits to this change:
1. Fix OOMs when a trace is fragmented across many scope spans. Right
now there is a bug where for each scope span a map of
`len(spans-in-aggregation)` is pre-allocated and added to
`spansToRemove`. By removing `spansToRemove` this is no longer done.
However, this could also be fixed by avoiding the pre-allocation.
2. Using the existing tree instead of the two maps is also 10-20%
faster, and reduces the amount of data allocated to process each trace.
```
│ sec/op │ sec/op vs base │
ProcessTrace_SmallTrace-8 8.219µ ± 7% 8.028µ ± 1% -2.32% (p=0.000 n=10)
ProcessTrace_MediumTrace-8 72.36µ ± 7% 64.70µ ± 2% -10.59% (p=0.000 n=10)
ProcessTrace_LargeTrace-8 739.1µ ± 1% 690.5µ ± 4% -6.58% (p=0.000 n=10)
ProcessTrace_SparseAggregation-8 480.2µ ± 3% 488.4µ ± 1% +1.72% (p=0.019 n=10)
DeepTrace_Depth1-8 441.0µ ± 5% 389.1µ ± 4% -11.75% (p=0.000 n=10)
DeepTrace_Depth5-8 488.0µ ± 0% 428.6µ ± 4% -12.18% (p=0.000 n=10)
DeepTrace_Depth10-8 487.8µ ± 0% 417.2µ ± 8% -14.48% (p=0.000 n=10)
ExecuteAggregations-8 16.97µ ± 2% 13.58µ ± 2% -19.99% (p=0.000 n=10)
│ B/op │ B/op vs base │
ProcessTrace_SmallTrace-8 13.92Ki ± 0% 13.80Ki ± 0% -0.90% (p=0.000 n=10)
ProcessTrace_MediumTrace-8 119.9Ki ± 0% 113.5Ki ± 0% -5.32% (p=0.000 n=10)
ProcessTrace_LargeTrace-8 1.184Mi ± 0% 1.075Mi ± 0% -9.19% (p=0.000 n=10)
ProcessTrace_SparseAggregation-8 860.7Ki ± 0% 860.1Ki ± 0% -0.07% (p=0.000 n=10)
DeepTrace_Depth1-8 730.6Ki ± 0% 675.1Ki ± 0% -7.59% (p=0.000 n=10)
DeepTrace_Depth5-8 814.8Ki ± 0% 701.9Ki ± 0% -13.86% (p=0.000 n=10)
DeepTrace_Depth10-8 814.8Ki ± 0% 701.9Ki ± 0% -13.86% (p=0.000 n=10)
ExecuteAggregations-8 20.24Ki ± 0% 18.97Ki ± 0% -6.26% (p=0.000 n=10)
¹ all samples are equal
│ allocs/op │ allocs/op vs base │
ProcessTrace_SmallTrace-8 204.0 ± 0% 202.0 ± 0% -0.98% (p=0.000 n=10)
ProcessTrace_MediumTrace-8 1.508k ± 0% 1.493k ± 0% -0.99% (p=0.000 n=10)
ProcessTrace_LargeTrace-8 13.80k ± 0% 13.77k ± 0% -0.24% (p=0.000 n=10)
ProcessTrace_SparseAggregation-8 10.63k ± 0% 10.62k ± 0% -0.07% (p=0.000 n=10)
DeepTrace_Depth1-8 8.230k ± 0% 8.204k ± 0% -0.32% (p=0.000 n=10)
DeepTrace_Depth5-8 8.895k ± 0% 8.855k ± 0% -0.45% (p=0.000 n=10)
DeepTrace_Depth10-8 8.895k ± 0% 8.855k ± 0% -0.45% (p=0.000 n=10)
ExecuteAggregations-8 247.0 ± 0% 237.0 ± 0% -4.05% (p=0.000 n=10)
```
<!--Describe what testing was performed and which tests were added.-->
#### Testing
All the existing tests pass, and we have verified the memory improvement
on a fragmented trace that was collected locally. This PR no longer OOMs
when we try to process that trace.
0 commit comments