Fix EH clause sort to handle try-in-handler nesting#8428
Fix EH clause sort to handle try-in-handler nesting#8428
Conversation
BenchmarksBenchmark execution time: 2026-04-10 16:42:25 Comparing candidate commit 6b6b9a4 in PR branch Found 29 performance improvements and 40 performance regressions! Performance is the same for 211 metrics, 8 unstable metrics.
|
|
Closing this as I just thought it would be easier to share this as a PR form than not, if this is valuable can always pick it back up but we don't know if this actually solves anything. |
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8428) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (72ms) : 69, 74
master - mean (77ms) : 67, 87
section Bailout
This PR (8428) - mean (76ms) : 74, 77
master - mean (81ms) : 72, 91
section CallTarget+Inlining+NGEN
This PR (8428) - mean (1,113ms) : 983, 1244
master - mean (1,161ms) : 1045, 1278
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (117ms) : 113, 120
master - mean (132ms) : 126, 139
section Bailout
This PR (8428) - mean (116ms) : 113, 119
master - mean (133ms) : 126, 139
section CallTarget+Inlining+NGEN
This PR (8428) - mean (792ms) : 765, 818
master - mean (807ms) : 769, 844
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (99ms) : 97, 102
master - mean (104ms) : 98, 109
section Bailout
This PR (8428) - mean (101ms) : 98, 103
master - mean (104ms) : 100, 108
section CallTarget+Inlining+NGEN
This PR (8428) - mean (932ms) : 902, 963
master - mean (944ms) : 903, 985
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (98ms) : 95, 101
master - mean (101ms) : 97, 106
section Bailout
This PR (8428) - mean (99ms) : 97, 101
master - mean (102ms) : 98, 106
section CallTarget+Inlining+NGEN
This PR (8428) - mean (813ms) : 780, 846
master - mean (825ms) : 788, 862
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (194ms) : 190, 197
master - mean (194ms) : 190, 199
section Bailout
This PR (8428) - mean (197ms) : 194, 199
master - mean (198ms) : 195, 201
section CallTarget+Inlining+NGEN
This PR (8428) - mean (1,152ms) : 1111, 1193
master - mean (1,159ms) : 1114, 1204
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (278ms) : 273, 283
master - mean (279ms) : 273, 285
section Bailout
This PR (8428) - mean (278ms) : 276, 281
master - mean (281ms) : 275, 286
section CallTarget+Inlining+NGEN
This PR (8428) - mean (932ms) : 911, 953
master - mean (942ms) : 918, 966
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (271ms) : 267, 275
master - mean (274ms) : 267, 282
section Bailout
This PR (8428) - mean (270ms) : 267, 274
master - mean (276ms) : 272, 280
section CallTarget+Inlining+NGEN
This PR (8428) - mean (1,140ms) : 1102, 1178
master - mean (1,146ms) : 1118, 1175
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (269ms) : 265, 273
master - mean (274ms) : 268, 281
section Bailout
This PR (8428) - mean (269ms) : 267, 271
master - mean (271ms) : 267, 275
section CallTarget+Inlining+NGEN
This PR (8428) - mean (1,025ms) : 982, 1067
master - mean (1,024ms) : 984, 1064
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
## Summary of changes This reproduces the crash [described in this README](https://github.com/DataDog/dd-trace-dotnet/blob/82cc4cf0e29478fd2fdc41afdf764a2abf662199/repro/APMS-19196/README.md) on linux x64 arch. Aim is to merge to #8428 First tested it's crashing on master <img width="2830" height="851" alt="image" src="https://github.com/user-attachments/assets/96e882fa-308e-44b6-8116-c93c7d5b7bb2" /> ## Reason for change ## Implementation details ## Test coverage ## Other details <!-- Fixes #{issue} --> <!--⚠️ Note: Where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. MergeQueue is NOT enabled in this repository. If you have write access to the repo, the PR has 1-2 approvals (see above), and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #apm-dotnet channel in Slack. -->
Snapshots difference summaryThe following differences have been observed in committed snapshots. It is meant to help the reviewer. 3 occurrences of : + },
+ {
+ TraceId: Id_1,
+ SpanId: Id_62,
+ Name: trace.annotation,
+ Resource: ExtremeExceptionHandling.DeepNestedExceptionHandlingSync,
+ Service: Samples.TraceAnnotations,
+ ParentId: Id_2,
+ Tags: {
+ component: trace,
+ env: integration_tests,
+ language: dotnet,
+ version: 1.0.0
+ }
3 occurrences of : + },
+ {
+ TraceId: Id_1,
+ SpanId: Id_62,
+ Name: trace.annotation,
+ Resource: ExtremeExceptionHandling.DeepNestedExceptionHandlingSync,
+ Service: Samples.TraceAnnotations.VersionMismatch.AfterFeature,
+ ParentId: Id_2,
+ Tags: {
+ component: trace,
+ env: integration_tests,
+ language: dotnet,
+ version: 1.0.0
+ }
3 occurrences of : + },
+ {
+ TraceId: Id_1,
+ SpanId: Id_62,
+ Name: trace.annotation,
+ Resource: ExtremeExceptionHandling.DeepNestedExceptionHandlingSync,
+ Service: Samples.TraceAnnotations.VersionMismatch.BeforeFeature,
+ ParentId: Id_2,
+ Tags: {
+ component: trace,
+ env: integration_tests,
+ language: dotnet,
+ version: 1.0.0
+ }
|
Add original index as final tiebreaker so std::sort does not reorder catch/filter handlers that share the same try region. Without this, multiple catch blocks for the same try can be silently swapped, changing which handler the runtime evaluates first.
Extract IsProperlyContained helper, use explicit types, add braces on all control flow, and use named variables in the sort comparator to match surrounding native code style.
Summary of changes
Fix incorrect EH clause ordering in
ILRewriter::Export()that causedInvalidProgramExceptionwhen instrumenting methods with try blocks nested inside handler regions.Reason for change
When we instrument a method, the IL rewriter may insert try/catch blocks inside the handler of an existing try/catch (e.g., debugger inserting
EndMethod(SetException)inside an async state machine's outer catch, or CallTarget instrumentation wrapping handler logic). ECMA-335 II.19 requires such nested clauses to appear before the clause whose region encloses them in the EH table.The existing sort comparator only checked try-in-try containment:
return a.m_pTryBegin->m_offset > b.m_pTryBegin->m_offset && a.m_pTryEnd->m_offset < b.m_pTryEnd->m_offset;Because the inserted clause's try region starts after the outer clause's try region ends, this check returns false. The clause stays in its appended position -- after the outer clause -- violating ECMA-335. The CLR validates EH table ordering on JIT and rejects the method with
InvalidProgramException.The comparator also violated strict weak ordering required by
std::sort(transitivity of incomparability fails for sibling clauses adjacent to a nesting relationship), making its behavior undefined.Implementation details
Replaced the inline
std::sortwithILRewriter::SortEHClauses, a new static method that:Computes a nesting depth for each clause: the number of other clauses whose try or handler region strictly contains this clause's try region. This correctly detects both try-in-try and try-in-handler nesting per ECMA-335 II.19.
Sorts by (depth descending, try offset ascending), which is a total order and therefore satisfies strict weak ordering. Deeper-nested clauses sort first, satisfying the ECMA-335 requirement.
The method is extracted as a
staticto make it directly unit-testable without requiringICorProfilerInfo. Internal allocations usestd::unique_ptrto prevent memory leaks if an allocation fails.Test coverage
Added
ILRewriterEHSortTestinDatadog.Tracer.Native.Testswith 8 Google Test cases covering:EndMethod(SetException)clause is ordered before the outer clause whose handler contains itThe
OldComparatorFailsMiddlewareScenariotest applies the original comparator inline against the middleware topology and asserts it produces the wrong ordering, confirming the bug is real and the new tests would catch a regression.Added integration test in
tracer/test/test-applications/integrations/Samples.TraceAnnotations/ExtremeExceptionHandling.csthat reproduces the crash. Without the fix, the test causes a CLR fatal exception.Other details
Symptom:
System.InvalidProgramException: Common Language Runtime detected an invalid programthrown on any instrumented method whose EH table contains try-in-handler nesting. Originally observed on async middlewareInvokeAsyncmethods with Exception Replay enabled, but affects any instrumentation path that goes throughILRewriter::Export().Linked to APMS-19196