Skip to content

Commit 7b09d7b

Browse files
NagyDonattstellar
authored andcommitted
[analyzer] Workaround for slowdown spikes (unintended scope increase) (#136720)
Recently some users reported that they observed large increases of runtime (up to +600% on some translation units) when they upgraded to a more recent (slightly patched, internal) clang version. Bisection revealed that the bulk of this increase was probably caused by my earlier commit bb27d5e ("Don't assume third iteration in loops"). As I evaluated that earlier commit on several open source project, it turns out that on average it's runtime-neutral (or slightly helpful: it reduced the total analysis time by 1.5%) but it can cause runtime spikes on some code: in particular it more than doubled the time to analyze `tmux` (one of the smaller test projects). Further profiling and investigation proved that these spikes were caused by an _increase of analysis scope_ because there was an heuristic that placed functions on a "don't inline this" blacklist if they reached the `-analyzer-max-loop` limit (anywhere, on any one execution path) -- which became significantly rarer when my commit ensured the analyzer no longer "just assumes" four iterations. (With more inlining significantly more entry points use up their allocated budgets, which leads to the increased runtime.) I feel that this heuristic for the "don't inline" blacklist is unjustified and arbitrary, because reaching the "retry without inlining" limit on one path does not imply that inlining the function won't be valuable on other paths -- so I hope that we can eventually replace it with more "natural" limits of the analysis scope. However, the runtime increases are annoying for the users whose project is affected, so I created this quick workaround commit that approximates the "don't inline" blacklist effects of ambiguous loops (where the analyzer doesn't understand the loop condition) without fully reverting the "Don't assume third iteration" commit (to avoid reintroducing the false positives that were eliminated by it). Investigating this issue was a team effort: I'm grateful to Endre Fülöp (gamesh411) who did the bisection and shared his time measurement setup, and Gábor Tóthvári (tigbr) who helped me in profiling. (cherry picked from commit 9600a12)
1 parent a708fb7 commit 7b09d7b

File tree

6 files changed

+286
-22
lines changed

6 files changed

+286
-22
lines changed

clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def

+13
Original file line numberDiff line numberDiff line change
@@ -385,6 +385,19 @@ ANALYZER_OPTION(
385385
"flex\" won't be analyzed.",
386386
true)
387387

388+
ANALYZER_OPTION(
389+
bool, InlineFunctionsWithAmbiguousLoops, "inline-functions-with-ambiguous-loops",
390+
"If disabled (the default), the analyzer puts functions on a \"do not "
391+
"inline this\" list if it finds an execution path within that function "
392+
"that may potentially perform 'analyzer-max-loop' (= 4 by default) "
393+
"iterations in a loop. (Note that functions that _definitely_ reach the "
394+
"loop limit on some execution path are currently marked as \"do not "
395+
"inline\" even if this option is enabled.) Enabling this option "
396+
"eliminates this (somewhat arbitrary) restriction from the analysis "
397+
"scope, which increases the analysis runtime (on average by ~10%, but "
398+
"a few translation units may see much larger slowdowns).",
399+
false)
400+
388401
//===----------------------------------------------------------------------===//
389402
// Unsigned analyzer options.
390403
//===----------------------------------------------------------------------===//

clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h

-4
Original file line numberDiff line numberDiff line change
@@ -81,10 +81,6 @@ class FunctionSummariesTy {
8181
I->second.MayInline = 0;
8282
}
8383

84-
void markReachedMaxBlockCount(const Decl *D) {
85-
markShouldNotInline(D);
86-
}
87-
8884
std::optional<bool> mayInline(const Decl *D) {
8985
MapTy::const_iterator I = Map.find(D);
9086
if (I != Map.end() && I->second.InlineChecked)

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp

+49-11
Original file line numberDiff line numberDiff line change
@@ -2510,6 +2510,20 @@ bool ExprEngine::replayWithoutInlining(ExplodedNode *N,
25102510
return true;
25112511
}
25122512

2513+
/// Return the innermost location context which is inlined at `Node`, unless
2514+
/// it's the top-level (entry point) location context.
2515+
static const LocationContext *getInlinedLocationContext(ExplodedNode *Node,
2516+
ExplodedGraph &G) {
2517+
const LocationContext *CalleeLC = Node->getLocation().getLocationContext();
2518+
const LocationContext *RootLC =
2519+
(*G.roots_begin())->getLocation().getLocationContext();
2520+
2521+
if (CalleeLC->getStackFrame() == RootLC->getStackFrame())
2522+
return nullptr;
2523+
2524+
return CalleeLC;
2525+
}
2526+
25132527
/// Block entrance. (Update counters).
25142528
void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
25152529
NodeBuilderWithSinks &nodeBuilder,
@@ -2557,21 +2571,24 @@ void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
25572571
const ExplodedNode *Sink =
25582572
nodeBuilder.generateSink(Pred->getState(), Pred, &tag);
25592573

2560-
// Check if we stopped at the top level function or not.
2561-
// Root node should have the location context of the top most function.
2562-
const LocationContext *CalleeLC = Pred->getLocation().getLocationContext();
2563-
const LocationContext *CalleeSF = CalleeLC->getStackFrame();
2564-
const LocationContext *RootLC =
2565-
(*G.roots_begin())->getLocation().getLocationContext();
2566-
if (RootLC->getStackFrame() != CalleeSF) {
2567-
Engine.FunctionSummaries->markReachedMaxBlockCount(CalleeSF->getDecl());
2574+
if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
2575+
// FIXME: This will unconditionally prevent inlining this function (even
2576+
// from other entry points), which is not a reasonable heuristic: even if
2577+
// we reached max block count on this particular execution path, there
2578+
// may be other execution paths (especially with other parametrizations)
2579+
// where the analyzer can reach the end of the function (so there is no
2580+
// natural reason to avoid inlining it). However, disabling this would
2581+
// significantly increase the analysis time (because more entry points
2582+
// would exhaust their allocated budget), so it must be compensated by a
2583+
// different (more reasonable) reduction of analysis scope.
2584+
Engine.FunctionSummaries->markShouldNotInline(
2585+
LC->getStackFrame()->getDecl());
25682586

25692587
// Re-run the call evaluation without inlining it, by storing the
25702588
// no-inlining policy in the state and enqueuing the new work item on
25712589
// the list. Replay should almost never fail. Use the stats to catch it
25722590
// if it does.
2573-
if ((!AMgr.options.NoRetryExhausted &&
2574-
replayWithoutInlining(Pred, CalleeLC)))
2591+
if ((!AMgr.options.NoRetryExhausted && replayWithoutInlining(Pred, LC)))
25752592
return;
25762593
NumMaxBlockCountReachedInInlined++;
25772594
} else
@@ -2835,8 +2852,29 @@ void ExprEngine::processBranch(
28352852
// conflicts with the widen-loop analysis option (which is off by
28362853
// default). If we intend to support and stabilize the loop widening,
28372854
// we must ensure that it 'plays nicely' with this logic.
2838-
if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops)
2855+
if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops) {
28392856
Builder.generateNode(StTrue, true, PredN);
2857+
} else if (!AMgr.options.InlineFunctionsWithAmbiguousLoops) {
2858+
// FIXME: There is an ancient and arbitrary heuristic in
2859+
// `ExprEngine::processCFGBlockEntrance` which prevents all further
2860+
// inlining of a function if it finds an execution path within that
2861+
// function which reaches the `MaxBlockVisitOnPath` limit (a/k/a
2862+
// `analyzer-max-loop`, by default four iterations in a loop). Adding
2863+
// this "don't assume third iteration" logic significantly increased
2864+
// the analysis runtime on some inputs because less functions were
2865+
// arbitrarily excluded from being inlined, so more entry points used
2866+
// up their full allocated budget. As a hacky compensation for this,
2867+
// here we apply the "should not inline" mark in cases when the loop
2868+
// could potentially reach the `MaxBlockVisitOnPath` limit without the
2869+
// "don't assume third iteration" logic. This slightly overcompensates
2870+
// (activates if the third iteration can be entered, and will not
2871+
// recognize cases where the fourth iteration would't be completed), but
2872+
// should be good enough for practical purposes.
2873+
if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
2874+
Engine.FunctionSummaries->markShouldNotInline(
2875+
LC->getStackFrame()->getDecl());
2876+
}
2877+
}
28402878
}
28412879

28422880
if (StFalse)

clang/test/Analysis/analyzer-config.c

+1
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@
8888
// CHECK-NEXT: graph-trim-interval = 1000
8989
// CHECK-NEXT: ignore-bison-generated-files = true
9090
// CHECK-NEXT: ignore-flex-generated-files = true
91+
// CHECK-NEXT: inline-functions-with-ambiguous-loops = false
9192
// CHECK-NEXT: inline-lambdas = true
9293
// CHECK-NEXT: ipa = dynamic-bifurcate
9394
// CHECK-NEXT: ipa-always-inline-size = 3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify=expected,default %s
2+
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config inline-functions-with-ambiguous-loops=true -verify=expected,enabled %s
3+
4+
// This file tests some heuristics in the engine that put functions on a
5+
// "do not inline" list if their analyisis reaches the `analyzer-max-loop`
6+
// limit (by default 4 iterations) in a loop. This was almost surely intended
7+
// as memoization optimization for the "retry without inlining" fallback (if we
8+
// had to retry once, next time don't even try inlining), but aggressively
9+
// oversteps the "natural" scope: reaching 4 iterations on _one particular_
10+
// execution path does not imply that each path would need "retry without
11+
// inlining" especially if a different call receives different arguments.
12+
//
13+
// This heuristic significantly affects the scope/depth of the analysis (and
14+
// therefore the execution time) because without this limitation on the
15+
// inlining significantly more entry points would be able to exhaust their
16+
// `max-nodes` quota. (Trivial thin wrappers around big complex functions are
17+
// common in many projects.)
18+
//
19+
// Unfortunately, this arbitrary heuristic strongly relies on the current loop
20+
// handling model and its many limitations, so improvements in loop handling
21+
// can cause surprising slowdowns by reducing the "do not inline" blacklist.
22+
// In the tests "FIXME-BUT-NEEDED" comments mark "problematic" (aka buggy)
23+
// analyzer behavior which cannot be fixed without also improving the
24+
// heuristics for (not) inlining large functions.
25+
26+
int getNum(void); // Get an unknown symbolic number.
27+
28+
void clang_analyzer_dump(int arg);
29+
30+
//-----------------------------------------------------------------------------
31+
// Simple case: inlined function never reaches `analyzer-max-loop`, so it is
32+
// always inlined.
33+
34+
int inner_simple(int callIdx) {
35+
clang_analyzer_dump(callIdx); // expected-warning {{1 S32}}
36+
// expected-warning@-1 {{2 S32}}
37+
return 42;
38+
}
39+
40+
int outer_simple(void) {
41+
int x = inner_simple(1);
42+
int y = inner_simple(2);
43+
return 53 / (x - y); // expected-warning {{Division by zero}}
44+
}
45+
46+
//-----------------------------------------------------------------------------
47+
// Inlined function always reaches `analyzer-max-loop`, which stops the
48+
// analysis on that path and puts the function on the "do not inline" list.
49+
50+
int inner_fixed_loop_1(int callIdx) {
51+
int i;
52+
clang_analyzer_dump(callIdx); // expected-warning {{1 S32}}
53+
for (i = 0; i < 10; i++); // FIXME-BUT-NEEDED: This stops the analysis.
54+
clang_analyzer_dump(callIdx); // no-warning
55+
return 42;
56+
}
57+
58+
int outer_fixed_loop_1(void) {
59+
int x = inner_fixed_loop_1(1);
60+
int y = inner_fixed_loop_1(2);
61+
62+
// FIXME-BUT-NEEDED: The analysis doesn't reach this zero division.
63+
return 53 / (x - y); // no-warning
64+
}
65+
66+
//-----------------------------------------------------------------------------
67+
// Inlined function always reaches `analyzer-max-loop`; inlining is prevented
68+
// even for different entry points.
69+
// NOTE: the analyzer happens to analyze the entry points in a reversed order,
70+
// so `outer_2_fixed_loop_2` is analyzed first and it will be the one which is
71+
// able to inline the inner function.
72+
73+
int inner_fixed_loop_2(int callIdx) {
74+
// Identical copy of inner_fixed_loop_1.
75+
int i;
76+
clang_analyzer_dump(callIdx); // expected-warning {{2 S32}}
77+
for (i = 0; i < 10; i++); // FIXME-BUT-NEEDED: This stops the analysis.
78+
clang_analyzer_dump(callIdx); // no-warning
79+
return 42;
80+
}
81+
82+
int outer_1_fixed_loop_2(void) {
83+
return inner_fixed_loop_2(1);
84+
}
85+
86+
int outer_2_fixed_loop_2(void) {
87+
return inner_fixed_loop_2(2);
88+
}
89+
90+
//-----------------------------------------------------------------------------
91+
// Inlined function reaches `analyzer-max-loop` only in its second call. The
92+
// function is inlined twice but the second call doesn't finish and ends up
93+
// being conservatively evaluated.
94+
95+
int inner_parametrized_loop_1(int count) {
96+
int i;
97+
clang_analyzer_dump(count); // expected-warning {{2 S32}}
98+
// expected-warning@-1 {{10 S32}}
99+
for (i = 0; i < count; i++);
100+
// FIXME-BUT-NEEDED: This loop stops the analysis when count >=4.
101+
clang_analyzer_dump(count); // expected-warning {{2 S32}}
102+
return 42;
103+
}
104+
105+
int outer_parametrized_loop_1(void) {
106+
int x = inner_parametrized_loop_1(2);
107+
int y = inner_parametrized_loop_1(10);
108+
109+
// FIXME-BUT-NEEDED: The analysis doesn't reach this zero division.
110+
return 53 / (x - y); // no-warning
111+
}
112+
113+
//-----------------------------------------------------------------------------
114+
// Inlined function reaches `analyzer-max-loop` on its first call, so the
115+
// second call isn't inlined (although it could be fully evaluated).
116+
117+
int inner_parametrized_loop_2(int count) {
118+
// Identical copy of inner_parametrized_loop_1.
119+
int i;
120+
clang_analyzer_dump(count); // expected-warning {{10 S32}}
121+
for (i = 0; i < count; i++);
122+
// FIXME-BUT-NEEDED: This loop stops the analysis when count >=4.
123+
clang_analyzer_dump(count); // no-warning
124+
return 42;
125+
}
126+
127+
int outer_parametrized_loop_2(void) {
128+
int y = inner_parametrized_loop_2(10);
129+
int x = inner_parametrized_loop_2(2);
130+
131+
// FIXME-BUT-NEEDED: The analysis doesn't reach this zero division.
132+
return 53 / (x - y); // no-warning
133+
}
134+
135+
//-----------------------------------------------------------------------------
136+
// Inlined function may or may not reach `analyzer-max-loop` depending on an
137+
// ambiguous check before the loop. This is very similar to the "fixed loop"
138+
// cases: the function is placed on the "don't inline" list when any execution
139+
// path reaches `analyzer-max-loop` (even if other execution paths reach the
140+
// end of the function).
141+
// NOTE: This is tested with two separate entry points to ensure that one
142+
// inlined call is fully evaluated before we try to inline the other call.
143+
// NOTE: the analyzer happens to analyze the entry points in a reversed order,
144+
// so `outer_2_conditional_loop` is analyzed first and it will be the one which
145+
// is able to inline the inner function.
146+
147+
int inner_conditional_loop(int callIdx) {
148+
int i;
149+
clang_analyzer_dump(callIdx); // expected-warning {{2 S32}}
150+
if (getNum() == 777) {
151+
for (i = 0; i < 10; i++);
152+
}
153+
clang_analyzer_dump(callIdx); // expected-warning {{2 S32}}
154+
return 42;
155+
}
156+
157+
int outer_1_conditional_loop(void) {
158+
return inner_conditional_loop(1);
159+
}
160+
161+
int outer_2_conditional_loop(void) {
162+
return inner_conditional_loop(2);
163+
}
164+
165+
//-----------------------------------------------------------------------------
166+
// Inlined function executes an ambiguous loop that may or may not reach
167+
// `analyzer-max-loop`. Historically, before the "don't assume third iteration"
168+
// commit (bb27d5e5c6b194a1440b8ac4e5ace68d0ee2a849) this worked like the
169+
// `conditional_loop` cases: the analyzer was able to find a path reaching
170+
// `analyzer-max-loop` so inlining was disabled. After that commit the analyzer
171+
// does not _assume_ a third (or later) iteration (i.e. does not enter those
172+
// iterations if the loop condition is an unknown value), so e.g. this test
173+
// function does not reach `analyzer-max-loop` iterations and the inlining is
174+
// not disabled.
175+
// Unfortunately this change significantly increased the workload and
176+
// runtime of the analyzer (more entry points used up their budget), so the
177+
// option `inline-functions-with-ambiguous-loops` was introduced and disabled
178+
// by default to suppress the inlining in situations where the "don't assume
179+
// third iteration" logic activates.
180+
// NOTE: This is tested with two separate entry points to ensure that one
181+
// inlined call is fully evaluated before we try to inline the other call.
182+
// NOTE: the analyzer happens to analyze the entry points in a reversed order,
183+
// so `outer_2_ambiguous_loop` is analyzed first and it will be the one which
184+
// is able to inline the inner function.
185+
186+
int inner_ambiguous_loop(int callIdx) {
187+
int i;
188+
clang_analyzer_dump(callIdx); // default-warning {{2 S32}}
189+
// enabled-warning@-1 {{1 S32}}
190+
// enabled-warning@-2 {{2 S32}}
191+
for (i = 0; i < getNum(); i++);
192+
return i;
193+
}
194+
195+
int outer_1_ambiguous_loop(void) {
196+
return inner_ambiguous_loop(1);
197+
}
198+
int outer_2_ambiguous_loop(void) {
199+
return inner_ambiguous_loop(2);
200+
}

clang/test/Analysis/loop-unrolling.cpp

+23-7
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
2-
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify -std=c++14 -DDFS=1 %s
1+
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify=expected,default -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
2+
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify=expected,dfs -std=c++14 %s
33

44
void clang_analyzer_numTimesReached();
55
void clang_analyzer_warnIfReached();
@@ -337,6 +337,7 @@ int nested_both_unrolled() {
337337
}
338338

339339
int simple_known_bound_loop() {
340+
// Iteration count visible: can be unrolled and fully executed.
340341
for (int i = 2; i < 12; i++) {
341342
// This function is inlined in nested_inlined_unroll1()
342343
clang_analyzer_numTimesReached(); // expected-warning {{90}}
@@ -345,27 +346,42 @@ int simple_known_bound_loop() {
345346
}
346347

347348
int simple_unknown_bound_loop() {
349+
// Iteration count unknown: unrolling won't happen and the execution will be
350+
// split two times:
351+
// (1) split between skipped loop (immediate exit) and entering the loop
352+
// (2) split between exit after 1 iteration and entering the second iteration
353+
// After these there is no third state split because the "don't assume third
354+
// iteration" logic in `ExprEngine::processBranch` prevents it; but the
355+
// `legacy-inlining-prevention` logic will put this function onto the list of
356+
// functions that may not be inlined in the future.
357+
// The exploration strategy apparently influences the number of times this
358+
// function can be inlined before it's placed on the "don't inline" list.
348359
for (int i = 2; i < getNum(); i++) {
349-
clang_analyzer_numTimesReached(); // expected-warning {{8}}
360+
clang_analyzer_numTimesReached(); // default-warning {{4}} dfs-warning {{8}}
350361
}
351362
return 0;
352363
}
353364

354365
int nested_inlined_unroll1() {
366+
// Here the analyzer can unroll and fully execute both the outer loop and the
367+
// inner loop within simple_known_bound_loop().
355368
int k;
356369
for (int i = 0; i < 9; i++) {
357370
clang_analyzer_numTimesReached(); // expected-warning {{9}}
358-
k = simple_known_bound_loop(); // no reevaluation without inlining
371+
k = simple_known_bound_loop();
359372
}
360373
int a = 22 / k; // expected-warning {{Division by zero}}
361374
return 0;
362375
}
363376

364377
int nested_inlined_no_unroll1() {
378+
// Here no unrolling happens and we only run `analyzer-max-loop` (= 4)
379+
// iterations of the loop within this function, but some state splits happen
380+
// in `simple_unknown_bound_loop()` calls.
365381
int k;
366-
for (int i = 0; i < 9; i++) {
367-
clang_analyzer_numTimesReached(); // expected-warning {{10}}
368-
k = simple_unknown_bound_loop(); // reevaluation without inlining, splits the state as well
382+
for (int i = 0; i < 40; i++) {
383+
clang_analyzer_numTimesReached(); // default-warning {{9}} dfs-warning {{12}}
384+
k = simple_unknown_bound_loop();
369385
}
370386
int a = 22 / k; // no-warning
371387
return 0;

0 commit comments

Comments
 (0)