perf(intel): index-backed searchBlock in IndirectCallAnalyzer#47
Conversation
resolveRegisterCalls() resolves each "call <register>" by walking the
CFG backward through up to block_depth (=3) levels of incoming refs.
At every level, searchBlock was doing a linear scan over every block in
the function and, for each block, a list comprehension over every
instruction:
for block in analysis_state.getBlocks():
if address in [i[0] for i in block]:
return block
So one call to searchBlock is O(B*I) — and the recursive descent into
processBlock calls it once per incoming ref at every depth. Functions
with many register calls (the file already mentions a Go sample with
130k of them) hit this hot.
This commit:
* Seeds an {instruction_addr: containing_block} dict once at the start
of resolveRegisterCalls(), so every searchBlock lookup is O(1).
* Preserves "first matching block wins" by using `if addr not in index`
during construction — important because FunctionAnalysisState.getBlocks
can place the same instruction in multiple overlapping blocks via
the sorted potential_starts walk.
* Clears the index in a finally so a reused analyzer instance never
serves a stale index after the function completes.
* Keeps a slim linear-scan fallback in searchBlock for direct callers
(e.g. existing unit tests that drive processBlock without going
through resolveRegisterCalls).
Microbench (80 blocks × 15 instructions, 1200 lookups):
legacy linear scan: 17.04 ms
indexed O(1) lookup: 0.18 ms
-> 92x faster, bit-identical block-object references returned.
End-to-end on asprox is unchanged (it has few register calls); the win
scales with the number of indirect calls in the binary.
Validation:
- pytest tests/test* -> 111 passed, 79 subtests passed
- ruff check + format --check clean
- asprox sha256 / num_instructions / function count unchanged
|
Important Review skippedAuto reviews are disabled on this repository. To trigger a review, include ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request optimizes block lookups in IndirectCallAnalyzer from analysis_state object instead of storing it on the analyzer instance (self). This change would ensure thread safety and re-entrancy, while also simplifying resolveRegisterCalls by eliminating the need for manual index management and try...finally blocks.
Address Gemini review on PR #47: stash the {instruction_addr: block} index on analysis_state instead of self. analysis_state has the right lifetime (one per function analysis) so the cache is naturally re-entrancy-safe and can't outlive what it indexes, the analyzer keeps no transient state, and the explicit seed + try/finally in resolveRegisterCalls goes away. searchBlock now lazy-builds on first call, so the legacy fallback branch is also gone — every caller (including direct unit-test callers) gets the O(1) path automatically. contextlib.suppress(AttributeError) guards the cache write so that test doubles or hypothetical __slots__-locked states still work; the freshly built dict is returned in that case. Re-ran the focused micro-bench (80 blocks x 15 instructions, 1200 lookups): ~107x faster than the legacy scan, 0/1200 parity mismatches. End-to-end asprox sha256/num_instructions/function count unchanged. Validation: - pytest tests/test* -> 111 passed, 79 subtests passed - ruff check + format --check clean
Address Gemini review on PR #47: stash the {instruction_addr: block} index on analysis_state instead of self. analysis_state has the right lifetime (one per function analysis) so the cache is naturally re-entrancy-safe and can't outlive what it indexes, the analyzer keeps no transient state, and the explicit seed + try/finally in resolveRegisterCalls goes away. searchBlock now lazy-builds on first call, so the legacy fallback branch is also gone — every caller (including direct unit-test callers) gets the O(1) path automatically. contextlib.suppress(AttributeError) guards the cache write so that test doubles or hypothetical __slots__-locked states still work; the freshly built dict is returned in that case. Re-ran the focused micro-bench (80 blocks x 15 instructions, 1200 lookups): ~107x faster than the legacy scan, 0/1200 parity mismatches. End-to-end asprox sha256/num_instructions/function count unchanged. Validation: - pytest tests/test* -> 111 passed, 79 subtests passed - ruff check + format --check clean
Summary
Picks up the highest-impact deferred item from PR #46: collapse the O(B·I) linear scan in
IndirectCallAnalyzer.searchBlockto an O(1) dict lookup. Single optimization class.resolveRegisterCallswalks the CFG backward through up toblock_depth=3levels of incoming refs to resolvecall <register>targets. At every level,searchBlockwas doing:One call is O(B·I). The recursive descent in
processBlockmakes that call once per incoming ref at every depth. Functions with many register calls (the file already mentions "found one Go sample with 130k register calls") hit this hard.Change
{instruction_addr: containing_block}index onanalysis_statethe first timesearchBlockruns. Subsequent lookups during the same function analysis are O(1).if addr not in indexduring construction — important becauseFunctionAnalysisState.getBlocks()can place the same instruction in multiple overlapping blocks via the sortedpotential_startswalk.contextlib.suppress(AttributeError)guards the cache write so test doubles or__slots__-locked states still work; the freshly built dict is returned in that case.Measurements
Microbench (synthetic 80 blocks × 15 instructions = 1200 lookups):
Parity check on the same fixture: 1200 lookups, 0 mismatches — both paths return the same block object reference.
End-to-end on asprox is unchanged (asprox is malware with mostly direct calls, so it doesn't stress
resolveRegisterCalls). The win scales linearly with the number of indirect calls in the binary, which is high in Go and other compiler-heavy targets.Behavior compatibility
IndirectCallAnalyzerunchanged (searchBlock,processBlock,resolveRegisterCalls,getDwordall keep the same signatures).sha256,num_instructions, function count, and integration assertions unchanged.Test plan
python -m pytest tests/test*— 111 passed, 79 subtests passed in 12.63 spython -m ruff check .— All checks passedpython -m ruff format --check .— 95 files already formattedResidual risk
analysis_state, so its lifetime is tied to the state object. If blocks were ever mutated after the firstsearchBlockcall (currently they aren't —resolveRegisterCallsonly runs afterfinalizeAnalysis), the index would go stale. The fix would be cache invalidation on the mutation site, not here.contextlib.suppress(AttributeError)is a deliberate fallback for objects that reject attribute assignment — for those the index is rebuilt on every call, which is no worse than the legacy O(B·I) scan.Still deferred (out of scope for this branch)
SmdaFunction.getNormalizedBlockRefscaching (needsarchitecture_metadatamutation tracking).BinaryInfo.getImportedFunctionsdiscards aPeSymbolProvider.parseSymbolsresult before re-parsing for imports.*FileLoader.getArchitecture/getCodeAreaseachlief.parse(binary)—BinaryInfocaches but the static accessors don't share it._logCandidateStats+ latent== 2vs== 0bug inFunctionCandidateManager.mcrit-install3×5 CI matrix likely over-spec for a smoke test.Review follow-ups
analysis_stateto avoid re-entrancy/thread-safety risks and simplifyresolveRegisterCalls. Applied in1e0fc9c— speedup reran at ~107× (up from ~92×) with the same 0/1200 parity, fallback branch andtry/finallyremoved.https://claude.ai/code/session_01C8CcS2k1g59ByLKYdEcaxR