perf(intel): index-backed searchBlock in IndirectCallAnalyzer by r0ny123 · Pull Request #47 · r0ny123/smda

r0ny123 · 2026-05-26T15:21:07Z

Summary

Picks up the highest-impact deferred item from PR #46: collapse the O(B·I) linear scan in IndirectCallAnalyzer.searchBlock to an O(1) dict lookup. Single optimization class.

resolveRegisterCalls walks the CFG backward through up to block_depth=3 levels of incoming refs to resolve call <register> targets. At every level, searchBlock was doing:

for block in analysis_state.getBlocks():
    if address in [i[0] for i in block]:
        return block

One call is O(B·I). The recursive descent in processBlock makes that call once per incoming ref at every depth. Functions with many register calls (the file already mentions "found one Go sample with 130k register calls") hit this hard.

Change

Lazy-cache an {instruction_addr: containing_block} index on analysis_state the first time searchBlock runs. Subsequent lookups during the same function analysis are O(1).
Cache lives on the state object (not on the analyzer), so the index has the correct lifetime (one per function analysis), the analyzer stays re-entrancy-safe, and direct callers (e.g. unit tests) get the O(1) path automatically — no separate fallback branch needed.
Preserve "first matching block wins" by using if addr not in index during construction — important because FunctionAnalysisState.getBlocks() can place the same instruction in multiple overlapping blocks via the sorted potential_starts walk.
contextlib.suppress(AttributeError) guards the cache write so test doubles or __slots__-locked states still work; the freshly built dict is returned in that case.

Measurements

Microbench (synthetic 80 blocks × 15 instructions = 1200 lookups):

Path	Best of 5
legacy linear scan	28.60 ms
indexed O(1) lookup	0.27 ms
speedup	~107×

Parity check on the same fixture: 1200 lookups, 0 mismatches — both paths return the same block object reference.

End-to-end on asprox is unchanged (asprox is malware with mostly direct calls, so it doesn't stress resolveRegisterCalls). The win scales linearly with the number of indirect calls in the binary, which is high in Go and other compiler-heavy targets.

Behavior compatibility

Public API of IndirectCallAnalyzer unchanged (searchBlock, processBlock, resolveRegisterCalls, getDword all keep the same signatures).
Same block-list reference identity returned for any given address.
Report serialization untouched.
asprox sha256, num_instructions, function count, and integration assertions unchanged.

Test plan

python -m pytest tests/test* — 111 passed, 79 subtests passed in 12.63 s
python -m ruff check . — All checks passed
python -m ruff format --check . — 95 files already formatted
Microbench parity check — 1200/1200 identical block references
End-to-end asprox disassembly invariants verified

Residual risk

The cache is on analysis_state, so its lifetime is tied to the state object. If blocks were ever mutated after the first searchBlock call (currently they aren't — resolveRegisterCalls only runs after finalizeAnalysis), the index would go stale. The fix would be cache invalidation on the mutation site, not here.
contextlib.suppress(AttributeError) is a deliberate fallback for objects that reject attribute assignment — for those the index is rebuilt on every call, which is no worse than the legacy O(B·I) scan.

Still deferred (out of scope for this branch)

SmdaFunction.getNormalizedBlockRefs caching (needs architecture_metadata mutation tracking).
BinaryInfo.getImportedFunctions discards a PeSymbolProvider.parseSymbols result before re-parsing for imports.
Static *FileLoader.getArchitecture / getCodeAreas each lief.parse(binary) — BinaryInfo caches but the static accessors don't share it.
Dead _logCandidateStats + latent == 2 vs == 0 bug in FunctionCandidateManager.
mcrit-install 3×5 CI matrix likely over-spec for a smoke test.

Review follow-ups

Gemini (PR perf(intel): index-backed searchBlock in IndirectCallAnalyzer #47, 2026-05-26): suggested moving the index cache off the analyzer and onto analysis_state to avoid re-entrancy/thread-safety risks and simplify resolveRegisterCalls. Applied in 1e0fc9c — speedup reran at ~107× (up from ~92×) with the same 0/1200 parity, fallback branch and try/finally removed.

https://claude.ai/code/session_01C8CcS2k1g59ByLKYdEcaxR

resolveRegisterCalls() resolves each "call <register>" by walking the CFG backward through up to block_depth (=3) levels of incoming refs. At every level, searchBlock was doing a linear scan over every block in the function and, for each block, a list comprehension over every instruction: for block in analysis_state.getBlocks(): if address in [i[0] for i in block]: return block So one call to searchBlock is O(B*I) — and the recursive descent into processBlock calls it once per incoming ref at every depth. Functions with many register calls (the file already mentions a Go sample with 130k of them) hit this hot. This commit: * Seeds an {instruction_addr: containing_block} dict once at the start of resolveRegisterCalls(), so every searchBlock lookup is O(1). * Preserves "first matching block wins" by using `if addr not in index` during construction — important because FunctionAnalysisState.getBlocks can place the same instruction in multiple overlapping blocks via the sorted potential_starts walk. * Clears the index in a finally so a reused analyzer instance never serves a stale index after the function completes. * Keeps a slim linear-scan fallback in searchBlock for direct callers (e.g. existing unit tests that drive processBlock without going through resolveRegisterCalls). Microbench (80 blocks × 15 instructions, 1200 lookups): legacy linear scan: 17.04 ms indexed O(1) lookup: 0.18 ms -> 92x faster, bit-identical block-object references returned. End-to-end on asprox is unchanged (it has few register calls); the win scales with the number of indirect calls in the binary. Validation: - pytest tests/test* -> 111 passed, 79 subtests passed - ruff check + format --check clean - asprox sha256 / num_instructions / function count unchanged

coderabbitai · 2026-05-26T15:21:15Z

Important

Review skipped

Auto reviews are disabled on this repository. To trigger a review, include @coderabbit in the PR description. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8579cfdf-3d90-4e14-ada0-341ba80135e5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request optimizes block lookups in IndirectCallAnalyzer from $O(N^2)$ to $O(1)$ by introducing an instruction-to-block index during register call resolution. The reviewer recommends caching this index lazily on the analysis_state object instead of storing it on the analyzer instance (self). This change would ensure thread safety and re-entrancy, while also simplifying resolveRegisterCalls by eliminating the need for manual index management and try...finally blocks.

Address Gemini review on PR #47: stash the {instruction_addr: block} index on analysis_state instead of self. analysis_state has the right lifetime (one per function analysis) so the cache is naturally re-entrancy-safe and can't outlive what it indexes, the analyzer keeps no transient state, and the explicit seed + try/finally in resolveRegisterCalls goes away. searchBlock now lazy-builds on first call, so the legacy fallback branch is also gone — every caller (including direct unit-test callers) gets the O(1) path automatically. contextlib.suppress(AttributeError) guards the cache write so that test doubles or hypothetical __slots__-locked states still work; the freshly built dict is returned in that case. Re-ran the focused micro-bench (80 blocks x 15 instructions, 1200 lookups): ~107x faster than the legacy scan, 0/1200 parity mismatches. End-to-end asprox sha256/num_instructions/function count unchanged. Validation: - pytest tests/test* -> 111 passed, 79 subtests passed - ruff check + format --check clean

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

Comment thread src/smda/intel/IndirectCallAnalyzer.py Outdated

Comment thread src/smda/intel/IndirectCallAnalyzer.py Outdated

r0ny123 marked this pull request as ready for review May 26, 2026 18:39

r0ny123 merged commit b2a1d20 into master May 26, 2026
46 checks passed

r0ny123 deleted the claude/perf-sweep-20260526-indirect-call-index branch May 28, 2026 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(intel): index-backed searchBlock in IndirectCallAnalyzer#47

perf(intel): index-backed searchBlock in IndirectCallAnalyzer#47
r0ny123 merged 2 commits into
masterfrom
claude/perf-sweep-20260526-indirect-call-index

r0ny123 commented May 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

r0ny123 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change

Measurements

Behavior compatibility

Test plan

Residual risk

Still deferred (out of scope for this branch)

Review follow-ups

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

r0ny123 commented May 26, 2026 •

edited

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading