Skip to content

⚡ [performance] Compile regular expressions in IndirectCallAnalyzer#20

Open
r0ny123 wants to merge 5 commits intomasterfrom
perf-optimize-indirect-call-regex-3375033366522080559
Open

⚡ [performance] Compile regular expressions in IndirectCallAnalyzer#20
r0ny123 wants to merge 5 commits intomasterfrom
perf-optimize-indirect-call-regex-3375033366522080559

Conversation

@r0ny123
Copy link
Copy Markdown
Owner

@r0ny123 r0ny123 commented Feb 13, 2026

This PR optimizes the IndirectCallAnalyzer by pre-compiling regular expressions used in the processBlock method.

💡 What: Moved 5 regular expression patterns from inline re.match calls to class-level compiled re.Pattern objects.
🎯 Why: processBlock is a recursive method that iterates over instructions in basic blocks. Compiling regexes on every iteration is inefficient.
📊 Measured Improvement: Micro-benchmarks showed a 63.6% improvement in the time taken for regex matching (from 2.23s down to 0.81s for 100k iterations of typical instruction strings).

A new test file tests/testIndirectCallAnalyzer.py was added to verify the regex matching and the basic dataflow logic of the analyzer.


PR created automatically by Jules for task 3375033366522080559 started by @r0ny123


Open with Devin

Pre-compiling regular expressions as class attributes in IndirectCallAnalyzer
significantly improves performance by avoiding repeated compilation during
instruction parsing in the processBlock method.

Benchmark results show a ~63% improvement in regex matching time for typical
instruction patterns.

Verified with new unit tests in tests/testIndirectCallAnalyzer.py.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @r0ny123, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance of the IndirectCallAnalyzer by optimizing its regular expression usage. By compiling regex patterns once at the class level instead of repeatedly within a recursive method, the analysis process becomes much more efficient. The changes are validated with new unit tests, ensuring the integrity of the analyzer's core functionality.

Highlights

  • Performance Optimization: Optimized the IndirectCallAnalyzer by pre-compiling five regular expression patterns, moving them from inline re.match calls to class-level re.Pattern objects. This change specifically targets the processBlock method, which is recursive and frequently iterates over instructions.
  • Measured Improvement: Micro-benchmarks demonstrated a significant 63.6% improvement in regex matching time, reducing it from 2.23 seconds to 0.81 seconds for 100,000 iterations of typical instruction strings.
  • New Test Coverage: Introduced a new test file, tests/testIndirectCallAnalyzer.py, to verify the correctness of the regex matching and the basic dataflow logic within the IndirectCallAnalyzer.
Changelog
  • smda/intel/IndirectCallAnalyzer.py
    • Added five class-level re.Pattern objects for common regular expression patterns.
    • Replaced inline re.match calls with references to the newly compiled re.Pattern objects within the processBlock method.
  • tests/testIndirectCallAnalyzer.py
    • Added a new test file for IndirectCallAnalyzer.
    • Included test_regex_matching to verify the functionality of the pre-compiled regular expressions.
    • Implemented test_processBlock_logic to ensure the basic dataflow analysis logic works as expected.
Activity
  • The pull request was automatically created by Jules for task 3375033366522080559, initiated by @r0ny123.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively improves performance by pre-compiling regular expressions, which is a great optimization for the recursive processBlock method. The addition of unit tests in tests/testIndirectCallAnalyzer.py is also a valuable contribution, ensuring the correctness of the regex patterns and the analyzer's logic. I have one suggestion to improve the maintainability of the processBlock method by refactoring a small section to reduce code duplication.

…nd fix linting

- Pre-compiled regular expressions in IndirectCallAnalyzer for better performance.
- Added unit tests in tests/testIndirectCallAnalyzer.py.
- Fixed CI linting issues (unused import and formatting) in the new test file.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@r0ny123
Copy link
Copy Markdown
Owner Author

r0ny123 commented Feb 20, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great performance optimization. Pre-compiling the regular expressions in IndirectCallAnalyzer is a solid improvement, especially since processBlock is a recursive, frequently called method. The addition of testIndirectCallAnalyzer.py is also a valuable contribution to ensure correctness.

My review includes a couple of suggestions:

  1. I've identified a potential bug in the regular expressions that could cause issues with 64-bit binaries. The patterns for matching registers and addresses are too restrictive. I've suggested a fix to make them more robust.
  2. I've also suggested expanding the new test cases to cover these 64-bit scenarios, which will help prevent future regressions.

Comment thread smda/intel/IndirectCallAnalyzer.py
Comment on lines +10 to +41
def test_regex_matching(self):
analyzer = IndirectCallAnalyzer(MagicMock())

# Test mov <reg>, <reg>
match = analyzer.RE_MOV_REG_REG.match("eax, ebx")
self.assertIsNotNone(match)
self.assertEqual(match.group("reg1"), "eax")
self.assertEqual(match.group("reg2"), "ebx")

# Test mov <reg>, <const>
match = analyzer.RE_MOV_REG_CONST.match("ecx, 0x12345678")
self.assertIsNotNone(match)
self.assertEqual(match.group("reg"), "ecx")
self.assertEqual(match.group("val"), "0x12345678")

# Test mov <reg>, dword ptr [<addr>]
match = analyzer.RE_REG_DWORD_PTR_ADDR.match("edx, dword ptr [0x8048000]")
self.assertIsNotNone(match)
self.assertEqual(match.group("reg"), "edx")
self.assertEqual(match.group("addr"), "0x8048000")

# Test mov <reg>, qword ptr [rip + <addr>]
match = analyzer.RE_REG_QWORD_PTR_RIP_ADDR.match("rax, qword ptr [rip + 0x1234]")
self.assertIsNotNone(match)
self.assertEqual(match.group("reg"), "rax")
self.assertEqual(match.group("addr"), "0x1234")

# Test lea <reg>, [<addr>]
match = analyzer.RE_REG_ADDR.match("rsi, [0x400000]")
self.assertIsNotNone(match)
self.assertEqual(match.group("reg"), "rsi")
self.assertEqual(match.group("addr"), "0x400000")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's great that you've added tests for the regexes! To make them more comprehensive, I suggest adding cases for 64-bit architecture. This would include registers with names that are not 3 characters long (e.g., r8, r10) and full 64-bit hexadecimal addresses. This will help ensure the regexes are robust across different architectures.

    def test_regex_matching(self):
        analyzer = IndirectCallAnalyzer(MagicMock())

        # Test mov <reg>, <reg>
        match = analyzer.RE_MOV_REG_REG.match("eax, ebx")
        self.assertIsNotNone(match)
        self.assertEqual(match.group("reg1"), "eax")
        self.assertEqual(match.group("reg2"), "ebx")
        match = analyzer.RE_MOV_REG_REG.match("r8, r9")
        self.assertIsNotNone(match)
        self.assertEqual(match.group("reg1"), "r8")
        self.assertEqual(match.group("reg2"), "r9")

        # Test mov <reg>, <const>
        match = analyzer.RE_MOV_REG_CONST.match("ecx, 0x12345678")
        self.assertIsNotNone(match)
        self.assertEqual(match.group("reg"), "ecx")
        self.assertEqual(match.group("val"), "0x12345678")
        match = analyzer.RE_MOV_REG_CONST.match("r10, 0x1122334455667788")
        self.assertIsNotNone(match)
        self.assertEqual(match.group("reg"), "r10")
        self.assertEqual(match.group("val"), "0x1122334455667788")

        # Test mov <reg>, dword ptr [<addr>]
        match = analyzer.RE_REG_DWORD_PTR_ADDR.match("edx, dword ptr [0x8048000]")
        self.assertIsNotNone(match)
        self.assertEqual(match.group("reg"), "edx")
        self.assertEqual(match.group("addr"), "0x8048000")

        # Test mov <reg>, qword ptr [rip + <addr>]
        match = analyzer.RE_REG_QWORD_PTR_RIP_ADDR.match("rax, qword ptr [rip + 0x1234]")
        self.assertIsNotNone(match)
        self.assertEqual(match.group("reg"), "rax")
        self.assertEqual(match.group("addr"), "0x1234")

        # Test lea <reg>, [<addr>]
        match = analyzer.RE_REG_ADDR.match("rsi, [0x400000]")
        self.assertIsNotNone(match)
        self.assertEqual(match.group("reg"), "rsi")
        self.assertEqual(match.group("addr"), "0x400000")

r0ny123 and others added 2 commits February 24, 2026 13:32
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…nd fix CI

- Pre-compiled regular expressions in IndirectCallAnalyzer for better performance.
- Added unit tests in tests/testIndirectCallAnalyzer.py.
- Fixed processBlock to explicitly return False if no target is resolved.
- Fixed formatting in test file to satisfy CI ruff version.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@r0ny123
Copy link
Copy Markdown
Owner Author

r0ny123 commented Feb 27, 2026

@codex review the whole committed code.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant