⚡ Optimize printable string detection by r0ny123 · Pull Request #30 · r0ny123/smda

r0ny123 · 2026-05-06T20:51:04Z

💡 What

Precomputes printable ASCII character membership once at module import time and uses an integer-indexed lookup table inside detect_ascii_len() and detect_unicode_len().

🎯 Why

The previous hot-loop condition called chr(char) in string.printable for every byte checked. That repeatedly allocated one-character strings and scanned string.printable during string detection. The new lookup keeps the same printable-byte semantics while making the inner-loop check constant time with no per-iteration chr() call.

📊 Measured Improvement

Measured with a focused in-process benchmark using dummy SMDA reports, 4,096 printable ~ characters, warmup, GC disabled during samples, and best-of-seven timing on this Windows checkout. The machine was noisy, so best sample is the most stable comparison point.

Path	Baseline best	Optimized best	Change
`detect_ascii_len`	4.668365s total / 1867.35us per call	4.232579s total / 1693.03us per call	9.3% faster
`detect_unicode_len`	3.421829s total / 2737.46us per call	2.690248s total / 2152.20us per call	21.4% faster

✅ Verification

python -m ruff format --check .
python -m ruff check .
python -m pytest -q (42 passed, 7 subtests passed)

gemini-code-assist

Code Review

This pull request optimizes string extraction by replacing repeated character checks with a pre-computed lookup table. Feedback suggests expanding the table to 256 elements to eliminate redundant range checks in the detect_ascii_len and detect_unicode_len loops, further improving performance.

r0ny123 · 2026-05-07T11:22:46Z

@claude

claude · 2026-05-07T11:23:09Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

Addresses gemini-code-assist review on PR #30. With a 256-entry table, non-ASCII bytes (>=127) naturally return False, so the explicit char < 127 guard in the detect_ascii_len and detect_unicode_len hot loops becomes redundant. Removing it saves one comparison per loop iteration on the string-detection hot path. https://claude.ai/code/session_01PHLmsRuiwBQJ3n7gvR7Aa5

Optimize printable string detection

2c8be1e

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

Comment thread smda/utility/StringExtractor.py Outdated

Comment thread smda/utility/StringExtractor.py Outdated

Comment thread smda/utility/StringExtractor.py Outdated

r0ny123 marked this pull request as ready for review May 7, 2026 11:47

r0ny123 merged commit f33265e into master May 7, 2026
7 checks passed

r0ny123 deleted the codex/optimize-string-printable-check branch May 7, 2026 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Optimize printable string detection#30

⚡ Optimize printable string detection#30
r0ny123 merged 2 commits intomasterfrom
codex/optimize-string-printable-check

r0ny123 commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

r0ny123 commented May 7, 2026

Uh oh!

claude Bot commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

r0ny123 commented May 6, 2026

💡 What

🎯 Why

📊 Measured Improvement

✅ Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

r0ny123 commented May 7, 2026

Uh oh!

claude Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented May 7, 2026 •

edited

Loading