Conversation
Stop unbounded opcode materialization in _extract_globals_advanced by enforcing max_opcodes and timeout during genops iteration. Add regression tests to verify extraction exits when opcode or timeout budgets are exceeded.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughAdded time- and opcode-bounded behavior to the pickle scanner's advanced global extraction: Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelaudit/scanners/pickle_scanner.py (1)
3837-3862:⚠️ Potential issue | 🟠 MajorTimeout budget is not enforced during post-processing
At Line 3837,
_build_symbolic_reference_maps(ops)and the subsequent fullforloop still run even after timeout has already been exceeded, which can add significant O(n) work and blunt the timeout safeguard.Proposed fix
- stack_global_refs, _callable_refs = _build_symbolic_reference_maps(ops) + if ( + effective_scan_start_time is not None + and (time.time() - effective_scan_start_time) > self.timeout + ): + logger.warning( + f"Advanced global extraction post-processing stopped after exceeding timeout ({self.timeout}s)" + ) + extraction_truncated = True + stack_global_refs = {} + else: + stack_global_refs, _callable_refs = _build_symbolic_reference_maps(ops) last_byte = data.read(1) if last_byte: data.seek(-1, 1) for n, (opcode, arg, _pos) in enumerate(ops): + if ( + effective_scan_start_time is not None + and (time.time() - effective_scan_start_time) > self.timeout + ): + extraction_truncated = True + break op_name = opcode.nameAs per coding guidelines: "Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/pickle_scanner.py` around lines 3837 - 3862, The post-processing currently always calls _build_symbolic_reference_maps(ops) and iterates the entire for n,(opcode,arg,_pos) in enumerate(ops) loop even if the scanner timeout has already been exceeded; modify the code to first check the scanner timeout right after reading ops (and before calling _build_symbolic_reference_maps) and again inside the for-loop (e.g., at loop start) to break/return early if the timeout is exceeded, ensuring you avoid calling expensive work when timed out; update references around _build_symbolic_reference_maps, ops, the for-loop that computes globals_found, and the STACK_GLOBAL handling (including logger.debug and adding ("unknown","unknown")) so behavior is preserved when not timed out and aborted cleanly when it is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 1213-1228: The test currently yields only NOP ops so it can't
prove max_opcodes truncation; modify the mocked genops used by
test_extract_globals_advanced_respects_max_opcodes to yield GLOBAL-like opcodes
(or ops whose handling in PickleScanner._extract_globals_advanced would add
names to the globals set) and include a side-effect consumption counter in the
mock (or a flag set when the generator is exhausted) so you can assert that only
up to 3 ops were processed: verify globals_found contains only the limited
subset expected from the first 3 GLOBAL-like yields and/or assert the generator
consumption counter is < total yielded to prove parsing stopped after
max_opcodes; keep the monkeypatch target
"modelaudit.scanners.pickle_scanner.pickletools.genops" and the test call to
PickleScanner._extract_globals_advanced.
- Around line 1230-1247: The test
test_extract_globals_advanced_respects_scan_timeout uses NOP-only _fake_genops
which never yields globals so the timeout behavior isn't exercised; modify
_fake_genops used in PickleScanner (scanner variable and its method
_extract_globals_advanced) to emit at least one opcode that would produce a
global (or a countered side-effect) while still iterating many times so the
timeout check is actually reached, and assert behavior based on that emission
(e.g., that the scanner stops/returns an empty set when time advances); also
move the BytesIO import out of the function to the module-level imports for
consistency with other tests.
---
Outside diff comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 3837-3862: The post-processing currently always calls
_build_symbolic_reference_maps(ops) and iterates the entire for
n,(opcode,arg,_pos) in enumerate(ops) loop even if the scanner timeout has
already been exceeded; modify the code to first check the scanner timeout right
after reading ops (and before calling _build_symbolic_reference_maps) and again
inside the for-loop (e.g., at loop start) to break/return early if the timeout
is exceeded, ensuring you avoid calling expensive work when timed out; update
references around _build_symbolic_reference_maps, ops, the for-loop that
computes globals_found, and the STACK_GLOBAL handling (including logger.debug
and adding ("unknown","unknown")) so behavior is preserved when not timed out
and aborted cleanly when it is.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 04413eca-941a-4027-97ae-e96838cb8964
📒 Files selected for processing (2)
modelaudit/scanners/pickle_scanner.pytests/scanners/test_pickle_scanner.py
Motivation
_extract_globals_advanced()materialized every opcode vialist(pickletools.genops(data))with no opcode or timeout bounds, allowing crafted pickles to exhaust CPU/RAM before guarded scanning.Description
PickleScanner._extract_globals_advanced()by incrementally iteratingpickletools.genops()and checkingself.max_opcodesand the scan timeout during iteration, and stop extraction gracefully when budgets are exceeded.tests/scanners/test_pickle_scanner.pythat assert advanced extraction exits when the opcode budget is exceeded and when the scan timeout budget is already exhausted.modelaudit/scanners/pickle_scanner.py,tests/scanners/test_pickle_scanner.py.Testing
uv run ruff format modelaudit/ tests/anduv run ruff check --fix modelaudit/ tests/(passed).uv run mypy modelaudit/(passed).uv run pytest -n auto -m "not slow and not integration" --maxfail=1(one pre-existing unrelated test failed:tests/utils/helpers/test_secure_hasher.py::TestErrorHandling::test_hash_permission_denied, all other tests ran; this failure is not caused by this change).uv run pytest tests/scanners/test_pickle_scanner.py(47 passed) anduv run pytest tests/scanners/test_pickle_scanner.py -k "extract_globals_advanced_respects"(2 passed), confirming the regression tests and the bounded extraction behavior.Codex Task
Summary by CodeRabbit
New Features
Tests