fix: fail closed on malformed STACK_GLOBAL operands#704
fix: fail closed on malformed STACK_GLOBAL operands#704
Conversation
WalkthroughAdds detection and reporting for malformed STACK_GLOBAL operands: a new classifier produces bounded previews for non-string/missing-memo/insufficient-context operands, Changes
Sequence DiagramsequenceDiagram
participant Parser as Pickle Parser
participant SG as STACK_GLOBAL Processor
participant Classifier as Operand Classifier
participant Tracker as Malformed Tracker
participant Analysis as Threat Analysis
Parser->>SG: Encounter STACK_GLOBAL opcode (module, name)
SG->>Classifier: Provide operands and memo context
Classifier->>Classifier: Determine kind (string, missing_memo, non-string, insufficient_context) and bounded preview
Classifier->>Tracker: Record malformed_stack_globals entry (index → {kind, preview, reason})
SG->>Tracker: Add valid stack_global_refs / callable_refs
Tracker->>Analysis: Supply valid refs + malformed_stack_globals
Analysis->>Analysis: Adjust findings and severities based on classifications
Analysis->>Parser: Emit security findings
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
CHANGELOG.md (1)
82-148:⚠️ Potential issue | 🟡 MinorConsolidate duplicate
### Fixedsections.The
[Unreleased]section contains two### Fixedsubsections (lines 82 and 123). Per Keep a Changelog format, each section type should appear only once per release version.Please consolidate these into a single
### Fixedsection. As per coding guidelines, CHANGELOG entries should follow Keep a Changelog format.📝 Suggested consolidation approach
Merge all entries from both
### Fixedsections into one section. The consolidated section should appear after### Addedand before### Securityto follow conventional ordering (Added, Changed, Deprecated, Removed, Fixed, Security).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` around lines 82 - 148, The changelog has two duplicate "### Fixed" subsections under the [Unreleased] release; consolidate them by merging all bullet entries from both "### Fixed" blocks into a single "### Fixed" section, remove the duplicate header, and place that consolidated "### Fixed" section after the "### Added" section and before the "### Security" section so the release follows the Keep a Changelog ordering; ensure no bullets are lost or duplicated and preserve their original wording and bullet order.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 1948-1955: _classify_stack_global_operand currently builds
"non_string" previews using the full repr(value), which can blow up memory for
large operands; change it to produce a bounded preview: compute a short preview
string (e.g., limit repr(value) to a safe max length and append "..." when
truncated) and for binary types (bytes/bytearray/memoryview) produce a concise
size/hex-prefix like "bytes(len=NN, hex=0xDEAD...)" instead of full content;
apply the same truncation strategy to the other analogous classifier in the file
to avoid large allocations across the scanner.
---
Outside diff comments:
In `@CHANGELOG.md`:
- Around line 82-148: The changelog has two duplicate "### Fixed" subsections
under the [Unreleased] release; consolidate them by merging all bullet entries
from both "### Fixed" blocks into a single "### Fixed" section, remove the
duplicate header, and place that consolidated "### Fixed" section after the "###
Added" section and before the "### Security" section so the release follows the
Keep a Changelog ordering; ensure no bullets are lost or duplicated and preserve
their original wording and bullet order.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f93a213a-55a5-4ee0-8962-12bfce964b24
📒 Files selected for processing (3)
CHANGELOG.mdmodelaudit/scanners/pickle_scanner.pytests/scanners/test_pickle_scanner.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
modelaudit/scanners/pickle_scanner.py (1)
1949-1956:⚠️ Potential issue | 🟠 MajorBound operand previews in malformed STACK_GLOBAL classification
Line 1956 still builds
non_stringpayloads from fullrepr(value). Large operands can bloat memory and finding payload size.💡 Proposed fix
def _classify_stack_global_operand(value: Any) -> tuple[str, str]: if isinstance(value, str): return "string", value if value is missing_memo: return "missing_memo", "unknown" if value is unknown: return "unknown", "unknown" - return "non_string", f"{type(value).__name__}:{value!r}" + if isinstance(value, (bytes, bytearray, memoryview)): + return "non_string", f"{type(value).__name__}:<{len(value)} bytes>" + preview = repr(value) + if len(preview) > 256: + preview = preview[:256] + "...<truncated>" + return "non_string", f"{type(value).__name__}:{preview}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/pickle_scanner.py` around lines 1949 - 1956, _classify_stack_global_operand currently returns a full repr for non-string values which can create huge payloads; change the non_string branch to truncate the preview (introduce a constant like STACK_GLOBAL_PREVIEW_MAX) and return a bounded preview plus the original value size/length instead of the full repr (e.g. compute preview = repr(value)[:STACK_GLOBAL_PREVIEW_MAX], compute size = len(repr(value)) or an appropriate length metric, then return "non_string", f"{type(value).__name__}:{preview} (len={size})"). Ensure the new logic is used in _classify_stack_global_operand so large operands no longer blow up payloads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 1949-1956: _classify_stack_global_operand currently returns a full
repr for non-string values which can create huge payloads; change the non_string
branch to truncate the preview (introduce a constant like
STACK_GLOBAL_PREVIEW_MAX) and return a bounded preview plus the original value
size/length instead of the full repr (e.g. compute preview =
repr(value)[:STACK_GLOBAL_PREVIEW_MAX], compute size = len(repr(value)) or an
appropriate length metric, then return "non_string",
f"{type(value).__name__}:{preview} (len={size})"). Ensure the new logic is used
in _classify_stack_global_operand so large operands no longer blow up payloads.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a42eb735-7984-4f8f-bca9-ff143a67bd89
📒 Files selected for processing (3)
CHANGELOG.mdmodelaudit/scanners/pickle_scanner.pytests/scanners/test_pickle_scanner.py
90e4cc8 to
a92e66f
Compare
|
Rebased onto current What changed in this update:
Why this mattered:
Validation passed:
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelaudit/scanners/pickle_scanner.py (1)
1997-2001: 🧹 Nitpick | 🔵 TrivialConsolidate duplicated
STOPhandling blocks in symbolic simulation.After Line 1997’s
STOPbranchcontinue, the laterSTOPbranches are unreachable. Keeping three reset paths in one loop increases drift risk if only one branch is edited later.♻️ Suggested cleanup
@@ - if name == "STOP": - # Reset stack and memo at pickle stream boundaries so that - # references from a previous stream cannot leak into the next - # one (multi-stream / appended-pickle scenarios). - stack.clear() - memo.clear() - next_memo_index = 0 - continue @@ - if name == "STOP": - # Reset memo and stack at pickle stream boundaries so that - # references from one stream don't leak into the next in - # multi-pickle files (e.g. PyTorch .pt containers). - memo.clear() - stack.clear() - next_memo_index = 0Also applies to: 2147-2154, 2187-2194
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/pickle_scanner.py` around lines 1997 - 2001, Multiple identical 'if name == "STOP":' branches reset stack, memo, and next_memo_index and then continue, causing duplicate/unreachable code; consolidate these into a single canonical STOP handler inside the main symbolic-simulation loop. Locate all occurrences of the STOP check (the conditional testing name == "STOP" that clears stack, memo and resets next_memo_index) and replace duplicate blocks with one shared handler (keep the clear() calls on stack and memo and set next_memo_index = 0, then continue), removing the later redundant branches so edits won't drift out of sync; ensure any local control flow dependent on the original positions still behaves correctly after centralizing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 1997-2001: Multiple identical 'if name == "STOP":' branches reset
stack, memo, and next_memo_index and then continue, causing
duplicate/unreachable code; consolidate these into a single canonical STOP
handler inside the main symbolic-simulation loop. Locate all occurrences of the
STOP check (the conditional testing name == "STOP" that clears stack, memo and
resets next_memo_index) and replace duplicate blocks with one shared handler
(keep the clear() calls on stack and memo and set next_memo_index = 0, then
continue), removing the later redundant branches so edits won't drift out of
sync; ensure any local control flow dependent on the original positions still
behaves correctly after centralizing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: dd505563-fa0b-4873-8713-e47b6eb9d739
📒 Files selected for processing (3)
CHANGELOG.mdmodelaudit/scanners/pickle_scanner.pytests/scanners/test_pickle_scanner.py
Summary
STACK_GLOBALoperands when the symbolic stack resolves non-string values or missing memo referencesValidation
Summary by CodeRabbit
New Features
Bug Fixes
Tests