fix: expand dangerous pickle primitive coverage#705
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughRegisters exact dotted dangerous imports (e.g., Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Test / Caller
participant Scanner as PickleScanner
participant Registry as ExplanationsRegistry
participant Rules as OpcodeRuleResolver
participant Reporter as FindingsReporter
Client->>Scanner: submit bytes/stream
Scanner->>Scanner: parse -> extract advanced globals (module, name, opcode_name)
Scanner->>Registry: get_import_explanation("module.name")
Registry-->>Scanner: exact dotted explanation (if present)
alt no public rule code for global
Scanner->>Rules: get_pickle_opcode_rule_code(opcode_name)
Rules-->>Scanner: rule code / fallback (e.g., "S206")
end
Scanner->>Reporter: emit finding (module.name, opcode_name, explanation, rule_code)
Reporter-->>Client: finding/result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelaudit/scanners/pickle_scanner.py (1)
4256-4270:⚠️ Potential issue | 🟠 MajorFix fallback rule classification in advanced global findings.
_extract_globals_advanced()aggregatesGLOBAL,INST, andSTACK_GLOBAL, but this check hardcodes opcode asSTACK_GLOBALand usesS205fallback. That can misclassify unresolved dangerousGLOBALfindings.Proposed minimal correction
- if not rule_code: - rule_code = "S205" # STACK_GLOBAL/GLOBAL fallback + if not rule_code: + rule_code = "S206" # generic unresolved GLOBAL-like import fallback @@ - "opcode": "STACK_GLOBAL", + "opcode": "GLOBAL_OR_STACK_GLOBAL",Also applies to: 4275-4275
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/pickle_scanner.py` around lines 4256 - 4270, In _extract_globals_advanced(), stop hardcoding "STACK_GLOBAL" and "S205": derive the opcode for each finding (e.g., "GLOBAL", "INST", or "STACK_GLOBAL") and use that opcode both in the details map passed to result.add_check and to select a proper fallback rule_code when get_import_rule_code(mod, func) returns falsy (map opcode -> fallback code instead of always using "S205"); update the call sites around get_import_rule_code(mod, func) and result.add_check so the rule_code and the "opcode" field reflect the actual opcode variable rather than the fixed string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 4465-4479: The _reduce_details dict created in the branch where
_is_pytorch_file is true only sets "cve_id" for CVE-2025-32434; update the
creation of _reduce_details (the dict built alongside variables pos, opcode,
associated_global) to include the full CVE metadata fields required by the
scanner schema: add "cvss" (numeric or string score), "cwe" (identifier),
"description" (brief text describing the vulnerability), and "remediation"
(recommended fix/mitigation), while preserving existing keys like "position",
"opcode", "associated_global", and "ml_context_confidence" (pulled from
ml_context.get("overall_confidence", 0)); ensure these fields are populated with
the appropriate static values for CVE-2025-32434.
---
Outside diff comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 4256-4270: In _extract_globals_advanced(), stop hardcoding
"STACK_GLOBAL" and "S205": derive the opcode for each finding (e.g., "GLOBAL",
"INST", or "STACK_GLOBAL") and use that opcode both in the details map passed to
result.add_check and to select a proper fallback rule_code when
get_import_rule_code(mod, func) returns falsy (map opcode -> fallback code
instead of always using "S205"); update the call sites around
get_import_rule_code(mod, func) and result.add_check so the rule_code and the
"opcode" field reflect the actual opcode variable rather than the fixed string.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a151a9cc-63e3-4c15-b534-4d307f4896de
📒 Files selected for processing (5)
CHANGELOG.mdmodelaudit/config/explanations.pymodelaudit/scanners/pickle_scanner.pytests/scanners/test_pickle_scanner.pytests/test_why_explanations.py
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelaudit/scanners/pickle_scanner.py (1)
3803-3851:⚠️ Potential issue | 🟠 MajorUse the resyncing parser in
_extract_globals_advanced().This exact-match path still relies on raw
pickletools.genops(), so once a later stream hits a separator byte or partial-stream parse error, theif globals_found: return globals_foundpath stops extraction with a truncatedadvanced_globalsset. That weakens the new exact dangerous-global coverage for appended pickles and also affects the downgrade/error gates that consultadvanced_globals. Please reuse_genops_with_fallback(..., multi_stream=multiple_pickles)here or mirror its resync behavior. Based on learnings: Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/pickle_scanner.py` around lines 3803 - 3851, _in _extract_globals_advanced the parser uses pickletools.genops(data) directly which fails to resync on stream separators and truncates extraction; replace the genops call with the resyncing helper _genops_with_fallback(data, multi_stream=multiple_pickles) (or call the existing _genops_with_fallback(...) signature used elsewhere) and iterate the returned ops the same way, preserving the subsequent stack_global_refs/_build_symbolic_reference_maps logic and the multiple_pickles behavior so appended pickles are parsed and security detections remain robust.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 1014-1054: Add a regression case that verifies a leading comment
token does not bypass detection: for one entry from PICKLESCAN_GAP_REFS, call
the existing _craft_global_only_pickle and _craft_global_reduce_pickle but
prepend a SHORT_BINUNICODE comment token (i.e. "#..." followed by POP) to the
serialized payload (or add an optional parameter to those helpers to inject that
prefix), then run _scan_bytes on that payload and assert the same failing checks
as in test_picklescan_gap_globals_are_critical_on_import_only and
test_picklescan_gap_globals_are_critical_with_reduce (look for the same check
names, statuses, severities and messages) so embedding a single comment token
does not suppress detection.
- Around line 1064-1085: Tighten the tests to assert on the exact dotted
reference instead of broad flags: in
test_safe_nearby_imports_remain_non_failing, after scanning with
_scan_bytes(_craft_global_only_pickle(...)) assert that neither "Global Module
Reference Check" nor "Advanced Global Reference Check" has status FAILED with a
message containing the exact f"{module}.{func}"; in
test_existing_reference_behavior_unchanged, replace the generic
result.has_errors assertion with an assertion that a specific check (e.g.,
"Global Module Reference Check" or "Advanced Global Reference Check") has status
FAILED and its message contains f"{module}.{func}" when scanning
_craft_global_reduce_pickle(module, func), so the tests precisely validate the
exact dotted reference failure/absence.
- Around line 631-641: PICKLESCAN_GAP_REFS is missing an explicit class-level
type annotation; add an import for ClassVar from typing and annotate the
constant as ClassVar[tuple[tuple[str, str], ...]] on the class where
PICKLESCAN_GAP_REFS is defined (preserving the existing value) so the structure
is documented and satisfies the typing guideline.
---
Outside diff comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 3803-3851: _in _extract_globals_advanced the parser uses
pickletools.genops(data) directly which fails to resync on stream separators and
truncates extraction; replace the genops call with the resyncing helper
_genops_with_fallback(data, multi_stream=multiple_pickles) (or call the existing
_genops_with_fallback(...) signature used elsewhere) and iterate the returned
ops the same way, preserving the subsequent
stack_global_refs/_build_symbolic_reference_maps logic and the multiple_pickles
behavior so appended pickles are parsed and security detections remain robust.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f6b5fc89-d9e5-4104-82a4-9e37ec06e70d
📒 Files selected for processing (2)
modelaudit/scanners/pickle_scanner.pytests/scanners/test_pickle_scanner.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 3822-3827: The GLOBAL/INST branch currently splits arg by a space
which misses "module.function" forms; replace the manual split with the shared
parser by calling _parse_module_function(arg) and use its returned (module,
function) values when adding to globals_found along with op_name (i.e., replace
the split/parts logic in the GLOBAL/INST branch to call _parse_module_function
and add (module, function or "", op_name) to globals_found), ensuring the same
parsing behavior as the rest of the scanner.
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 1137-1151: The test function
test_picklescan_gap_detected_inside_zip_entry uses
tempfile.TemporaryDirectory(); change it to use the pytest tmp_path fixture by
adding tmp_path as a parameter to the test and create zip_path under tmp_path
(e.g. tmp_path / "numpy_loader_payload.zip") instead of using
tempfile.TemporaryDirectory(), then write the ZIP entry there and call scan_file
on str(zip_path) so the test uses the deterministic tmp_path fixture; update any
local variables (zip_path) accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 88bad51b-966e-48d5-af26-d1c9305ca60b
📒 Files selected for processing (2)
modelaudit/scanners/pickle_scanner.pytests/scanners/test_pickle_scanner.py
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
CHANGELOG.md (1)
82-149: 🛠️ Refactor suggestion | 🟠 MajorConsolidate duplicate "Fixed" sections.
The
[Unreleased]changelog has two separate### Fixedsections (lines 82 and 124), which violates Keep a Changelog format. Each category (Added, Changed, Fixed, Security, etc.) should appear at most once. All Fixed entries should be consolidated into a single section.Note: This is a pre-existing issue not introduced by this PR, but should be addressed to maintain proper changelog structure. As per coding guidelines, the changelog must follow Keep a Changelog format.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` around lines 82 - 149, The changelog contains two separate "### Fixed" headings; locate both "### Fixed" headings in CHANGELOG.md and merge their bullet lists into a single "### Fixed" section (preserve existing bullet order/semantics), remove the duplicate "### Fixed" header, and ensure surrounding sections like "### Security" remain distinct and ordered per Keep a Changelog conventions; update only the markdown structure (no content edits) so there is exactly one "Fixed" category under the Unreleased heading.
♻️ Duplicate comments (1)
tests/scanners/test_pickle_scanner.py (1)
1199-1201:⚠️ Potential issue | 🟠 MajorUse
tmp_pathinstead oftempfile.TemporaryDirectory()in this new test.This new regression currently writes ZIP artifacts under a non-deterministic temp dir. Please switch this test to
tmp_path: Pathand createzip_path = tmp_path / "numpy_loader_payload.zip".As per coding guidelines, "Use deterministic fixtures only in tests; never reference host paths like
/etc/passwd; create all targets undertmp_path."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scanners/test_pickle_scanner.py` around lines 1199 - 1201, The test currently uses tempfile.TemporaryDirectory() to create a non-deterministic temp dir and then builds zip_path with Path(tmp_dir) in tests/scanners/test_pickle_scanner.py (the block that creates "numpy_loader_payload.zip"); change the test to accept/use the pytest tmp_path fixture (tmp_path: Path) and create zip_path = tmp_path / "numpy_loader_payload.zip" instead of using tempfile.TemporaryDirectory(), updating the test signature and any references to tmp_dir accordingly so all artifacts are created under tmp_path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Line 84: Update the CHANGELOG entry to explicitly list all covered primitives
(numpy.load, site.main, _io.FileIO, test.support.script_helper.assert_python_ok,
_osx_support._read_output, _aix_support._read_cmd_output,
_pyrepl.pager.pipe_pager, torch.serialization.load,
torch._inductor.codecache.compile_file) or at minimum append the total count (9)
next to "other validated PickleScan-only loader and execution primitives" so the
entry is precise and searchable; edit the sentence that currently reads about
"and the other validated PickleScan-only loader and execution primitives" to
include either the full list above or the phrase "(9 primitives total)"
alongside the existing four named items.
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 4489-4523: The current logic sets CVE-2025-32434 metadata for any
non-allowlisted REDUCE in a PyTorch-looking file; change it so _reduce_msg and
_reduce_details remain generic by default and only include "cve_id", "cvss",
"cwe", "description", and "remediation" (and the exploit-specific message) when
a dedicated bypass predicate (e.g., is_torch_weights_only_bypass or
detect_weights_only_bypass) returns True for the given associated_global /
ml_context; keep the existing branches keyed on _is_pytorch_file but move the
CVE-specific fields behind that predicate check, preserving the generic REDUCE
finding otherwise and using ml_context.get("overall_confidence", 0) in both
cases.
- Around line 3813-3845: The advanced-global extraction currently returns a
partial set when it stops early (max_opcodes/timeout/parse error) and callers
like _scan_pickle_bytes treat that partial advanced_globals as authoritative;
change _extract_globals_advanced to return a tuple (raw_advanced_globals,
advanced_globals_complete: bool) indicating whether extraction completed, update
_scan_pickle_bytes (and any other callers) to only use advanced_globals to
downgrade MemoryError/parse failures when advanced_globals_complete is True, and
otherwise treat advanced globals as absent/unreliable (do not downgrade); ensure
logging records completeness and add tests covering benign front-loaded and
malicious-suffix pickles to verify downgrades only occur when
advanced_globals_complete is True.
---
Outside diff comments:
In `@CHANGELOG.md`:
- Around line 82-149: The changelog contains two separate "### Fixed" headings;
locate both "### Fixed" headings in CHANGELOG.md and merge their bullet lists
into a single "### Fixed" section (preserve existing bullet order/semantics),
remove the duplicate "### Fixed" header, and ensure surrounding sections like
"### Security" remain distinct and ordered per Keep a Changelog conventions;
update only the markdown structure (no content edits) so there is exactly one
"Fixed" category under the Unreleased heading.
---
Duplicate comments:
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 1199-1201: The test currently uses tempfile.TemporaryDirectory()
to create a non-deterministic temp dir and then builds zip_path with
Path(tmp_dir) in tests/scanners/test_pickle_scanner.py (the block that creates
"numpy_loader_payload.zip"); change the test to accept/use the pytest tmp_path
fixture (tmp_path: Path) and create zip_path = tmp_path /
"numpy_loader_payload.zip" instead of using tempfile.TemporaryDirectory(),
updating the test signature and any references to tmp_dir accordingly so all
artifacts are created under tmp_path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5599d361-a58c-45fa-8a74-dee0e4e5a1b5
📒 Files selected for processing (3)
CHANGELOG.mdmodelaudit/scanners/pickle_scanner.pytests/scanners/test_pickle_scanner.py
a3b9835 to
dc486a2
Compare
|
Rebased onto current main, resolved the CHANGELOG.md and pickle test conflicts, and tightened the follow-up QA fixes on top. Additional cleanup in this pass:
Local validation passed:
Fresh GitHub CI is running on head 4b6a6b0. |
|
Rebased onto current , resolved the and pickle test conflicts, and tightened the follow-up QA fixes on top. Additional cleanup in this pass:
Local validation passed:
........................................................................ [ 3%] Fresh GitHub CI is running on head . |
There was a problem hiding this comment.
♻️ Duplicate comments (2)
modelaudit/scanners/pickle_scanner.py (2)
4440-4469:⚠️ Potential issue | 🟠 MajorDon't map every non-allowlisted PyTorch
REDUCEto CVE-2025-32434.This branch still attaches the CVE metadata and exploit-specific wording whenever the file merely looks like PyTorch. Ordinary checkpoints with custom reducers will be reported as a specific
weights_only=Truebypass even when no bypass indicator is present. Keep the genericREDUCEfinding by default and add the CVE fields only behind a dedicated bypass predicate. As per coding guidelines: Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/pickle_scanner.py` around lines 4440 - 4469, The current logic in the REDUCE handling (_is_pytorch_file, _reduce_msg, _reduce_details) unconditionally attaches CVE-2025-32434 metadata for any file that looks like PyTorch; instead, keep the generic REDUCE finding by default and only add the CVE-specific fields when a dedicated bypass predicate is true (e.g., when you detect an explicit weights_only=True bypass indicator or other concrete exploit signal in ml_context). Update the branch that builds _reduce_msg/_reduce_details to: 1) create the generic message/details regardless of file type, 2) evaluate a new predicate function/flag (use or add something like detect_weights_only_bypass(ml_context) or ml_context["weights_only_bypass"]) and only when that returns true, extend _reduce_details with cve_id, cvss, cwe, description and remediation and modify the message to reference CVE-2025-32434; otherwise leave the finding as a generic REDUCE-with-custom-global report. Ensure you reference _is_pytorch_file and ml_context when implementing the predicate to avoid false positives.
3804-3815:⚠️ Potential issue | 🟠 MajorDon't trust
advanced_globalswhen extraction is partial or unbounded.This now buffers the full opcode stream with
list(_genops_with_fallback(...)), but the result still has no completeness bit and is later used to decide whether aMemoryError/parse failure can be downgraded as a legitimate model. A crafted file can front-load benign refs, force early termination or resource pressure, and keep the dangerous suffix out ofadvanced_globals. Please keep bounded extraction here and only useadvanced_globalsin the downgrade gate when extraction completed successfully. Based on learnings: Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes.Also applies to: 5016-5045
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/pickle_scanner.py` around lines 3804 - 3815, The code unsafely buffers the entire opcode stream by calling list(_genops_with_fallback(...)) inside _extract_globals_advanced which can produce partial/unbounded extraction and be abused; change this to consume the generator in a bounded/streaming way (iterate with an explicit max-opcode or max-bytes limit, or modify _genops_with_fallback to yield a completion flag) and build stack_global_refs via incremental calls to _build_symbolic_reference_maps without calling list(...); ensure the function returns/sets a completion boolean (or raises) and only allow using advanced_globals in the downgrade gate when that completion flag is true (apply the same streaming/completion change wherever advanced_globals is used, including the other block referenced around lines 5016-5045).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 4440-4469: The current logic in the REDUCE handling
(_is_pytorch_file, _reduce_msg, _reduce_details) unconditionally attaches
CVE-2025-32434 metadata for any file that looks like PyTorch; instead, keep the
generic REDUCE finding by default and only add the CVE-specific fields when a
dedicated bypass predicate is true (e.g., when you detect an explicit
weights_only=True bypass indicator or other concrete exploit signal in
ml_context). Update the branch that builds _reduce_msg/_reduce_details to: 1)
create the generic message/details regardless of file type, 2) evaluate a new
predicate function/flag (use or add something like
detect_weights_only_bypass(ml_context) or ml_context["weights_only_bypass"]) and
only when that returns true, extend _reduce_details with cve_id, cvss, cwe,
description and remediation and modify the message to reference CVE-2025-32434;
otherwise leave the finding as a generic REDUCE-with-custom-global report.
Ensure you reference _is_pytorch_file and ml_context when implementing the
predicate to avoid false positives.
- Around line 3804-3815: The code unsafely buffers the entire opcode stream by
calling list(_genops_with_fallback(...)) inside _extract_globals_advanced which
can produce partial/unbounded extraction and be abused; change this to consume
the generator in a bounded/streaming way (iterate with an explicit max-opcode or
max-bytes limit, or modify _genops_with_fallback to yield a completion flag) and
build stack_global_refs via incremental calls to _build_symbolic_reference_maps
without calling list(...); ensure the function returns/sets a completion boolean
(or raises) and only allow using advanced_globals in the downgrade gate when
that completion flag is true (apply the same streaming/completion change
wherever advanced_globals is used, including the other block referenced around
lines 5016-5045).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5ef5046d-436b-4d47-9d2f-872bb2aad136
📒 Files selected for processing (5)
CHANGELOG.mdmodelaudit/config/explanations.pymodelaudit/scanners/pickle_scanner.pytests/scanners/test_pickle_scanner.pytests/test_why_explanations.py
Summary
numpy.load,site.main,_io.FileIO,test.support.script_helper.assert_python_ok,_osx_support._read_output,_aix_support._read_cmd_output,_pyrepl.pager.pipe_pager,torch.serialization.load, andtorch._inductor.codecache.compile_fileS206fallback for dangerousGLOBALfindings with no module-specific rule mappingValidation
Summary by CodeRabbit
Bug Fixes & Security
Tests
Documentation