fix: expand dangerous pickle primitive coverage by mldangelo · Pull Request #705 · promptfoo/modelaudit

mldangelo · 2026-03-13T23:46:32Z

Summary

add exact dangerous-global coverage for the validated PickleScan-only refs: numpy.load, site.main, _io.FileIO, test.support.script_helper.assert_python_ok, _osx_support._read_output, _aix_support._read_cmd_output, _pyrepl.pager.pipe_pager, torch.serialization.load, and torch._inductor.codecache.compile_file
upgrade exact dotted-name explanations so pickle findings for these refs show the real risk instead of falling back to the base module
keep coverage exact and add a generic S206 fallback for dangerous GLOBAL findings with no module-specific rule mapping

Validation

uv run pytest tests/scanners/test_pickle_scanner.py -q -k "picklescan_gap or safe_nearby_imports_remain_non_failing or existing_reference_behavior_unchanged"
uv run pytest tests/test_why_explanations.py -q -k "exact_dangerous_imports or dangerous_imports"
uv run ruff format modelaudit/ tests/
uv run ruff check --fix modelaudit/ tests/
uv run ruff check modelaudit/ tests/
uv run ruff format --check modelaudit/ tests/
uv run mypy modelaudit/
uv run pytest -n auto -m "not slow and not integration" --maxfail=1

Summary by CodeRabbit

Bug Fixes & Security
- Expanded detection and reporting for additional dangerous imports and pickle callsites, including NumPy and PyTorch load patterns, with more precise, context-aware explanations and CVE-linked details.
Tests
- Added regression tests covering exact-name detections, REDUCE/global paths, multi-stream/archive scenarios, bypass attempts, and PyTorch-related cases.
Documentation
- Changelog updated with the new security entry.

coderabbitai · 2026-03-13T23:46:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Registers exact dotted dangerous imports (e.g., numpy.load, torch.serialization.load), adds exact-match import explanations, extends PickleScanner to extract and propagate opcode context as 3-tuples, and adds tests for GAP/REDUCE detection, ZIP/stream cases, and explanation assertions.

Changes

Cohort / File(s)	Summary
Changelog `CHANGELOG.md`	Added Unreleased security entry documenting exact dotted dangerous-global coverage for newly registered dotted imports.
Explanations / Config `modelaudit/config/explanations.py`	Added dotted entries to `DANGEROUS_IMPORTS` (e.g., `numpy.load`, `site.main`, `_io.FileIO`, `test.support.script_helper.assert_python_ok`, `_osx_support._read_output`, `_aix_support._read_cmd_output`, `_pyrepl.pager.pipe_pager`, `torch.serialization.load`, `torch._inductor.codecache.compile_file`) and short-circuited `get_import_explanation` to return exact-match import explanations.
Pickle scanner core `modelaudit/scanners/pickle_scanner.py`	Changed advanced-global representation to 3-tuples `(module, name, opcode_name)`, updated `_extract_globals_advanced` signature to accept `BinaryIO` and return 3-tuples, added `numpy.load` to always-dangerous set, threaded opcode context through scanning, reporting, and rule-code fallback (including `get_pickle_opcode_rule_code`), and hardened extraction to return partial results on parse errors.
Scanner tests `tests/scanners/test_pickle_scanner.py`	Added `PICKLESCAN_GAP_REFS`, helper to craft GLOBAL-only pickles, updated `_scan_bytes` signature to accept `suffix`, adapted monkeypatches to new `_extract_globals_advanced(multiple_pickles=True)` signature, and added extensive tests for GAP/REDUCE detection, comment-token handling, ZIP/second-stream detection, opcode/timeouts, and CVE-related behaviors.
Explanation tests `tests/test_why_explanations.py`	Added tests asserting exact dotted import explanations exist and contain expected substrings for `numpy.load` and `torch.serialization.load`.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Test / Caller
    participant Scanner as PickleScanner
    participant Registry as ExplanationsRegistry
    participant Rules as OpcodeRuleResolver
    participant Reporter as FindingsReporter

    Client->>Scanner: submit bytes/stream
    Scanner->>Scanner: parse -> extract advanced globals (module, name, opcode_name)
    Scanner->>Registry: get_import_explanation("module.name")
    Registry-->>Scanner: exact dotted explanation (if present)
    alt no public rule code for global
        Scanner->>Rules: get_pickle_opcode_rule_code(opcode_name)
        Rules-->>Scanner: rule code / fallback (e.g., "S206")
    end
    Scanner->>Reporter: emit finding (module.name, opcode_name, explanation, rule_code)
    Reporter-->>Client: finding/result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through opcodes, sniffed each name,
Dotted imports found and called by name.
I stitched the traces, kept the rabbit track,
Pickles now reveal what once hid back.
Tiny paws, bright guard — a nibble and a clap.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: expand dangerous pickle primitive coverage' directly and clearly describes the main change: expanding coverage for dangerous pickle primitives with exact dotted-name references.
Docstring Coverage	✅ Passed	Docstring coverage is 95.83% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/pickle-expanded-dangerous-primitives

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelaudit/scanners/pickle_scanner.py (1)

4256-4270: ⚠️ Potential issue | 🟠 Major

Fix fallback rule classification in advanced global findings.

_extract_globals_advanced() aggregates GLOBAL, INST, and STACK_GLOBAL, but this check hardcodes opcode as STACK_GLOBAL and uses S205 fallback. That can misclassify unresolved dangerous GLOBAL findings.

Proposed minimal correction

-                    if not rule_code:
-                        rule_code = "S205"  # STACK_GLOBAL/GLOBAL fallback
+                    if not rule_code:
+                        rule_code = "S206"  # generic unresolved GLOBAL-like import fallback
@@
-                            "opcode": "STACK_GLOBAL",
+                            "opcode": "GLOBAL_OR_STACK_GLOBAL",

Also applies to: 4275-4275

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/pickle_scanner.py` around lines 4256 - 4270, In
_extract_globals_advanced(), stop hardcoding "STACK_GLOBAL" and "S205": derive
the opcode for each finding (e.g., "GLOBAL", "INST", or "STACK_GLOBAL") and use
that opcode both in the details map passed to result.add_check and to select a
proper fallback rule_code when get_import_rule_code(mod, func) returns falsy
(map opcode -> fallback code instead of always using "S205"); update the call
sites around get_import_rule_code(mod, func) and result.add_check so the
rule_code and the "opcode" field reflect the actual opcode variable rather than
the fixed string.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 4465-4479: The _reduce_details dict created in the branch where
_is_pytorch_file is true only sets "cve_id" for CVE-2025-32434; update the
creation of _reduce_details (the dict built alongside variables pos, opcode,
associated_global) to include the full CVE metadata fields required by the
scanner schema: add "cvss" (numeric or string score), "cwe" (identifier),
"description" (brief text describing the vulnerability), and "remediation"
(recommended fix/mitigation), while preserving existing keys like "position",
"opcode", "associated_global", and "ml_context_confidence" (pulled from
ml_context.get("overall_confidence", 0)); ensure these fields are populated with
the appropriate static values for CVE-2025-32434.

---

Outside diff comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 4256-4270: In _extract_globals_advanced(), stop hardcoding
"STACK_GLOBAL" and "S205": derive the opcode for each finding (e.g., "GLOBAL",
"INST", or "STACK_GLOBAL") and use that opcode both in the details map passed to
result.add_check and to select a proper fallback rule_code when
get_import_rule_code(mod, func) returns falsy (map opcode -> fallback code
instead of always using "S205"); update the call sites around
get_import_rule_code(mod, func) and result.add_check so the rule_code and the
"opcode" field reflect the actual opcode variable rather than the fixed string.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a151a9cc-63e3-4c15-b534-4d307f4896de

📥 Commits

Reviewing files that changed from the base of the PR and between 2df2d78 and 3e98973.

📒 Files selected for processing (5)

CHANGELOG.md
modelaudit/config/explanations.py
modelaudit/scanners/pickle_scanner.py
tests/scanners/test_pickle_scanner.py
tests/test_why_explanations.py

modelaudit/scanners/pickle_scanner.py

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelaudit/scanners/pickle_scanner.py (1)
3803-3851: ⚠️ Potential issue | 🟠 Major

Use the resyncing parser in _extract_globals_advanced().

This exact-match path still relies on raw pickletools.genops(), so once a later stream hits a separator byte or partial-stream parse error, the if globals_found: return globals_found path stops extraction with a truncated advanced_globals set. That weakens the new exact dangerous-global coverage for appended pickles and also affects the downgrade/error gates that consult advanced_globals. Please reuse _genops_with_fallback(..., multi_stream=multiple_pickles) here or mirror its resync behavior. Based on learnings: Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/pickle_scanner.py` around lines 3803 - 3851, _in
_extract_globals_advanced the parser uses pickletools.genops(data) directly
which fails to resync on stream separators and truncates extraction; replace the
genops call with the resyncing helper _genops_with_fallback(data,
multi_stream=multiple_pickles) (or call the existing _genops_with_fallback(...)
signature used elsewhere) and iterate the returned ops the same way, preserving
the subsequent stack_global_refs/_build_symbolic_reference_maps logic and the
multiple_pickles behavior so appended pickles are parsed and security detections
remain robust.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 1014-1054: Add a regression case that verifies a leading comment
token does not bypass detection: for one entry from PICKLESCAN_GAP_REFS, call
the existing _craft_global_only_pickle and _craft_global_reduce_pickle but
prepend a SHORT_BINUNICODE comment token (i.e. "#..." followed by POP) to the
serialized payload (or add an optional parameter to those helpers to inject that
prefix), then run _scan_bytes on that payload and assert the same failing checks
as in test_picklescan_gap_globals_are_critical_on_import_only and
test_picklescan_gap_globals_are_critical_with_reduce (look for the same check
names, statuses, severities and messages) so embedding a single comment token
does not suppress detection.
- Around line 1064-1085: Tighten the tests to assert on the exact dotted
reference instead of broad flags: in
test_safe_nearby_imports_remain_non_failing, after scanning with
_scan_bytes(_craft_global_only_pickle(...)) assert that neither "Global Module
Reference Check" nor "Advanced Global Reference Check" has status FAILED with a
message containing the exact f"{module}.{func}"; in
test_existing_reference_behavior_unchanged, replace the generic
result.has_errors assertion with an assertion that a specific check (e.g.,
"Global Module Reference Check" or "Advanced Global Reference Check") has status
FAILED and its message contains f"{module}.{func}" when scanning
_craft_global_reduce_pickle(module, func), so the tests precisely validate the
exact dotted reference failure/absence.
- Around line 631-641: PICKLESCAN_GAP_REFS is missing an explicit class-level
type annotation; add an import for ClassVar from typing and annotate the
constant as ClassVar[tuple[tuple[str, str], ...]] on the class where
PICKLESCAN_GAP_REFS is defined (preserving the existing value) so the structure
is documented and satisfies the typing guideline.

---

Outside diff comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 3803-3851: _in _extract_globals_advanced the parser uses
pickletools.genops(data) directly which fails to resync on stream separators and
truncates extraction; replace the genops call with the resyncing helper
_genops_with_fallback(data, multi_stream=multiple_pickles) (or call the existing
_genops_with_fallback(...) signature used elsewhere) and iterate the returned
ops the same way, preserving the subsequent
stack_global_refs/_build_symbolic_reference_maps logic and the multiple_pickles
behavior so appended pickles are parsed and security detections remain robust.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f6b5fc89-d9e5-4104-82a4-9e37ec06e70d

📥 Commits

Reviewing files that changed from the base of the PR and between 3e98973 and 44e92d1.

📒 Files selected for processing (2)

modelaudit/scanners/pickle_scanner.py
tests/scanners/test_pickle_scanner.py

tests/scanners/test_pickle_scanner.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 3822-3827: The GLOBAL/INST branch currently splits arg by a space
which misses "module.function" forms; replace the manual split with the shared
parser by calling _parse_module_function(arg) and use its returned (module,
function) values when adding to globals_found along with op_name (i.e., replace
the split/parts logic in the GLOBAL/INST branch to call _parse_module_function
and add (module, function or "", op_name) to globals_found), ensuring the same
parsing behavior as the rest of the scanner.

In `@tests/scanners/test_pickle_scanner.py`:
- Around line 1137-1151: The test function
test_picklescan_gap_detected_inside_zip_entry uses
tempfile.TemporaryDirectory(); change it to use the pytest tmp_path fixture by
adding tmp_path as a parameter to the test and create zip_path under tmp_path
(e.g. tmp_path / "numpy_loader_payload.zip") instead of using
tempfile.TemporaryDirectory(), then write the ZIP entry there and call scan_file
on str(zip_path) so the test uses the deterministic tmp_path fixture; update any
local variables (zip_path) accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 88bad51b-966e-48d5-af26-d1c9305ca60b

📥 Commits

Reviewing files that changed from the base of the PR and between 44e92d1 and 6778175.

📒 Files selected for processing (2)

modelaudit/scanners/pickle_scanner.py
tests/scanners/test_pickle_scanner.py

modelaudit/scanners/pickle_scanner.py

tests/scanners/test_pickle_scanner.py

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

CHANGELOG.md (1)
82-149: 🛠️ Refactor suggestion | 🟠 Major

Consolidate duplicate "Fixed" sections.

The [Unreleased] changelog has two separate ### Fixed sections (lines 82 and 124), which violates Keep a Changelog format. Each category (Added, Changed, Fixed, Security, etc.) should appear at most once. All Fixed entries should be consolidated into a single section.

Note: This is a pre-existing issue not introduced by this PR, but should be addressed to maintain proper changelog structure. As per coding guidelines, the changelog must follow Keep a Changelog format.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` around lines 82 - 149, The changelog contains two separate "###
Fixed" headings; locate both "### Fixed" headings in CHANGELOG.md and merge
their bullet lists into a single "### Fixed" section (preserve existing bullet
order/semantics), remove the duplicate "### Fixed" header, and ensure
surrounding sections like "### Security" remain distinct and ordered per Keep a
Changelog conventions; update only the markdown structure (no content edits) so
there is exactly one "Fixed" category under the Unreleased heading.

♻️ Duplicate comments (1)

tests/scanners/test_pickle_scanner.py (1)
1199-1201: ⚠️ Potential issue | 🟠 Major

Use tmp_path instead of tempfile.TemporaryDirectory() in this new test.

This new regression currently writes ZIP artifacts under a non-deterministic temp dir. Please switch this test to tmp_path: Path and create zip_path = tmp_path / "numpy_loader_payload.zip".

As per coding guidelines, "Use deterministic fixtures only in tests; never reference host paths like /etc/passwd; create all targets under tmp_path."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scanners/test_pickle_scanner.py` around lines 1199 - 1201, The test
currently uses tempfile.TemporaryDirectory() to create a non-deterministic temp
dir and then builds zip_path with Path(tmp_dir) in
tests/scanners/test_pickle_scanner.py (the block that creates
"numpy_loader_payload.zip"); change the test to accept/use the pytest tmp_path
fixture (tmp_path: Path) and create zip_path = tmp_path /
"numpy_loader_payload.zip" instead of using tempfile.TemporaryDirectory(),
updating the test signature and any references to tmp_dir accordingly so all
artifacts are created under tmp_path.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Line 84: Update the CHANGELOG entry to explicitly list all covered primitives
(numpy.load, site.main, _io.FileIO, test.support.script_helper.assert_python_ok,
_osx_support._read_output, _aix_support._read_cmd_output,
_pyrepl.pager.pipe_pager, torch.serialization.load,
torch._inductor.codecache.compile_file) or at minimum append the total count (9)
next to "other validated PickleScan-only loader and execution primitives" so the
entry is precise and searchable; edit the sentence that currently reads about
"and the other validated PickleScan-only loader and execution primitives" to
include either the full list above or the phrase "(9 primitives total)"
alongside the existing four named items.

In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 4489-4523: The current logic sets CVE-2025-32434 metadata for any
non-allowlisted REDUCE in a PyTorch-looking file; change it so _reduce_msg and
_reduce_details remain generic by default and only include "cve_id", "cvss",
"cwe", "description", and "remediation" (and the exploit-specific message) when
a dedicated bypass predicate (e.g., is_torch_weights_only_bypass or
detect_weights_only_bypass) returns True for the given associated_global /
ml_context; keep the existing branches keyed on _is_pytorch_file but move the
CVE-specific fields behind that predicate check, preserving the generic REDUCE
finding otherwise and using ml_context.get("overall_confidence", 0) in both
cases.
- Around line 3813-3845: The advanced-global extraction currently returns a
partial set when it stops early (max_opcodes/timeout/parse error) and callers
like _scan_pickle_bytes treat that partial advanced_globals as authoritative;
change _extract_globals_advanced to return a tuple (raw_advanced_globals,
advanced_globals_complete: bool) indicating whether extraction completed, update
_scan_pickle_bytes (and any other callers) to only use advanced_globals to
downgrade MemoryError/parse failures when advanced_globals_complete is True, and
otherwise treat advanced globals as absent/unreliable (do not downgrade); ensure
logging records completeness and add tests covering benign front-loaded and
malicious-suffix pickles to verify downgrades only occur when
advanced_globals_complete is True.

---

Outside diff comments:
In `@CHANGELOG.md`:
- Around line 82-149: The changelog contains two separate "### Fixed" headings;
locate both "### Fixed" headings in CHANGELOG.md and merge their bullet lists
into a single "### Fixed" section (preserve existing bullet order/semantics),
remove the duplicate "### Fixed" header, and ensure surrounding sections like
"### Security" remain distinct and ordered per Keep a Changelog conventions;
update only the markdown structure (no content edits) so there is exactly one
"Fixed" category under the Unreleased heading.

---

Duplicate comments:
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 1199-1201: The test currently uses tempfile.TemporaryDirectory()
to create a non-deterministic temp dir and then builds zip_path with
Path(tmp_dir) in tests/scanners/test_pickle_scanner.py (the block that creates
"numpy_loader_payload.zip"); change the test to accept/use the pytest tmp_path
fixture (tmp_path: Path) and create zip_path = tmp_path /
"numpy_loader_payload.zip" instead of using tempfile.TemporaryDirectory(),
updating the test signature and any references to tmp_dir accordingly so all
artifacts are created under tmp_path.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5599d361-a58c-45fa-8a74-dee0e4e5a1b5

📥 Commits

Reviewing files that changed from the base of the PR and between 6778175 and fdda8fe.

📒 Files selected for processing (3)

CHANGELOG.md
modelaudit/scanners/pickle_scanner.py
tests/scanners/test_pickle_scanner.py

CHANGELOG.md

modelaudit/scanners/pickle_scanner.py

mldangelo · 2026-03-14T07:24:04Z

Rebased onto current main, resolved the CHANGELOG.md and pickle test conflicts, and tightened the follow-up QA fixes on top.

Additional cleanup in this pass:

switched advanced-global extraction to the shared fallback parser and preserved per-opcode metadata in downstream checks
added the missing CVE metadata bundle for the PyTorch-specific REDUCE warning path
tightened the pickle regressions with exact-reference assertions, a comment-token bypass case, tmp_path zip coverage, and updated MemoryError monkeypatch fixtures for the new advanced-global tuple shape
expanded the changelog entry to list all 9 covered PickleScan primitives explicitly

Local validation passed:

uv run ruff format modelaudit/ tests/
uv run ruff check --fix modelaudit/ tests/
uv run ruff check modelaudit/ tests/
uv run ruff format --check modelaudit/ tests/
uv run mypy modelaudit/
uv run pytest tests/scanners/test_pickle_scanner.py -q -k "httplib or picklescan_gap or memory_error or pytorch_reduce_warning_includes_complete_cve_metadata"
uv run pytest tests/test_why_explanations.py -q -k "exact_dangerous_imports or dangerous_imports"
uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 2185 passed, 57 skipped

Fresh GitHub CI is running on head 4b6a6b0.

mldangelo · 2026-03-14T07:25:01Z

Rebased onto current , resolved the and pickle test conflicts, and tightened the follow-up QA fixes on top.

Additional cleanup in this pass:

switched advanced-global extraction to the shared fallback parser and preserved per-opcode metadata in downstream checks
added the missing CVE metadata bundle for the PyTorch-specific REDUCE warning path
tightened the pickle regressions with exact-reference assertions, a comment-token bypass case, zip coverage, and updated MemoryError monkeypatch fixtures for the new advanced-global tuple shape
expanded the changelog entry to list all 9 covered PickleScan primitives explicitly

Local validation passed:

331 files left unchanged
All checks passed!
All checks passed!
331 files already formatted
Success: no issues found in 191 source files
.............. [100%]
14 passed, 46 deselected in 1.35s
.. [100%]
2 passed, 20 deselected in 0.09s
============================= test session starts ==============================
platform darwin -- Python 3.11.14, pytest-9.0.2, pluggy-1.6.0
rootdir: /Users/mdangelo/projects/worktrees/modelaudit-task-69b4743e
configfile: pyproject.toml
testpaths: tests
plugins: anyio-4.12.1, xdist-3.8.0, asyncio-1.3.0, cov-7.0.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
created: 14/14 workers
14 workers [2234 items]

........................................................................ [ 3%]
........................................................................ [ 6%]
........................................................................ [ 9%]
........................................................................ [ 12%]
........................................................................ [ 16%]
........................................................................ [ 19%]
........................................................................ [ 22%]
........................................................................ [ 25%]
..................................................................s.sss. [ 29%]
s..s...s................................................................ [ 32%]
.....................s......s........................................... [ 35%]
........................................................................ [ 38%]
.........................................sss............................ [ 41%]
........................................................................ [ 45%]
........................................................................ [ 48%]
.......................s................................................ [ 51%]
.................................................................s...... [ 54%]
............................s........................................... [ 58%]
.....................................................................sss [ 61%]
........................................................................ [ 64%]
........................................................sssssss......... [ 67%]
...............................s.ss.....s..s............................ [ 70%]
......................ss.sssssss........................................ [ 74%]
........................................................................ [ 77%]
........................................................................ [ 80%]
........................................................ss.............. [ 83%]
........................................................................ [ 87%]
........................................................................ [ 90%]
........................................................................ [ 93%]
.............s.s...........................s............................ [ 96%]
...........ss...............s..s..s..................................... [ 99%]
.. [100%]
=========================== short test summary info ============================
SKIPPED [1] tests/scanners/test_joblib_scanner.py:5: could not import 'joblib': No module named 'joblib'
SKIPPED [1] tests/scanners/test_keras_h5_scanner.py:8: could not import 'h5py': No module named 'h5py'
SKIPPED [1] tests/scanners/test_onnx_scanner.py:7: could not import 'onnx': No module named 'onnx'
SKIPPED [1] tests/scanners/test_safetensors_scanner.py:9: could not import 'safetensors': No module named 'safetensors'
SKIPPED [1] tests/test_asset_inventory_integration.py:24: could not import 'safetensors': No module named 'safetensors'
SKIPPED [1] tests/test_asset_list.py:10: could not import 'safetensors': No module named 'safetensors'
SKIPPED [1] tests/test_pytorch_zip_detection.py:10: could not import 'torch': No module named 'torch'
SKIPPED [1] tests/utils/helpers/test_py_compile_improvements.py:10: could not import 'h5py': No module named 'h5py'
SKIPPED [1] tests/scanners/test_sevenzip_scanner.py:107: py7zr not available
SKIPPED [1] tests/scanners/test_sevenzip_scanner.py:126: py7zr not available
SKIPPED [1] tests/scanners/test_sevenzip_scanner.py:145: py7zr not available
SKIPPED [1] tests/scanners/test_sevenzip_scanner.py:182: py7zr not available
SKIPPED [1] tests/scanners/test_sevenzip_scanner.py:248: py7zr not available
SKIPPED [1] tests/scanners/test_sevenzip_scanner.py:324: py7zr not available
SKIPPED [1] tests/scanners/test_sevenzip_scanner.py:444: py7zr not available
SKIPPED [1] tests/scanners/test_sevenzip_scanner.py:503: py7zr not available
SKIPPED [1] tests/scanners/test_sevenzip_scanner.py:615: py7zr not available for integration tests
SKIPPED [1] tests/scanners/test_tflite_scanner.py:58: tflite not installed
SKIPPED [1] tests/scanners/test_tflite_scanner.py:74: tflite not installed
SKIPPED [1] tests/scanners/test_tflite_scanner.py:111: tflite not installed
SKIPPED [1] tests/test_false_positive_fixes.py:188: h5py not installed
SKIPPED [1] tests/test_metadata_extractor.py:188: could not import 'joblib': No module named 'joblib'
SKIPPED [1] tests/test_metadata_extractor.py:320: could not import 'xgboost': No module named 'xgboost'
SKIPPED [1] tests/test_real_world_dill_joblib.py:116: joblib not available
SKIPPED [1] tests/test_real_world_dill_joblib.py:136: joblib not available
SKIPPED [1] tests/test_real_world_dill_joblib.py:176: joblib not available
SKIPPED [1] tests/test_sklearn_joblib_false_positive.py:33: sklearn not installed
SKIPPED [1] tests/test_sklearn_joblib_false_positive.py:66: sklearn not installed
SKIPPED [1] tests/test_sklearn_joblib_false_positive.py:103: sklearn not installed
SKIPPED [1] tests/test_sklearn_joblib_false_positive.py:137: sklearn not installed
SKIPPED [1] tests/test_sklearn_joblib_false_positive.py:171: sklearn not installed
SKIPPED [1] tests/test_sklearn_joblib_false_positive.py:201: sklearn not installed
SKIPPED [1] tests/test_sklearn_joblib_false_positive.py:224: sklearn not installed
SKIPPED [1] tests/scanners/test_weight_distribution_scanner.py:362: PyTorch not installed
SKIPPED [1] tests/scanners/test_weight_distribution_scanner.py:410: h5py not installed
SKIPPED [1] tests/scanners/test_weight_distribution_scanner.py:444: TensorFlow not installed
SKIPPED [1] tests/scanners/test_weight_distribution_scanner.py:483: PyTorch not installed
SKIPPED [1] tests/scanners/test_weight_distribution_scanner.py:509: PyTorch not installed
SKIPPED [8] tests/conftest.py:139: tensorflow is not installed
SKIPPED [1] tests/test_tensorflow_lambda_detection.py:218: TensorFlow not installed
SKIPPED [1] tests/utils/helpers/test_ml_context_false_positives.py:149: Skipping due to test environment differences - core functionality verified with real models
SKIPPED [1] tests/utils/helpers/test_ml_context_false_positives.py:199: Skipping due to test environment differences - core functionality verified with real models
SKIPPED [1] tests/scanners/test_pytorch_binary_scanner.py:79: ML context filtering now ignores executable signatures in weight-like data to reduce false positives
SKIPPED [1] tests/scanners/test_pytorch_binary_scanner.py:134: ML context filtering now ignores executable signatures in weight-like data to reduce false positives
SKIPPED [1] tests/test_false_positive_fixes.py:292: torch not installed
SKIPPED [1] tests/test_performance_benchmarks.py:152: Not enough asset files for scaling test
SKIPPED [1] tests/test_performance_benchmarks.py:195: psutil not available for memory testing
SKIPPED [1] tests/scanners/test_xgboost_scanner.py:244: ubjson not installed
SKIPPED [1] tests/test_performance_benchmarks.py:255: Skipping concurrency overhead check in local environment (overhead=3.33x)
SKIPPED [1] tests/test_performance_benchmarks.py:312: Skipping timeout performance test due to enhanced security scanning. The improved security detection now performs more thorough analysis, which introduces legitimate performance variance that makes timeout overhead measurements unreliable. The core security functionality has been verified to work correctly.
================= 2185 passed, 57 skipped in 68.10s (0:01:08) ================== →

Fresh GitHub CI is running on head .

coderabbitai

♻️ Duplicate comments (2)

modelaudit/scanners/pickle_scanner.py (2)
4440-4469: ⚠️ Potential issue | 🟠 Major

Don't map every non-allowlisted PyTorch REDUCE to CVE-2025-32434.

This branch still attaches the CVE metadata and exploit-specific wording whenever the file merely looks like PyTorch. Ordinary checkpoints with custom reducers will be reported as a specific weights_only=True bypass even when no bypass indicator is present. Keep the generic REDUCE finding by default and add the CVE fields only behind a dedicated bypass predicate. As per coding guidelines: Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/pickle_scanner.py` around lines 4440 - 4469, The current
logic in the REDUCE handling (_is_pytorch_file, _reduce_msg, _reduce_details)
unconditionally attaches CVE-2025-32434 metadata for any file that looks like
PyTorch; instead, keep the generic REDUCE finding by default and only add the
CVE-specific fields when a dedicated bypass predicate is true (e.g., when you
detect an explicit weights_only=True bypass indicator or other concrete exploit
signal in ml_context). Update the branch that builds _reduce_msg/_reduce_details
to: 1) create the generic message/details regardless of file type, 2) evaluate a
new predicate function/flag (use or add something like
detect_weights_only_bypass(ml_context) or ml_context["weights_only_bypass"]) and
only when that returns true, extend _reduce_details with cve_id, cvss, cwe,
description and remediation and modify the message to reference CVE-2025-32434;
otherwise leave the finding as a generic REDUCE-with-custom-global report.
Ensure you reference _is_pytorch_file and ml_context when implementing the
predicate to avoid false positives.
3804-3815: ⚠️ Potential issue | 🟠 Major

Don't trust advanced_globals when extraction is partial or unbounded.

This now buffers the full opcode stream with list(_genops_with_fallback(...)), but the result still has no completeness bit and is later used to decide whether a MemoryError/parse failure can be downgraded as a legitimate model. A crafted file can front-load benign refs, force early termination or resource pressure, and keep the dangerous suffix out of advanced_globals. Please keep bounded extraction here and only use advanced_globals in the downgrade gate when extraction completed successfully. Based on learnings: Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes.

Also applies to: 5016-5045
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/pickle_scanner.py` around lines 3804 - 3815, The code
unsafely buffers the entire opcode stream by calling
list(_genops_with_fallback(...)) inside _extract_globals_advanced which can
produce partial/unbounded extraction and be abused; change this to consume the
generator in a bounded/streaming way (iterate with an explicit max-opcode or
max-bytes limit, or modify _genops_with_fallback to yield a completion flag) and
build stack_global_refs via incremental calls to _build_symbolic_reference_maps
without calling list(...); ensure the function returns/sets a completion boolean
(or raises) and only allow using advanced_globals in the downgrade gate when
that completion flag is true (apply the same streaming/completion change
wherever advanced_globals is used, including the other block referenced around
lines 5016-5045).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 4440-4469: The current logic in the REDUCE handling
(_is_pytorch_file, _reduce_msg, _reduce_details) unconditionally attaches
CVE-2025-32434 metadata for any file that looks like PyTorch; instead, keep the
generic REDUCE finding by default and only add the CVE-specific fields when a
dedicated bypass predicate is true (e.g., when you detect an explicit
weights_only=True bypass indicator or other concrete exploit signal in
ml_context). Update the branch that builds _reduce_msg/_reduce_details to: 1)
create the generic message/details regardless of file type, 2) evaluate a new
predicate function/flag (use or add something like
detect_weights_only_bypass(ml_context) or ml_context["weights_only_bypass"]) and
only when that returns true, extend _reduce_details with cve_id, cvss, cwe,
description and remediation and modify the message to reference CVE-2025-32434;
otherwise leave the finding as a generic REDUCE-with-custom-global report.
Ensure you reference _is_pytorch_file and ml_context when implementing the
predicate to avoid false positives.
- Around line 3804-3815: The code unsafely buffers the entire opcode stream by
calling list(_genops_with_fallback(...)) inside _extract_globals_advanced which
can produce partial/unbounded extraction and be abused; change this to consume
the generator in a bounded/streaming way (iterate with an explicit max-opcode or
max-bytes limit, or modify _genops_with_fallback to yield a completion flag) and
build stack_global_refs via incremental calls to _build_symbolic_reference_maps
without calling list(...); ensure the function returns/sets a completion boolean
(or raises) and only allow using advanced_globals in the downgrade gate when
that completion flag is true (apply the same streaming/completion change
wherever advanced_globals is used, including the other block referenced around
lines 5016-5045).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5ef5046d-436b-4d47-9d2f-872bb2aad136

📥 Commits

Reviewing files that changed from the base of the PR and between fdda8fe and 4b6a6b0.

📒 Files selected for processing (5)

CHANGELOG.md
modelaudit/config/explanations.py
modelaudit/scanners/pickle_scanner.py
tests/scanners/test_pickle_scanner.py
tests/test_why_explanations.py

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

modelaudit/scanners/pickle_scanner.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

tests/scanners/test_pickle_scanner.py Outdated Show resolved Hide resolved

tests/scanners/test_pickle_scanner.py Show resolved Hide resolved

tests/scanners/test_pickle_scanner.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

modelaudit/scanners/pickle_scanner.py Outdated Show resolved Hide resolved

tests/scanners/test_pickle_scanner.py Outdated Show resolved Hide resolved

mldangelo added 2 commits March 14, 2026 03:14

fix: expand dangerous pickle primitive coverage

4172200

fix: tighten dangerous pickle primitive QA

1936b72

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

modelaudit/scanners/pickle_scanner.py Outdated Show resolved Hide resolved

modelaudit/scanners/pickle_scanner.py Show resolved Hide resolved

fix: finish pickle primitive rebase cleanup

dc486a2

mldangelo force-pushed the feat/pickle-expanded-dangerous-primitives branch from a3b9835 to dc486a2 Compare March 14, 2026 07:22

docs: expand pickle primitive changelog entry

4b6a6b0

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

Conversation

mldangelo commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mldangelo commented Mar 14, 2026

Uh oh!

mldangelo commented Mar 14, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mldangelo commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading