Skip to content

fix: remove pickle hasattr allowlist entries#692

Open
mldangelo wants to merge 3 commits intomainfrom
feat/pickle-hasattr-allowlist
Open

fix: remove pickle hasattr allowlist entries#692
mldangelo wants to merge 3 commits intomainfrom
feat/pickle-hasattr-allowlist

Conversation

@mldangelo
Copy link
Member

@mldangelo mldangelo commented Mar 13, 2026

Summary

  • remove builtins.hasattr and __builtin__.hasattr from the pickle safe-global allowlist
  • keep the change scoped to allowlist hardening while preserving safe builtin constructors like set, slice, and tuple
  • add targeted regressions for GLOBAL, STACK_GLOBAL, memo recall, and benign-first multi-stream cases, with assertions on the actual failing scanner checks

Validation

  • uv run ruff format modelaudit/ tests/
  • uv run ruff check --fix modelaudit/ tests/
  • uv run ruff check modelaudit/ tests/
  • uv run ruff format --check modelaudit/ tests/
  • uv run mypy modelaudit/
  • uv run pytest tests/scanners/test_pickle_scanner.py -q -k "hasattr or builtins or pkgutil or uuid or multi_stream_benign_then_malicious"
  • uv run pytest -n auto -m "not slow and not integration" --maxfail=1

Notes

  • hasattr is not promoted into ALWAYS_DANGEROUS_FUNCTIONS; removing it from ML_SAFE_GLOBALS is sufficient because builtins / __builtin__ already flow through the existing dangerous-module path
  • updated the nearby comment to reflect that policy accurately

Summary by CodeRabbit

  • Bug Fixes

    • Hardened pickle safety: attribute-access primitives (e.g., hasattr) are now treated as dangerous and removed from allowlists.
  • Tests

    • Added comprehensive regression tests covering many pickle scenarios to validate hardened detections and messaging.
  • Documentation

    • Changelog updated with the unreleased security fix.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Walkthrough

Removed hasattr from the pickle scanner ML-safe global allowlists and added extensive regression tests verifying builtins.hasattr and __builtin__.hasattr are flagged CRITICAL across GLOBAL/REDUCE, STACK_GLOBAL, memoization (BINPUT/BINGET), and import-only pickle payload patterns.

Changes

Cohort / File(s) Summary
Scanner
modelaudit/scanners/pickle_scanner.py
Removed hasattr from ML_SAFE_GLOBALS for both __builtin__ and builtins; updated comments to note attribute-access primitives (getattr/setattr/delattr/hasattr) are not ML-safe.
Tests
tests/scanners/test_pickle_scanner.py
Added a helper _craft_global_only_pickle and a comprehensive suite of regression tests (multiple new methods) in TestPickleScannerBlocklistHardening that assert builtins.hasattr and __builtin__.hasattr are classified CRITICAL across import-only, GLOBAL, STACK_GLOBAL, REDUCE, and memoization scenarios; also verifies safe builtins remain allowlisted and dangerous builtins still fail.
Changelog
CHANGELOG.md
Added Unreleased > Fixed entry documenting removal of hasattr from the pickle safe-global allowlist.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nudged through bytes and seams, a careful little hop,
Found hasattr tucked away — I gave its safety a stop.
Now pickles tremble at the sight of my keen nose,
I thump my foot, I twitch my ears, and tidy up the rows.
Carrot for courage, whiskers for the win — hop!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change: removing hasattr from pickle allowlist entries.
Docstring Coverage ✅ Passed Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/pickle-hasattr-allowlist
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
CHANGELOG.md (1)

82-147: ⚠️ Potential issue | 🟡 Minor

Consolidate duplicate ### Fixed sections.

The [Unreleased] section contains two separate ### Fixed headings (lines 82 and 122). According to Keep a Changelog format, there should be only one section per category. Merge all Fixed entries under a single ### Fixed heading.

♻️ Suggested consolidation

Remove the duplicate ### Fixed heading at line 122 and keep all Fixed entries under the first ### Fixed section at line 82.

As per coding guidelines: "Add CHANGELOG entry under the existing [Unreleased] section when adding user-visible changes (Keep a Changelog format)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` around lines 82 - 147, Under the [Unreleased] section there are
two "### Fixed" headings; remove the duplicate "### Fixed" block (the second
one) and merge its bullet entries into the first "### Fixed" list so all fixes
appear under a single "### Fixed" heading; update only the duplicate heading and
its bullets (leave the first "### Fixed" heading, the section title
"[Unreleased]", and other sections intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 698-707: The test test_safe_builtins_remain_allowlisted currently
only asserts no errors; strengthen it by also asserting that none of the
returned issues are from the specific check "Global Module Reference Check" (or
the canonical check name used by your scanner) so regressions in check routing
fail the test. After calling _scan_bytes(...) and asserting result.success,
iterate result.issues (or use a comprehension) and assert no issue.check equals
"Global Module Reference Check" (and/or assert that the set of issue.check
values does not contain that string), referencing the test name
test_safe_builtins_remain_allowlisted and the result/issue fields returned by
_scan_bytes/_craft_global_reduce_pickle.

---

Outside diff comments:
In `@CHANGELOG.md`:
- Around line 82-147: Under the [Unreleased] section there are two "### Fixed"
headings; remove the duplicate "### Fixed" block (the second one) and merge its
bullet entries into the first "### Fixed" list so all fixes appear under a
single "### Fixed" heading; update only the duplicate heading and its bullets
(leave the first "### Fixed" heading, the section title "[Unreleased]", and
other sections intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 22e3687e-b7f5-40e5-9dd0-9dd59fea1fee

📥 Commits

Reviewing files that changed from the base of the PR and between 9431fae and 60aef2d.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • modelaudit/scanners/pickle_scanner.py
  • tests/scanners/test_pickle_scanner.py

@mldangelo mldangelo force-pushed the feat/pickle-hasattr-allowlist branch from 92d97dc to 2fb3890 Compare March 14, 2026 00:24
@mldangelo
Copy link
Member Author

Rebased onto current main, tightened the safe-builtin regression, and reran validation locally.

Changes in this update:

  • keep the hasattr allowlist removal intact on top of current main
  • strengthen test_safe_builtins_remain_allowlisted so it asserts a passed Global Module Reference Check for safe builtins and no failed check on that path
  • preserve the changelog entry under the current single ### Fixed section during rebase

Validation passed:

  • uv run ruff format modelaudit/ tests/
  • uv run ruff check --fix modelaudit/ tests/
  • uv run ruff check modelaudit/ tests/
  • uv run ruff format --check modelaudit/ tests/
  • uv run mypy modelaudit/
  • uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 2180 passed, 57 skipped

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 698-716: The test test_safe_builtins_remain_allowlisted currently
uses result.has_errors (which only reflects CRITICAL issues) and only checks the
"Global Module Reference Check"; change it to assert that there are no checks
with status == CheckStatus.FAILED that mention the specific
builtins.{safe_builtin} across all checks (e.g., filter result.checks for
entries whose message or name references f"builtins.{safe_builtin}" and assert
none have status FAILED), removing the reliance on result.has_errors; keep the
existing assertions that a passed Global Module Reference Check exists for
builtins.{safe_builtin} and that result.success is true.
- Around line 754-772: The test
test_builtins_hasattr_binput_binget_recall_is_critical currently leaves the
original GLOBAL on the pickle stack because the payload uses BINPUT without
popping, so a resolver that ignores BINGET can still pass; modify the payload
bytes used in this test so that after the BINPUT opcode the original callable is
removed (e.g., add a POP immediately after BINPUT) and optionally add a filler
test case (like test_reduce_pattern_detects_memoized_callable) to ensure the
detection only succeeds when memo lookup via BINGET actually occurs; update the
payload variable in the test to include the POP and any needed filler so the
test enforces true memo recall behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1843f3a7-a992-418d-a58f-217dbcfffef6

📥 Commits

Reviewing files that changed from the base of the PR and between 92d97dc and 2fb3890.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • modelaudit/scanners/pickle_scanner.py
  • tests/scanners/test_pickle_scanner.py

Comment on lines +754 to +772
def test_builtins_hasattr_binput_binget_recall_is_critical(self) -> None:
"""Memoized callable recall via BINPUT/BINGET must keep builtins.hasattr dangerous."""
payload = b"\x80\x02cbuiltins\nhasattr\nq\x01h\x01(tR."

result = self._scan_bytes(payload)

assert result.success
assert result.has_errors
failed_reduce_checks = [
check
for check in result.checks
if check.name == "REDUCE Opcode Safety Check"
and check.status == CheckStatus.FAILED
and check.severity == IssueSeverity.CRITICAL
]
assert any(check.details.get("associated_global") == "builtins.hasattr" for check in failed_reduce_checks), (
f"Expected CRITICAL REDUCE finding for builtins.hasattr memo recall, "
f"got: {[check.details for check in failed_reduce_checks]}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This payload does not actually force memo recall.

On Line 756, BINPUT memoizes without popping, so the original GLOBAL builtins.hasattr stays on the stack. A resolver that ignores BINGET and simply falls back to the earlier GLOBAL can still satisfy this test, which means the memo-recall bypass could regress unnoticed. POP the original callable after BINPUT (and ideally add filler, like test_reduce_pattern_detects_memoized_callable) so the test only passes when memo resolution really works.

Proposed fix
-        payload = b"\x80\x02cbuiltins\nhasattr\nq\x01h\x01(tR."
+        # Memoize the callable, drop the original stack reference, then recall it.
+        payload = b"\x80\x02cbuiltins\nhasattr\nq\x010h\x01(tR."

Based on learnings, “Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scanners/test_pickle_scanner.py` around lines 754 - 772, The test
test_builtins_hasattr_binput_binget_recall_is_critical currently leaves the
original GLOBAL on the pickle stack because the payload uses BINPUT without
popping, so a resolver that ignores BINGET can still pass; modify the payload
bytes used in this test so that after the BINPUT opcode the original callable is
removed (e.g., add a POP immediately after BINPUT) and optionally add a filler
test case (like test_reduce_pattern_detects_memoized_callable) to ensure the
detection only succeeds when memo lookup via BINGET actually occurs; update the
payload variable in the test to include the POP and any needed filler so the
test enforces true memo recall behavior.

@mldangelo
Copy link
Member Author

Added stronger differential validation to #692.

What I verified against origin/main:

  • on origin/main, builtins.hasattr and __builtin__.hasattr were incorrectly treated as safe for import-only GLOBAL, REDUCE, STACK_GLOBAL, and safe-then-malicious multi-stream payloads
  • on this branch, those same payloads now fail as critical
  • safe nearby builtins like set, slice, and tuple still pass through Global Module Reference Check without errors

What I added to lock that in:

  • explicit import-only regressions for builtins.hasattr and __builtin__.hasattr
  • explicit import-only false-positive guardrails for safe builtins
  • explicit safe-then-malicious multi-stream import-only regression coverage

Validation passed after the update:

  • uv run pytest tests/scanners/test_pickle_scanner.py -q -k "hasattr or safe_builtins_remain_allowlisted or safe_import_only_builtins_remain_allowlisted"
  • uv run ruff format modelaudit/ tests/
  • uv run ruff check --fix modelaudit/ tests/
  • uv run ruff check modelaudit/ tests/
  • uv run ruff format --check modelaudit/ tests/
  • uv run mypy modelaudit/
  • uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 2184 passed, 57 skipped

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tests/scanners/test_pickle_scanner.py (2)

740-778: ⚠️ Potential issue | 🟡 Minor

Tighten safe-builtin assertions beyond CRITICAL-only and single-check scope.

These tests still allow regressions where builtins.set/slice/tuple fail in non-Global Module Reference Check paths. Assert there are no CheckStatus.FAILED checks mentioning the target builtin across all checks.

Proposed test tightening
     def test_safe_builtins_remain_allowlisted(self) -> None:
         """Safe reconstruction builtins must remain non-failing."""
         for safe_builtin in ["set", "slice", "tuple"]:
             result = self._scan_bytes(self._craft_global_reduce_pickle("builtins", safe_builtin))

             assert result.success
-            assert not result.has_errors, (
-                f"Expected builtins.{safe_builtin} to remain non-failing, got: {[i.message for i in result.issues]}"
-            )
             safe_global_checks = [check for check in result.checks if check.name == "Global Module Reference Check"]
+            failed_builtin_checks = [
+                check
+                for check in result.checks
+                if check.status == CheckStatus.FAILED and f"builtins.{safe_builtin}" in check.message
+            ]
             assert any(
                 check.status == CheckStatus.PASSED and f"builtins.{safe_builtin}" in check.message
                 for check in safe_global_checks
             ), f"Expected passed Global Module Reference Check for builtins.{safe_builtin}"
             assert not any(check.status == CheckStatus.FAILED for check in safe_global_checks), (
                 f"Unexpected failed Global Module Reference Check for builtins.{safe_builtin}: "
                 f"{[check.message for check in safe_global_checks]}"
             )
+            assert not failed_builtin_checks, (
+                f"Expected builtins.{safe_builtin} to stay non-failing across all checks, "
+                f"got: {[check.message for check in failed_builtin_checks]}"
+            )

     def test_safe_import_only_builtins_remain_allowlisted(self) -> None:
         """Safe builtins must remain non-failing for import-only GLOBAL payloads."""
         for safe_builtin in ["set", "slice", "tuple"]:
             result = self._scan_bytes(self._craft_global_only_pickle("builtins", safe_builtin))

             assert result.success
-            assert not result.has_errors, (
-                f"Expected import-only builtins.{safe_builtin} to remain non-failing, "
-                f"got: {[i.message for i in result.issues]}"
-            )
             safe_global_checks = [check for check in result.checks if check.name == "Global Module Reference Check"]
+            failed_builtin_checks = [
+                check
+                for check in result.checks
+                if check.status == CheckStatus.FAILED and f"builtins.{safe_builtin}" in check.message
+            ]
             assert any(
                 check.status == CheckStatus.PASSED and f"builtins.{safe_builtin}" in check.message
                 for check in safe_global_checks
             ), f"Expected passed import-only Global Module Reference Check for builtins.{safe_builtin}"
             assert not any(check.status == CheckStatus.FAILED for check in safe_global_checks), (
                 f"Unexpected failed import-only Global Module Reference Check for builtins.{safe_builtin}: "
                 f"{[check.message for check in safe_global_checks]}"
             )
+            assert not failed_builtin_checks, (
+                f"Expected import-only builtins.{safe_builtin} to stay non-failing across all checks, "
+                f"got: {[check.message for check in failed_builtin_checks]}"
+            )

Based on learnings, “Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scanners/test_pickle_scanner.py` around lines 740 - 778, The tests
test_safe_builtins_remain_allowlisted and
test_safe_import_only_builtins_remain_allowlisted only verify the "Global Module
Reference Check" and CRITICAL-only paths, allowing regressions where builtins
like builtins.set/slice/tuple fail in other checks; update each test to also
assert that across all result.checks there is no CheckStatus.FAILED entry whose
message contains the target "builtins.<name>" (and keep the existing assertions
that a passed global check mentions the builtin) so any failed check referencing
the builtin will cause the test to fail.

816-819: ⚠️ Potential issue | 🟠 Major

This payload still does not force true memo recall via BINGET.

BINPUT memoizes but keeps the original callable on the stack, so scanners can pass without actually resolving memo recall. Add POP immediately after BINPUT to make BINGET resolution mandatory.

Proposed payload fix
-        payload = b"\x80\x02cbuiltins\nhasattr\nq\x01h\x01(tR."
+        payload = b"\x80\x02cbuiltins\nhasattr\nq\x010h\x01(tR."
#!/bin/bash
python - <<'PY'
import pickletools

payload_current = b"\x80\x02cbuiltins\nhasattr\nq\x01h\x01(tR."
payload_fixed = b"\x80\x02cbuiltins\nhasattr\nq\x010h\x01(tR."

print("Current payload disassembly:")
pickletools.dis(payload_current)
print("\nFixed payload disassembly:")
pickletools.dis(payload_fixed)
PY

Based on learnings, “Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scanners/test_pickle_scanner.py` around lines 816 - 819, The test
payload in test_builtins_hasattr_binput_binget_recall_is_critical uses BINPUT
but does not force memo recall via BINGET; insert a POP opcode immediately after
the BINPUT in the payload (i.e., add the byte 0x30 right after the q\x01
sequence) so the scanner must resolve the memo via BINGET — update the payload
variable in that test (payload = b"...") accordingly to include the POP byte.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 740-778: The tests test_safe_builtins_remain_allowlisted and
test_safe_import_only_builtins_remain_allowlisted only verify the "Global Module
Reference Check" and CRITICAL-only paths, allowing regressions where builtins
like builtins.set/slice/tuple fail in other checks; update each test to also
assert that across all result.checks there is no CheckStatus.FAILED entry whose
message contains the target "builtins.<name>" (and keep the existing assertions
that a passed global check mentions the builtin) so any failed check referencing
the builtin will cause the test to fail.
- Around line 816-819: The test payload in
test_builtins_hasattr_binput_binget_recall_is_critical uses BINPUT but does not
force memo recall via BINGET; insert a POP opcode immediately after the BINPUT
in the payload (i.e., add the byte 0x30 right after the q\x01 sequence) so the
scanner must resolve the memo via BINGET — update the payload variable in that
test (payload = b"...") accordingly to include the POP byte.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4f770595-9372-4886-a81c-540534716c69

📥 Commits

Reviewing files that changed from the base of the PR and between 2fb3890 and cef7315.

📒 Files selected for processing (1)
  • tests/scanners/test_pickle_scanner.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant