Skip to content

fix: enforce skops ZIP entry and archive uncompressed size limits to prevent decompression-bomb DoS#702

Merged
mldangelo merged 1 commit intomainfrom
codex/fix-skops-scanner-decompression-size-limits
Mar 14, 2026
Merged

fix: enforce skops ZIP entry and archive uncompressed size limits to prevent decompression-bomb DoS#702
mldangelo merged 1 commit intomainfrom
codex/fix-skops-scanner-decompression-size-limits

Conversation

@mldangelo
Copy link
Member

@mldangelo mldangelo commented Mar 13, 2026

Motivation

  • The Skops scanner previously declared max_skops_file_size but never enforced it, while calling zip_file.read() on every entry which enables a zip-decompression bomb memory exhaustion attack.
  • The change aims to prevent untrusted .skops archives from forcing large uncompressed allocations during scanning and to make size limits effective.

Description

  • Add max_zip_entry_read_size (default 10 MB) and a helper _read_zip_entry_safely() that opens entries and bounds reads, returning None for oversized entries.
  • Replace direct zip_file.read(...) calls in _detect_cve_2025_54412, _detect_cve_2025_54413, _detect_cve_2025_54886, _check_protocol_version, and _check_unsafe_joblib_fallback with the safe reader and skip oversized entries.
  • Enforce archive-level uncompressed size by summing file_info.file_size for all entries and compare against max_skops_file_size, failing the scan early if exceeded and recording archive_uncompressed_size in metadata.
  • Add unit tests covering the new behavior and resilience: test_rejects_archive_exceeding_uncompressed_size_limit and test_skips_oversized_readme_entry_without_crashing and keep existing decompression-bomb file-count test.

Testing

  • Ran the scanner unit tests for the skops scanner with pytest tests/scanners/test_skops_scanner.py and the modified test suite passed.
  • Added test_rejects_archive_exceeding_uncompressed_size_limit which asserts archives whose total uncompressed size exceed max_skops_file_size fail.
  • Added test_skips_oversized_readme_entry_without_crashing which asserts oversized entries are skipped by bounded reads and do not trigger CVE false positives or crashes.

Codex Task

Summary by CodeRabbit

  • Security & Stability
    • Archive scanner now enforces size limits on individual entries and total uncompressed archive size to prevent processing of excessively large files
    • Improved resilience when processing archives with oversized entries—scanner continues processing rather than failing
    • Enhanced handling of problematic archive files to prevent unexpected crashes

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Walkthrough

The changes add decompression bomb protections to SkopsScanner by introducing a safe ZIP entry reader with per-entry read size limits and cumulative archive uncompressed size validation. Updated scanning paths handle oversized entries gracefully, and new tests validate boundary condition enforcement.

Changes

Cohort / File(s) Summary
Safe ZIP Entry Reader
modelaudit/scanners/skops_scanner.py
Added _read_zip_entry_safely() method with per-entry read size limit (max_zip_entry_read_size). Integrated into CVE detection and protocol/version checks to replace direct ZIP reads. Returns None for oversized entries, which are skipped by callers. Added Archive Uncompressed Size Limit check in scan method with cumulative size calculation.
Edge Case Testing
tests/scanners/test_skops_scanner.py
Added two new boundary condition tests: one verifying archive uncompressed size limit enforcement causes scan failure, and another confirming oversized entries are safely skipped without false CVE-2025-54886 positives.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 ZIP bombs hop no more,
Safe readers guard the door,
Limits checked, sizes bound,
Archive threats unwound!
Decompression fears are shed,

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: enforcing ZIP entry and archive uncompressed size limits in the Skops scanner to prevent decompression-bomb DoS attacks, which directly matches the primary objective of the pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/fix-skops-scanner-decompression-size-limits
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/skops_scanner.py`:
- Around line 31-42: The _read_zip_entry_safely function currently returns None
for oversized entries and callers silently continue; modify it to record skipped
entries and mark the scan as partial: when file_info.file_size >
self.max_zip_entry_read_size or the read exceeds the limit, append the file
name/path to a new self._skipped_zip_entries list (or similar attribute) and
ensure the scanner emits a metadata flag (e.g., partial_scan=True and
skipped_entries list) in the final scan result; update any callers that iterate
zip entries to still record the skipped name from the return or from
self._skipped_zip_entries so results/reporting include the skipped files and a
clear partial-scan indicator.
- Line 29: Clamp the configured ZIP entry read size so it can never exceed 10
MB: replace the current assignment to self.max_zip_entry_read_size with logic
that reads the configured value via self.config.get("max_zip_entry_read_size",
10 * 1024 * 1024) and sets self.max_zip_entry_read_size to the minimum of that
value and 10 * 1024 * 1024 (thus allowing only lowered values), keeping the 10
MB default when unset; update the assignment in scanners/skops_scanner.py where
self.max_zip_entry_read_size is set.

In `@tests/scanners/test_skops_scanner.py`:
- Around line 359-371: Update the
test_skips_oversized_readme_entry_without_crashing to also assert that the
scanner reports the skipped oversized entry (e.g., via a specific check or
metadata field on result) so partial-scan visibility is verified: after calling
SkopsScanner.scan(...) inspect result.checks and result.metadata (or the
scanner's skip-reporting check name) and assert there is a check or metadata
entry indicating README.md was skipped due to exceeding max_zip_entry_read_size,
while still asserting no failed CVE-2025-54886 checks; reference test function
name, SkopsScanner, result.checks, result.metadata, and CheckStatus to locate
and add the new assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f047d18d-a761-4cd2-85fe-8dc5a8ca6b4e

📥 Commits

Reviewing files that changed from the base of the PR and between 698859d and 6ef431d.

📒 Files selected for processing (2)
  • modelaudit/scanners/skops_scanner.py
  • tests/scanners/test_skops_scanner.py

# Security limits for decompression bomb protection
self.max_file_size = self.config.get("max_skops_file_size", 500 * 1024 * 1024) # 500MB
self.max_files_in_archive = self.config.get("max_files_in_archive", 10000)
self.max_zip_entry_read_size = self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce a hard 10 MB upper bound for ZIP entry reads.

Line 29 accepts arbitrary larger config values, which weakens the decompression-bomb protection goal. Clamp to 10 MB and only allow lowering it.

🔧 Proposed fix
 class SkopsScanner(BaseScanner):
     """Scanner for skops serialized files (.skops format)."""

+    MAX_ZIP_ENTRY_READ_SIZE: ClassVar[int] = 10 * 1024 * 1024
+
     name = "skops"
@@
-        self.max_zip_entry_read_size = self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024)
+        configured_zip_entry_limit = int(
+            self.config.get("max_zip_entry_read_size", self.MAX_ZIP_ENTRY_READ_SIZE)
+        )
+        self.max_zip_entry_read_size = min(configured_zip_entry_limit, self.MAX_ZIP_ENTRY_READ_SIZE)

As per coding guidelines: "Cap archive member reads to 10 MB for metadata validation to prevent memory spikes on large pickles".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/skops_scanner.py` at line 29, Clamp the configured ZIP
entry read size so it can never exceed 10 MB: replace the current assignment to
self.max_zip_entry_read_size with logic that reads the configured value via
self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024) and sets
self.max_zip_entry_read_size to the minimum of that value and 10 * 1024 * 1024
(thus allowing only lowered values), keeping the 10 MB default when unset;
update the assignment in scanners/skops_scanner.py where
self.max_zip_entry_read_size is set.

Comment on lines +31 to +42
def _read_zip_entry_safely(self, zip_file: zipfile.ZipFile, file_info: zipfile.ZipInfo) -> bytes | None:
"""Read a ZIP entry with a bounded memory limit."""
if file_info.file_size > self.max_zip_entry_read_size:
return None

with zip_file.open(file_info, "r") as entry:
content = entry.read(self.max_zip_entry_read_size + 1)

if len(content) > self.max_zip_entry_read_size:
return None

return content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t silently skip oversized entries without reporting reduced coverage.

Lines 33-40 return None, and callers silently continue. That can yield a successful scan while unread oversized files may contain exploit indicators. Record skipped entries and emit a dedicated check/metadata flag for partial scan visibility.

🔧 Proposed fix
     def __init__(self, config: dict[str, Any] | None = None):
         super().__init__(config)
@@
         self.max_zip_entry_read_size = self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024)
+        self._skipped_oversized_entries: list[str] = []

     def _read_zip_entry_safely(self, zip_file: zipfile.ZipFile, file_info: zipfile.ZipInfo) -> bytes | None:
         """Read a ZIP entry with a bounded memory limit."""
         if file_info.file_size > self.max_zip_entry_read_size:
+            self._skipped_oversized_entries.append(file_info.filename)
             return None
@@
         if len(content) > self.max_zip_entry_read_size:
+            self._skipped_oversized_entries.append(file_info.filename)
             return None
@@
     def scan(self, path: str) -> ScanResult:
@@
+        self._skipped_oversized_entries = []
@@
                 self._check_unsafe_joblib_fallback(zip_file, result, path)
+
+                if self._skipped_oversized_entries:
+                    result.add_check(
+                        name="Oversized ZIP Entries Skipped",
+                        passed=False,
+                        message=(
+                            f"Skipped {len(self._skipped_oversized_entries)} oversized archive entries; "
+                            "security detection coverage is partial"
+                        ),
+                        severity=IssueSeverity.WARNING,
+                        location=path,
+                        details={
+                            "max_zip_entry_read_size": self.max_zip_entry_read_size,
+                            "skipped_entries": self._skipped_oversized_entries[:20],
+                            "skipped_count": len(self._skipped_oversized_entries),
+                        },
+                    )

Based on learnings: "Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/skops_scanner.py` around lines 31 - 42, The
_read_zip_entry_safely function currently returns None for oversized entries and
callers silently continue; modify it to record skipped entries and mark the scan
as partial: when file_info.file_size > self.max_zip_entry_read_size or the read
exceeds the limit, append the file name/path to a new self._skipped_zip_entries
list (or similar attribute) and ensure the scanner emits a metadata flag (e.g.,
partial_scan=True and skipped_entries list) in the final scan result; update any
callers that iterate zip entries to still record the skipped name from the
return or from self._skipped_zip_entries so results/reporting include the
skipped files and a clear partial-scan indicator.

Comment on lines +359 to +371
def test_skips_oversized_readme_entry_without_crashing(self, tmp_path: Path) -> None:
"""Oversized archive entries should be skipped by bounded reads."""
skops_file = tmp_path / "oversized_readme.skops"
with zipfile.ZipFile(skops_file, "w", compression=zipfile.ZIP_DEFLATED) as zf:
zf.writestr("README.md", "get_model via joblib.load" * 512)
zf.writestr("schema.json", '{"version": "1.0"}')

scanner = SkopsScanner(config={"max_zip_entry_read_size": 128, "max_skops_file_size": 10 * 1024 * 1024})
result = scanner.scan(str(skops_file))

assert result.success is True
cve_checks = [c for c in result.checks if "CVE-2025-54886" in c.name and c.status == CheckStatus.FAILED]
assert len(cve_checks) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Strengthen this test to assert partial-scan visibility for oversized entries.

Right now it only verifies “no crash/no CVE flag.” Once skipped-entry reporting is added, assert that check/metadata too, so silent detection gaps can’t regress.

Based on learnings: "Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scanners/test_skops_scanner.py` around lines 359 - 371, Update the
test_skips_oversized_readme_entry_without_crashing to also assert that the
scanner reports the skipped oversized entry (e.g., via a specific check or
metadata field on result) so partial-scan visibility is verified: after calling
SkopsScanner.scan(...) inspect result.checks and result.metadata (or the
scanner's skip-reporting check name) and assert there is a check or metadata
entry indicating README.md was skipped due to exceeding max_zip_entry_read_size,
while still asserting no failed CVE-2025-54886 checks; reference test function
name, SkopsScanner, result.checks, result.metadata, and CheckStatus to locate
and add the new assertions.

@mldangelo mldangelo merged commit a91577d into main Mar 14, 2026
26 checks passed
@mldangelo mldangelo deleted the codex/fix-skops-scanner-decompression-size-limits branch March 14, 2026 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant