fix: enforce skops ZIP entry and archive uncompressed size limits to prevent decompression-bomb DoS by mldangelo · Pull Request #702 · promptfoo/modelaudit

mldangelo · 2026-03-13T22:48:55Z

Motivation

The Skops scanner previously declared max_skops_file_size but never enforced it, while calling zip_file.read() on every entry which enables a zip-decompression bomb memory exhaustion attack.
The change aims to prevent untrusted .skops archives from forcing large uncompressed allocations during scanning and to make size limits effective.

Description

Add max_zip_entry_read_size (default 10 MB) and a helper _read_zip_entry_safely() that opens entries and bounds reads, returning None for oversized entries.
Replace direct zip_file.read(...) calls in _detect_cve_2025_54412, _detect_cve_2025_54413, _detect_cve_2025_54886, _check_protocol_version, and _check_unsafe_joblib_fallback with the safe reader and skip oversized entries.
Enforce archive-level uncompressed size by summing file_info.file_size for all entries and compare against max_skops_file_size, failing the scan early if exceeded and recording archive_uncompressed_size in metadata.
Add unit tests covering the new behavior and resilience: test_rejects_archive_exceeding_uncompressed_size_limit and test_skips_oversized_readme_entry_without_crashing and keep existing decompression-bomb file-count test.

Testing

Ran the scanner unit tests for the skops scanner with pytest tests/scanners/test_skops_scanner.py and the modified test suite passed.
Added test_rejects_archive_exceeding_uncompressed_size_limit which asserts archives whose total uncompressed size exceed max_skops_file_size fail.
Added test_skips_oversized_readme_entry_without_crashing which asserts oversized entries are skipped by bounded reads and do not trigger CVE false positives or crashes.

Codex Task

Summary by CodeRabbit

Security & Stability
- Archive scanner now enforces size limits on individual entries and total uncompressed archive size to prevent processing of excessively large files
- Improved resilience when processing archives with oversized entries—scanner continues processing rather than failing
- Enhanced handling of problematic archive files to prevent unexpected crashes

coderabbitai · 2026-03-13T22:49:15Z

Walkthrough

The changes add decompression bomb protections to SkopsScanner by introducing a safe ZIP entry reader with per-entry read size limits and cumulative archive uncompressed size validation. Updated scanning paths handle oversized entries gracefully, and new tests validate boundary condition enforcement.

Changes

Cohort / File(s)	Summary
Safe ZIP Entry Reader `modelaudit/scanners/skops_scanner.py`	Added `_read_zip_entry_safely()` method with per-entry read size limit (max_zip_entry_read_size). Integrated into CVE detection and protocol/version checks to replace direct ZIP reads. Returns None for oversized entries, which are skipped by callers. Added Archive Uncompressed Size Limit check in scan method with cumulative size calculation.
Edge Case Testing `tests/scanners/test_skops_scanner.py`	Added two new boundary condition tests: one verifying archive uncompressed size limit enforcement causes scan failure, and another confirming oversized entries are safely skipped without false CVE-2025-54886 positives.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 ZIP bombs hop no more,
Safe readers guard the door,
Limits checked, sizes bound,
Archive threats unwound!
Decompression fears are shed, ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: enforcing ZIP entry and archive uncompressed size limits in the Skops scanner to prevent decompression-bomb DoS attacks, which directly matches the primary objective of the pull request.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch codex/fix-skops-scanner-decompression-size-limits

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/skops_scanner.py`:
- Around line 31-42: The _read_zip_entry_safely function currently returns None
for oversized entries and callers silently continue; modify it to record skipped
entries and mark the scan as partial: when file_info.file_size >
self.max_zip_entry_read_size or the read exceeds the limit, append the file
name/path to a new self._skipped_zip_entries list (or similar attribute) and
ensure the scanner emits a metadata flag (e.g., partial_scan=True and
skipped_entries list) in the final scan result; update any callers that iterate
zip entries to still record the skipped name from the return or from
self._skipped_zip_entries so results/reporting include the skipped files and a
clear partial-scan indicator.
- Line 29: Clamp the configured ZIP entry read size so it can never exceed 10
MB: replace the current assignment to self.max_zip_entry_read_size with logic
that reads the configured value via self.config.get("max_zip_entry_read_size",
10 * 1024 * 1024) and sets self.max_zip_entry_read_size to the minimum of that
value and 10 * 1024 * 1024 (thus allowing only lowered values), keeping the 10
MB default when unset; update the assignment in scanners/skops_scanner.py where
self.max_zip_entry_read_size is set.

In `@tests/scanners/test_skops_scanner.py`:
- Around line 359-371: Update the
test_skips_oversized_readme_entry_without_crashing to also assert that the
scanner reports the skipped oversized entry (e.g., via a specific check or
metadata field on result) so partial-scan visibility is verified: after calling
SkopsScanner.scan(...) inspect result.checks and result.metadata (or the
scanner's skip-reporting check name) and assert there is a check or metadata
entry indicating README.md was skipped due to exceeding max_zip_entry_read_size,
while still asserting no failed CVE-2025-54886 checks; reference test function
name, SkopsScanner, result.checks, result.metadata, and CheckStatus to locate
and add the new assertions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f047d18d-a761-4cd2-85fe-8dc5a8ca6b4e

📥 Commits

Reviewing files that changed from the base of the PR and between 698859d and 6ef431d.

📒 Files selected for processing (2)

modelaudit/scanners/skops_scanner.py
tests/scanners/test_skops_scanner.py

coderabbitai · 2026-03-13T22:53:28Z

modelaudit/scanners/skops_scanner.py

        # Security limits for decompression bomb protection
        self.max_file_size = self.config.get("max_skops_file_size", 500 * 1024 * 1024)  # 500MB
        self.max_files_in_archive = self.config.get("max_files_in_archive", 10000)
+        self.max_zip_entry_read_size = self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024)


⚠️ Potential issue | 🟠 Major

Enforce a hard 10 MB upper bound for ZIP entry reads.

Line 29 accepts arbitrary larger config values, which weakens the decompression-bomb protection goal. Clamp to 10 MB and only allow lowering it.

🔧 Proposed fix

class SkopsScanner(BaseScanner): """Scanner for skops serialized files (.skops format).""" + MAX_ZIP_ENTRY_READ_SIZE: ClassVar[int] = 10 * 1024 * 1024 + name = "skops" @@ - self.max_zip_entry_read_size = self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024) + configured_zip_entry_limit = int( + self.config.get("max_zip_entry_read_size", self.MAX_ZIP_ENTRY_READ_SIZE) + ) + self.max_zip_entry_read_size = min(configured_zip_entry_limit, self.MAX_ZIP_ENTRY_READ_SIZE)

As per coding guidelines: "Cap archive member reads to 10 MB for metadata validation to prevent memory spikes on large pickles".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/skops_scanner.py` at line 29, Clamp the configured ZIP entry read size so it can never exceed 10 MB: replace the current assignment to self.max_zip_entry_read_size with logic that reads the configured value via self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024) and sets self.max_zip_entry_read_size to the minimum of that value and 10 * 1024 * 1024 (thus allowing only lowered values), keeping the 10 MB default when unset; update the assignment in scanners/skops_scanner.py where self.max_zip_entry_read_size is set.

coderabbitai · 2026-03-13T22:53:28Z

modelaudit/scanners/skops_scanner.py

+    def _read_zip_entry_safely(self, zip_file: zipfile.ZipFile, file_info: zipfile.ZipInfo) -> bytes | None:
+        """Read a ZIP entry with a bounded memory limit."""
+        if file_info.file_size > self.max_zip_entry_read_size:
+            return None
+
+        with zip_file.open(file_info, "r") as entry:
+            content = entry.read(self.max_zip_entry_read_size + 1)
+
+        if len(content) > self.max_zip_entry_read_size:
+            return None
+
+        return content


⚠️ Potential issue | 🟠 Major

Don’t silently skip oversized entries without reporting reduced coverage.

Lines 33-40 return None, and callers silently continue. That can yield a successful scan while unread oversized files may contain exploit indicators. Record skipped entries and emit a dedicated check/metadata flag for partial scan visibility.

🔧 Proposed fix

def __init__(self, config: dict[str, Any] | None = None): super().__init__(config) @@ self.max_zip_entry_read_size = self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024) + self._skipped_oversized_entries: list[str] = [] def _read_zip_entry_safely(self, zip_file: zipfile.ZipFile, file_info: zipfile.ZipInfo) -> bytes | None: """Read a ZIP entry with a bounded memory limit.""" if file_info.file_size > self.max_zip_entry_read_size: + self._skipped_oversized_entries.append(file_info.filename) return None @@ if len(content) > self.max_zip_entry_read_size: + self._skipped_oversized_entries.append(file_info.filename) return None @@ def scan(self, path: str) -> ScanResult: @@ + self._skipped_oversized_entries = [] @@ self._check_unsafe_joblib_fallback(zip_file, result, path) + + if self._skipped_oversized_entries: + result.add_check( + name="Oversized ZIP Entries Skipped", + passed=False, + message=( + f"Skipped {len(self._skipped_oversized_entries)} oversized archive entries; " + "security detection coverage is partial" + ), + severity=IssueSeverity.WARNING, + location=path, + details={ + "max_zip_entry_read_size": self.max_zip_entry_read_size, + "skipped_entries": self._skipped_oversized_entries[:20], + "skipped_count": len(self._skipped_oversized_entries), + }, + )

Based on learnings: "Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/skops_scanner.py` around lines 31 - 42, The _read_zip_entry_safely function currently returns None for oversized entries and callers silently continue; modify it to record skipped entries and mark the scan as partial: when file_info.file_size > self.max_zip_entry_read_size or the read exceeds the limit, append the file name/path to a new self._skipped_zip_entries list (or similar attribute) and ensure the scanner emits a metadata flag (e.g., partial_scan=True and skipped_entries list) in the final scan result; update any callers that iterate zip entries to still record the skipped name from the return or from self._skipped_zip_entries so results/reporting include the skipped files and a clear partial-scan indicator.

coderabbitai · 2026-03-13T22:53:28Z

tests/scanners/test_skops_scanner.py

+    def test_skips_oversized_readme_entry_without_crashing(self, tmp_path: Path) -> None:
+        """Oversized archive entries should be skipped by bounded reads."""
+        skops_file = tmp_path / "oversized_readme.skops"
+        with zipfile.ZipFile(skops_file, "w", compression=zipfile.ZIP_DEFLATED) as zf:
+            zf.writestr("README.md", "get_model via joblib.load" * 512)
+            zf.writestr("schema.json", '{"version": "1.0"}')
+
+        scanner = SkopsScanner(config={"max_zip_entry_read_size": 128, "max_skops_file_size": 10 * 1024 * 1024})
+        result = scanner.scan(str(skops_file))
+
+        assert result.success is True
+        cve_checks = [c for c in result.checks if "CVE-2025-54886" in c.name and c.status == CheckStatus.FAILED]
+        assert len(cve_checks) == 0


🧹 Nitpick | 🔵 Trivial

Strengthen this test to assert partial-scan visibility for oversized entries.

Right now it only verifies “no crash/no CVE flag.” Once skipped-entry reporting is added, assert that check/metadata too, so silent detection gaps can’t regress.

Based on learnings: "Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/scanners/test_skops_scanner.py` around lines 359 - 371, Update the test_skips_oversized_readme_entry_without_crashing to also assert that the scanner reports the skipped oversized entry (e.g., via a specific check or metadata field on result) so partial-scan visibility is verified: after calling SkopsScanner.scan(...) inspect result.checks and result.metadata (or the scanner's skip-reporting check name) and assert there is a check or metadata entry indicating README.md was skipped due to exceeding max_zip_entry_read_size, while still asserting no failed CVE-2025-54886 checks; reference test function name, SkopsScanner, result.checks, result.metadata, and CheckStatus to locate and add the new assertions.

fix: bound skops zip entry reads and enforce uncompressed size limit

6ef431d

mldangelo added codex aardvark labels Mar 13, 2026 — with ChatGPT Codex Connector

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

mldangelo merged commit a91577d into main Mar 14, 2026
26 checks passed

mldangelo deleted the codex/fix-skops-scanner-decompression-size-limits branch March 14, 2026 07:04

github-actions bot mentioned this pull request Mar 13, 2026

chore(main): release 0.2.28 #647

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enforce skops ZIP entry and archive uncompressed size limits to prevent decompression-bomb DoS#702

fix: enforce skops ZIP entry and archive uncompressed size limits to prevent decompression-bomb DoS#702
mldangelo merged 1 commit intomainfrom
codex/fix-skops-scanner-decompression-size-limits

mldangelo commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mldangelo commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mldangelo commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading