Conversation
WalkthroughThe changes add decompression bomb protections to SkopsScanner by introducing a safe ZIP entry reader with per-entry read size limits and cumulative archive uncompressed size validation. Updated scanning paths handle oversized entries gracefully, and new tests validate boundary condition enforcement. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelaudit/scanners/skops_scanner.py`:
- Around line 31-42: The _read_zip_entry_safely function currently returns None
for oversized entries and callers silently continue; modify it to record skipped
entries and mark the scan as partial: when file_info.file_size >
self.max_zip_entry_read_size or the read exceeds the limit, append the file
name/path to a new self._skipped_zip_entries list (or similar attribute) and
ensure the scanner emits a metadata flag (e.g., partial_scan=True and
skipped_entries list) in the final scan result; update any callers that iterate
zip entries to still record the skipped name from the return or from
self._skipped_zip_entries so results/reporting include the skipped files and a
clear partial-scan indicator.
- Line 29: Clamp the configured ZIP entry read size so it can never exceed 10
MB: replace the current assignment to self.max_zip_entry_read_size with logic
that reads the configured value via self.config.get("max_zip_entry_read_size",
10 * 1024 * 1024) and sets self.max_zip_entry_read_size to the minimum of that
value and 10 * 1024 * 1024 (thus allowing only lowered values), keeping the 10
MB default when unset; update the assignment in scanners/skops_scanner.py where
self.max_zip_entry_read_size is set.
In `@tests/scanners/test_skops_scanner.py`:
- Around line 359-371: Update the
test_skips_oversized_readme_entry_without_crashing to also assert that the
scanner reports the skipped oversized entry (e.g., via a specific check or
metadata field on result) so partial-scan visibility is verified: after calling
SkopsScanner.scan(...) inspect result.checks and result.metadata (or the
scanner's skip-reporting check name) and assert there is a check or metadata
entry indicating README.md was skipped due to exceeding max_zip_entry_read_size,
while still asserting no failed CVE-2025-54886 checks; reference test function
name, SkopsScanner, result.checks, result.metadata, and CheckStatus to locate
and add the new assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f047d18d-a761-4cd2-85fe-8dc5a8ca6b4e
📒 Files selected for processing (2)
modelaudit/scanners/skops_scanner.pytests/scanners/test_skops_scanner.py
| # Security limits for decompression bomb protection | ||
| self.max_file_size = self.config.get("max_skops_file_size", 500 * 1024 * 1024) # 500MB | ||
| self.max_files_in_archive = self.config.get("max_files_in_archive", 10000) | ||
| self.max_zip_entry_read_size = self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024) |
There was a problem hiding this comment.
Enforce a hard 10 MB upper bound for ZIP entry reads.
Line 29 accepts arbitrary larger config values, which weakens the decompression-bomb protection goal. Clamp to 10 MB and only allow lowering it.
🔧 Proposed fix
class SkopsScanner(BaseScanner):
"""Scanner for skops serialized files (.skops format)."""
+ MAX_ZIP_ENTRY_READ_SIZE: ClassVar[int] = 10 * 1024 * 1024
+
name = "skops"
@@
- self.max_zip_entry_read_size = self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024)
+ configured_zip_entry_limit = int(
+ self.config.get("max_zip_entry_read_size", self.MAX_ZIP_ENTRY_READ_SIZE)
+ )
+ self.max_zip_entry_read_size = min(configured_zip_entry_limit, self.MAX_ZIP_ENTRY_READ_SIZE)As per coding guidelines: "Cap archive member reads to 10 MB for metadata validation to prevent memory spikes on large pickles".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelaudit/scanners/skops_scanner.py` at line 29, Clamp the configured ZIP
entry read size so it can never exceed 10 MB: replace the current assignment to
self.max_zip_entry_read_size with logic that reads the configured value via
self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024) and sets
self.max_zip_entry_read_size to the minimum of that value and 10 * 1024 * 1024
(thus allowing only lowered values), keeping the 10 MB default when unset;
update the assignment in scanners/skops_scanner.py where
self.max_zip_entry_read_size is set.
| def _read_zip_entry_safely(self, zip_file: zipfile.ZipFile, file_info: zipfile.ZipInfo) -> bytes | None: | ||
| """Read a ZIP entry with a bounded memory limit.""" | ||
| if file_info.file_size > self.max_zip_entry_read_size: | ||
| return None | ||
|
|
||
| with zip_file.open(file_info, "r") as entry: | ||
| content = entry.read(self.max_zip_entry_read_size + 1) | ||
|
|
||
| if len(content) > self.max_zip_entry_read_size: | ||
| return None | ||
|
|
||
| return content |
There was a problem hiding this comment.
Don’t silently skip oversized entries without reporting reduced coverage.
Lines 33-40 return None, and callers silently continue. That can yield a successful scan while unread oversized files may contain exploit indicators. Record skipped entries and emit a dedicated check/metadata flag for partial scan visibility.
🔧 Proposed fix
def __init__(self, config: dict[str, Any] | None = None):
super().__init__(config)
@@
self.max_zip_entry_read_size = self.config.get("max_zip_entry_read_size", 10 * 1024 * 1024)
+ self._skipped_oversized_entries: list[str] = []
def _read_zip_entry_safely(self, zip_file: zipfile.ZipFile, file_info: zipfile.ZipInfo) -> bytes | None:
"""Read a ZIP entry with a bounded memory limit."""
if file_info.file_size > self.max_zip_entry_read_size:
+ self._skipped_oversized_entries.append(file_info.filename)
return None
@@
if len(content) > self.max_zip_entry_read_size:
+ self._skipped_oversized_entries.append(file_info.filename)
return None
@@
def scan(self, path: str) -> ScanResult:
@@
+ self._skipped_oversized_entries = []
@@
self._check_unsafe_joblib_fallback(zip_file, result, path)
+
+ if self._skipped_oversized_entries:
+ result.add_check(
+ name="Oversized ZIP Entries Skipped",
+ passed=False,
+ message=(
+ f"Skipped {len(self._skipped_oversized_entries)} oversized archive entries; "
+ "security detection coverage is partial"
+ ),
+ severity=IssueSeverity.WARNING,
+ location=path,
+ details={
+ "max_zip_entry_read_size": self.max_zip_entry_read_size,
+ "skipped_entries": self._skipped_oversized_entries[:20],
+ "skipped_count": len(self._skipped_oversized_entries),
+ },
+ )Based on learnings: "Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelaudit/scanners/skops_scanner.py` around lines 31 - 42, The
_read_zip_entry_safely function currently returns None for oversized entries and
callers silently continue; modify it to record skipped entries and mark the scan
as partial: when file_info.file_size > self.max_zip_entry_read_size or the read
exceeds the limit, append the file name/path to a new self._skipped_zip_entries
list (or similar attribute) and ensure the scanner emits a metadata flag (e.g.,
partial_scan=True and skipped_entries list) in the final scan result; update any
callers that iterate zip entries to still record the skipped name from the
return or from self._skipped_zip_entries so results/reporting include the
skipped files and a clear partial-scan indicator.
| def test_skips_oversized_readme_entry_without_crashing(self, tmp_path: Path) -> None: | ||
| """Oversized archive entries should be skipped by bounded reads.""" | ||
| skops_file = tmp_path / "oversized_readme.skops" | ||
| with zipfile.ZipFile(skops_file, "w", compression=zipfile.ZIP_DEFLATED) as zf: | ||
| zf.writestr("README.md", "get_model via joblib.load" * 512) | ||
| zf.writestr("schema.json", '{"version": "1.0"}') | ||
|
|
||
| scanner = SkopsScanner(config={"max_zip_entry_read_size": 128, "max_skops_file_size": 10 * 1024 * 1024}) | ||
| result = scanner.scan(str(skops_file)) | ||
|
|
||
| assert result.success is True | ||
| cve_checks = [c for c in result.checks if "CVE-2025-54886" in c.name and c.status == CheckStatus.FAILED] | ||
| assert len(cve_checks) == 0 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Strengthen this test to assert partial-scan visibility for oversized entries.
Right now it only verifies “no crash/no CVE flag.” Once skipped-entry reporting is added, assert that check/metadata too, so silent detection gaps can’t regress.
Based on learnings: "Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/scanners/test_skops_scanner.py` around lines 359 - 371, Update the
test_skips_oversized_readme_entry_without_crashing to also assert that the
scanner reports the skipped oversized entry (e.g., via a specific check or
metadata field on result) so partial-scan visibility is verified: after calling
SkopsScanner.scan(...) inspect result.checks and result.metadata (or the
scanner's skip-reporting check name) and assert there is a check or metadata
entry indicating README.md was skipped due to exceeding max_zip_entry_read_size,
while still asserting no failed CVE-2025-54886 checks; reference test function
name, SkopsScanner, result.checks, result.metadata, and CheckStatus to locate
and add the new assertions.
Motivation
max_skops_file_sizebut never enforced it, while callingzip_file.read()on every entry which enables a zip-decompression bomb memory exhaustion attack.Description
max_zip_entry_read_size(default 10 MB) and a helper_read_zip_entry_safely()that opens entries and bounds reads, returningNonefor oversized entries.zip_file.read(...)calls in_detect_cve_2025_54412,_detect_cve_2025_54413,_detect_cve_2025_54886,_check_protocol_version, and_check_unsafe_joblib_fallbackwith the safe reader and skip oversized entries.file_info.file_sizefor all entries and compare againstmax_skops_file_size, failing the scan early if exceeded and recordingarchive_uncompressed_sizein metadata.test_rejects_archive_exceeding_uncompressed_size_limitandtest_skips_oversized_readme_entry_without_crashingand keep existing decompression-bomb file-count test.Testing
pytest tests/scanners/test_skops_scanner.pyand the modified test suite passed.test_rejects_archive_exceeding_uncompressed_size_limitwhich asserts archives whose total uncompressed size exceedmax_skops_file_sizefail.test_skips_oversized_readme_entry_without_crashingwhich asserts oversized entries are skipped by bounded reads and do not trigger CVE false positives or crashes.Codex Task
Summary by CodeRabbit