Skip to content

Commit af0d486

Browse files
committed
fix(container-scanner): handle empty/malformed grype output without traceback
Three failure modes in the container sub-scanner runners produced either a JSONDecodeError traceback or a silent downgrade of scanner coverage: 1. Grype writes a 0-byte ``grype-results.json`` (common when its catalog/image-resolution step fails after the output handle is created). ``output_file.exists()`` is True, so the existing "no output" guard skipped, and ``json.loads("")`` blew up with a bare JSONDecodeError that propagated to the user. 2. Grype's ``except Exception: logger.exception(...); return []`` path swallowed the error after dumping a Python traceback to stderr — engine never recorded ``scanner_errors["grype"]``, summary reported "Grype: 0 findings" as if the scan succeeded. 3. Trivy had a similar shape: ``logger.exception`` on parse failure spammed a traceback even though the runtime contract was correct. Fix - New shared helper ``_validate_scanner_output(scanner_name, output_file, result)`` validates in order: returncode == 0, file exists, file size > 0. On any failure, logs at ERROR level (clipped stderr, no traceback) and raises a single ``RuntimeError(f"<scanner> scan failed (exit N): <stderr>")`` shape so the orchestrator records it under ``scanner_errors`` consistently. - ``_run_grype`` and ``_run_trivy`` both call the helper after their subprocess returns, then translate JSON parse errors into the same ``RuntimeError`` shape (instead of bare JSONDecodeError or silent ``return []``). - ``logger.exception(...)`` → ``logger.error(...)`` on parse failure paths so users don't see a Python traceback for an expected scanner failure mode. Behavior preserved - Healthy scans (zero exit, JSON-parseable output) take the same fast path through the parser. - The orchestrator's existing try/except around ``_run_grype`` / ``_run_trivy`` already catches ``RuntimeError`` and records it under ``scanner_errors``; this PR makes the runners *actually* raise that shape instead of silencing failures. - Trivy still propagates findings when grype fails, and vice versa — partial scan coverage is preserved while the failed scanner is surfaced separately. Tests (+13) - ``TestValidateScannerOutput`` (6 cases): helper acceptance matrix — healthy run silent, non-zero exit raises with stderr breadcrumb, missing file raises, 0-byte file raises, stderr+0-byte combo raises with breadcrumb, short stderr preserved verbatim. - ``TestRunGrype`` (4 cases, the user's exact acceptance matrix): non-zero exit + empty file, zero exit + empty file, malformed JSON, valid JSON happy path. - ``TestRunTrivy`` (2 cases): mirror coverage — same failure-mode contract via the shared validator. - ``TestOrchestratorRecordsScannerError`` (1 case): end-to-end assertion that grype's RuntimeError flows up to ``scanner_errors["grype"]`` AND trivy's findings still propagate in the same scan. Closes the loop on "preserve partial results from other scanners but clearly surface the failure." Validation - Full SDK suite: 1451 passed (+13 from this PR), 8 skipped.
1 parent 267e4be commit af0d486

2 files changed

Lines changed: 398 additions & 24 deletions

File tree

argus/container/scanner.py

Lines changed: 94 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
"""Scan container images with trivy and grype, deduplicate findings."""
22

3+
import json
34
import logging
45
import shutil
56
import tempfile
@@ -232,6 +233,70 @@ def _container_vol_args(
232233
return args
233234

234235

236+
def _validate_scanner_output(
237+
scanner_name: str,
238+
output_file: Path,
239+
result,
240+
) -> None:
241+
"""Raise RuntimeError when a sub-scanner's run looks unhealthy.
242+
243+
Container sub-scanners (trivy, grype) all hand off via the same
244+
"subprocess writes JSON to a file path, we parse it" contract.
245+
They all have the same failure modes:
246+
247+
1. subprocess exits non-zero — DB pull failed, image not
248+
resolvable, registry auth missing, etc. Anything that prints
249+
to stderr and bails before producing meaningful output.
250+
2. output file isn't there — scanner crashed mid-run.
251+
3. output file exists but is 0 bytes — common when a wrapper
252+
process (e.g. ``docker run --rm``) exits non-zero from a
253+
different stage than the scanner itself, leaving a stub
254+
file from the redirect.
255+
256+
All three modes need to surface the scanner's own stderr (clipped
257+
for terminal sanity), use ERROR-level logging without dumping a
258+
Python traceback, and raise a single ``RuntimeError`` shape so
259+
the caller can record it under ``scanner_errors`` consistently.
260+
261+
JSON parse failure is intentionally NOT validated here — every
262+
sub-scanner uses a different parser, so the per-runner caller
263+
owns that check (and translates exceptions into RuntimeError
264+
via the same shape).
265+
"""
266+
stderr = result.stderr.strip()[:500] if result.stderr else ""
267+
stderr_label = stderr or "no stderr"
268+
269+
if result.returncode != 0:
270+
logger.error(
271+
"%s exited non-zero (%d): %s",
272+
scanner_name, result.returncode, stderr_label,
273+
)
274+
raise RuntimeError(
275+
f"{scanner_name} scan failed (exit {result.returncode}): "
276+
f"{stderr or 'no output'}"
277+
)
278+
279+
if not output_file.exists():
280+
logger.error(
281+
"%s produced no output file (exit %d): %s",
282+
scanner_name, result.returncode, stderr_label,
283+
)
284+
raise RuntimeError(
285+
f"{scanner_name} scan produced no output file "
286+
f"(exit {result.returncode}): {stderr or 'no output'}"
287+
)
288+
289+
if output_file.stat().st_size == 0:
290+
logger.error(
291+
"%s produced 0-byte output (exit %d): %s",
292+
scanner_name, result.returncode, stderr_label,
293+
)
294+
raise RuntimeError(
295+
f"{scanner_name} scan produced empty output file "
296+
f"(exit {result.returncode}): {stderr or 'no output'}"
297+
)
298+
299+
235300
def _run_trivy(
236301
image_ref: str, tmp_path: Path, local: bool = False,
237302
) -> list[Finding]:
@@ -318,21 +383,21 @@ def _run_trivy(
318383
logger.error("trivy binary not found")
319384
return []
320385

321-
if not output_file.exists():
322-
stderr = result.stderr.strip()[:500]
323-
logger.error(
324-
"trivy produced no output (exit %d): %s",
325-
result.returncode, stderr,
326-
)
327-
raise RuntimeError(
328-
f"trivy scan failed (exit {result.returncode}): {stderr or 'no output'}"
329-
)
386+
_validate_scanner_output("trivy", output_file, result)
330387

331388
try:
332389
return _parser.parse_trivy_results(output_file)
333-
except Exception:
334-
logger.exception("Failed to parse trivy results for %s", image_ref)
335-
raise
390+
except json.JSONDecodeError as exc:
391+
logger.error("trivy output JSON parse error for %s: %s", image_ref, exc)
392+
raise RuntimeError(
393+
f"trivy output JSON parse error: {exc}"
394+
) from exc
395+
except Exception as exc:
396+
# Non-decode parser errors (schema mismatch, missing keys) —
397+
# log without traceback and re-raise as RuntimeError so the
398+
# engine catches it as a structured scanner_errors entry.
399+
logger.error("trivy output parse error for %s: %s", image_ref, exc)
400+
raise RuntimeError(f"trivy output parse error: {exc}") from exc
336401

337402

338403
def _run_grype(
@@ -407,21 +472,26 @@ def _run_grype(
407472
logger.error("grype binary not found")
408473
return []
409474

410-
if not output_file.exists():
411-
stderr = result.stderr.strip()[:500]
412-
logger.error(
413-
"grype produced no output (exit %d): %s",
414-
result.returncode, stderr,
415-
)
416-
raise RuntimeError(
417-
f"grype scan failed (exit {result.returncode}): {stderr or 'no output'}"
418-
)
475+
_validate_scanner_output("grype", output_file, result)
419476

420477
try:
421478
return _parser.parse_grype_results(output_file)
422-
except Exception:
423-
logger.exception("Failed to parse grype results for %s", image_ref)
424-
return []
479+
except json.JSONDecodeError as exc:
480+
# The user-reported regression: grype writes a 0-byte file
481+
# when its image-resolution / catalog step fails after the
482+
# output handle is created. Validation above catches that
483+
# branch first, but JSON-shape errors (truncated output,
484+
# malformed schema) still land here. Raise instead of
485+
# swallowing — the engine catches the RuntimeError and
486+
# records it under scanner_errors so the summary reflects
487+
# reality instead of silently dropping grype's contribution.
488+
logger.error("grype output JSON parse error for %s: %s", image_ref, exc)
489+
raise RuntimeError(
490+
f"grype output JSON parse error: {exc}"
491+
) from exc
492+
except Exception as exc:
493+
logger.error("grype output parse error for %s: %s", image_ref, exc)
494+
raise RuntimeError(f"grype output parse error: {exc}") from exc
425495

426496

427497
def _run_syft(image_ref: str, tmp_path: Path) -> None:

0 commit comments

Comments
 (0)