Skip to content

Commit dbdd858

Browse files
committed
feat(container): emit canonical argus-results.json + persist raw scanner output
Two related fixes for the container-scan flow, addressing user reports that: 1. ``argus view`` doesn't display container vulnerabilities — the container-scan flow only wrote a domain-shaped ``container-scan.json`` (per-image counts, ``container_count``, etc.) which the viewers don't know how to render. The viewers consume the canonical ``argus-results.json`` shape produced by source scans. 2. The ``argus-results/`` dir doesn't preserve the raw per-scanner output files (``trivy-results.json``, ``grype-results.json``, ``syft-sbom.json``) — they live in a tempdir that gets wiped at the end of ``scan_image``. Users who want forensics, audit trails, or manual triage have nowhere to look. Both rooted in the same architectural drift: the container flow diverged from the source-scan output contract. This PR re-aligns it. Canonical ScanSummary for container scans - ``_cmd_container_scan`` now builds a canonical ``ScanSummary`` alongside the existing ``ContainerScanSummary``: each container target maps to ``ScanResult(scanner=f"container/<name>", findings=combined, metadata={image_ref, build_success, scanner_errors, scan_error})``. - The JSON reporter writes that to ``argus-results.json`` unconditionally (matches the source-scan canonical-artifact contract from PR #111). - The SARIF reporter now consumes the same canonical summary instead of building a one-off conversion locally. - The domain-shaped ``container-scan.json`` (with ``container_count``, per-image stats) is preserved for backward compat with downstream tooling that consumes it; it just lives alongside the canonical artifact rather than instead of it. - ``argus view`` opens container scan results without any new code on the viewer side — it just sees ``ScanResult`` rows named ``container/<image>`` and renders them like any other scanner. Raw scanner output persistence - ``scan_image`` gains a ``raw_output_dir: Path | None`` parameter. When set, copies ``trivy-results.json``, ``grype-results.json``, and ``syft-sbom.json`` into that directory before the tempdir is cleaned up. Best-effort — copy errors log a warning but don't fail the scan. - ``ContainerEngine`` reads ``_raw_output_root`` from its config dict (the dispatcher sets this) and threads a per-target subdir to ``scan_image`` as ``<root>/<target.name>/``. - ``_cmd_container_scan`` defaults to ON: raw outputs land at ``<output_dir>/raw/<image>/``. Opt out via ``--no-keep-raw`` flag or ``containers.keep_raw: false`` in argus.yml. CLI flag wins on conflict (explicit > implicit). - 0-byte files are explicitly skipped (they're failure signals upstream; persisting them would make a known-bad output look authoritative on disk). Documentation - ``argus.example.yml`` documents ``containers.keep_raw: true`` in the commented schema block, alongside the existing ``images``, ``discover``, and ``scanners`` keys. Tests (+5) - ``TestScanImageRawOutputPersistence`` (4 cases): all artifacts copied when dir supplied, no copy when ``raw_output_dir=None``, partial coverage (only trivy ran) doesn't block the others, 0-byte files are explicitly skipped. - ``TestContainerCanonicalScanSummary`` (1 case): each ContainerScanResult maps to a canonical ScanResult(scanner= "container/<name>") with combined findings; metadata lifts onto the ScanResult; round-trips through ``ScanSummary.to_dict()`` unchanged so the viewer gets the same shape it expects. Validation - Full SDK suite: 1464 passed (+5 net), 8 skipped.
1 parent d7c94dc commit dbdd858

6 files changed

Lines changed: 296 additions & 9 deletions

File tree

argus.example.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,15 @@ reporting:
101101
#
102102
# # Override which sub-scanners run; default is trivy + grype.
103103
# scanners: [trivy, grype, syft]
104+
#
105+
# # Persist raw per-scanner artifacts (trivy-results.json,
106+
# # grype-results.json, syft-sbom.json) under
107+
# # ``<output_dir>/raw/<image>/`` alongside the canonical
108+
# # argus-results.json. Default is true so the artifacts are
109+
# # available for forensics, audit, or manual triage. Set false
110+
# # (or pass --no-keep-raw on the CLI) to skip — saves a few MB
111+
# # per image when running in tight CI environments.
112+
# keep_raw: true
104113

105114
# Execution backend configuration
106115
execution:

argus/cli.py

Lines changed: 66 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -616,6 +616,18 @@ def _build_scan_parser(subparsers: argparse._SubParsersAction) -> None:
616616
default=None,
617617
help="Sub-scanners for container scanning: trivy,grype,syft (default: trivy,grype)",
618618
)
619+
container_group.add_argument(
620+
"--no-keep-raw",
621+
action="store_true",
622+
dest="no_keep_raw",
623+
help="Do not persist raw per-scanner output (trivy-results.json, "
624+
"grype-results.json, syft-sbom.json) under "
625+
"<output_dir>/raw/<image>/. By default raw artifacts are "
626+
"kept alongside the canonical argus-results.json so users "
627+
"can drill into individual scanner output for forensics or "
628+
"manual triage. Set ``containers.keep_raw: false`` in argus.yml "
629+
"for the same effect via config.",
630+
)
619631

620632
# ZAP DAST flags (used with: argus scan zap)
621633
dast_group = scan_parser.add_argument_group(
@@ -1733,6 +1745,18 @@ def _cmd_container_scan(
17331745
output_dir = _make_run_dir(base_dir)
17341746
formats = args.formats or ["terminal", "markdown"]
17351747

1748+
# Decide whether to persist raw per-scanner outputs alongside the
1749+
# canonical argus-results.json. Default is ON — the user just ran
1750+
# a scan and would expect those artifacts to be available for
1751+
# manual triage. Opt out via ``--no-keep-raw`` (CLI) or
1752+
# ``containers.keep_raw: false`` (argus.yml). CLI flag wins on
1753+
# conflict, matching the rest of the dispatcher's
1754+
# explicit-over-implicit posture.
1755+
keep_raw_config = config.get("keep_raw", True)
1756+
keep_raw = bool(keep_raw_config) and not getattr(args, "no_keep_raw", False)
1757+
if keep_raw:
1758+
config["_raw_output_root"] = str(Path(output_dir) / "raw")
1759+
17361760
# Run
17371761
try:
17381762
engine = ContainerEngine(config)
@@ -1745,6 +1769,42 @@ def _cmd_container_scan(
17451769
print(f"Error: container scan failed: {exc}", file=sys.stderr)
17461770
return EXIT_ERROR
17471771

1772+
# Build a canonical ScanSummary view of the container results so
1773+
# the standard reporters (json → argus-results.json, sarif) and
1774+
# ``argus view`` can consume container scans the same way they
1775+
# consume source scans. Each container target becomes a
1776+
# ScanResult; the per-image domain metadata (image_ref, build
1777+
# status, scanner_errors) lifts onto ScanResult.metadata so the
1778+
# browser dashboard and exporters surface it.
1779+
from argus.core.models import ScanResult, ScanSummary
1780+
canonical_results = [
1781+
ScanResult(
1782+
scanner=f"container/{r.name}",
1783+
findings=list(r.combined_findings),
1784+
metadata={
1785+
"image_ref": r.image_ref,
1786+
"build_success": r.build_success,
1787+
**(
1788+
{"scanner_errors": dict(r.scanner_errors)}
1789+
if r.scanner_errors else {}
1790+
),
1791+
**(
1792+
{"scan_error": r.scan_error}
1793+
if getattr(r, "scan_error", None) else {}
1794+
),
1795+
},
1796+
)
1797+
for r in summary.results
1798+
]
1799+
canonical_summary = ScanSummary(results=canonical_results)
1800+
1801+
# Always emit argus-results.json — same canonical-artifact
1802+
# contract the source-scan flow established. ``argus view`` and
1803+
# the audit manifest both consume this regardless of what the
1804+
# user listed in ``formats``.
1805+
from argus.reporters import get_reporter
1806+
get_reporter("json").report(canonical_summary, output_dir)
1807+
17481808
# Reports
17491809
for fmt in formats:
17501810
if fmt == "markdown":
@@ -1755,16 +1815,15 @@ def _cmd_container_scan(
17551815
elif fmt == "terminal":
17561816
_print_container_terminal(summary)
17571817
elif fmt == "json":
1818+
# Domain-shaped per-image summary (container_count etc.)
1819+
# lives at container-scan.json. The canonical
1820+
# argus-results.json was already written above; this
1821+
# is the supplementary domain artifact for tooling that
1822+
# wants per-image stats without parsing findings.
17581823
_write_container_json(summary, output_dir)
17591824
elif fmt == "sarif":
1760-
from argus.core.models import ScanResult, ScanSummary
1761-
from argus.reporters import get_reporter
1762-
results = [
1763-
ScanResult(scanner=f"container/{r.name}", findings=r.combined_findings)
1764-
for r in summary.results
1765-
]
17661825
sarif_reporter = get_reporter("sarif")
1767-
sarif_reporter.report(ScanSummary(results=results), output_dir)
1826+
sarif_reporter.report(canonical_summary, output_dir)
17681827

17691828
# Exit code — scanner failures are always non-zero
17701829
scan_failures = getattr(summary, "scan_failures", 0)

argus/container/engine.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
"""
77

88
import logging
9+
from pathlib import Path
910

1011
from .builder import build_image
1112
from .discovery import (
@@ -130,10 +131,20 @@ def _process_target(self, target: ContainerTarget) -> ContainerScanResult:
130131
self._built_images.append(target.image_ref)
131132

132133
try:
134+
# If the dispatcher set ``_raw_output_root`` in the
135+
# config dict, persist this target's raw scanner outputs
136+
# under ``<root>/<target.name>/``. Caller controls
137+
# whether this is set (CLI flag + config opt-out); the
138+
# engine just threads it through.
139+
raw_root = self.config.get("_raw_output_root")
140+
target_raw_dir = (
141+
Path(raw_root) / target.name if raw_root else None
142+
)
133143
return scan_image(
134144
target,
135145
scanners=self._scanners(),
136146
sbom=self._sbom_enabled(),
147+
raw_output_dir=target_raw_dir,
137148
)
138149
except OSError as exc:
139150
# Disk full, permission denied, etc.

argus/container/scanner.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ def scan_image(
123123
target: ContainerTarget,
124124
scanners: tuple[str, ...] = ("trivy", "grype"),
125125
sbom: bool = True,
126+
raw_output_dir: Path | None = None,
126127
) -> ContainerScanResult:
127128
"""Scan a single container image with trivy and/or grype.
128129
@@ -132,7 +133,19 @@ def scan_image(
132133
133134
For locally-built images, scanners reference the local Docker daemon.
134135
Per-scanner errors are caught and recorded, not swallowed.
136+
137+
``raw_output_dir``: when supplied, the raw scanner output files
138+
(``trivy-results.json``, ``grype-results.json``, ``syft-sbom.json``)
139+
are copied into this directory before the temp dir is cleaned up.
140+
Lets users preserve full per-scanner artifacts for forensics,
141+
audit, or manual triage workflows alongside the canonical
142+
``argus-results.json``. ``None`` (the default) means transient
143+
output — historic behavior.
135144
"""
145+
import shutil as _shutil # local import to avoid shadowing the
146+
# module-level ``shutil`` reference used
147+
# by ``shutil.which`` checks below.
148+
136149
trivy_findings: list[Finding] = []
137150
grype_findings: list[Finding] = []
138151
scanner_errors: dict[str, str] = {}
@@ -164,6 +177,29 @@ def scan_image(
164177
if sbom and "syft" not in scanners:
165178
_run_syft(target.image_ref, tmp_path)
166179

180+
# Persist raw scanner artifacts (best-effort) before the
181+
# tempdir is wiped. We copy whatever files exist; missing
182+
# files (e.g. grype failed before writing) just don't get
183+
# copied — the structured ``scanner_errors`` already records
184+
# why. Errors during copy are non-fatal: the scan succeeded,
185+
# the canonical JSON is still emitted upstream.
186+
if raw_output_dir is not None:
187+
try:
188+
raw_output_dir.mkdir(parents=True, exist_ok=True)
189+
for fname in (
190+
"trivy-results.json",
191+
"grype-results.json",
192+
"syft-sbom.json",
193+
):
194+
src = tmp_path / fname
195+
if src.exists() and src.stat().st_size > 0:
196+
_shutil.copy2(src, raw_output_dir / fname)
197+
except OSError as exc:
198+
logger.warning(
199+
"Failed to persist raw scanner outputs to %s: %s",
200+
raw_output_dir, exc,
201+
)
202+
167203
combined = deduplicate_findings(trivy_findings, grype_findings)
168204

169205
return ContainerScanResult(

argus/tests/test_container_scanner_runners.py

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,177 @@ def fake_run(cmd, **_kwargs):
333333
# ───────────────────────────────────────────────
334334

335335

336+
class TestScanImageRawOutputPersistence:
337+
"""``scan_image(raw_output_dir=...)`` copies raw scanner artifacts
338+
into a caller-supplied directory so ``argus-results/<run>/raw/``
339+
can preserve trivy/grype/syft per-scanner output for forensics
340+
after the underlying tempdir is cleaned up."""
341+
342+
def _stub_runners(self, monkeypatch, write_files=("trivy", "grype")):
343+
"""Replace the live scanner runners with stubs that drop the
344+
files we'd expect to see on a successful real run. Lets these
345+
tests focus on the copy/persistence layer without touching
346+
the actual binaries."""
347+
from argus.container import scanner as scanner_mod
348+
349+
def fake_trivy(image_ref, tmp_path, local=False):
350+
if "trivy" in write_files:
351+
(tmp_path / "trivy-results.json").write_text('{"Results": []}')
352+
return []
353+
354+
def fake_grype(image_ref, tmp_path, local=False):
355+
if "grype" in write_files:
356+
(tmp_path / "grype-results.json").write_text('{"matches": []}')
357+
return []
358+
359+
def fake_syft(image_ref, tmp_path):
360+
if "syft" in write_files:
361+
(tmp_path / "syft-sbom.json").write_text('{"artifacts": []}')
362+
363+
monkeypatch.setattr(scanner_mod, "_run_trivy", fake_trivy)
364+
monkeypatch.setattr(scanner_mod, "_run_grype", fake_grype)
365+
monkeypatch.setattr(scanner_mod, "_run_syft", fake_syft)
366+
367+
def test_raw_outputs_copied_when_dir_supplied(self, tmp_path, monkeypatch):
368+
from argus.container.scanner import scan_image
369+
from argus.container.discovery import ContainerTarget
370+
371+
self._stub_runners(monkeypatch, write_files=("trivy", "grype", "syft"))
372+
373+
target = ContainerTarget(name="app", image_ref="myapp:dev")
374+
raw_dir = tmp_path / "raw" / "app"
375+
376+
scan_image(target, sbom=True, raw_output_dir=raw_dir)
377+
378+
# All three artifacts persisted at the expected names.
379+
assert (raw_dir / "trivy-results.json").exists()
380+
assert (raw_dir / "grype-results.json").exists()
381+
assert (raw_dir / "syft-sbom.json").exists()
382+
# Contents survived intact.
383+
assert "Results" in (raw_dir / "trivy-results.json").read_text()
384+
385+
def test_no_copy_when_raw_output_dir_is_none(self, tmp_path, monkeypatch):
386+
# Default path — historic behavior — leaves no artifacts on
387+
# disk after the tempdir cleanup.
388+
from argus.container.scanner import scan_image
389+
from argus.container.discovery import ContainerTarget
390+
391+
self._stub_runners(monkeypatch)
392+
393+
target = ContainerTarget(name="app", image_ref="myapp:dev")
394+
scan_image(target, sbom=False, raw_output_dir=None)
395+
396+
# No `raw/` directory was created (the test's tmp_path is
397+
# otherwise empty).
398+
assert not (tmp_path / "raw").exists()
399+
400+
def test_partial_outputs_persisted_when_some_scanners_skipped(
401+
self, tmp_path, monkeypatch,
402+
):
403+
# Only trivy ran (grype skipped or failed); the raw dir
404+
# contains just trivy's file. Missing files don't block the
405+
# copy of the ones that exist.
406+
from argus.container.scanner import scan_image
407+
from argus.container.discovery import ContainerTarget
408+
409+
self._stub_runners(monkeypatch, write_files=("trivy",))
410+
411+
target = ContainerTarget(name="app", image_ref="myapp:dev")
412+
raw_dir = tmp_path / "raw" / "app"
413+
414+
scan_image(
415+
target, scanners=("trivy",), sbom=False,
416+
raw_output_dir=raw_dir,
417+
)
418+
419+
assert (raw_dir / "trivy-results.json").exists()
420+
assert not (raw_dir / "grype-results.json").exists()
421+
assert not (raw_dir / "syft-sbom.json").exists()
422+
423+
def test_zero_byte_files_are_not_persisted(self, tmp_path, monkeypatch):
424+
# 0-byte files are an explicit failure signal upstream
425+
# (``_validate_scanner_output`` rejects them). Don't copy
426+
# them — the persistence layer should never make a 0-byte
427+
# file look authoritative on disk.
428+
from argus.container import scanner as scanner_mod
429+
from argus.container.scanner import scan_image
430+
from argus.container.discovery import ContainerTarget
431+
432+
def fake_trivy(image_ref, tmp_path, local=False):
433+
(tmp_path / "trivy-results.json").touch() # 0-byte
434+
return []
435+
436+
monkeypatch.setattr(scanner_mod, "_run_trivy", fake_trivy)
437+
monkeypatch.setattr(scanner_mod, "_run_grype", lambda *a, **kw: [])
438+
monkeypatch.setattr(scanner_mod, "_run_syft", lambda *a, **kw: None)
439+
440+
raw_dir = tmp_path / "raw" / "app"
441+
scan_image(
442+
ContainerTarget(name="app", image_ref="myapp:dev"),
443+
scanners=("trivy",), sbom=False, raw_output_dir=raw_dir,
444+
)
445+
# Either the dir doesn't exist (nothing copied) or it's empty.
446+
if raw_dir.exists():
447+
assert not list(raw_dir.iterdir())
448+
449+
450+
class TestContainerCanonicalScanSummary:
451+
"""The container scan flow now also emits the canonical
452+
ScanSummary shape (the same one source scans use), so
453+
``argus view`` and the JSON reporter can render container
454+
findings without a separate code path."""
455+
456+
def test_each_target_becomes_a_scanresult_with_combined_findings(
457+
self, tmp_path, monkeypatch,
458+
):
459+
# Exercises the cli.py snippet that maps ContainerScanResult
460+
# → ScanResult(scanner=f"container/{name}", ...). Tests a
461+
# representative subset of the conversion in isolation.
462+
from argus.core.models import ScanResult, ScanSummary, Finding, Severity
463+
from argus.container.scanner import (
464+
ContainerScanResult, ContainerScanSummary,
465+
)
466+
467+
f1 = Finding(id="CVE-2024-1", severity=Severity.HIGH, title="t1",
468+
cve="CVE-2024-1", scanner="trivy")
469+
f2 = Finding(id="CVE-2024-2", severity=Severity.MEDIUM, title="t2",
470+
cve="CVE-2024-2", scanner="grype")
471+
472+
container_summary = ContainerScanSummary(
473+
results=[
474+
ContainerScanResult(
475+
name="webapp",
476+
image_ref="myorg/webapp:1.0",
477+
combined_findings=[f1, f2],
478+
scanner_errors={},
479+
),
480+
],
481+
)
482+
483+
# Mirror cli.py's mapping logic.
484+
canonical = ScanSummary(results=[
485+
ScanResult(
486+
scanner=f"container/{r.name}",
487+
findings=list(r.combined_findings),
488+
metadata={
489+
"image_ref": r.image_ref,
490+
"build_success": r.build_success,
491+
},
492+
)
493+
for r in container_summary.results
494+
])
495+
496+
# The canonical summary round-trips through the same
497+
# serialization the source-scan flow uses, so ``argus view``
498+
# treats container findings identically.
499+
as_dict = canonical.to_dict()
500+
assert "results" in as_dict
501+
assert as_dict["results"][0]["scanner"] == "container/webapp"
502+
assert len(as_dict["results"][0]["findings"]) == 2
503+
# Per-image metadata lifts onto the ScanResult.
504+
assert as_dict["results"][0]["metadata"]["image_ref"] == "myorg/webapp:1.0"
505+
506+
336507
class TestOrchestratorRecordsScannerError:
337508
"""Closing the loop: when ``_run_grype`` raises RuntimeError, the
338509
orchestrator must catch it and record under ``scanner_errors`` so

0 commit comments

Comments
 (0)