Skip to content

Commit 8d184d6

Browse files
authored
feat(scan): emit canonical argus-results.json + persist raw scanner output (source + container) (#116)
* feat(container): emit canonical argus-results.json + persist raw scanner output Two related fixes for the container-scan flow, addressing user reports that: 1. ``argus view`` doesn't display container vulnerabilities — the container-scan flow only wrote a domain-shaped ``container-scan.json`` (per-image counts, ``container_count``, etc.) which the viewers don't know how to render. The viewers consume the canonical ``argus-results.json`` shape produced by source scans. 2. The ``argus-results/`` dir doesn't preserve the raw per-scanner output files (``trivy-results.json``, ``grype-results.json``, ``syft-sbom.json``) — they live in a tempdir that gets wiped at the end of ``scan_image``. Users who want forensics, audit trails, or manual triage have nowhere to look. Both rooted in the same architectural drift: the container flow diverged from the source-scan output contract. This PR re-aligns it. Canonical ScanSummary for container scans - ``_cmd_container_scan`` now builds a canonical ``ScanSummary`` alongside the existing ``ContainerScanSummary``: each container target maps to ``ScanResult(scanner=f"container/<name>", findings=combined, metadata={image_ref, build_success, scanner_errors, scan_error})``. - The JSON reporter writes that to ``argus-results.json`` unconditionally (matches the source-scan canonical-artifact contract from PR #111). - The SARIF reporter now consumes the same canonical summary instead of building a one-off conversion locally. - The domain-shaped ``container-scan.json`` (with ``container_count``, per-image stats) is preserved for backward compat with downstream tooling that consumes it; it just lives alongside the canonical artifact rather than instead of it. - ``argus view`` opens container scan results without any new code on the viewer side — it just sees ``ScanResult`` rows named ``container/<image>`` and renders them like any other scanner. Raw scanner output persistence - ``scan_image`` gains a ``raw_output_dir: Path | None`` parameter. When set, copies ``trivy-results.json``, ``grype-results.json``, and ``syft-sbom.json`` into that directory before the tempdir is cleaned up. Best-effort — copy errors log a warning but don't fail the scan. - ``ContainerEngine`` reads ``_raw_output_root`` from its config dict (the dispatcher sets this) and threads a per-target subdir to ``scan_image`` as ``<root>/<target.name>/``. - ``_cmd_container_scan`` defaults to ON: raw outputs land at ``<output_dir>/raw/<image>/``. Opt out via ``--no-keep-raw`` flag or ``containers.keep_raw: false`` in argus.yml. CLI flag wins on conflict (explicit > implicit). - 0-byte files are explicitly skipped (they're failure signals upstream; persisting them would make a known-bad output look authoritative on disk). Documentation - ``argus.example.yml`` documents ``containers.keep_raw: true`` in the commented schema block, alongside the existing ``images``, ``discover``, and ``scanners`` keys. Tests (+5) - ``TestScanImageRawOutputPersistence`` (4 cases): all artifacts copied when dir supplied, no copy when ``raw_output_dir=None``, partial coverage (only trivy ran) doesn't block the others, 0-byte files are explicitly skipped. - ``TestContainerCanonicalScanSummary`` (1 case): each ContainerScanResult maps to a canonical ScanResult(scanner= "container/<name>") with combined findings; metadata lifts onto the ScanResult; round-trips through ``ScanSummary.to_dict()`` unchanged so the viewer gets the same shape it expects. Validation - Full SDK suite: 1464 passed (+5 net), 8 skipped. * feat(scan): persist raw per-scanner output for source scans Extend the raw-output preservation already in place for container scans to cover source scans. ArgusEngine.run() now accepts raw_output_dir and copies each scanner's results.json / *.sarif / stdout.txt under <output_dir>/raw/<scanner>/ alongside the canonical argus-results.json — the same posture as the container flow, so forensics and manual triage have the same surface area regardless of which scan path produced the findings. The CLI gains a unified --no-keep-raw flag (moved out of the container-only group) and reporting.keep_raw replaces the container-scoped containers.keep_raw key. CLI flag wins on conflict; default remains keep-raw=true. --------- Co-authored-by: eFAILution <eFAILution@users.noreply.github.com>
1 parent d7c94dc commit 8d184d6

9 files changed

Lines changed: 495 additions & 10 deletions

File tree

argus.example.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,15 @@ reporting:
5555
severity_threshold: high
5656
output_dir: "./argus-results"
5757

58+
# Persist each scanner's raw output (results.json, *.sarif,
59+
# stdout.txt) under ``<output_dir>/raw/<scanner>/`` alongside the
60+
# canonical argus-results.json. Default true — useful for
61+
# forensics, audit trails, and manual triage. Set false (or pass
62+
# --no-keep-raw on the CLI) to skip; saves a few MB per scan in
63+
# tight CI environments. Applies to both source scans (``argus
64+
# scan``) and container scans (``argus scan container``).
65+
keep_raw: true
66+
5867
# Container lifecycle targets (consumed by ``argus scan container``).
5968
# Defining anything under this top-level ``containers:`` key activates
6069
# config-driven container scans — no need to pass --image / --discover
@@ -101,6 +110,11 @@ reporting:
101110
#
102111
# # Override which sub-scanners run; default is trivy + grype.
103112
# scanners: [trivy, grype, syft]
113+
#
114+
# Note: raw scanner-output preservation is configured via the
115+
# unified ``reporting.keep_raw`` knob above — the same flag covers
116+
# trivy/grype/syft container outputs and source-scan scanner
117+
# outputs. No separate container-only setting is needed.
104118

105119
# Execution backend configuration
106120
execution:

argus/cli.py

Lines changed: 96 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -590,6 +590,19 @@ def _build_scan_parser(subparsers: argparse._SubParsersAction) -> None:
590590
help="Disable DB cache volume mounts. Forces scanners to re-download "
591591
"vulnerability databases on every container run.",
592592
)
593+
scan_parser.add_argument(
594+
"--no-keep-raw",
595+
action="store_true",
596+
dest="no_keep_raw",
597+
help="Do not persist raw per-scanner output files alongside the "
598+
"canonical argus-results.json. Source scans normally drop "
599+
"each scanner's results.json / *.sarif / stdout.txt under "
600+
"<output_dir>/raw/<scanner>/; container scans drop "
601+
"trivy-results.json / grype-results.json / syft-sbom.json "
602+
"under <output_dir>/raw/<image>/. Pass --no-keep-raw to "
603+
"skip that step in tight CI environments. The same effect "
604+
"is available via 'reporting.keep_raw: false' in argus.yml.",
605+
)
593606

594607
# Container-specific flags (used with: argus scan container)
595608
container_group = scan_parser.add_argument_group(
@@ -1135,6 +1148,15 @@ def _load_container_config(args: argparse.Namespace) -> dict:
11351148
)
11361149
config = dict(containers_section)
11371150

1151+
# Pull ``reporting.keep_raw`` from the same file so the
1152+
# container handler honors the unified config knob — same
1153+
# default-True semantics as ``_cmd_source_scan``. Stashed
1154+
# under a synthetic underscore key so it doesn't collide
1155+
# with any future ``containers:`` field a user might add.
1156+
reporting_section = file_config.get("reporting", {})
1157+
if isinstance(reporting_section, dict) and "keep_raw" in reporting_section:
1158+
config["_reporting_keep_raw"] = bool(reporting_section["keep_raw"])
1159+
11381160
# CLI overrides — explicit > implicit. --image and --discover both
11391161
# OVERWRITE the corresponding config keys so the user's intent is
11401162
# unambiguous (and so we don't accidentally double-scan an image
@@ -1248,6 +1270,18 @@ def _cmd_source_scan(args: argparse.Namespace) -> int:
12481270

12491271
log.info("Argus scan starting")
12501272

1273+
# Decide whether to persist raw per-scanner outputs alongside the
1274+
# canonical argus-results.json. Default ON — users running
1275+
# ``argus scan`` reasonably expect each scanner's raw results
1276+
# (results.json / *.sarif / stdout.txt) to be available for
1277+
# forensics or manual triage. Opt out via ``--no-keep-raw``
1278+
# (CLI) or ``reporting.keep_raw: false`` (argus.yml). CLI flag
1279+
# wins on conflict, matching the dispatcher's
1280+
# explicit-over-implicit posture used throughout.
1281+
keep_raw_config = getattr(config.reporting, "keep_raw", True)
1282+
keep_raw = bool(keep_raw_config) and not getattr(args, "no_keep_raw", False)
1283+
raw_output_root = str(Path(output_dir) / "raw") if keep_raw else None
1284+
12511285
# Build engine and register scanners
12521286
engine = ArgusEngine(config)
12531287

@@ -1352,6 +1386,7 @@ def _cmd_source_scan(args: argparse.Namespace) -> int:
13521386
use_default_excludes=not getattr(args, "no_default_excludes", False),
13531387
sbom_path=str(info.path),
13541388
sbom_format=info.format,
1389+
raw_output_dir=raw_output_root,
13551390
)
13561391
except Exception as exc:
13571392
log.error(
@@ -1392,6 +1427,7 @@ def _cmd_source_scan(args: argparse.Namespace) -> int:
13921427
allow_local_versions=getattr(args, "allow_local_versions", False),
13931428
no_cache=getattr(args, "no_cache", False),
13941429
use_default_excludes=not getattr(args, "no_default_excludes", False),
1430+
raw_output_dir=raw_output_root,
13951431
)
13961432
if args.verbose and getattr(engine, "_last_resolutions", None):
13971433
from argus.core.tool_config import format_resolutions_for_display
@@ -1733,6 +1769,24 @@ def _cmd_container_scan(
17331769
output_dir = _make_run_dir(base_dir)
17341770
formats = args.formats or ["terminal", "markdown"]
17351771

1772+
# Decide whether to persist raw per-scanner outputs alongside the
1773+
# canonical argus-results.json. Default is ON — the user just ran
1774+
# a scan and would expect those artifacts to be available for
1775+
# manual triage. Opt out via ``--no-keep-raw`` (CLI) or
1776+
# ``containers.keep_raw: false`` (argus.yml). CLI flag wins on
1777+
# conflict, matching the rest of the dispatcher's
1778+
# explicit-over-implicit posture.
1779+
# ``reporting.keep_raw`` is the unified config home for raw-output
1780+
# preservation; the legacy ``containers.keep_raw`` is still read
1781+
# as a fallback so configs from earlier in this PR's lifecycle
1782+
# don't break. CLI ``--no-keep-raw`` wins over both.
1783+
keep_raw_config = config.get(
1784+
"_reporting_keep_raw", config.get("keep_raw", True),
1785+
)
1786+
keep_raw = bool(keep_raw_config) and not getattr(args, "no_keep_raw", False)
1787+
if keep_raw:
1788+
config["_raw_output_root"] = str(Path(output_dir) / "raw")
1789+
17361790
# Run
17371791
try:
17381792
engine = ContainerEngine(config)
@@ -1745,6 +1799,42 @@ def _cmd_container_scan(
17451799
print(f"Error: container scan failed: {exc}", file=sys.stderr)
17461800
return EXIT_ERROR
17471801

1802+
# Build a canonical ScanSummary view of the container results so
1803+
# the standard reporters (json → argus-results.json, sarif) and
1804+
# ``argus view`` can consume container scans the same way they
1805+
# consume source scans. Each container target becomes a
1806+
# ScanResult; the per-image domain metadata (image_ref, build
1807+
# status, scanner_errors) lifts onto ScanResult.metadata so the
1808+
# browser dashboard and exporters surface it.
1809+
from argus.core.models import ScanResult, ScanSummary
1810+
canonical_results = [
1811+
ScanResult(
1812+
scanner=f"container/{r.name}",
1813+
findings=list(r.combined_findings),
1814+
metadata={
1815+
"image_ref": r.image_ref,
1816+
"build_success": r.build_success,
1817+
**(
1818+
{"scanner_errors": dict(r.scanner_errors)}
1819+
if r.scanner_errors else {}
1820+
),
1821+
**(
1822+
{"scan_error": r.scan_error}
1823+
if getattr(r, "scan_error", None) else {}
1824+
),
1825+
},
1826+
)
1827+
for r in summary.results
1828+
]
1829+
canonical_summary = ScanSummary(results=canonical_results)
1830+
1831+
# Always emit argus-results.json — same canonical-artifact
1832+
# contract the source-scan flow established. ``argus view`` and
1833+
# the audit manifest both consume this regardless of what the
1834+
# user listed in ``formats``.
1835+
from argus.reporters import get_reporter
1836+
get_reporter("json").report(canonical_summary, output_dir)
1837+
17481838
# Reports
17491839
for fmt in formats:
17501840
if fmt == "markdown":
@@ -1755,16 +1845,15 @@ def _cmd_container_scan(
17551845
elif fmt == "terminal":
17561846
_print_container_terminal(summary)
17571847
elif fmt == "json":
1848+
# Domain-shaped per-image summary (container_count etc.)
1849+
# lives at container-scan.json. The canonical
1850+
# argus-results.json was already written above; this
1851+
# is the supplementary domain artifact for tooling that
1852+
# wants per-image stats without parsing findings.
17581853
_write_container_json(summary, output_dir)
17591854
elif fmt == "sarif":
1760-
from argus.core.models import ScanResult, ScanSummary
1761-
from argus.reporters import get_reporter
1762-
results = [
1763-
ScanResult(scanner=f"container/{r.name}", findings=r.combined_findings)
1764-
for r in summary.results
1765-
]
17661855
sarif_reporter = get_reporter("sarif")
1767-
sarif_reporter.report(ScanSummary(results=results), output_dir)
1856+
sarif_reporter.report(canonical_summary, output_dir)
17681857

17691858
# Exit code — scanner failures are always non-zero
17701859
scan_failures = getattr(summary, "scan_failures", 0)

argus/container/engine.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
"""
77

88
import logging
9+
from pathlib import Path
910

1011
from .builder import build_image
1112
from .discovery import (
@@ -130,10 +131,20 @@ def _process_target(self, target: ContainerTarget) -> ContainerScanResult:
130131
self._built_images.append(target.image_ref)
131132

132133
try:
134+
# If the dispatcher set ``_raw_output_root`` in the
135+
# config dict, persist this target's raw scanner outputs
136+
# under ``<root>/<target.name>/``. Caller controls
137+
# whether this is set (CLI flag + config opt-out); the
138+
# engine just threads it through.
139+
raw_root = self.config.get("_raw_output_root")
140+
target_raw_dir = (
141+
Path(raw_root) / target.name if raw_root else None
142+
)
133143
return scan_image(
134144
target,
135145
scanners=self._scanners(),
136146
sbom=self._sbom_enabled(),
147+
raw_output_dir=target_raw_dir,
137148
)
138149
except OSError as exc:
139150
# Disk full, permission denied, etc.

argus/container/scanner.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ def scan_image(
123123
target: ContainerTarget,
124124
scanners: tuple[str, ...] = ("trivy", "grype"),
125125
sbom: bool = True,
126+
raw_output_dir: Path | None = None,
126127
) -> ContainerScanResult:
127128
"""Scan a single container image with trivy and/or grype.
128129
@@ -132,7 +133,19 @@ def scan_image(
132133
133134
For locally-built images, scanners reference the local Docker daemon.
134135
Per-scanner errors are caught and recorded, not swallowed.
136+
137+
``raw_output_dir``: when supplied, the raw scanner output files
138+
(``trivy-results.json``, ``grype-results.json``, ``syft-sbom.json``)
139+
are copied into this directory before the temp dir is cleaned up.
140+
Lets users preserve full per-scanner artifacts for forensics,
141+
audit, or manual triage workflows alongside the canonical
142+
``argus-results.json``. ``None`` (the default) means transient
143+
output — historic behavior.
135144
"""
145+
import shutil as _shutil # local import to avoid shadowing the
146+
# module-level ``shutil`` reference used
147+
# by ``shutil.which`` checks below.
148+
136149
trivy_findings: list[Finding] = []
137150
grype_findings: list[Finding] = []
138151
scanner_errors: dict[str, str] = {}
@@ -164,6 +177,29 @@ def scan_image(
164177
if sbom and "syft" not in scanners:
165178
_run_syft(target.image_ref, tmp_path)
166179

180+
# Persist raw scanner artifacts (best-effort) before the
181+
# tempdir is wiped. We copy whatever files exist; missing
182+
# files (e.g. grype failed before writing) just don't get
183+
# copied — the structured ``scanner_errors`` already records
184+
# why. Errors during copy are non-fatal: the scan succeeded,
185+
# the canonical JSON is still emitted upstream.
186+
if raw_output_dir is not None:
187+
try:
188+
raw_output_dir.mkdir(parents=True, exist_ok=True)
189+
for fname in (
190+
"trivy-results.json",
191+
"grype-results.json",
192+
"syft-sbom.json",
193+
):
194+
src = tmp_path / fname
195+
if src.exists() and src.stat().st_size > 0:
196+
_shutil.copy2(src, raw_output_dir / fname)
197+
except OSError as exc:
198+
logger.warning(
199+
"Failed to persist raw scanner outputs to %s: %s",
200+
raw_output_dir, exc,
201+
)
202+
167203
combined = deduplicate_findings(trivy_findings, grype_findings)
168204

169205
return ContainerScanResult(

argus/core/config.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,14 @@ class ReportingConfig:
4444
formats: list[str] = field(default_factory=lambda: ["terminal"])
4545
severity_threshold: Optional[Severity] = None
4646
output_dir: str = "./argus-results"
47+
# When True, the engine persists each scanner's raw output files
48+
# (results.json / *.sarif / stdout.txt) under
49+
# ``<output_dir>/raw/<scanner>/`` alongside the canonical
50+
# ``argus-results.json``. Default ON since most users running
51+
# ``argus scan`` would expect the artifacts to be available for
52+
# forensics or manual triage; opt out via ``--no-keep-raw`` (CLI)
53+
# or ``reporting.keep_raw: false`` for tight CI environments.
54+
keep_raw: bool = True
4755

4856

4957
@dataclass
@@ -208,6 +216,7 @@ def _parse_reporting_config(raw: dict | None) -> ReportingConfig:
208216
formats=raw.get("formats", ["terminal"]),
209217
severity_threshold=_parse_severity(raw.get("severity_threshold")),
210218
output_dir=raw.get("output_dir", "./argus-results"),
219+
keep_raw=bool(raw.get("keep_raw", True)),
211220
)
212221

213222

argus/core/engine.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ def __init__(self, config: ArgusConfig):
3535
self._no_cache: bool = False
3636
self._sbom_path: str | None = None
3737
self._sbom_format: str | None = None
38+
self._raw_output_root: str | None = None
3839

3940
def register_scanner(self, scanner: Scanner) -> None:
4041
"""Register a scanner instance for use by the engine."""
@@ -59,6 +60,7 @@ def run(
5960
use_default_excludes: bool = True,
6061
sbom_path: str | None = None,
6162
sbom_format: str | None = None,
63+
raw_output_dir: str | None = None,
6264
) -> ScanSummary:
6365
"""Run scanners and return an aggregated ScanSummary.
6466
@@ -79,6 +81,14 @@ def run(
7981
attribute is True, auto-enables them regardless of
8082
argus.yml, and threads the SBOM path through
8183
``config_dict['sbom_path']``.
84+
raw_output_dir: when set, ``_run_in_container`` copies each
85+
scanner's raw output files (``results.json``,
86+
``stdout.txt``, ``*.sarif``) into
87+
``<raw_output_dir>/<scanner_name>/`` before the
88+
per-scanner tempdir is cleaned up. Mirrors the
89+
container-scan flow's ``raw/`` artifact preservation
90+
so users can drill into individual scanner output
91+
regardless of which scan flow produced it.
8292
"""
8393
from .exclusions import build_exclusion_set, log_exclusion_set
8494

@@ -87,6 +97,7 @@ def run(
8797
self._use_default_excludes = use_default_excludes
8898
self._sbom_path = sbom_path
8999
self._sbom_format = sbom_format
100+
self._raw_output_root = raw_output_dir
90101

91102
# Validate sbom_format if provided
92103
if sbom_format is not None and sbom_format not in SBOM_FORMAT_EXTENSIONS:
@@ -710,6 +721,28 @@ def _run_in_container(
710721
result_files = [stdout_file]
711722
logger.debug("No output files — captured stdout (%d bytes)", len(proc.stdout))
712723

724+
# Persist raw scanner output (best-effort) before the
725+
# tempdir is wiped. Mirrors the container-scan flow's
726+
# ``raw/`` artifact preservation: every scanner gets its
727+
# own subdir under ``<raw_output_root>/<scanner.name>/``
728+
# so ``argus-results.json`` (the canonical artifact)
729+
# lives next to the per-scanner files (results.json,
730+
# *.sarif, stdout.txt) for forensics or manual triage.
731+
# Errors during copy are non-fatal — the scan succeeded,
732+
# the canonical JSON is still emitted upstream.
733+
if self._raw_output_root and result_files:
734+
try:
735+
target_dir = Path(self._raw_output_root) / scanner.name
736+
target_dir.mkdir(parents=True, exist_ok=True)
737+
for src in result_files:
738+
if src.exists() and src.stat().st_size > 0:
739+
shutil.copy2(src, target_dir / src.name)
740+
except OSError as exc:
741+
logger.warning(
742+
"Failed to persist raw output for '%s' under %s: %s",
743+
scanner.name, self._raw_output_root, exc,
744+
)
745+
713746
if result_files:
714747
logger.debug(
715748
"Output files: %s",

0 commit comments

Comments
 (0)