Skip to content

Commit 538d94a

Browse files
committed
feat(scan): persist raw per-scanner output for source scans
Extend the raw-output preservation already in place for container scans to cover source scans. ArgusEngine.run() now accepts raw_output_dir and copies each scanner's results.json / *.sarif / stdout.txt under <output_dir>/raw/<scanner>/ alongside the canonical argus-results.json — the same posture as the container flow, so forensics and manual triage have the same surface area regardless of which scan path produced the findings. The CLI gains a unified --no-keep-raw flag (moved out of the container-only group) and reporting.keep_raw replaces the container-scoped containers.keep_raw key. CLI flag wins on conflict; default remains keep-raw=true.
1 parent dbdd858 commit 538d94a

6 files changed

Lines changed: 222 additions & 24 deletions

File tree

argus.example.yml

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,15 @@ reporting:
5555
severity_threshold: high
5656
output_dir: "./argus-results"
5757

58+
# Persist each scanner's raw output (results.json, *.sarif,
59+
# stdout.txt) under ``<output_dir>/raw/<scanner>/`` alongside the
60+
# canonical argus-results.json. Default true — useful for
61+
# forensics, audit trails, and manual triage. Set false (or pass
62+
# --no-keep-raw on the CLI) to skip; saves a few MB per scan in
63+
# tight CI environments. Applies to both source scans (``argus
64+
# scan``) and container scans (``argus scan container``).
65+
keep_raw: true
66+
5867
# Container lifecycle targets (consumed by ``argus scan container``).
5968
# Defining anything under this top-level ``containers:`` key activates
6069
# config-driven container scans — no need to pass --image / --discover
@@ -102,14 +111,10 @@ reporting:
102111
# # Override which sub-scanners run; default is trivy + grype.
103112
# scanners: [trivy, grype, syft]
104113
#
105-
# # Persist raw per-scanner artifacts (trivy-results.json,
106-
# # grype-results.json, syft-sbom.json) under
107-
# # ``<output_dir>/raw/<image>/`` alongside the canonical
108-
# # argus-results.json. Default is true so the artifacts are
109-
# # available for forensics, audit, or manual triage. Set false
110-
# # (or pass --no-keep-raw on the CLI) to skip — saves a few MB
111-
# # per image when running in tight CI environments.
112-
# keep_raw: true
114+
# Note: raw scanner-output preservation is configured via the
115+
# unified ``reporting.keep_raw`` knob above — the same flag covers
116+
# trivy/grype/syft container outputs and source-scan scanner
117+
# outputs. No separate container-only setting is needed.
113118

114119
# Execution backend configuration
115120
execution:

argus/cli.py

Lines changed: 43 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -590,6 +590,19 @@ def _build_scan_parser(subparsers: argparse._SubParsersAction) -> None:
590590
help="Disable DB cache volume mounts. Forces scanners to re-download "
591591
"vulnerability databases on every container run.",
592592
)
593+
scan_parser.add_argument(
594+
"--no-keep-raw",
595+
action="store_true",
596+
dest="no_keep_raw",
597+
help="Do not persist raw per-scanner output files alongside the "
598+
"canonical argus-results.json. Source scans normally drop "
599+
"each scanner's results.json / *.sarif / stdout.txt under "
600+
"<output_dir>/raw/<scanner>/; container scans drop "
601+
"trivy-results.json / grype-results.json / syft-sbom.json "
602+
"under <output_dir>/raw/<image>/. Pass --no-keep-raw to "
603+
"skip that step in tight CI environments. The same effect "
604+
"is available via 'reporting.keep_raw: false' in argus.yml.",
605+
)
593606

594607
# Container-specific flags (used with: argus scan container)
595608
container_group = scan_parser.add_argument_group(
@@ -616,18 +629,6 @@ def _build_scan_parser(subparsers: argparse._SubParsersAction) -> None:
616629
default=None,
617630
help="Sub-scanners for container scanning: trivy,grype,syft (default: trivy,grype)",
618631
)
619-
container_group.add_argument(
620-
"--no-keep-raw",
621-
action="store_true",
622-
dest="no_keep_raw",
623-
help="Do not persist raw per-scanner output (trivy-results.json, "
624-
"grype-results.json, syft-sbom.json) under "
625-
"<output_dir>/raw/<image>/. By default raw artifacts are "
626-
"kept alongside the canonical argus-results.json so users "
627-
"can drill into individual scanner output for forensics or "
628-
"manual triage. Set ``containers.keep_raw: false`` in argus.yml "
629-
"for the same effect via config.",
630-
)
631632

632633
# ZAP DAST flags (used with: argus scan zap)
633634
dast_group = scan_parser.add_argument_group(
@@ -1147,6 +1148,15 @@ def _load_container_config(args: argparse.Namespace) -> dict:
11471148
)
11481149
config = dict(containers_section)
11491150

1151+
# Pull ``reporting.keep_raw`` from the same file so the
1152+
# container handler honors the unified config knob — same
1153+
# default-True semantics as ``_cmd_source_scan``. Stashed
1154+
# under a synthetic underscore key so it doesn't collide
1155+
# with any future ``containers:`` field a user might add.
1156+
reporting_section = file_config.get("reporting", {})
1157+
if isinstance(reporting_section, dict) and "keep_raw" in reporting_section:
1158+
config["_reporting_keep_raw"] = bool(reporting_section["keep_raw"])
1159+
11501160
# CLI overrides — explicit > implicit. --image and --discover both
11511161
# OVERWRITE the corresponding config keys so the user's intent is
11521162
# unambiguous (and so we don't accidentally double-scan an image
@@ -1260,6 +1270,18 @@ def _cmd_source_scan(args: argparse.Namespace) -> int:
12601270

12611271
log.info("Argus scan starting")
12621272

1273+
# Decide whether to persist raw per-scanner outputs alongside the
1274+
# canonical argus-results.json. Default ON — users running
1275+
# ``argus scan`` reasonably expect each scanner's raw results
1276+
# (results.json / *.sarif / stdout.txt) to be available for
1277+
# forensics or manual triage. Opt out via ``--no-keep-raw``
1278+
# (CLI) or ``reporting.keep_raw: false`` (argus.yml). CLI flag
1279+
# wins on conflict, matching the dispatcher's
1280+
# explicit-over-implicit posture used throughout.
1281+
keep_raw_config = getattr(config.reporting, "keep_raw", True)
1282+
keep_raw = bool(keep_raw_config) and not getattr(args, "no_keep_raw", False)
1283+
raw_output_root = str(Path(output_dir) / "raw") if keep_raw else None
1284+
12631285
# Build engine and register scanners
12641286
engine = ArgusEngine(config)
12651287

@@ -1364,6 +1386,7 @@ def _cmd_source_scan(args: argparse.Namespace) -> int:
13641386
use_default_excludes=not getattr(args, "no_default_excludes", False),
13651387
sbom_path=str(info.path),
13661388
sbom_format=info.format,
1389+
raw_output_dir=raw_output_root,
13671390
)
13681391
except Exception as exc:
13691392
log.error(
@@ -1404,6 +1427,7 @@ def _cmd_source_scan(args: argparse.Namespace) -> int:
14041427
allow_local_versions=getattr(args, "allow_local_versions", False),
14051428
no_cache=getattr(args, "no_cache", False),
14061429
use_default_excludes=not getattr(args, "no_default_excludes", False),
1430+
raw_output_dir=raw_output_root,
14071431
)
14081432
if args.verbose and getattr(engine, "_last_resolutions", None):
14091433
from argus.core.tool_config import format_resolutions_for_display
@@ -1752,7 +1776,13 @@ def _cmd_container_scan(
17521776
# ``containers.keep_raw: false`` (argus.yml). CLI flag wins on
17531777
# conflict, matching the rest of the dispatcher's
17541778
# explicit-over-implicit posture.
1755-
keep_raw_config = config.get("keep_raw", True)
1779+
# ``reporting.keep_raw`` is the unified config home for raw-output
1780+
# preservation; the legacy ``containers.keep_raw`` is still read
1781+
# as a fallback so configs from earlier in this PR's lifecycle
1782+
# don't break. CLI ``--no-keep-raw`` wins over both.
1783+
keep_raw_config = config.get(
1784+
"_reporting_keep_raw", config.get("keep_raw", True),
1785+
)
17561786
keep_raw = bool(keep_raw_config) and not getattr(args, "no_keep_raw", False)
17571787
if keep_raw:
17581788
config["_raw_output_root"] = str(Path(output_dir) / "raw")

argus/core/config.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,14 @@ class ReportingConfig:
4444
formats: list[str] = field(default_factory=lambda: ["terminal"])
4545
severity_threshold: Optional[Severity] = None
4646
output_dir: str = "./argus-results"
47+
# When True, the engine persists each scanner's raw output files
48+
# (results.json / *.sarif / stdout.txt) under
49+
# ``<output_dir>/raw/<scanner>/`` alongside the canonical
50+
# ``argus-results.json``. Default ON since most users running
51+
# ``argus scan`` would expect the artifacts to be available for
52+
# forensics or manual triage; opt out via ``--no-keep-raw`` (CLI)
53+
# or ``reporting.keep_raw: false`` for tight CI environments.
54+
keep_raw: bool = True
4755

4856

4957
@dataclass
@@ -208,6 +216,7 @@ def _parse_reporting_config(raw: dict | None) -> ReportingConfig:
208216
formats=raw.get("formats", ["terminal"]),
209217
severity_threshold=_parse_severity(raw.get("severity_threshold")),
210218
output_dir=raw.get("output_dir", "./argus-results"),
219+
keep_raw=bool(raw.get("keep_raw", True)),
211220
)
212221

213222

argus/core/engine.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ def __init__(self, config: ArgusConfig):
3535
self._no_cache: bool = False
3636
self._sbom_path: str | None = None
3737
self._sbom_format: str | None = None
38+
self._raw_output_root: str | None = None
3839

3940
def register_scanner(self, scanner: Scanner) -> None:
4041
"""Register a scanner instance for use by the engine."""
@@ -59,6 +60,7 @@ def run(
5960
use_default_excludes: bool = True,
6061
sbom_path: str | None = None,
6162
sbom_format: str | None = None,
63+
raw_output_dir: str | None = None,
6264
) -> ScanSummary:
6365
"""Run scanners and return an aggregated ScanSummary.
6466
@@ -79,6 +81,14 @@ def run(
7981
attribute is True, auto-enables them regardless of
8082
argus.yml, and threads the SBOM path through
8183
``config_dict['sbom_path']``.
84+
raw_output_dir: when set, ``_run_in_container`` copies each
85+
scanner's raw output files (``results.json``,
86+
``stdout.txt``, ``*.sarif``) into
87+
``<raw_output_dir>/<scanner_name>/`` before the
88+
per-scanner tempdir is cleaned up. Mirrors the
89+
container-scan flow's ``raw/`` artifact preservation
90+
so users can drill into individual scanner output
91+
regardless of which scan flow produced it.
8292
"""
8393
from .exclusions import build_exclusion_set, log_exclusion_set
8494

@@ -87,6 +97,7 @@ def run(
8797
self._use_default_excludes = use_default_excludes
8898
self._sbom_path = sbom_path
8999
self._sbom_format = sbom_format
100+
self._raw_output_root = raw_output_dir
90101

91102
# Validate sbom_format if provided
92103
if sbom_format is not None and sbom_format not in SBOM_FORMAT_EXTENSIONS:
@@ -710,6 +721,28 @@ def _run_in_container(
710721
result_files = [stdout_file]
711722
logger.debug("No output files — captured stdout (%d bytes)", len(proc.stdout))
712723

724+
# Persist raw scanner output (best-effort) before the
725+
# tempdir is wiped. Mirrors the container-scan flow's
726+
# ``raw/`` artifact preservation: every scanner gets its
727+
# own subdir under ``<raw_output_root>/<scanner.name>/``
728+
# so ``argus-results.json`` (the canonical artifact)
729+
# lives next to the per-scanner files (results.json,
730+
# *.sarif, stdout.txt) for forensics or manual triage.
731+
# Errors during copy are non-fatal — the scan succeeded,
732+
# the canonical JSON is still emitted upstream.
733+
if self._raw_output_root and result_files:
734+
try:
735+
target_dir = Path(self._raw_output_root) / scanner.name
736+
target_dir.mkdir(parents=True, exist_ok=True)
737+
for src in result_files:
738+
if src.exists() and src.stat().st_size > 0:
739+
shutil.copy2(src, target_dir / src.name)
740+
except OSError as exc:
741+
logger.warning(
742+
"Failed to persist raw output for '%s' under %s: %s",
743+
scanner.name, self._raw_output_root, exc,
744+
)
745+
713746
if result_files:
714747
logger.debug(
715748
"Output files: %s",

argus/tests/test_engine.py

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -805,6 +805,127 @@ def mock_run(cmd, **_kwargs):
805805
result = engine._run_in_container(scanner, "/src", {})
806806
assert "execution_failed" not in result.metadata
807807

808+
def test_raw_output_dir_persists_per_scanner_files(self, monkeypatch, tmp_path):
809+
"""When ``raw_output_dir`` is set, the engine copies each
810+
scanner's raw output (results.json / *.sarif / stdout.txt)
811+
into ``<raw_output_dir>/<scanner.name>/`` before the tempdir
812+
is wiped. Mirrors the container-scan flow's ``raw/`` artifact
813+
preservation so source scans aren't an inconsistent
814+
second-class case."""
815+
engine = self._make_engine()
816+
scanner = self._make_scanner(parse_results=lambda f: [])
817+
engine._raw_output_root = str(tmp_path / "raw")
818+
819+
monkeypatch.setattr(engine, "_pull_image", lambda img: True)
820+
monkeypatch.setattr(engine, "_get_image_digest", lambda img: "sha256:abc")
821+
822+
def mock_run(cmd, **_kwargs):
823+
for i, arg in enumerate(cmd):
824+
if ":/output" in str(arg):
825+
Path(arg.split(":")[0]).joinpath("results.json").write_text(
826+
'{"findings": []}'
827+
)
828+
break
829+
return subprocess.CompletedProcess(
830+
args=cmd, returncode=0, stdout="", stderr="",
831+
)
832+
833+
monkeypatch.setattr(subprocess, "run", mock_run)
834+
engine._run_in_container(scanner, "/src", {})
835+
836+
# File landed at <raw_output_root>/<scanner_name>/results.json
837+
# — the per-scanner subdir keeps multi-scanner runs from
838+
# colliding on common filenames.
839+
persisted = tmp_path / "raw" / scanner.name / "results.json"
840+
assert persisted.exists()
841+
# Contents survived intact.
842+
assert "findings" in persisted.read_text()
843+
844+
def test_raw_output_dir_none_does_not_persist_files(
845+
self, monkeypatch, tmp_path,
846+
):
847+
"""Default behavior — ``raw_output_dir`` unset — leaves no
848+
per-scanner artifacts on disk after the run. Confirms the
849+
copy step is opt-in, not always-on."""
850+
engine = self._make_engine()
851+
scanner = self._make_scanner(parse_results=lambda f: [])
852+
# Explicitly None (the default).
853+
engine._raw_output_root = None
854+
855+
monkeypatch.setattr(engine, "_pull_image", lambda img: True)
856+
monkeypatch.setattr(engine, "_get_image_digest", lambda img: "sha256:abc")
857+
858+
def mock_run(cmd, **_kwargs):
859+
for i, arg in enumerate(cmd):
860+
if ":/output" in str(arg):
861+
Path(arg.split(":")[0]).joinpath("results.json").write_text("{}")
862+
break
863+
return subprocess.CompletedProcess(
864+
args=cmd, returncode=0, stdout="", stderr="",
865+
)
866+
867+
monkeypatch.setattr(subprocess, "run", mock_run)
868+
engine._run_in_container(scanner, "/src", {})
869+
870+
# tmp_path is otherwise untouched.
871+
assert list(tmp_path.iterdir()) == []
872+
873+
def test_raw_output_persists_stdout_fallback(self, monkeypatch, tmp_path):
874+
"""Some scanners write to stdout instead of a file (e.g.
875+
ClamAV). The engine captures that as ``stdout.txt`` in the
876+
per-scanner tempdir; raw-output preservation should pick it
877+
up the same way it picks up regular result files."""
878+
engine = self._make_engine()
879+
scanner = self._make_scanner(parse_results=lambda f: [])
880+
engine._raw_output_root = str(tmp_path / "raw")
881+
882+
monkeypatch.setattr(engine, "_pull_image", lambda img: True)
883+
monkeypatch.setattr(engine, "_get_image_digest", lambda img: "sha256:abc")
884+
885+
def mock_run(cmd, **_kwargs):
886+
return subprocess.CompletedProcess(
887+
args=cmd, returncode=0,
888+
stdout="scanner output line 1\nscanner output line 2\n",
889+
stderr="",
890+
)
891+
892+
monkeypatch.setattr(subprocess, "run", mock_run)
893+
engine._run_in_container(scanner, "/src", {})
894+
895+
persisted = tmp_path / "raw" / scanner.name / "stdout.txt"
896+
assert persisted.exists()
897+
assert "scanner output line" in persisted.read_text()
898+
899+
def test_raw_output_skips_zero_byte_files(self, monkeypatch, tmp_path):
900+
"""0-byte files are explicit failure signals upstream
901+
(``_validate_scanner_output`` rejects them in the container
902+
sub-scanners). Don't persist them — making known-broken
903+
output look authoritative on disk would mislead anyone
904+
triaging from the saved artifacts."""
905+
engine = self._make_engine()
906+
scanner = self._make_scanner(parse_results=lambda f: [])
907+
engine._raw_output_root = str(tmp_path / "raw")
908+
909+
monkeypatch.setattr(engine, "_pull_image", lambda img: True)
910+
monkeypatch.setattr(engine, "_get_image_digest", lambda img: "sha256:abc")
911+
912+
def mock_run(cmd, **_kwargs):
913+
for i, arg in enumerate(cmd):
914+
if ":/output" in str(arg):
915+
Path(arg.split(":")[0]).joinpath("results.json").touch()
916+
break
917+
return subprocess.CompletedProcess(
918+
args=cmd, returncode=0, stdout="", stderr="",
919+
)
920+
921+
monkeypatch.setattr(subprocess, "run", mock_run)
922+
engine._run_in_container(scanner, "/src", {})
923+
924+
target_dir = tmp_path / "raw" / scanner.name
925+
# Either no dir created or empty — never a 0-byte stub.
926+
if target_dir.exists():
927+
assert not list(target_dir.iterdir())
928+
808929
def test_container_custom_entrypoint(self, monkeypatch):
809930
engine = self._make_engine()
810931
scanner = self._make_scanner(container_entrypoint="/bin/custom")

docs/cli-reference.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,8 @@ argus scan [-h] [--path PATH] [--config CONFIG]
6464
[--interface {terminal,browser}] [--fail-fast]
6565
[--fail-on-scanner-error] [--timeout SECONDS]
6666
[--no-parallel] [--allow-local-versions] [--no-cache]
67-
[--discover [PATH]] [--image REF] [--scanners SCANNERS]
68-
[--no-keep-raw] [--target URL] [--port PORT]
67+
[--no-keep-raw] [--discover [PATH]] [--image REF]
68+
[--scanners SCANNERS] [--target URL] [--port PORT]
6969
[--env KEY=VALUE] [--scan-type {baseline,full}]
7070
[--startup-timeout STARTUP_TIMEOUT]
7171
[scanner]
@@ -100,6 +100,7 @@ argus scan [-h] [--path PATH] [--config CONFIG]
100100
| `--no-parallel` | Run scanners sequentially instead of concurrently. | `false` |
101101
| `--allow-local-versions` | Allow local tool versions that differ from argus-pinned versions. Use in airgapped environments where tool updates are constrained. | `false` |
102102
| `--no-cache` | Disable DB cache volume mounts. Forces scanners to re-download vulnerability databases on every container run. | `false` |
103+
| `--no-keep-raw` | Do not persist raw per-scanner output files alongside the canonical argus-results.json. Source scans normally drop each scanner's results.json / *.sarif / stdout.txt under <output_dir>/raw/<scanner>/; container scans drop trivy-results.json / grype-results.json / syft-sbom.json under <output_dir>/raw/<image>/. Pass --no-keep-raw to skip that step in tight CI environments. The same effect is available via 'reporting.keep_raw: false' in argus.yml. | `false` |
103104

104105
**Container Scanning:**
105106

@@ -108,7 +109,6 @@ argus scan [-h] [--path PATH] [--config CONFIG]
108109
| `--discover` | Discover Dockerfiles in PATH (default: current directory) | |
109110
| `--image` | Container image to scan (can be repeated) | |
110111
| `--scanners` | Sub-scanners for container scanning: trivy,grype,syft (default: trivy,grype) | |
111-
| `--no-keep-raw` | Do not persist raw per-scanner output (trivy-results.json, grype-results.json, syft-sbom.json) under <output_dir>/raw/<image>/. By default raw artifacts are kept alongside the canonical argus-results.json so users can drill into individual scanner output for forensics or manual triage. Set ``containers.keep_raw: false`` in argus.yml for the same effect via config. | `false` |
112112

113113
**Dast Scanning:**
114114

0 commit comments

Comments
 (0)