fix(engine): make container /output dir writable for non-root scanner users (#110)

eFAILution · web-flow · commit 79def1acb1d2 · 2026-05-05T12:51:00.000-04:00
Container scanners running as a non-root USER (e.g. our custom
images all use ``USER argus`` uid 1000) couldn't write
``/output/results.json`` on hosts where the invoking user has a
different uid (uid 501 on macOS being the canonical case). Python's
``tempfile.TemporaryDirectory`` creates dirs with mode 0o700, so
the cross-uid write fails with EACCES and the scanner exits without
writing anything. Argus' "produced no output files" warning was the
only signal, and the scan still rolled up to ``Status: PASS``.

Symptom matrix observed by user on argus 0.7.2.dev75:
  - bandit, opengrep, lint-dockerfile (custom images, uid 1000):
    "permission denied" on /output/results.json
  - gitleaks (official image, runs as root): worked fine
  - supply-chain (custom image, uid 1000): silent no-op

Fix
- ``ArgusEngine._run_in_container``: ``os.chmod(output_dir, 0o777)``
  immediately after ``TemporaryDirectory.__enter__``. Mode 0o777 is
  safe because the dir lives under ``tempfile.gettempdir()``, has a
  random name, holds only one scan's transient output (no secrets),
  and is removed at the end of the with-block.

Visibility for partial-execution failures
- When a container produces no output files and no stdout, mark the
  ScanResult with ``metadata["execution_failed"] = True`` plus an
  ``execution_failure_reason`` string carrying the exit code and a
  clipped stderr. This gives reporters and CI gates something
  structured to act on instead of grepping log lines.
- ``TerminalReporter._print_status`` surfaces a clear warning row
  above the PASS/FAIL line listing the failed scanners and pointing
  at ``--fail-on-scanner-error`` for hard CI gating. Threshold-
  driven PASS/FAIL is unchanged — the warning is a *separate* signal.
- New ``--fail-on-scanner-error`` flag on ``argus scan`` (default
  False, so existing default behavior stays): when set and any
  scanner had ``execution_failed=True``, the run exits ``EXIT_ERROR``
  even if the threshold check passes. CI users who require every
  configured scanner to actually run can opt in.

Tests (+7)
- ``test_engine.TestRunInContainer.test_container_output_dir_is_world_writable``
  — regression: tempdir mode is 0o777 by the time docker run starts.
- ``test_container_no_output_marks_execution_failed`` — exit-13 +
  empty /output produces a ScanResult with execution_failed metadata
  and a stderr-bearing reason.
- ``test_container_with_output_does_not_mark_execution_failed`` —
  successful runs stay clean.
- ``test_terminal.test_report_warns_on_execution_failure_above_pass_status``
  — terminal reporter prints the warning row above PASS, names the
  failed scanners, mentions the --fail-on-scanner-error flag.
- ``test_report_no_warning_when_all_scanners_produced_output`` —
  successful runs don't get the warning row.
- ``test_cli.test_fail_on_scanner_error_flag`` — flag parses as
  False by default and True when supplied.

Validation
- Real-world smoke test: ``argus scan bandit`` against the argus
  repo on a host with uid 501. Before: 0 findings, "produced no
  output files" warning. After: 153 findings, ``Output files:
  ['results.json']``, no warnings.
- Full SDK suite: 1400 passed.

Co-authored-by: eFAILution &lt;eFAILution@users.noreply.github.com&gt;
diff --git a/argus/cli.py b/argus/cli.py
@@ -522,6 +522,15 @@ def _build_scan_parser(subparsers: argparse._SubParsersAction) -> None:
         action="store_true",
         help="Abort immediately if any scanner fails instead of continuing.",
     )
+    scan_parser.add_argument(
+        "--fail-on-scanner-error",
+        action="store_true",
+        help="Exit non-zero when any scanner produced no output (typically "
+             "a uid-mismatch on /output, container crash, or wrong "
+             "entrypoint). Default behavior treats these as warnings so "
+             "partial scans still surface findings; opt in for hard CI "
+             "gates that require every configured scanner to actually run.",
+    )
     scan_parser.add_argument(
         "--timeout",
         type=int,
@@ -1301,11 +1310,29 @@ def _cmd_source_scan(args: argparse.Namespace) -> int:
     #   2. Otherwise, if any SBOM failed hard during a batch → EXIT_ERROR.
     #      This always fires AFTER every SBOM in the batch was attempted;
     #      we never abort the loop on the first failure.
-    #   3. Otherwise → EXIT_SUCCESS.
+    #   3. Otherwise, if --fail-on-scanner-error is set AND any scanner
+    #      produced no output → EXIT_ERROR. Opt-in so existing default
+    #      "warn but pass" behavior stays unchanged.
+    #   4. Otherwise → EXIT_SUCCESS.
+    scanner_execution_failures = [
+        r.scanner for r in summary.results
+        if r.metadata.get("execution_failed")
+    ]
     if not summary.passed:
         exit_code = EXIT_FINDINGS
     elif sbom_batch_failures:
         exit_code = EXIT_ERROR
+    elif (
+        getattr(args, "fail_on_scanner_error", False)
+        and scanner_execution_failures
+    ):
+        log.error(
+            "Exiting non-zero: %d scanner(s) produced no output (%s) and "
+            "--fail-on-scanner-error is set.",
+            len(scanner_execution_failures),
+            ", ".join(scanner_execution_failures),
+        )
+        exit_code = EXIT_ERROR
     else:
         exit_code = EXIT_SUCCESS
     finalize_manifest(manifest, summary=summary, exit_code=exit_code, output_dir=output_dir)
@@ -2026,6 +2053,7 @@ def _generate_zsh_completion(scanners: str) -> str:
                         '--no-spinner[Disable spinner]'
                         '--no-timestamp[Flat output directory]'
                         '--fail-fast[Abort on first failure]'
+                        '--fail-on-scanner-error[Exit non-zero if any scanner produced no output]'
                         '--timeout[Per-scanner timeout]:seconds:'
                         '--no-parallel[Run scanners sequentially]'
                         '--allow-local-versions[Skip version enforcement]'
@@ -2136,7 +2164,7 @@ def _generate_bash_completion(scanners: str) -> str:
                 --scan-type) COMPREPLY=($(compgen -W "baseline full" -- "$cur")); return ;;
                 --path|-p|--output-dir|-o|--config|-c|--output-vars) COMPREPLY=($(compgen -d -- "$cur")); return ;;
             esac
-            COMPREPLY=($(compgen -W "--path --config --output-dir --severity-threshold --format --interface --output-vars --list --verbose --no-spinner --no-timestamp --fail-fast --timeout --no-cache --no-parallel --allow-local-versions" -- "$cur"))
+            COMPREPLY=($(compgen -W "--path --config --output-dir --severity-threshold --format --interface --output-vars --list --verbose --no-spinner --no-timestamp --fail-fast --fail-on-scanner-error --timeout --no-cache --no-parallel --allow-local-versions" -- "$cur"))
             ;;
         report)
             if [ "$COMP_CWORD" -eq 2 ]; then
diff --git a/argus/core/engine.py b/argus/core/engine.py
@@ -609,6 +609,26 @@ def _run_in_container(
         abs_path = str(Path(path).resolve())
 
         with tempfile.TemporaryDirectory() as output_dir:
+            # Make the host-side temp dir world-writable BEFORE the
+            # container starts. Python's TemporaryDirectory creates
+            # dirs with mode 0o700 (owner-only). When a scanner image
+            # runs as a non-root user (e.g., bandit / opengrep / our
+            # custom images all use ``USER argus`` uid 1000) and the
+            # invoking host user has a different uid (commonly 501 on
+            # macOS), the container's process can't write
+            # ``/output/results.json`` and we get the silent "produced
+            # no output files" failure mode.
+            #
+            # Mode 0o777 is safe here:
+            #  - dir lives under ``tempfile.gettempdir()`` (host-only,
+            #    not network-shared)
+            #  - random name from ``mkdtemp`` (collision-resistant)
+            #  - removed at the end of this with-block
+            #  - holds only one scan's transient output (no secrets;
+            #    findings travel through ``parse_results`` and end up
+            #    in the user-specified output_dir, never here).
+            os.chmod(output_dir, 0o777)
+
             docker_cmd = [
                 self._runtime, "run", "--rm",
                 "-v", f"{abs_path}:/workspace:ro",
@@ -727,6 +747,26 @@ def _run_in_container(
 
             findings = []
             metadata_extra = {}
+            # Track scanner execution failures distinctly from "ran and
+            # found nothing". A scanner that produced no output files and
+            # no stdout most likely failed to run — could not write to
+            # /output (uid mismatch), crashed without flushing, or had
+            # the wrong entrypoint chain. We mark these on the ScanResult
+            # so the CLI / reporters can surface them, and so consumers
+            # who want hard CI gates can opt into ``--fail-on-scanner-error``
+            # without having to grep our log lines.
+            if not result_files:
+                metadata_extra["execution_failed"] = True
+                stderr_clipped = proc.stderr.strip()[:400]
+                if stderr_clipped:
+                    metadata_extra["execution_failure_reason"] = (
+                        f"no output files (exit={proc.returncode}). "
+                        f"stderr: {stderr_clipped}"
+                    )
+                else:
+                    metadata_extra["execution_failure_reason"] = (
+                        f"no output files and no stdout (exit={proc.returncode})"
+                    )
             if result_files and hasattr(scanner, "parse_results"):
                 parsed = scanner.parse_results(result_files[0])
                 # parse_results may return either a list of Findings,
diff --git a/argus/reporters/terminal.py b/argus/reporters/terminal.py
@@ -100,6 +100,24 @@ def _print_scanner_results(self, summary: ScanSummary) -> None:
             print()
 
     def _print_status(self, summary: ScanSummary) -> None:
+        # Scanner-execution failures (no output produced) are flagged
+        # separately so a single bad scanner image doesn't quietly slip
+        # past a "Status: PASS" line. The PASS/FAIL line still reflects
+        # *threshold* outcome; this row reflects *execution* outcome.
+        # CI-callers who want hard-fail behavior on missing output use
+        # ``--fail-on-scanner-error``.
+        failed = [
+            r.scanner for r in summary.results
+            if r.metadata.get("execution_failed")
+        ]
+        if failed:
+            names = ", ".join(failed)
+            print(f"Warning: {len(failed)} scanner(s) produced no output: {names}")
+            print("  These scanners likely failed to execute (uid mismatch on")
+            print("  /output mount, crashed, or wrong entrypoint). Re-run with")
+            print("  --verbose for stderr; pass --fail-on-scanner-error to fail")
+            print("  the scan when this happens.")
+
         if summary.passed:
             print("Status: PASS")
         else:
diff --git a/argus/tests/reporters/test_terminal.py b/argus/tests/reporters/test_terminal.py
@@ -77,6 +77,53 @@ def test_report_fail_status_with_threshold(self, capsys):
         output = capsys.readouterr().out
         assert "FAIL" in output
 
+    def test_report_warns_on_execution_failure_above_pass_status(self, capsys):
+        """Scanners that produced no output (execution_failed=True in
+        metadata) get a clear warning in the terminal output so a single
+        bad scanner image doesn't quietly slip past the PASS line."""
+        reporter = TerminalReporter()
+        summary = ScanSummary(
+            results=[
+                ScanResult(scanner="gitleaks", findings=[]),  # ran fine
+                ScanResult(
+                    scanner="bandit", findings=[],
+                    metadata={
+                        "execution_failed": True,
+                        "execution_failure_reason": (
+                            "no output files (exit=13). stderr: "
+                            "cannot open /output/results.json: permission denied"
+                        ),
+                    },
+                ),
+                ScanResult(
+                    scanner="opengrep", findings=[],
+                    metadata={"execution_failed": True},
+                ),
+            ],
+            severity_threshold=None,
+        )
+        reporter.report(summary)
+        output = capsys.readouterr().out
+
+        # Failed scanners are named, count is correct, and the hint
+        # points at --fail-on-scanner-error for hard CI gating.
+        assert "2 scanner(s) produced no output" in output
+        assert "bandit" in output
+        assert "opengrep" in output
+        assert "--fail-on-scanner-error" in output
+        # PASS status still renders below — execution failure is a
+        # separate signal from threshold compliance.
+        assert "PASS" in output
+
+    def test_report_no_warning_when_all_scanners_produced_output(self, capsys):
+        """Successful runs must not get the warning row."""
+        reporter = TerminalReporter()
+        summary = _make_summary()
+        reporter.report(summary)
+
+        output = capsys.readouterr().out
+        assert "produced no output" not in output
+
     def test_report_empty_results(self, capsys):
         reporter = TerminalReporter()
         summary = ScanSummary()
diff --git a/argus/tests/test_cli.py b/argus/tests/test_cli.py
@@ -812,6 +812,17 @@ def test_sbom_flag_default_none(self):
         args = parser.parse_args(["scan"])
         assert args.sbom is None
 
+    def test_fail_on_scanner_error_flag(self):
+        """--fail-on-scanner-error is opt-in (default False) so existing
+        ``argus scan`` users keep getting partial-scan PASS behavior;
+        CI callers who want hard fails set the flag."""
+        parser = build_parser()
+        args = parser.parse_args(["scan"])
+        assert args.fail_on_scanner_error is False
+
+        args = parser.parse_args(["scan", "--fail-on-scanner-error"])
+        assert args.fail_on_scanner_error is True
+
     def test_interface_flag_terminal(self):
         parser = build_parser()
         args = parser.parse_args(["scan", "--interface", "terminal"])
diff --git a/argus/tests/test_engine.py b/argus/tests/test_engine.py
@@ -1,5 +1,6 @@
 """Tests for argus.core.engine — ArgusEngine."""
 
+import os
 import subprocess
 from pathlib import Path
 
@@ -719,6 +720,91 @@ def mock_run(cmd, **kwargs):
         assert result.findings[0].title == "from stdout"
         assert captured_file["path"].name == "stdout.txt"
 
+    def test_container_output_dir_is_world_writable(self, monkeypatch):
+        """Regression: scanners running as USER non-root (uid 1000)
+        couldn't write /output/results.json on hosts with uid != 1000
+        because Python's TemporaryDirectory creates dirs mode 0o700.
+        Engine now chmods the dir 0o777 right after creation."""
+        engine = self._make_engine()
+        scanner = self._make_scanner()
+
+        monkeypatch.setattr(engine, "_pull_image", lambda img: True)
+        monkeypatch.setattr(engine, "_get_image_digest", lambda img: "sha256:abc")
+
+        captured_mode = {}
+
+        def mock_run(cmd, **_kwargs):
+            # The chmod happens before docker run, so by the time we're
+            # invoked the host-side temp dir already has the new mode.
+            for i, arg in enumerate(cmd):
+                if ":/output" in str(arg):
+                    host_dir = arg.split(":")[0]
+                    captured_mode["mode"] = os.stat(host_dir).st_mode & 0o777
+                    Path(host_dir).joinpath("results.json").write_text("{}")
+                    break
+            return subprocess.CompletedProcess(
+                args=cmd, returncode=0, stdout="", stderr="",
+            )
+
+        monkeypatch.setattr(subprocess, "run", mock_run)
+
+        engine._run_in_container(scanner, "/src", {})
+        # 0o777 means rwx for owner, group, other — every container
+        # uid can write to /output regardless of its image's USER.
+        assert captured_mode["mode"] == 0o777
+
+    def test_container_no_output_marks_execution_failed(self, monkeypatch):
+        """Empty result_files + no stdout means the container ran but
+        produced nothing. Mark the ScanResult so reporters and the
+        --fail-on-scanner-error gate can surface it instead of silently
+        rolling it up as an empty PASS."""
+        engine = self._make_engine()
+        scanner = self._make_scanner()
+
+        monkeypatch.setattr(engine, "_pull_image", lambda img: True)
+        monkeypatch.setattr(engine, "_get_image_digest", lambda img: "sha256:abc")
+
+        def mock_run(cmd, **_kwargs):
+            # Container exits 13 (permission denied) without writing
+            # anything to /output — the exact bug in the user report.
+            return subprocess.CompletedProcess(
+                args=cmd, returncode=13, stdout="",
+                stderr="cannot open /output/results.json: permission denied",
+            )
+
+        monkeypatch.setattr(subprocess, "run", mock_run)
+
+        result = engine._run_in_container(scanner, "/src", {})
+        assert result.findings == []
+        assert result.metadata.get("execution_failed") is True
+        # Reason carries the stderr for the user — they shouldn't have
+        # to bump the log level to find out why.
+        reason = result.metadata.get("execution_failure_reason", "")
+        assert "permission denied" in reason
+        assert "exit=13" in reason
+
+    def test_container_with_output_does_not_mark_execution_failed(self, monkeypatch):
+        """Successful container runs must not get the failure marker."""
+        engine = self._make_engine()
+        scanner = self._make_scanner(parse_results=lambda f: [])
+
+        monkeypatch.setattr(engine, "_pull_image", lambda img: True)
+        monkeypatch.setattr(engine, "_get_image_digest", lambda img: "sha256:abc")
+
+        def mock_run(cmd, **_kwargs):
+            for i, arg in enumerate(cmd):
+                if ":/output" in str(arg):
+                    Path(arg.split(":")[0]).joinpath("results.json").write_text("[]")
+                    break
+            return subprocess.CompletedProcess(
+                args=cmd, returncode=0, stdout="", stderr="",
+            )
+
+        monkeypatch.setattr(subprocess, "run", mock_run)
+
+        result = engine._run_in_container(scanner, "/src", {})
+        assert "execution_failed" not in result.metadata
+
     def test_container_custom_entrypoint(self, monkeypatch):
         engine = self._make_engine()
         scanner = self._make_scanner(container_entrypoint="/bin/custom")
diff --git a/docs/cli-reference.md b/docs/cli-reference.md
@@ -1,6 +1,6 @@
 # Argus CLI Reference (v0.7.2)
 
-> Auto-generated from argparse definitions on 2026-05-02.
+> Auto-generated from argparse definitions on 2026-05-05.
 > Do not edit manually — run `python -m scripts.ci.gen_cli_docs` to regenerate.
 
 Argus Security Scanner — comprehensive security scanning for your codebase
@@ -62,10 +62,11 @@ argus scan [-h] [--path PATH] [--config CONFIG]
                   [--output-vars FILE] [--exclude PATTERNS]
                   [--no-default-excludes] [--dry-run] [--sbom PATH]
                   [--interface {terminal,browser}] [--fail-fast]
-                  [--timeout SECONDS] [--no-parallel] [--allow-local-versions]
-                  [--no-cache] [--discover [PATH]] [--image REF]
-                  [--scanners SCANNERS] [--target URL] [--port PORT]
-                  [--env KEY=VALUE] [--scan-type {baseline,full}]
+                  [--fail-on-scanner-error] [--timeout SECONDS]
+                  [--no-parallel] [--allow-local-versions] [--no-cache]
+                  [--discover [PATH]] [--image REF] [--scanners SCANNERS]
+                  [--target URL] [--port PORT] [--env KEY=VALUE]
+                  [--scan-type {baseline,full}]
                   [--startup-timeout STARTUP_TIMEOUT]
                   [scanner]
 ```
@@ -94,6 +95,7 @@ argus scan [-h] [--path PATH] [--config CONFIG]
 | `--sbom` | Scan a pre-built SBOM or directory of SBOMs (CycloneDX JSON/XML, SPDX JSON/tag-value, or Syft JSON). When PATH is a directory, argus walks it recursively, sniffs each file, and scans every SBOM it finds. Auto-enables all SBOM-capable scanners (osv, grype, trivy) regardless of argus.yml. Filesystem scanners (bandit, gitleaks, ...) are skipped since they have nothing to scan. |  |
 | `--interface`, `-i` | After the scan completes, open a viewer on the just-written results. 'terminal' launches the TUI (requires 'argus-security[terminal]'); 'browser' launches the local web UI (requires 'argus-security[browser]'). (terminal, browser) |  |
 | `--fail-fast` | Abort immediately if any scanner fails instead of continuing. | `false` |
+| `--fail-on-scanner-error` | Exit non-zero when any scanner produced no output (typically a uid-mismatch on /output, container crash, or wrong entrypoint). Default behavior treats these as warnings so partial scans still surface findings; opt in for hard CI gates that require every configured scanner to actually run. | `false` |
 | `--timeout` | Per-scanner timeout in seconds. Scanners exceeding this limit are killed. |  |
 | `--no-parallel` | Run scanners sequentially instead of concurrently. | `false` |
 | `--allow-local-versions` | Allow local tool versions that differ from argus-pinned versions. Use in airgapped environments where tool updates are constrained. | `false` |