Skip to content

Commit 651c10b

Browse files
authored
fix(scanners): correct execution-failure signaling and OSV container entrypoint (#125)
* fix(scanners): correct execution-failure signaling and OSV container entrypoint Five related correctness fixes around scanner execution and status reporting. Each is a small, targeted change with a regression test; the linux-default behavior is unchanged on every path that wasn't broken. * OSV container exit 127: ``container_entrypoint`` is now the absolute ``/osv-scanner`` path. Docker's ``--entrypoint`` override does not consult the image's ``$PATH``, and ``ghcr.io/google/osv-scanner`` declares ``ENTRYPOINT ["/osv-scanner"]``. The bare name resolved nowhere. * lint-yaml false PASS: yamllint exit >= 2 with empty stdout used to flow through ``_parse_output('') -> []`` and render ``Status: PASS``, hiding bad-config failures. ``YamllintLinter.scan`` now branches on the exit code: 0 = clean, 1 = findings, >= 2 with empty findings = ``execution_failed`` + reason from stderr. * Unified failure metadata contract: ``run_subprocess_scan`` now emits ``metadata['execution_failed']`` + ``metadata['execution_failure_reason']`` on every failure path (FileNotFoundError, TimeoutExpired, no-output). Previously it used ``metadata['error']`` while the engine's container path used ``execution_failed``; the divergence meant local-path failures were invisible to the reporter's warning row and to ``--fail-on-scanner-error``. * Reporter degraded status: ``TerminalReporter`` now prints ``Status: PASS (degraded - some scanners did not run)`` when the threshold check passes but at least one scanner has ``execution_failed=True``. Threshold compliance and execution success are independent signals; the previous output had a ``Warning: ...`` row directly followed by ``Status: PASS`` which contradicted itself. ``ScanSummary.passed`` semantics are unchanged. * Windows chmod skip: ``os.chmod(output_dir, 0o777)`` in ``_run_in_container`` is now guarded by ``platform.system() != 'Windows'``. NTFS doesn't honor POSIX bits, ``os.chmod`` only flips the read-only attribute on Windows, and the macOS uid-mismatch failure mode the chmod guards against doesn't apply on Windows (Docker Desktop's bind-mount uid mapping is different). Linux/macOS keep the chmod - covered by a new test that asserts mode 0o777 on non-Windows hosts. Tests: * New: ``test_yamllint.py`` covers the full exit-code matrix (0 / 1 / 1+findings / 2+empty / 99+empty) * New: OSV ``test_container_entrypoint_uses_absolute_path`` regression * New: terminal ``test_report_status_pass_degraded_when_any_scanner_failed``, ``test_report_status_pass_clean_when_no_failures``, ``test_report_status_fail_takes_priority_over_degraded`` * New: engine ``test_chmod_skipped_on_windows`` and ``test_chmod_runs_on_non_windows`` - explicit Linux-safety guard * Updated: ``test_scanner_template`` and ``test_scanner_scan_methods`` switched from the old ``metadata['error']`` key to ``metadata['execution_failed']`` to match the unified contract Full SDK suite: 1640 passed, 8 skipped. .ai/errors.yaml: added three new entries (exit-127 entrypoint, yamllint silent-failure, and reporter degraded-status) so future agents recognize the patterns. * fix(scanners): surface parse failures distinctly and use per-scanner failure reasons Follow-up correctness pass on top of the previous patch. The first round addressed the silent-failure paths; this round addresses how those signals are presented and adds a fourth state the user asked for explicitly. Adapter exit-code semantics (locked in by tests): * OSV-scanner: 0 = no vulns, 1 = vulns found (happy path), >= 2 with empty output = real runtime failure. Engine parses results.json whenever the file exists; exit code is irrelevant. * yamllint: 0 = clean, 1 = lint violations (happy path), >= 2 with empty stdout = real runtime/config failure. Changes: * New 'parse_failed' state. The user asked for four distinct states: ran cleanly / ran-with-findings / didn't run / ran-but-output-unparsable. The fourth state was previously a raised exception that propagated up to the engine's exception handler and got rolled up as a generic "scanner failed". Now: - engine._run_in_container wraps scanner.parse_results() in try/except. On exception: metadata['parse_failed']=True and metadata['parse_failure_reason']=<exc + clipped output head>. Other scanners' results keep flowing. - scanner_template.run_subprocess_scan does the same for the local-execution path. - parse_failed is orthogonal to execution_failed; reporters render them in separate warning blocks. * Reporter no longer emits generic guesses. The previous "uid mismatch / crashed / wrong entrypoint" boilerplate was misleading for OSV exit-1-with-findings and yamllint binary-not-found cases. TerminalReporter now prints the per-scanner reason (execution_failure_reason or parse_failure_reason) verbatim, in bullet form, with the scanner name. Status label gets a richer breakdown: "PASS (degraded - 1 did not run, 1 unparsable)". * yamllint hardened against FileNotFoundError. is_available() is checked before scan(), but the binary can disappear in the gap (CI cleanup race, manual uninstall). scan() now catches FileNotFoundError and returns a clean execution_failed ScanResult instead of letting the exception escape into the engine's exception handler (which would render a stack trace). * --fail-on-scanner-error fires on parse_failed too. Both states represent "the scan didn't fully succeed"; from a CI gating perspective they're equivalent. Cross-platform safety: * No platform-specific branches added or modified. Linux scanner flows are unchanged - the new parse-failed wrapping fires only when the parser raises (a previously-broken path everywhere). * Adapter-specific exit-code handling stays at the adapter layer (yamllint owns its >= 2 rule, OSV owns its "exit code irrelevant to parsing" rule). No global "non-zero means failure" logic was added. * Existing output formats and result schemas stable: parse_failed is additive metadata; ScanSummary.passed semantics unchanged (still threshold-only). Tests added (8 new): * OSV: parse_results succeeds-with-findings on real fixture (exit-1 acceptance case) * OSV: parse_results returns [] on zero-findings fixture without raising * OSV: parse_results raises on malformed JSON (so the engine wrapper catches as parse_failed) * yamllint: FileNotFoundError returns execution_failed, no exception escapes * engine: parse exception in container path -> parse_failed (not execution_failed, not raised) * terminal: warning block shows per-scanner reason, no generic text * terminal: parse_failed renders distinctly from execution_failed * terminal: degraded status label lists both kinds when both present Tests updated (1): * test_scanner_template's test_unexpected_exception_propagates renamed/rewritten to test_parse_exception_emits_parse_failed_metadata reflecting the new contract. Full SDK suite: 1648 passed, 8 skipped. .ai/errors.yaml: degraded-status entry refreshed with new label shape; new entry added for the parse_failed pattern. * fix(windows): yamllint AppLocker fallback and UTF-8 encoding for container output Two Windows-specific correctness fixes. Linux behavior is unchanged. Bug 1 — yamllint PermissionError on Windows (AppLocker / SRP): On Windows hosts where Software Restriction Policy or AppLocker blocks executable launches from user AppData paths, the direct ``yamllint.exe`` invocation raises ``PermissionError [WinError 5] Access is denied`` mid-scan. The python interpreter itself is typically whitelisted, so loading the same package via ``python -m yamllint`` works on the same machine and is argv-compatible (yamllint's __main__ accepts the same flags). Fix: ``YamllintLinter._run_with_windows_fallback`` wraps the subprocess.run call. On ``sys.platform == 'win32'``, a PermissionError/OSError on the direct invocation triggers a retry with [sys.executable, '-m', 'yamllint'] + cmd[1:]. FileNotFoundError is re-raised unchanged so 'yamllint not installed' still renders as a clean execution_failed (the fallback wouldn't help — yamllint isn't installed at all). Linux is fully bypassed: AppLocker doesn't exist there, so a PermissionError on Linux indicates a genuine permission bug (chmod, mount options) the user needs to see, not a policy case the fallback compensates for. The Linux invocation path is byte-identical to before — same single subprocess.run call, same PATH lookup, same exception handling. Bug 2 — UnicodeDecodeError reading container output: Docker container output (and most CLI scanner output) is UTF-8. ``subprocess.run(text=True)`` and ``Path.read_text()`` fall back to the platform default when ``encoding=`` is omitted — cp1252 on Windows. Any non-ASCII byte (CVE descriptions with accented characters, file paths with unicode segments, scanner banners with arrow glyphs) raises UnicodeDecodeError mid-scan. Fix: explicit ``encoding='utf-8', errors='replace'`` on every container-output consumer: * engine._run_in_container's docker subprocess.run * scanner_template.run_subprocess_scan's subprocess.run * yamllint's subprocess.run (both direct + Windows fallback) * All 14 scanner.parse_results .read_text() calls (bandit, checkov, clamav, container, gitleaks, grype, opengrep, osv, supply_chain, trivy, trivy_iac, zap) * Engine + scanner_template parse-failed file-head reads errors='replace' over 'strict' is intentional: a security tool that shows U+FFFD on undecodable bytes is better than one that crashes on otherwise-usable scanner output. Linux unaffected since LANG=C.UTF-8 is the universal default; this only changes behavior on hosts whose default ISN'T UTF-8. Tests added (5 new): * yamllint: PermissionError on Windows triggers python -m fallback, second subprocess call uses sys.executable -m yamllint * yamllint: PermissionError on Linux does NOT fall back, raised through to outer handler as execution_failed * yamllint: subprocess.run gets encoding='utf-8' + errors='replace' * engine: docker subprocess.run gets encoding='utf-8' + errors='replace' * OSV: parse_results round-trips non-ASCII UTF-8 content (CVE summaries with accents, file paths with unicode) without raising — the user's reported decode crash on byte 0x8f at position 18203 Full SDK suite: 1653 passed, 8 skipped. .ai/errors.yaml: two new entries — Windows AppLocker pattern and the cp1252 UnicodeDecodeError pattern, both with the exact platform-guard rationale. --------- Co-authored-by: eFAILution <eFAILution@users.noreply.github.com>
1 parent 3ccf528 commit 651c10b

24 files changed

Lines changed: 1199 additions & 107 deletions

.ai/errors.yaml

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -374,3 +374,138 @@ scn_detector_errors:
374374
how: "Set enable_ai_fallback: true and provide ANTHROPIC_API_KEY"
375375
- description: "Review manually"
376376
how: "MANUAL_REVIEW is the expected safe default for unrecognized changes"
377+
378+
# ==============================================================================
379+
# SCANNER EXECUTION FAILURES
380+
# ==============================================================================
381+
382+
- pattern: "exit 127|exec.*not found"
383+
category: scanner-execution
384+
context: "Container scanner exits 127 immediately (e.g. osv-scanner)"
385+
386+
root_cause: |
387+
Docker's --entrypoint override does NOT consult the image's $PATH
388+
when given a bare binary name. Some images declare an absolute
389+
entrypoint (e.g. ghcr.io/google/osv-scanner uses
390+
ENTRYPOINT ["/osv-scanner"]) and require the absolute path on
391+
the override. A bare ``osv-scanner`` resolves nowhere and exits
392+
127.
393+
394+
solution:
395+
steps:
396+
- "Run ``docker image inspect <image>`` and read ``Config.Entrypoint``"
397+
- "Set ``container_entrypoint`` on the scanner class to the absolute path from that field"
398+
verification: "Engine strips argv[0] for ENTRYPOINT-based images, so build_args may keep the bare name"
399+
400+
reference: "argus/scanners/osv.py — see container_entrypoint = '/osv-scanner'"
401+
402+
- pattern: "yamllint.*PASS.*0 findings|lint-yaml.*passes despite errors"
403+
category: scanner-execution
404+
context: "Linter exits non-zero but reports zero findings and Status: PASS"
405+
406+
root_cause: |
407+
Tools like yamllint use exit codes to differentiate happy-path
408+
lint failures (exit 1, with parseable findings on stdout) from
409+
runtime errors (exit ≥ 2, empty stdout). Naive callers that map
410+
empty stdout to ``[]`` lose the runtime-error signal entirely
411+
and surface ``Status: PASS`` even when the tool failed to run.
412+
413+
solution:
414+
steps:
415+
- "Distinguish exit codes: 0 = clean, 1 = findings, ≥2 = real failure"
416+
- "When exit ≥ 2 with empty findings, set metadata['execution_failed']=True and metadata['execution_failure_reason']=<stderr summary>"
417+
- "Use the same shape the engine container path emits — the terminal reporter and --fail-on-scanner-error key off these exact field names"
418+
419+
reference: "argus/linters/yamllint.py — scan() exit-code branching"
420+
421+
- pattern: "scanner produced no output|execution_failed"
422+
category: scanner-execution
423+
context: "Reporter shows Warning row but Status: PASS contradicts it"
424+
425+
root_cause: |
426+
Threshold compliance (``ScanSummary.passed``) and execution
427+
success are independent signals. A scanner that fails to run
428+
produces zero findings, which alone passes any threshold. The
429+
reporter must label PASS as ``PASS (degraded — some scanners
430+
did not run)`` whenever any scanner has
431+
metadata['execution_failed']=True, otherwise the Warning above
432+
and the Status below contradict each other.
433+
434+
solution:
435+
steps:
436+
- "Reporter checks for any result.metadata.get('execution_failed')"
437+
- "If passed and any failed: print 'Status: PASS (degraded — N did not run, M unparsable)'"
438+
- "Add --fail-on-scanner-error in CI for hard-fail behavior"
439+
440+
reference: "argus/reporters/terminal.py::_print_status"
441+
442+
- pattern: "0 findings.*known to be vulnerable|JSONDecodeError.*results.json"
443+
category: scanner-execution
444+
context: "Scanner produced output but parser couldn't interpret it (third state)"
445+
446+
root_cause: |
447+
Distinct from execution-failure: the scanner ran, exited, and
448+
wrote a results file — we just couldn't parse what came out
449+
(schema drift, truncated JSON, mixed text+JSON). Previously
450+
this surfaced as a stack trace from the engine's exception
451+
handler and got rolled up as a generic "scanner failed".
452+
Reporters and CI gates now have a fourth state for it:
453+
``parse_failed`` + ``parse_failure_reason`` (carries the
454+
exception type and a clipped output head).
455+
456+
solution:
457+
steps:
458+
- "Engine's container path wraps scanner.parse_results in try/except"
459+
- "On any Exception, set metadata['parse_failed']=True and metadata['parse_failure_reason']"
460+
- "scanner_template.run_subprocess_scan does the same for local execution"
461+
- "TerminalReporter renders parse_failed in its own warning block"
462+
- "--fail-on-scanner-error fires on parse_failed too"
463+
464+
reference: "argus/core/engine.py::_run_in_container — try/except around parse_results"
465+
466+
# ==============================================================================
467+
# WINDOWS-SPECIFIC SCANNER ERRORS
468+
# ==============================================================================
469+
470+
- pattern: "PermissionError.*WinError 5.*Access is denied|yamllint.*Access is denied"
471+
category: scanner-execution
472+
context: "yamllint launches with PermissionError on Windows hosts with AppLocker / SRP"
473+
474+
root_cause: |
475+
AppLocker or Software Restriction Policy on Windows blocks
476+
executable launches from user AppData paths (where pip --user
477+
and virtualenv typically install scripts). The python
478+
interpreter itself is whitelisted, so loading the same
479+
package via ``python -m yamllint`` works on the same
480+
machine.
481+
482+
solution:
483+
steps:
484+
- "YamllintLinter._run_with_windows_fallback wraps subprocess.run"
485+
- "On sys.platform == 'win32', PermissionError/OSError triggers a retry with [sys.executable, '-m', 'yamllint'] + cmd[1:]"
486+
- "FileNotFoundError still propagates so 'yamllint not installed' renders cleanly"
487+
- "Linux/macOS bypass the fallback — Linux PermissionError is a genuine bug, not a policy case"
488+
489+
reference: "argus/linters/yamllint.py::_run_with_windows_fallback"
490+
491+
- pattern: "UnicodeDecodeError.*charmap codec.*can't decode byte"
492+
category: scanner-execution
493+
context: "Scanner result decode fails on Windows with cp1252 default encoding"
494+
495+
root_cause: |
496+
Docker container output (and most CLI tool output) is UTF-8.
497+
``subprocess.run(text=True)`` and ``Path.read_text()`` fall
498+
back to the platform default encoding when ``encoding=`` is
499+
omitted — cp1252 on Windows. Any non-ASCII byte (CVE
500+
descriptions, accented file paths, scanner banners) raises
501+
UnicodeDecodeError mid-scan.
502+
503+
solution:
504+
steps:
505+
- "Engine docker subprocess: encoding='utf-8', errors='replace'"
506+
- "scanner_template subprocess: same"
507+
- "All scanner.parse_results read_text() calls: same"
508+
- "Yamllint subprocess + Windows fallback: same"
509+
note: "errors='replace' is preferred over 'strict' — a security tool showing � is better than crashing on otherwise-usable output"
510+
511+
reference: "argus/core/engine.py and every argus/scanners/*.py read_text()"

argus/cli.py

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1693,20 +1693,37 @@ def _cmd_source_scan(args: argparse.Namespace) -> int:
16931693
r.scanner for r in summary.results
16941694
if r.metadata.get("execution_failed")
16951695
]
1696+
scanner_parse_failures = [
1697+
r.scanner for r in summary.results
1698+
if r.metadata.get("parse_failed")
1699+
]
16961700
if not summary.passed:
16971701
exit_code = EXIT_FINDINGS
16981702
elif sbom_batch_failures:
16991703
exit_code = EXIT_ERROR
17001704
elif (
17011705
getattr(args, "fail_on_scanner_error", False)
1702-
and scanner_execution_failures
1706+
and (scanner_execution_failures or scanner_parse_failures)
17031707
):
1704-
log.error(
1705-
"Exiting non-zero: %d scanner(s) produced no output (%s) and "
1706-
"--fail-on-scanner-error is set.",
1707-
len(scanner_execution_failures),
1708-
", ".join(scanner_execution_failures),
1709-
)
1708+
# Both states represent "the scan didn't fully succeed":
1709+
# execution_failed = couldn't run; parse_failed = ran but
1710+
# output unintelligible. From a CI gating perspective they're
1711+
# equivalent — the user asked for a hard fail when scanners
1712+
# don't deliver clean results.
1713+
if scanner_execution_failures:
1714+
log.error(
1715+
"Exiting non-zero: %d scanner(s) did not run cleanly "
1716+
"(%s) and --fail-on-scanner-error is set.",
1717+
len(scanner_execution_failures),
1718+
", ".join(scanner_execution_failures),
1719+
)
1720+
if scanner_parse_failures:
1721+
log.error(
1722+
"Exiting non-zero: %d scanner(s) produced unparsable "
1723+
"output (%s) and --fail-on-scanner-error is set.",
1724+
len(scanner_parse_failures),
1725+
", ".join(scanner_parse_failures),
1726+
)
17101727
exit_code = EXIT_ERROR
17111728
else:
17121729
exit_code = EXIT_SUCCESS

argus/core/engine.py

Lines changed: 79 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import logging
44
import os
5+
import platform
56
import shutil
67
import subprocess
78
import tempfile
@@ -674,7 +675,15 @@ def _run_in_container(
674675
# - holds only one scan's transient output (no secrets;
675676
# findings travel through ``parse_results`` and end up
676677
# in the user-specified output_dir, never here).
677-
os.chmod(output_dir, 0o777)
678+
#
679+
# Skip on Windows: NTFS doesn't honor POSIX bits, ``os.chmod``
680+
# only flips the read-only attribute, and Docker Desktop on
681+
# Windows handles uid mapping for bind mounts differently
682+
# (it doesn't suffer from the macOS uid-mismatch failure mode
683+
# this guard exists for). Calling ``chmod 0o777`` there is
684+
# at best a no-op and at worst confusing in stack traces.
685+
if platform.system() != "Windows":
686+
os.chmod(output_dir, 0o777)
678687

679688
docker_cmd = [
680689
self._runtime, "run", "--rm",
@@ -738,10 +747,21 @@ def _run_in_container(
738747
)
739748

740749
start = time.monotonic()
750+
# Docker container output is always UTF-8. Without
751+
# ``encoding='utf-8'``, ``text=True`` falls back to the
752+
# platform default — cp1252 on Windows — which raises
753+
# ``UnicodeDecodeError`` on any non-ASCII byte the
754+
# scanner emits (CVE descriptions, file paths with
755+
# non-ASCII characters, etc.). ``errors='replace'`` is
756+
# a safe fallback over ``strict``: a security tool
757+
# showing ``�`` is better than crashing the whole
758+
# scan on output we'd otherwise be able to use.
741759
proc = subprocess.run(
742760
docker_cmd,
743761
capture_output=True,
744762
text=True,
763+
encoding="utf-8",
764+
errors="replace",
745765
)
746766
elapsed = int((time.monotonic() - start) * 1000)
747767

@@ -854,35 +874,65 @@ def _run_in_container(
854874
f"no output files and no stdout (exit={proc.returncode})"
855875
)
856876
if result_files and hasattr(scanner, "parse_results"):
857-
parsed = scanner.parse_results(result_files[0])
858-
# parse_results may return either a list of Findings,
859-
# a ``(list, int)`` tuple (legacy passed_count channel,
860-
# used by linters), or a ``(list, dict)`` tuple (extra
861-
# metadata merged into ScanResult.metadata — used by
862-
# Grype to flag "source.target=unknown" which means
863-
# "couldn't identify packages" rather than "nothing
864-
# vulnerable").
865-
if isinstance(parsed, tuple):
866-
findings, extra = parsed
867-
if isinstance(extra, int):
868-
metadata_extra["passed_count"] = extra
869-
elif isinstance(extra, dict):
870-
metadata_extra.update(extra)
871-
# Warn at the engine layer too so the signal is
872-
# visible even when a reporter doesn't render
873-
# per-scanner metadata.
874-
if "warning" in extra:
875-
logger.warning(
876-
"Scanner '%s': %s",
877-
scanner.name, extra["warning"],
878-
)
877+
try:
878+
parsed = scanner.parse_results(result_files[0])
879+
except Exception as exc:
880+
# Scanner produced output but the parser couldn't
881+
# interpret it (e.g. osv-scanner v2 rev'd its
882+
# schema, truncated output, mixed text+JSON). This
883+
# is a third state distinct from "execution failed"
884+
# and "ran clean" — we surface it as
885+
# ``parse_failed`` so the reporter can show "OSV
886+
# produced 12KB of output we couldn't parse" rather
887+
# than the misleading "no output produced". The
888+
# parser bug doesn't crash the rest of the scan;
889+
# other scanners' results are still useful.
890+
head = ""
891+
try:
892+
head = result_files[0].read_text(
893+
encoding="utf-8", errors="replace",
894+
)[:200]
895+
except OSError:
896+
head = "<unreadable>"
897+
metadata_extra["parse_failed"] = True
898+
metadata_extra["parse_failure_reason"] = (
899+
f"{type(exc).__name__}: {exc}. "
900+
f"output head: {head!r}"
901+
)
902+
logger.warning(
903+
"Scanner '%s' produced output but parse failed: %s",
904+
scanner.name, exc,
905+
)
906+
findings = []
879907
else:
880-
findings = parsed
881-
logger.debug(
882-
"Parsed %d finding(s) from %s",
883-
len(findings),
884-
result_files[0].name,
885-
)
908+
# parse_results may return either a list of Findings,
909+
# a ``(list, int)`` tuple (legacy passed_count channel,
910+
# used by linters), or a ``(list, dict)`` tuple (extra
911+
# metadata merged into ScanResult.metadata — used by
912+
# Grype to flag "source.target=unknown" which means
913+
# "couldn't identify packages" rather than "nothing
914+
# vulnerable").
915+
if isinstance(parsed, tuple):
916+
findings, extra = parsed
917+
if isinstance(extra, int):
918+
metadata_extra["passed_count"] = extra
919+
elif isinstance(extra, dict):
920+
metadata_extra.update(extra)
921+
# Warn at the engine layer too so the signal is
922+
# visible even when a reporter doesn't render
923+
# per-scanner metadata.
924+
if "warning" in extra:
925+
logger.warning(
926+
"Scanner '%s': %s",
927+
scanner.name, extra["warning"],
928+
)
929+
else:
930+
findings = parsed
931+
logger.debug(
932+
"Parsed %d finding(s) from %s",
933+
len(findings),
934+
result_files[0].name,
935+
)
886936

887937
return ScanResult(
888938
scanner=scanner.name,

0 commit comments

Comments
 (0)