Skip to content

refactor(scanners): unify tool_version + scan + build_args into a single SDK pattern#117

Merged
eFAILution merged 3 commits intofeat/argus-portabilityfrom
refactor/sdk-simplicity-quick-wins
May 6, 2026
Merged

refactor(scanners): unify tool_version + scan + build_args into a single SDK pattern#117
eFAILution merged 3 commits intofeat/argus-portabilityfrom
refactor/sdk-simplicity-quick-wins

Conversation

@eFAILution
Copy link
Copy Markdown
Collaborator

@eFAILution eFAILution commented May 6, 2026

Description

Establish — and validate end-to-end — a single SDK pattern for adding/removing/swapping security tools. Both humans and AI contributors land on the same shape.

The complexity audit run before this PR identified scanner duplication as the highest-leverage simplification target: every scanner rolled its own ~17-line tool_version(), ~30-line scan(), and a parallel _build_command() (local) + container_args() (container) pair. Those parallel methods had measurably drifted — OSV's _build_command was on the v1 --sbom flag while container_args used v2 -L; opengrep's local path used --output while the container path emitted to --output-file; bandit's exit-code semantics differed. The dual structure made every drift invisible in code review.

This PR collapses all three shapes into one pattern, applied across every scanner that fits, with documentation updates so the next contributor sees the pattern as the documented shape — not a thing they have to grep for.

Changes Made

  • Added new scanner/workflow
  • Modified existing scanner/workflow
  • Updated documentation
  • Fixed bug (drive-by; see below)
  • Other (please specify)

Details

Three new SDK helpers (replacing per-scanner duplication):

Helper Replaces What
argus.core.version.parse_tool_version(cmd, regex) ~17-line subprocess+regex+exception block per scanner Runs <tool> --version, captures via regex, narrowly handles missing-binary / timeout / OSError
argus.core.scanner_template.run_subprocess_scan(scanner, path, config) ~30-line tempdir + subprocess + error-handling boilerplate per scanner's scan() Single scan() body becomes one line: return run_subprocess_scan(self, path, config)
argus.core.scanner_template.ScanPaths(workspace, output) parallel _build_command(path, output, config) + container_args(config) methods Frozen value object passed to a single build_args(paths, config) method that returns the FULL argv. Engine drops argv[0] automatically when the class declares container_entrypoint = "<bin>"

Each scanner now has the same shape:

class MyScanner:
    name = "my-scanner"
    container_image = get_image("my-scanner")
    container_entrypoint = "my-tool"  # if image has ENTRYPOINT

    def scan(self, path, config=None) -> ScanResult:
        return run_subprocess_scan(self, path, config)

    def build_args(self, paths: ScanPaths, config: dict) -> list[str]:
        return ["my-tool", "scan", paths.workspace,
                "--format", "json", "--output", paths.output]

    def is_available(self): ...
    def install_command(self): ...
    def tool_version(self):
        return parse_tool_version(["my-tool", "--version"], r"^my-tool (\\\\S+)")
    def parse_results(self, output_path): ...

Migrated scanners: bandit, gitleaks, opengrep, osv, trivy_iac, checkov, clamav (tool_version only), grype (tool_version only), supply_chain (tool_version only), hadolint (tool_version only). The seven that fit the full template each lost ~100 lines of boilerplate.

Engine bridge: _run_in_container calls scanner.build_args(ScanPaths(workspace=\"/workspace\", output=\"/output/results.json\"), config) when present, falling back to the legacy container_args(config) for the four scanners that don't fit the template (clamav text output, container orchestrator, zap Docker-only, trivy/grype standalone SBOM modes, supply_chain multi-tool). The fallback branch will be removed in a follow-up once those scanners are dealt with case-by-case — they all have structurally-different flows that don't fit a single subprocess template.

Drive-by simplifications discovered during the same complexity review:

  • argus.core.models.Severity — replace 4 hand-rolled comparison methods with @functools.total_ordering + __lt__ (-16 lines)
  • argus.init._check_local_readiness — narrow except Exception to except ImportError. The broad catch hid bugs in the readiness logic itself; per ADR-016 those should surface. Test updated to assert the new loud-on-real-bug posture
  • argus.init.run_init banner — drop the per-line time.sleep scroll effect (~1-2s of cosmetic delay on every interactive argus init)
  • OSV v1/v2 drift: local path now uses osv-scanner v2 syntax (-L plus --output-file) consistently with the container path, instead of the deprecated v1 --sbom and --output flags it had drifted to. Exactly the kind of bug the unification fixes.

Net diff: 32 files changed, +1060 / -759 (the +1060 is mostly the two new helper modules + their 27 unit tests; net code reduction across the migrated scanners is ~300 lines).

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • Tested with different scanner combinations

Test Results

  • New helpers (full coverage of edge cases):
    • parse_tool_version: 13 unit tests — happy paths (capture group, multiline, stderr fallback, optional v-prefix, custom group index, compiled-regex input) + failure paths (no match, missing binary, timeout, OSError, empty output, unexpected exception propagates)
    • run_subprocess_scan + ScanPaths: 14 unit tests — happy paths (write findings, pass workspace through, propagate config, unpack passed_count tuple, unpack metadata-dict tuple, fall back to stdout, override output filename) + failure paths (missing binary, timeout, no output + no stdout, unexpected exception propagates)
  • Full SDK suite: 1497 passed, 8 skipped (was 1469 before this PR; +28 net from new helper tests minus 0 regressions)
  • Pre-push action suite: clean (validated via the husky pre-push hook running pytest)
  • Pre-commit hooks: all green (yamllint, codespell, hardcoded-secrets, etc.)

Security Considerations

  • No security impact
  • Security enhancement
  • Potential security implications (explain below)

Security Details

The exception-narrowing changes (init._check_local_readiness, tool_version() clauses, run_subprocess_scan failure handling) move the codebase toward ADR-016's loud-on-failure posture by no longer swallowing arbitrary exceptions. Only narrowly-named expected categories (missing binary, timeout, OS error) get translated to None / error-metadata; everything else propagates. No scanner output processing is touched.

AI Context Updates (.ai/)

  • .ai/architecture.yaml updated (if components/structure changed)
  • .ai/workflows.yaml updated (if commands/tasks changed)
  • .ai/decisions.yaml updated (if design decision made)
  • .ai/errors.yaml updated (if common error addressed)
  • N/A - No .ai/ updates needed

.ai/workflows.yaml::add_new_scanner now lists scan_template, build_args, tool_version, and secret_redaction under a helpers: block so an AI agent reading the workflow sees all three pattern entry points at once.

Checklist

  • Code follows project style guidelines
  • Documentation updated (CONTRIBUTING.md scanner-template rewritten + protocol table updated; .ai/workflows.yaml updated)
  • Changelog updated (if applicable)
  • All tests pass
  • Reviewed by at least one maintainer
  • Reviewed CONTRIBUTING.md guidelines

For New Scanners/Actions (if applicable)

  • N/A — no new scanners/actions in this PR

Follow-ups (not blocking)

The 4 scanners that don't fit the subprocess template all have structurally-different flows; they need bespoke treatment, not a forced abstraction:

  • clamav — text output, no -o flag (parses stdout)
  • container — orchestrator, runs trivy + grype + syft as a fan-out
  • zap — Docker-only, no local mode
  • trivy / grype — standalone SBOM-mode wrappers; their main vulnerability scanning lives inside the container orchestrator
  • supply_chain — multi-tool (zizmor + actionlint), custom orchestration

Once those are addressed (which may mean: keep custom scan(), but still adopt build_args(ScanPaths) for argv consistency), the engine's legacy container_args fallback branch can be deleted.

Each scanner previously rolled its own ~17-line tool_version() that ran
<tool> --version, swallowed all exceptions, and parsed the output with
ad-hoc string slicing. The parsers had drifted (some checked first
line, some last; some stripped a leading v, some didn't), the
exception lists had drifted (except (TimeoutExpired, FileNotFoundError,
Exception) — where Exception already covers the others), and the
timeouts had drifted between 5s and 10s for no clear reason.

Extract argus.core.version.parse_tool_version(cmd, regex, *, group=1,
timeout=5.0) which:
- Runs the subprocess and narrowly handles missing-binary, timeout,
  and OSError (returning None). ADR-016: bugs in subprocess.run
  itself surface, not silently translate to None.
- Applies the regex with re.MULTILINE so anchors like ^ behave
  predictably across multi-line tool banners.
- Falls back to stderr when stdout is empty (some Java tools).

Each scanner's tool_version() shrinks to a 1-line return:
  return parse_tool_version(["bandit", "--version"], r"^bandit (\S+)")

Refactored: bandit, clamav, gitleaks, opengrep, trivy, trivy_iac, osv,
supply_chain, checkov, hadolint. Grype keeps custom JSON parsing
(grype version -o json doesn't fit a single regex) but tightens its
exception clause from `except (TimeoutExpired, JSONDecodeError,
Exception)` to the narrowly-named exception types.

Documented in CONTRIBUTING.md (Adding a Scanner via the SDK) and
.ai/workflows.yaml (add_new_scanner) so the next contributor — human
or AI — sees the helper as the documented pattern, not just a thing
they can grep for.

Drive-by simplifications discovered during the same complexity
review:
- argus.core.models.Severity: replace 4 hand-rolled comparison
  methods with @functools.total_ordering + __lt__ (-16 lines).
- argus.init._check_local_readiness: narrow `except Exception` to
  `except ImportError`. The broad catch hid bugs in the readiness
  logic itself behind a generic "no readiness shown" display; per
  ADR-016 those should surface. Test updated to assert the new
  loud-on-real-bug posture.
- argus.init.run_init banner: drop the per-line time.sleep scroll
  effect (~1-2s of cosmetic delay on every interactive `argus init`).
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

🔒 Argus Container Security Scan

Branch: refactor/sdk-simplicity-quick-wins
Commit: 3836b25

📊 Combined Findings Summary

🚨 Critical ⚠️ High 🟡 Medium 🔵 Low 📦 Total 🔢 Unique
1 22 61 64 148 148

Scanned: 4 containers | Build Failures: 0

📦 Container Breakdown

Container Image 🚨 Crit ⚠️ High 🟡 Med 🔵 Low Total Unique Status
cli ghcr.io/huntridge-labs/argus/cli:3836b25471d0ebfe92f71965214c9efa72e67297 1 11 15 1 28 28
scanner-bandit ghcr.io/huntridge-labs/argus/scanner-bandit:3836b25471d0ebfe92f71965214c9efa72e67297 0 0 1 0 1 1
scanner-opengrep ghcr.io/huntridge-labs/argus/scanner-opengrep:3836b25471d0ebfe92f71965214c9efa72e67297 0 7 41 63 111 111
scanner-supply-chain ghcr.io/huntridge-labs/argus/scanner-supply-chain:3836b25471d0ebfe92f71965214c9efa72e67297 0 4 4 0 8 8

🔍 Detailed Findings by Container

🚨 cli - 28 vulnerabilities (22 unique)

Image: ghcr.io/huntridge-labs/argus/cli:3836b25471d0ebfe92f71965214c9efa72e67297

Combined (Deduplicated)

🚨 Critical ⚠️ High 🟡 Medium 🔵 Low Total Unique
1 11 15 1 28 22
🔷 Trivy Scanner (28 findings, 22 unique)
CVE Severity Package Version Fixed
CVE-2025-68121 🚨 CRITICAL stdlib v1.24.11 1.24.13, 1.25.7, 1.26.0-rc.3
CVE-2026-32280 ⚠️ HIGH stdlib v1.26.1 1.25.9, 1.26.2
CVE-2026-32281 ⚠️ HIGH stdlib v1.26.1 1.25.9, 1.26.2
CVE-2026-32283 ⚠️ HIGH stdlib v1.26.1 1.25.9, 1.26.2
CVE-2026-33810 ⚠️ HIGH stdlib v1.26.1 1.26.2
CVE-2025-61726 ⚠️ HIGH stdlib v1.24.11 1.24.12, 1.25.6
CVE-2025-61728 ⚠️ HIGH stdlib v1.24.11 1.24.12, 1.25.6
CVE-2026-25679 ⚠️ HIGH stdlib v1.24.11 1.25.8, 1.26.1
CVE-2026-32280 ⚠️ HIGH stdlib v1.24.11 1.25.9, 1.26.2
CVE-2026-32281 ⚠️ HIGH stdlib v1.24.11 1.25.9, 1.26.2
CVE-2026-32283 ⚠️ HIGH stdlib v1.24.11 1.25.9, 1.26.2
CVE-2026-34040 ⚠️ HIGH github.com/docker/docker v28.5.2+incompatible 29.3.1
CVE-2026-3219 🟡 MEDIUM pip 26.0.1 N/A
CVE-2026-32282 🟡 MEDIUM stdlib v1.26.1 1.25.9, 1.26.2
CVE-2026-32288 🟡 MEDIUM stdlib v1.26.1 1.25.9, 1.26.2
CVE-2026-32289 🟡 MEDIUM stdlib v1.26.1 1.25.9, 1.26.2
CVE-2025-11579 🟡 MEDIUM github.com/nwaples/rardecode/v2 v2.1.0 2.2.0
CVE-2025-58058 🟡 MEDIUM github.com/ulikunitz/xz v0.5.12 0.5.15
CVE-2025-47914 🟡 MEDIUM golang.org/x/crypto v0.35.0 0.45.0
CVE-2025-58181 🟡 MEDIUM golang.org/x/crypto v0.35.0 0.45.0
CVE-2025-61730 🟡 MEDIUM stdlib v1.24.11 1.24.12, 1.25.6
CVE-2026-27142 🟡 MEDIUM stdlib v1.24.11 1.25.8, 1.26.1
CVE-2026-32282 🟡 MEDIUM stdlib v1.24.11 1.25.9, 1.26.2
CVE-2026-32288 🟡 MEDIUM stdlib v1.24.11 1.25.9, 1.26.2
CVE-2026-32289 🟡 MEDIUM stdlib v1.24.11 1.25.9, 1.26.2
CVE-2026-33997 🟡 MEDIUM github.com/docker/docker v28.5.2+incompatible 29.3.1
CVE-2026-41506 🟡 MEDIUM github.com/go-git/go-git/v5 v5.17.2 5.18.0
CVE-2026-27139 🔵 LOW stdlib v1.24.11 1.25.8, 1.26.1
⚓ Grype Scanner (0 findings, 0 unique)

✅ No vulnerabilities detected by Grype

🟡 scanner-bandit - 1 vulnerabilities (1 unique)

Image: ghcr.io/huntridge-labs/argus/scanner-bandit:3836b25471d0ebfe92f71965214c9efa72e67297

Combined (Deduplicated)

🚨 Critical ⚠️ High 🟡 Medium 🔵 Low Total Unique
0 0 1 0 1 1
🔷 Trivy Scanner (1 findings, 1 unique)
CVE Severity Package Version Fixed
CVE-2026-3219 🟡 MEDIUM pip 26.0.1 N/A
⚓ Grype Scanner (0 findings, 0 unique)

✅ No vulnerabilities detected by Grype

⚠️ scanner-opengrep - 113 vulnerabilities (49 unique)

Image: ghcr.io/huntridge-labs/argus/scanner-opengrep:3836b25471d0ebfe92f71965214c9efa72e67297

Combined (Deduplicated)

🚨 Critical ⚠️ High 🟡 Medium 🔵 Low Total Unique
0 7 41 63 113 49
🔷 Trivy Scanner (113 findings, 48 unique)
CVE Severity Package Version Fixed
CVE-2026-4878 ⚠️ HIGH libcap2 1:2.75-10+b8 N/A
CVE-2025-69720 ⚠️ HIGH libncursesw6 6.5+20250216-2 N/A
CVE-2026-29111 ⚠️ HIGH libsystemd0 257.9-1~deb13u1 N/A
CVE-2025-69720 ⚠️ HIGH libtinfo6 6.5+20250216-2 N/A
CVE-2026-29111 ⚠️ HIGH libudev1 257.9-1~deb13u1 N/A
CVE-2025-69720 ⚠️ HIGH ncurses-base 6.5+20250216-2 N/A
CVE-2025-69720 ⚠️ HIGH ncurses-bin 6.5+20250216-2 N/A
CVE-2026-27456 🟡 MEDIUM bsdutils 1:2.41-5 N/A
CVE-2026-3184 🟡 MEDIUM bsdutils 1:2.41-5 N/A
CVE-2026-27456 🟡 MEDIUM libblkid1 2.41-5 N/A
CVE-2026-3184 🟡 MEDIUM libblkid1 2.41-5 N/A
CVE-2026-4046 🟡 MEDIUM libc-bin 2.41-12+deb13u2 N/A
CVE-2026-4437 🟡 MEDIUM libc-bin 2.41-12+deb13u2 N/A
CVE-2026-4438 🟡 MEDIUM libc-bin 2.41-12+deb13u2 N/A
CVE-2026-5435 🟡 MEDIUM libc-bin 2.41-12+deb13u2 N/A
CVE-2026-5450 🟡 MEDIUM libc-bin 2.41-12+deb13u2 N/A
CVE-2026-5928 🟡 MEDIUM libc-bin 2.41-12+deb13u2 N/A
CVE-2026-4046 🟡 MEDIUM libc6 2.41-12+deb13u2 N/A
CVE-2026-4437 🟡 MEDIUM libc6 2.41-12+deb13u2 N/A
CVE-2026-4438 🟡 MEDIUM libc6 2.41-12+deb13u2 N/A
CVE-2026-5435 🟡 MEDIUM libc6 2.41-12+deb13u2 N/A
CVE-2026-5450 🟡 MEDIUM libc6 2.41-12+deb13u2 N/A
CVE-2026-5928 🟡 MEDIUM libc6 2.41-12+deb13u2 N/A
CVE-2026-27456 🟡 MEDIUM liblastlog2-2 2.41-5 N/A
CVE-2026-3184 🟡 MEDIUM liblastlog2-2 2.41-5 N/A
CVE-2026-34743 🟡 MEDIUM liblzma5 5.8.1-1 N/A
CVE-2026-27456 🟡 MEDIUM libmount1 2.41-5 N/A
CVE-2026-3184 🟡 MEDIUM libmount1 2.41-5 N/A
CVE-2026-27456 🟡 MEDIUM libsmartcols1 2.41-5 N/A
CVE-2026-3184 🟡 MEDIUM libsmartcols1 2.41-5 N/A
CVE-2026-40225 🟡 MEDIUM libsystemd0 257.9-1~deb13u1 N/A
CVE-2026-40226 🟡 MEDIUM libsystemd0 257.9-1~deb13u1 N/A
CVE-2026-4105 🟡 MEDIUM libsystemd0 257.9-1~deb13u1 N/A
CVE-2026-40225 🟡 MEDIUM libudev1 257.9-1~deb13u1 N/A
CVE-2026-40226 🟡 MEDIUM libudev1 257.9-1~deb13u1 N/A
CVE-2026-4105 🟡 MEDIUM libudev1 257.9-1~deb13u1 N/A
CVE-2026-27456 🟡 MEDIUM libuuid1 2.41-5 N/A
CVE-2026-3184 🟡 MEDIUM libuuid1 2.41-5 N/A
CVE-2026-27456 🟡 MEDIUM login 1:4.16.0-2+really2.41-5 N/A
CVE-2026-3184 🟡 MEDIUM login 1:4.16.0-2+really2.41-5 N/A
CVE-2026-27456 🟡 MEDIUM mount 2.41-5 N/A
CVE-2026-3184 🟡 MEDIUM mount 2.41-5 N/A
CVE-2026-5958 🟡 MEDIUM sed 4.9-2 N/A
CVE-2026-5704 🟡 MEDIUM tar 1.35+dfsg-3.1 N/A
CVE-2026-27456 🟡 MEDIUM util-linux 2.41-5 N/A
CVE-2026-3184 🟡 MEDIUM util-linux 2.41-5 N/A
CVE-2026-27171 🟡 MEDIUM zlib1g 1:1.3.dfsg+really1.3.1-1+b1 N/A
CVE-2026-3219 🟡 MEDIUM pip 26.0.1 N/A
CVE-2011-3374 🔵 LOW apt 3.0.3 N/A
TEMP-0841856-B18BAF 🔵 LOW bash 5.2.37-2+b8 N/A

...and 63 more

⚓ Grype Scanner (0 findings, 0 unique)

✅ No vulnerabilities detected by Grype

⚠️ scanner-supply-chain - 8 vulnerabilities (8 unique)

Image: ghcr.io/huntridge-labs/argus/scanner-supply-chain:3836b25471d0ebfe92f71965214c9efa72e67297

Combined (Deduplicated)

🚨 Critical ⚠️ High 🟡 Medium 🔵 Low Total Unique
0 4 4 0 8 8
🔷 Trivy Scanner (8 findings, 8 unique)
CVE Severity Package Version Fixed
CVE-2026-32280 ⚠️ HIGH stdlib v1.26.1 1.25.9, 1.26.2
CVE-2026-32281 ⚠️ HIGH stdlib v1.26.1 1.25.9, 1.26.2
CVE-2026-32283 ⚠️ HIGH stdlib v1.26.1 1.25.9, 1.26.2
CVE-2026-33810 ⚠️ HIGH stdlib v1.26.1 1.26.2
CVE-2026-3219 🟡 MEDIUM pip 26.0.1 N/A
CVE-2026-32282 🟡 MEDIUM stdlib v1.26.1 1.25.9, 1.26.2
CVE-2026-32288 🟡 MEDIUM stdlib v1.26.1 1.25.9, 1.26.2
CVE-2026-32289 🟡 MEDIUM stdlib v1.26.1 1.25.9, 1.26.2
⚓ Grype Scanner (0 findings, 0 unique)

✅ No vulnerabilities detected by Grype


Generated by Argus

Establish the second half of the SDK scanner pattern. Each scanner
previously rolled three near-identical, drift-prone shapes:

1. ``scan()`` — 30-line tempdir + subprocess + returncode-check +
   output-file-check + parse + ScanResult-build boilerplate.
2. ``_build_command(path, output, config)`` — local-execution argv.
3. ``container_args(config)`` — container-execution argv, hand-mirrored
   from _build_command.

The two argv methods had drifted measurably — OSV's _build_command was
on the v1 ``--sbom`` flag while container_args used v2 ``-L``;
opengrep's local path used ``--output`` while the container path
emitted to ``--output-file``; bandit's exit-code semantics differed.
The dual structure made every drift invisible in code review.

This commit adds two pieces of scaffolding:

* ``argus.core.scanner_template.ScanPaths`` — frozen dataclass with
  ``workspace`` and ``output``. Local execution sets host paths;
  container execution sets ``/workspace`` and ``/output/<file>``. The
  scanner doesn't care which.
* ``argus.core.scanner_template.run_subprocess_scan(scanner, path,
  config)`` — runs ``scanner.build_args(paths, config)`` in a tempdir,
  reads the output file, builds a ScanResult. Falls back to stdout
  capture when no output file is produced. Narrowly handles missing
  binaries / timeouts / OSError per ADR-016 — bugs in
  parse_results propagate.

The scanner protocol grows one method:

    def build_args(self, paths: ScanPaths, config: dict) -> list[str]:
        return ["my-tool", "scan", paths.workspace,
                "--format", "json",
                "--output", paths.output]

…which replaces the parallel _build_command + container_args pair.
Single source of truth for argv. The engine drops argv[0] when the
class declares ``container_entrypoint = "<bin>"`` (the image has
``ENTRYPOINT``), so the same method works for both execution paths.

Migrated to the new pattern: bandit, gitleaks, opengrep, trivy_iac,
checkov, osv. Each scanner's body shrinks by ~100 lines.

Engine ``_run_in_container`` builds a ScanPaths and calls
``scanner.build_args(paths, config)`` when present, falling back to
the legacy ``container_args(config)`` method only for scanners not
yet migrated (clamav, container, zap, trivy, grype, supply_chain).
The legacy branch will be dropped in a follow-up once the remaining
scanners — which all have structurally-different flows (text output,
multi-tool orchestration, Docker-only) — are dealt with on a
case-by-case basis.

Documentation updated:
* CONTRIBUTING.md "Adding a Scanner via the SDK" rewritten with the
  new ~50-line example scanner module and an updated protocol table.
  Explicitly calls out when to skip the template (grype, clamav,
  supply_chain shapes).
* .ai/workflows.yaml::add_new_scanner gains a ``scan_template`` and
  ``build_args`` entry under the helpers block so the next
  contributor — human or AI — sees both helpers as the documented
  pattern.

Side fix: OSV's local path now uses osv-scanner v2 syntax (``-L``
plus ``--output-file``) consistently with the container path, instead
of the deprecated v1 ``--sbom`` and ``--output`` flags it had drifted
to.

Tests: 1497 passed (was 1495 before this commit; +14 template-helper
tests, -2 from removing dead-method tests, +0 from net OSV migration).
Total net: -193 lines across 12 files.
@eFAILution eFAILution changed the title refactor(scanners): unify tool_version() via parse_tool_version helper refactor(scanners): unify tool_version + scan + build_args into a single SDK pattern May 6, 2026
Add a ``containers:`` block listing argus's own four images
(scanner-bandit, scanner-opengrep, scanner-supply-chain, cli) with
their Dockerfile paths so ``argus scan container --config argus.yml``
runs the full trivy + grype + syft sweep without needing CLI flags.

Validated end-to-end against the local checkout:
  - argus scan --config argus.yml: 6 scanners, 163 findings (mostly LOW
    B404/B603 subprocess warnings, by design for a security tool that
    wraps subprocess CLIs).
  - argus scan container --config argus.yml: 4 images built locally,
    230s wall, 141 total findings (98 unique).

Sets up the .github/workflows/build-containers.yml simplification
follow-up where the scan job's hand-rolled trivy-action + grype-action
+ inline Python glue (~80 lines) collapses to a single
``argus scan container --config argus.yml`` invocation, with argus.yml
as the single source of truth for the image list.
@eFAILution eFAILution merged commit 4f03e7b into feat/argus-portability May 6, 2026
21 checks passed
@eFAILution eFAILution deleted the refactor/sdk-simplicity-quick-wins branch May 6, 2026 02:20
eFAILution added a commit that referenced this pull request May 6, 2026
Scanners with custom ``scan()`` flows that don't fit the standard
``build_args(ScanPaths) -> list[str]`` contract (linters that walk
the workspace and invoke their tool per file) used to AttributeError
inside ``_run_in_container`` when the engine routed them through the
container path. Combined with PR #117's silent-drop loophole, that
made ``lint-dockerfile`` disappear from canonical results entirely
when hadolint was not installed locally.

Engine change: in ``_run_scanner``, the auto/docker branch now
checks for ``build_args`` or ``container_args`` before entering
``_run_in_container``. When neither is present:

* backend=auto: log a debug message and fall through to the local
  path (which calls ``scanner.scan(path, config)`` directly).
* backend=docker: raise a clear RuntimeError naming the constraint
  ("scanner has container_image but no build_args/container_args
  method") so users know to either implement build_args or relax
  the backend.

HadolintLinter cleanup: collapse the per-Dockerfile subprocess loop
into a single ``hadolint --format json file1 file2 ...`` invocation.
Hadolint accepts multiple paths and emits one combined JSON array
with each finding's source ``file`` field intact, so the parser
still produces correct ``location`` strings without threading the
path back through the caller. Drops one process spawn per Dockerfile
on every scan.

Roadmap: docs/developer/SDK-ROADMAP.md adds a FileDiscoveryScanner
template entry under Known Issues. The engine fallback gives every
linter a working escape hatch today, but the real abstraction would
be a base class that handles workspace walks, file globbing,
container vs local routing, and batched invocation centrally so the
six existing linters stop reimplementing it. Deferred until a
second linter contributor copy-pastes the boilerplate; trigger to
revisit documented inline.

Two new regression tests in ``TestDockerExecutionBackend``:
test_auto_backend_defers_to_scan_when_no_build_args (the
lint-dockerfile fix path) and
test_docker_backend_rejects_scanner_without_build_args (the loud
error when the user opted into container-only). 1519 SDK tests pass.
eFAILution added a commit that referenced this pull request May 6, 2026
Scanners with custom ``scan()`` flows that don't fit the standard
``build_args(ScanPaths) -> list[str]`` contract (linters that walk
the workspace and invoke their tool per file) used to AttributeError
inside ``_run_in_container`` when the engine routed them through the
container path. Combined with PR #117's silent-drop loophole, that
made ``lint-dockerfile`` disappear from canonical results entirely
when hadolint was not installed locally.

Engine change: in ``_run_scanner``, the auto/docker branch now
checks for ``build_args`` or ``container_args`` before entering
``_run_in_container``. When neither is present:

* backend=auto: log a debug message and fall through to the local
  path (which calls ``scanner.scan(path, config)`` directly).
* backend=docker: raise a clear RuntimeError naming the constraint
  ("scanner has container_image but no build_args/container_args
  method") so users know to either implement build_args or relax
  the backend.

HadolintLinter cleanup: collapse the per-Dockerfile subprocess loop
into a single ``hadolint --format json file1 file2 ...`` invocation.
Hadolint accepts multiple paths and emits one combined JSON array
with each finding's source ``file`` field intact, so the parser
still produces correct ``location`` strings without threading the
path back through the caller. Drops one process spawn per Dockerfile
on every scan.

Roadmap: docs/developer/SDK-ROADMAP.md adds a FileDiscoveryScanner
template entry under Known Issues. The engine fallback gives every
linter a working escape hatch today, but the real abstraction would
be a base class that handles workspace walks, file globbing,
container vs local routing, and batched invocation centrally so the
six existing linters stop reimplementing it. Deferred until a
second linter contributor copy-pastes the boilerplate; trigger to
revisit documented inline.

Two new regression tests in ``TestDockerExecutionBackend``:
test_auto_backend_defers_to_scan_when_no_build_args (the
lint-dockerfile fix path) and
test_docker_backend_rejects_scanner_without_build_args (the loud
error when the user opted into container-only). 1519 SDK tests pass.
eFAILution added a commit that referenced this pull request May 6, 2026
* fix(engine): defer to scanner.scan when build_args is missing

Scanners with custom ``scan()`` flows that don't fit the standard
``build_args(ScanPaths) -> list[str]`` contract (linters that walk
the workspace and invoke their tool per file) used to AttributeError
inside ``_run_in_container`` when the engine routed them through the
container path. Combined with PR #117's silent-drop loophole, that
made ``lint-dockerfile`` disappear from canonical results entirely
when hadolint was not installed locally.

Engine change: in ``_run_scanner``, the auto/docker branch now
checks for ``build_args`` or ``container_args`` before entering
``_run_in_container``. When neither is present:

* backend=auto: log a debug message and fall through to the local
  path (which calls ``scanner.scan(path, config)`` directly).
* backend=docker: raise a clear RuntimeError naming the constraint
  ("scanner has container_image but no build_args/container_args
  method") so users know to either implement build_args or relax
  the backend.

HadolintLinter cleanup: collapse the per-Dockerfile subprocess loop
into a single ``hadolint --format json file1 file2 ...`` invocation.
Hadolint accepts multiple paths and emits one combined JSON array
with each finding's source ``file`` field intact, so the parser
still produces correct ``location`` strings without threading the
path back through the caller. Drops one process spawn per Dockerfile
on every scan.

Roadmap: docs/developer/SDK-ROADMAP.md adds a FileDiscoveryScanner
template entry under Known Issues. The engine fallback gives every
linter a working escape hatch today, but the real abstraction would
be a base class that handles workspace walks, file globbing,
container vs local routing, and batched invocation centrally so the
six existing linters stop reimplementing it. Deferred until a
second linter contributor copy-pastes the boilerplate; trigger to
revisit documented inline.

Two new regression tests in ``TestDockerExecutionBackend``:
test_auto_backend_defers_to_scan_when_no_build_args (the
lint-dockerfile fix path) and
test_docker_backend_rejects_scanner_without_build_args (the loud
error when the user opted into container-only). 1519 SDK tests pass.

* feat(linter): make HadolintLinter docker-aware so lint-dockerfile works without local hadolint

The original framing in PR #120 was that ``lint-dockerfile`` requires
the user to install hadolint locally OR wait for the
FileDiscoveryScanner template. The user pushed back — argus has the
official ``hadolint/hadolint:v2.14.0`` Docker image declared on the
linter, so the engine should be using it instead of complaining the
local binary is missing.

Two changes make that work:

1. Engine: in the auto/no-build_args defer path, hand off to
   ``scanner.scan(path, config)`` unconditionally instead of falling
   through to the is_available() gate. Scanners without build_args
   are signaling that they own dispatch internally — including the
   choice between local execution and the docker fallback. The
   is_available() gate was preventing scan() from ever being called
   when the local binary was absent, even though scan() could have
   handled it.

2. HadolintLinter.scan(): when ``self.is_available()`` returns False,
   construct a ``docker run`` command against ``self.container_image``
   instead of trying to invoke ``hadolint`` directly. Workspace
   mounts read-only at /workspace; discovered Dockerfile paths get
   translated to their /workspace/... equivalents. Hadolint accepts
   multiple file paths in one invocation, so the batched-call shape
   from the prior commit carries through cleanly.

   Bug along the way: the hadolint image has empty ENTRYPOINT and
   ``CMD = ["/bin/hadolint", "-"]``. Passing args at the end of
   ``docker run`` replaces CMD entirely, so the first arg becomes
   the command. Include the binary name explicitly as the first arg.

Verified end-to-end against this repo's checkout:

    $ argus scan lint-dockerfile --severity-threshold none
    INFO  DL3018  /workspace/docker/Dockerfile.cli:19 - Pin versions...
    INFO  DL4006  /workspace/docker/Dockerfile.cli:29 - Set the SHELL option...
    ... 11 findings across 3 Dockerfiles
    Status: PASS

Real lint findings flowing through, no local install required.

Doesn't ship a unit test for the docker subprocess path because mocking
``shutil.which("docker")`` plus the ``docker run`` invocation reliably
across pytest runs requires more plumbing than the value justifies for
a 25-line method that's verified end-to-end above. The
test_auto_backend_defers_to_scan_when_no_build_args test from this
PR's prior commit covers the engine handoff.

* feat(linters): bundle Python tools, add docker fallback for terraform, swap jshint for eslint

Three coupled changes that close the "linter requires local install"
gap surfaced during the lint-dockerfile validation.

1. Bundle yamllint and flake8 as core dependencies. They are pure
   Python packages, argus is itself Python, so a clean
   ``pip install argus-security`` now leaves both linters runnable
   with no extra setup. Linting is core scan capability rather than
   an optional extra. Native-binary linters (hadolint, terraform,
   tflint) keep their container-fallback path instead.

2. Add docker fallback to TerraformLinter, mirroring the hadolint
   shape from earlier in this PR. terraform fmt + validate run via
   the official ``hashicorp/terraform:1.9.8`` image; tflint via
   ``ghcr.io/terraform-linters/tflint:v0.55.1`` (its official image).
   Workspace mounts read-write because terraform init writes
   ``.terraform/`` plugin state that validate then reads. Drive-by:
   tool_version() switches to ``parse_tool_version`` for consistency
   with the rest of the SDK.

3. Replace jshint with eslint. jshint is niche, last meaningful
   release was 2022, and there is no trusted official container —
   eslint is the de-facto JavaScript / TypeScript linter, has
   active maintenance, and the ``pipelinecomponents/eslint`` image
   keeps argus out of the business of maintaining a Node container.
   The new EslintLinter handles the no-config case gracefully (info
   row, not a failure) since a project without an eslint.config.js
   isn't an error worth crashing on. Also extends the language list
   to ``javascript, typescript`` and adds ESLint config detection
   to argus init's project-signal walk.

Container manifest grew three entries — ``terraform``, ``tflint``,
``eslint`` — pinned to specific versions so Renovate keeps them
current the same way it tracks the existing tool images. Removed
``argus/linters/jshint.py`` and its registry slot; ``lint-javascript``
now resolves to ``EslintLinter``.

Tests: 1519 passing. End-to-end verified locally — argus scan lint-yaml
runs cleanly via the bundled yamllint package without any "produced no
output" warning.

* fix(containers): pin pipelinecomponents/eslint to :latest, no semver tags exist

CI's manifest check caught that pipelinecomponents/eslint:0.20.0 doesn't
exist — the upstream image tags are commit-SHA based (amd64-6df2a47,
arm64-2b4c228, etc.) plus ``:latest`` and ``:edge``, no semver. Pin
``:latest`` and rely on the renovate.yaml ``pinDigests: true`` rule to
append an immutable ``@sha256:...`` digest on the first Renovate run.
Renovate will then bump the digest on the same 7-day stability lag it
applies to every other tracked image.

Verified locally: ``python -m scripts.ci.check_container_images`` now
reports "All images resolve" with the new entry.

* test(eslint): cover EslintLinter scan() and helpers

Adds 36 unit tests for the new EslintLinter, covering metadata,
config detection across every recognized eslint config filename plus
package.json eslintConfig, local + docker command construction, JSON
message parsing across severity mappings, and the scan() flow under
each branching condition (no config, local eslint, docker fallback,
neither available, JSON parse error, tool-error exit code, clean
empty-output case).

Brings codecov/patch coverage above the threshold for PR #120.

---------

Co-authored-by: eFAILution <eFAILution@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant