Skip to content

Commit b2266bf

Browse files
authored
feat(secrets): credential resolver + zap config-passthrough wiring (#142)
Implements the first slice of ADR-024. Adds a shared credential resolver any scanner can call, two optional engine hooks for passing env vars and bind mounts into containers, and the actual zap config keys that turn `argus.yml` into the source of truth for ZAP tuning. Credential resolver (`argus/core/secrets.py`): - `resolve_secret(config, field)` accepts either `<field>` (literal, back-compat) or `<field>_env` (env-var name reference, preferred). Precedence: stdin_override > _env > literal. Returns None when nothing is configured or the named env var is unset. - `looks_like_literal_secret()` flags vendor-prefixed tokens (gh*, AKIA, AIza, glpat-, sk_live_, npm_, xox*-) at config-load so pasted secrets don't get committed. - `validate_env_var_name()` enforces POSIX shell identifier rules. - Stdlib-only. 42 tests in `argus/tests/core/test_secrets.py`. Schema integration (`argus/core/schema.py`): - `_validate_secret_field()` enforces the either-form contract. Errors on invalid env-var names; warns on vendor-shaped literals and on dual-set fields. - Extends `_SCANNER_KNOWN_KEYS` with the new ZAP keys (api_spec, rules_file, cmd_options, max_duration_minutes, healthcheck_url, app_image_ref, app_ports, auth, *_env credential variants). - `_validate_zap_auth()` walks the nested `scanners.zap.auth.*` block with the same credential rules. 18 tests in `argus/tests/core/test_schema_scanners.py`. Engine hooks (`argus/core/engine.py`): - `container_env(config) -> dict[str, str]` — extra `-e` flags. None values are filtered so unset env-var refs don't leak as `NAME=None`. - `container_mounts(config) -> [(host, container)]` — extra read-only bind mounts. Non-existent host paths are skipped with a WARNING rather than aborting the scan. 5 tests in `argus/tests/test_engine.py`. ZAP scanner (`argus/scanners/zap.py`): - `container_args(config)` resolves the right ZAP script (zap-baseline.py / zap-full-scan.py / zap-api-scan.py) from `scan_type` and `api_spec`, then assembles -t / -J / -c / -n / -T flags and appends `cmd_options` verbatim. - `container_env(config)` exports `ZAP_REGISTRY_USERNAME` / `_PASSWORD` for registry auth and `ZAP_AUTH_USERNAME` / `_PASSWORD` for web-app auth (ZAP context-file placeholders substitute these natively). - `container_mounts(config)` binds `rules_file` and `auth.context_file` into the container at the known `/zap/wrk/` paths ZAP expects. - `scan()` (local-binary path) catches FileNotFoundError when zap-cli is missing rather than letting it escape, and warns when container-only keys are set on the local backend. - 21 tests in `argus/tests/scanners/test_zap.py`. Container scanner (`argus/scanners/container.py`): - `_build_env()` now calls `resolve_secret()` for parity. No surface change — existing literal `registry_username` / `_password` still work; new `*_env` form also works. Docs: - `docs/config-reference.md` — new credential-fields section, ZAP key table, four worked examples (registry auth, baseline, API scan, authenticated scan). - `docs/scanners.md` — new "Configuring ZAP via argus.yml" subsection in the ZAP section. - `argus.example.yml` — commented ZAP tuning + auth block. - `.ai/architecture.yaml` — `core/secrets.py` and engine-hook notes added to the SDK structure (both component copies). - `.ai/errors.yaml` — modernized the "registry auth failed" pattern to recommend `*_env` over literal credentials. Deferred to follow-up PRs (per ADR-024 task list): - Healthcheck poll (needs an engine pre-scan hook) - `--registry-password-stdin` CLI flag - Composite-action trim (.github/actions/scanner-zap surface) - Migration guide Full suite: 3065 passed, 2 skipped. Co-authored-by: eFAILution <eFAILution@users.noreply.github.com>
1 parent cd0b13f commit b2266bf

14 files changed

Lines changed: 1442 additions & 33 deletions

.ai/architecture.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,12 @@ components:
3333
"core/models.py": "Severity enum, Finding/ScanResult/ScanSummary dataclasses"
3434
"core/scanner.py": "Scanner Protocol definition"
3535
"core/config.py": "ArgusConfig loading from argus.yml"
36-
"core/engine.py": "ArgusEngine orchestrating scanners and aggregating results (delegates to core/prewarm.py for background image pre-warm + lazy pulls in parallel mode)"
36+
"core/engine.py": "ArgusEngine orchestrating scanners and aggregating results (delegates to core/prewarm.py for background image pre-warm + lazy pulls in parallel mode). Container path honors three optional Scanner hooks: container_args/build_args (CLI argv), container_env(config)->dict (extra -e flags; None values are filtered so unset env-var-name refs don't leak as 'NAME=None'), and container_mounts(config)->[(host,container)] (extra read-only bind mounts; non-existent host paths are skipped with a WARNING rather than aborting the scan)."
3737
"core/prewarm.py": "ImagePrewarmer — best-effort background container image pulls. Dedup'd by image ref, capped concurrency (execution.prewarm_workers, default 4), opt-out via execution.prewarm_images: false. Skipped when pull_policy=never. Pre-warm failure falls back to inline pull in _run_in_container."
3838
"core/exclusions.py": "Path exclusion set (builtins + .gitignore + config + CLI), ``**``-glob matcher"
3939
"core/tool_config.py": "Auto-discovery of per-scanner canonical config files (.bandit, .checkov.yaml, trivy.yaml, osv-scanner.toml, semgrep.yml)"
4040
"core/sbom.py": "SBOM format detection (CycloneDX JSON/XML, SPDX JSON/tag-value, Syft JSON) for ``argus scan --sbom``"
41+
"core/secrets.py": "Credential resolution for any scanner config field. resolve_secret(config, field) accepts either <field> (literal, warned at config-load if vendor-shaped via looks_like_literal_secret) or <field>_env (env-var name reference; reads os.environ at scan time). validate_env_var_name enforces POSIX shell identifier rules. Stdlib-only. Used by scanners.container and scanners.zap; future scanners with credential needs use the same helper. See ADR-024."
4142
"core/findings_view.py": "Shared UI-free logic for findings display — ViewState, SEVERITY_ORDER, finding_detail_rows, compute_summary, diff_scans (scan-over-scan bucketing keyed off (scanner, id, location)). Consumed by argus view terminal (TUI ``D`` keybind, DiffScreen) and argus view browser (web UI ``/diff`` route)."
4243
"viewers/": "`argus view` interfaces (optional extras)"
4344
"viewers/__init__.py": "ViewerUnavailable shared exception"
@@ -465,11 +466,12 @@ docsite:
465466
"core/models.py": "Severity enum, Finding/ScanResult/ScanSummary dataclasses"
466467
"core/scanner.py": "Scanner Protocol definition"
467468
"core/config.py": "ArgusConfig loading from argus.yml"
468-
"core/engine.py": "ArgusEngine orchestrating scanners and aggregating results (delegates to core/prewarm.py for background image pre-warm + lazy pulls in parallel mode)"
469+
"core/engine.py": "ArgusEngine orchestrating scanners and aggregating results (delegates to core/prewarm.py for background image pre-warm + lazy pulls in parallel mode). Container path honors three optional Scanner hooks: container_args/build_args (CLI argv), container_env(config)->dict (extra -e flags; None values are filtered so unset env-var-name refs don't leak as 'NAME=None'), and container_mounts(config)->[(host,container)] (extra read-only bind mounts; non-existent host paths are skipped with a WARNING rather than aborting the scan)."
469470
"core/prewarm.py": "ImagePrewarmer — best-effort background container image pulls. Dedup'd by image ref, capped concurrency (execution.prewarm_workers, default 4), opt-out via execution.prewarm_images: false. Skipped when pull_policy=never. Pre-warm failure falls back to inline pull in _run_in_container."
470471
"core/exclusions.py": "Path exclusion set (builtins + .gitignore + config + CLI), ``**``-glob matcher"
471472
"core/tool_config.py": "Auto-discovery of per-scanner canonical config files (.bandit, .checkov.yaml, trivy.yaml, osv-scanner.toml, semgrep.yml)"
472473
"core/sbom.py": "SBOM format detection (CycloneDX JSON/XML, SPDX JSON/tag-value, Syft JSON) for ``argus scan --sbom``"
474+
"core/secrets.py": "Credential resolution for any scanner config field. resolve_secret(config, field) accepts either <field> (literal, warned at config-load if vendor-shaped via looks_like_literal_secret) or <field>_env (env-var name reference; reads os.environ at scan time). validate_env_var_name enforces POSIX shell identifier rules. Stdlib-only. Used by scanners.container and scanners.zap; future scanners with credential needs use the same helper. See ADR-024."
473475
"core/findings_view.py": "Shared UI-free logic for findings display — ViewState, SEVERITY_ORDER, finding_detail_rows, compute_summary, diff_scans (scan-over-scan bucketing keyed off (scanner, id, location)). Consumed by argus view terminal (TUI ``D`` keybind, DiffScreen) and argus view browser (web UI ``/diff`` route)."
474476
"viewers/": "`argus view` interfaces (optional extras)"
475477
"viewers/__init__.py": "ViewerUnavailable shared exception"

.ai/errors.yaml

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -146,13 +146,36 @@ error_patterns:
146146

147147
solution:
148148
steps:
149-
- "Add registry credentials to GitHub Secrets"
150-
- "Pass credentials to workflow"
149+
- "Store credentials as environment variables (CI runner env, .envrc, etc.) — never as YAML literals."
150+
- "Reference the env-var *name* from argus.yml using the <field>_env shape; argus.core.secrets.resolve_secret reads os.environ at scan time."
151+
- "For composite-action consumers on GitHub Actions, the runner-side env can still be wired from `${{ secrets.X }}`."
151152
code: |
152-
with:
153-
registry_username: ${{ secrets.REGISTRY_USER }}
154-
registry_password: ${{ secrets.REGISTRY_TOKEN }}
155-
warning: "Never put credentials in config file"
153+
# Preferred — argus.yml
154+
scanners:
155+
container:
156+
registry_username_env: REGISTRY_USER # name of env var
157+
registry_password_env: REGISTRY_TOKEN # name of env var
158+
ci_example: |
159+
# GitHub Actions: export the secret into the runner env
160+
- name: Run argus scan
161+
env:
162+
REGISTRY_USER: ${{ secrets.REGISTRY_USER }}
163+
REGISTRY_TOKEN: ${{ secrets.REGISTRY_TOKEN }}
164+
run: argus scan --config argus.yml
165+
legacy_supported: |
166+
# Still works (warned at config-load if the value matches
167+
# a known vendor secret prefix):
168+
scanners:
169+
container:
170+
registry_username: "user"
171+
registry_password: "literal"
172+
warning: |
173+
Never commit literal credentials to argus.yml. The
174+
<field>_env shape keeps secret values out of VCS entirely;
175+
argus.core.secrets emits a warning if a literal value
176+
matches a known vendor-secret prefix (gh*, AKIA, AIza,
177+
glpat-, etc.) at config-load time so the leak is caught
178+
before the scan runs.
156179
157180
# ==============================================================================
158181
# PR COMMENT ERRORS

argus.example.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,10 +39,30 @@ scanners:
3939
# enabled: false
4040
# image_ref: "myapp:latest"
4141
# scanners: "trivy,grype,syft"
42+
# # Private-registry auth: name an env var, never paste a literal.
43+
# # The runner / shell exports the value; argus reads it at scan time.
44+
# registry_username_env: REGISTRY_USER
45+
# registry_password_env: REGISTRY_TOKEN
4246

4347
# zap:
4448
# enabled: false
4549
# target_url: "http://localhost:3000"
50+
# #
51+
# # Optional tuning (decided in ADR-024; container backend only):
52+
# #
53+
# # scan_type: baseline # baseline | full | api
54+
# # api_spec: "http://localhost:3000/openapi.json" # auto-switches to api scan
55+
# # rules_file: ".zap/rules.tsv" # ZAP ignore rules; mounted into container
56+
# # max_duration_minutes: 30 # hard cap on scan time
57+
# # cmd_options: # appended verbatim after built-in flags
58+
# # - "-z"
59+
# # - "-config view.locale=en_GB"
60+
# #
61+
# # Authenticated scan via ZAP context file (user-authored):
62+
# # auth:
63+
# # context_file: ".zap/context.xml"
64+
# # username_env: ZAP_APP_USER # env-var *name*
65+
# # password_env: ZAP_APP_PASSWORD
4666

4767
reporting:
4868
# ``argus-results.json`` is the canonical scan artifact and is

argus/core/engine.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -832,6 +832,48 @@ def _run_in_container(
832832
docker_cmd.extend(["--entrypoint", entrypoint])
833833
logger.debug("Overriding entrypoint: %s", entrypoint)
834834

835+
# Optional per-scanner env vars — e.g. credentials resolved
836+
# via ``argus.core.secrets.resolve_secret``. Scanners that
837+
# need to pass authentication or runtime parameters to the
838+
# tool implement ``container_env(config) -> dict[str, str]``;
839+
# the engine forwards each entry as ``-e NAME=VALUE``.
840+
# Values are passed by content, not by name-passthrough,
841+
# so unset host env vars don't accidentally inherit.
842+
if hasattr(scanner, "container_env"):
843+
extra_env = scanner.container_env(config or {}) or {}
844+
for env_name, env_value in extra_env.items():
845+
if env_value is None:
846+
continue
847+
docker_cmd.extend(["-e", f"{env_name}={env_value}"])
848+
if extra_env:
849+
logger.debug(
850+
"Scanner '%s' injected %d env var(s) into container",
851+
scanner.name, len([v for v in extra_env.values() if v is not None]),
852+
)
853+
854+
# Optional per-scanner read-only bind mounts — e.g. ZAP
855+
# context files or ignore-rules files. Scanners return a
856+
# list of ``(host_path, container_path)`` tuples from
857+
# ``container_mounts(config)``. Host paths are resolved to
858+
# absolute paths; entries pointing at non-existent files
859+
# are skipped with a warning so a typo'd config doesn't
860+
# take down the whole scan.
861+
if hasattr(scanner, "container_mounts"):
862+
for mount in scanner.container_mounts(config or {}) or []:
863+
host_path, container_path = mount
864+
abs_host = str(Path(host_path).resolve())
865+
if not Path(abs_host).exists():
866+
logger.warning(
867+
"Scanner '%s' requested mount of '%s' but the "
868+
"path does not exist — skipping",
869+
scanner.name, host_path,
870+
)
871+
continue
872+
docker_cmd.extend(["-v", f"{abs_host}:{container_path}:ro"])
873+
logger.debug(
874+
"Scanner mount: %s → %s (ro)", abs_host, container_path,
875+
)
876+
835877
# Prefer the unified ``build_args(ScanPaths)`` shape (single
836878
# source of truth for both local and container CLI args).
837879
# Fall back to legacy ``container_args(config)`` for scanners

argus/core/schema.py

Lines changed: 143 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@
1010
import logging
1111
from typing import Any
1212

13+
from argus.core.secrets import (
14+
looks_like_literal_secret,
15+
validate_env_var_name,
16+
)
17+
1318
logger = logging.getLogger("argus")
1419

1520
# ── Schema definition ────────────────────────────────────────────────
@@ -30,7 +35,30 @@
3035
# Scanner-specific keys that are valid in extra
3136
"image_ref", "target_url", "scanners", "scan_type",
3237
"framework", "check", "skip_check", "config",
38+
# Credential fields (either form: literal or <field>_env)
3339
"registry_username", "registry_password",
40+
"registry_username_env", "registry_password_env",
41+
# ZAP-specific tuning (decided in ADR-024)
42+
"api_spec", "rules_file", "cmd_options",
43+
"max_duration_minutes", "healthcheck_url",
44+
"app_image_ref", "app_ports",
45+
"auth", # nested block; sub-keys validated separately
46+
}
47+
48+
# ZAP web-app auth sub-block keys (under scanners.zap.auth.*)
49+
_ZAP_AUTH_KEYS = {
50+
"context_file",
51+
"username", "username_env",
52+
"password", "password_env",
53+
}
54+
55+
# Credential fields per scanner — drives validate_secret_field rules.
56+
# Each entry is (scanner_name, field_path).
57+
_CREDENTIAL_FIELDS: dict[str, tuple[str, ...]] = {
58+
"container": ("registry_username", "registry_password"),
59+
"zap": ("registry_username", "registry_password"),
60+
# zap.auth.username / zap.auth.password handled separately
61+
# because they live in a nested block.
3462
}
3563

3664
# Known reporting keys
@@ -158,7 +186,36 @@ def _validate_scanner(path: str, data: Any) -> list[ConfigError]:
158186
if "exclude" in data and not isinstance(data["exclude"], str):
159187
errors.append(ConfigError(f"{path}.exclude", "Must be a comma-separated string"))
160188

161-
# Warn on unknown keys
189+
# Credential fields — validate either-form contract (literal or *_env).
190+
# Scanner name is the last path segment ("scanners.<name>").
191+
scanner_name = path.rsplit(".", 1)[-1]
192+
for cred_field in _CREDENTIAL_FIELDS.get(scanner_name, ()):
193+
errors.extend(_validate_secret_field(data, cred_field, path))
194+
195+
# ZAP web-app auth sub-block (nested under scanners.zap.auth.*)
196+
if scanner_name == "zap" and "auth" in data:
197+
errors.extend(_validate_zap_auth(f"{path}.auth", data["auth"]))
198+
199+
# ZAP cmd_options must be a list of strings
200+
if scanner_name == "zap" and "cmd_options" in data:
201+
opts = data["cmd_options"]
202+
if not isinstance(opts, list) or not all(isinstance(o, str) for o in opts):
203+
errors.append(ConfigError(
204+
f"{path}.cmd_options",
205+
"Must be a list of strings (passed verbatim to the ZAP CLI)",
206+
))
207+
208+
# ZAP max_duration_minutes must be a positive int
209+
if scanner_name == "zap" and "max_duration_minutes" in data:
210+
v = data["max_duration_minutes"]
211+
if not isinstance(v, int) or isinstance(v, bool) or v <= 0:
212+
errors.append(ConfigError(
213+
f"{path}.max_duration_minutes",
214+
f"Must be a positive integer, got {v!r}",
215+
))
216+
217+
# Warn on unknown keys (after credential / nested-block handling so
218+
# we don't double-warn on the keys we already validated).
162219
for key in data:
163220
if key not in _SCANNER_KNOWN_KEYS:
164221
errors.append(ConfigError(
@@ -170,6 +227,91 @@ def _validate_scanner(path: str, data: Any) -> list[ConfigError]:
170227
return errors
171228

172229

230+
def _validate_secret_field(
231+
data: dict, field: str, path: str,
232+
) -> list[ConfigError]:
233+
"""Validate a credential field that follows the <field>/<field>_env contract.
234+
235+
- ``<field>_env`` must be a valid POSIX shell identifier.
236+
- ``<field>`` literal is allowed but warned if it looks like a
237+
known vendor secret (gh*, AKIA, AIza, etc.).
238+
- Both set is a warning (the resolver uses _env).
239+
"""
240+
errors: list[ConfigError] = []
241+
env_field = f"{field}_env"
242+
243+
if env_field in data:
244+
name = data[env_field]
245+
if not isinstance(name, str):
246+
errors.append(ConfigError(
247+
f"{path}.{env_field}",
248+
f"Must be a string environment variable name, "
249+
f"got {type(name).__name__}",
250+
))
251+
elif not validate_env_var_name(name):
252+
errors.append(ConfigError(
253+
f"{path}.{env_field}",
254+
f"'{name}' is not a valid environment variable name "
255+
f"(must match [A-Za-z_][A-Za-z0-9_]*)",
256+
))
257+
258+
if field in data:
259+
value = data[field]
260+
if isinstance(value, str) and looks_like_literal_secret(value):
261+
errors.append(ConfigError(
262+
f"{path}.{field}",
263+
f"Looks like a literal vendor secret. Prefer "
264+
f"'{env_field}: <ENV_VAR_NAME>' to keep credentials "
265+
f"out of argus.yml.",
266+
level="warning",
267+
))
268+
269+
if field in data and env_field in data:
270+
errors.append(ConfigError(
271+
f"{path}.{field}",
272+
f"Both '{field}' and '{env_field}' are set — only one "
273+
f"should be used. '{env_field}' takes precedence at resolution.",
274+
level="warning",
275+
))
276+
277+
return errors
278+
279+
280+
def _validate_zap_auth(path: str, data: Any) -> list[ConfigError]:
281+
"""Validate the ``scanners.zap.auth`` sub-block.
282+
283+
Holds the ZAP context-file path plus credential references for
284+
web-app authentication. Credentials follow the same <field> /
285+
<field>_env contract as registry auth.
286+
"""
287+
errors: list[ConfigError] = []
288+
289+
if not isinstance(data, dict):
290+
errors.append(ConfigError(
291+
path, f"Must be a mapping, got {type(data).__name__}",
292+
))
293+
return errors
294+
295+
for key in data:
296+
if key not in _ZAP_AUTH_KEYS:
297+
errors.append(ConfigError(
298+
f"{path}.{key}",
299+
f"Unknown auth key '{key}'. "
300+
f"Valid keys: {', '.join(sorted(_ZAP_AUTH_KEYS))}",
301+
level="warning",
302+
))
303+
304+
if "context_file" in data and not isinstance(data["context_file"], str):
305+
errors.append(ConfigError(
306+
f"{path}.context_file", "Must be a string path",
307+
))
308+
309+
errors.extend(_validate_secret_field(data, "username", path))
310+
errors.extend(_validate_secret_field(data, "password", path))
311+
312+
return errors
313+
314+
173315
def _validate_reporting(path: str, data: Any) -> list[ConfigError]:
174316
"""Validate the reporting config block."""
175317
errors: list[ConfigError] = []

0 commit comments

Comments
 (0)