test(dream-cli): BATS regression shield for 5 dream-cli / supporting behaviors#1018
test(dream-cli): BATS regression shield for 5 dream-cli / supporting behaviors#1018yasinBursali wants to merge 12 commits intoLight-Heart-Labs:mainfrom
Conversation
|
Audit follow-up: keep draft / fix the pipefail fallback before merge. This adds useful BATS coverage, but the branch still needs the #1008-style tolerant guard: with |
Maintainer audit on PR Light-Heart-Labs#998 (Lightheartdevs, 2026-04-28) flagged that under `set -euo pipefail`, `_check_version_compat` must tolerate a `.env` lacking `DREAM_VERSION` and fall back to `.version` / `manifest.json`. The audit-required form is the trailing `|| true` on the DREAM_VERSION grep pipeline: _COMPAT_INSTALLED_VER=$(grep '^DREAM_VERSION=' "$INSTALL_DIR/.env" 2>/dev/null \ | sed -n '1p' | cut -d= -f2 | tr -d '[:space:]' || true) The rebase onto current main (which carries Light-Heart-Labs#1008's tolerance fix) preserved this form via conflict resolution; this test locks it in. `tests/test-dream-cli-version-compat-pipefail.sh` (4 cases, wired into `make test`): 1. dream-cli runs under strict mode (set -e or stricter). 2. The DREAM_VERSION grep pipeline ends with `|| true` (matched against active code; comment lines stripped before grep so the rationale block can't satisfy the assertion on its own). 3. `.version` and `manifest.json` fallback branches still exist (they are the destinations the `|| true` enables reaching). 4. Anti-regression: the bare-close form (no trailing `|| true`) does NOT appear — catches a future "cleanup" PR removing the guard. Behavioural test was attempted but rejected: `_check_version_compat` depends on `warn`/`log`/`_manifest_field`/`_semver_*` and several file-scope color vars, so isolating it requires too much harness setup. The dream-cli BATS suite (Light-Heart-Labs#1018) carries the behavioural coverage at the right layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote shell option from set -e to set -eo pipefail so failures in
upstream pipeline commands propagate and surface the real exit code
(not the status of the last stage). Aligns dream-cli with the project
standard (CLAUDE.md: 'set -euo pipefail everywhere').
Scope is deliberately narrow — nounset (-u) is intentionally not added
here because it requires a full bare-variable audit.
Side-effect audit:
- Two '| head -1' sites converted to '| sed -n 1p' (SIGPIPE-safe under
pipefail; upstream producers no longer die when head closes the pipe):
* _check_version_compat: grep DREAM_VERSION .env
* preset import: tar tzf archive listing
- '|| true' sites reviewed individually; all are intentional tolerance
of known-benign failures (missing optional env keys, idempotent stops,
arithmetic increment quirks) and were left in place.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…mark' Before: 'dream chat' against a dead backend printed '[dream] Sending to ...', emitted an empty line, and exited 0. 'dream benchmark' propagated that silent failure and happily reported sub-second 'Excellent' performance while the LLM was not even running. Root cause: curl was invoked with '-s' (silent, no error-show, no fail-on-HTTP-error) and the response was parsed with 'jq -r .choices[0].message.content // .error.message // "Error: no response"', which masks every failure path behind a human-friendly string on stdout. Fix: - Add a /v1/models pre-flight probe with --fail-with-body and --max-time 3, so an unreachable llama-server fails loudly before the user sees 'Sending to ...'. - Use --show-error --fail-with-body --max-time 30 on the completion call; non-2xx HTTP responses now fail the command and the body is shown. - Parse responses with 'jq -er' so missing '.choices[0].message.content' exits non-zero; on failure we re-parse '.error.message' and surface it through 'error' (exit 1). - cmd_benchmark: capture cmd_chat output via 'if ! response=...' so a failed chat aborts the benchmark instead of reporting a false-positive duration. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ssign
Three commands previously printed warnings or pseudo-success messages
while exiting 0, breaking scripted callers that rely on exit codes to
detect failure.
- cmd_disable: 'sr_resolve' returns its input unchanged for unknown
names, so 'dream disable bogus-service' used to silently stop a
non-existent container ('|| true') and print a success-shaped
message. Mirror the validation pattern already in cmd_enable: if
neither the built-in nor user-extension directory exists for the
resolved id, 'error "Unknown service: <input>"' (exit 1).
- cmd_agent logs: on a fresh install with no host-agent log file, the
logs subcommand warned and returned 0. Return 1 so 'dream agent
logs' fails fast when there is nothing to tail.
- _gpu_reassign: the NVIDIA-only guard and three prerequisite guards
(jq missing, topo lib missing, assign script missing) all exited 0
after warning, so automation could not tell whether reassignment
succeeded or was skipped. Each 'warn ...; return' becomes 'warn ...;
return 1'.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous implementation used a narrow regex anchored to `=`:
grep -qiE "(secret|pass|token|key)="
which required the keyword to appear immediately before `=`. It caught
`_SECRET=`, `_KEY=`, `_TOKEN=`, but missed `_PASSWORD=` (because `pass`
is followed by `word`, not `=`), `_SALT=` (not in the alternation), and
failed-open for any future secret name not ending in a matching suffix.
Real env keys that leaked in plaintext from `dream config show`:
- LANGFUSE_DB_PASSWORD
- LANGFUSE_SALT
- LANGFUSE_CLICKHOUSE_PASSWORD
- LANGFUSE_REDIS_PASSWORD
- LANGFUSE_INIT_USER_PASSWORD
- OPENCODE_SERVER_PASSWORD
Fix: drive masking from `.env.schema.json` — the canonical source of
truth that already marks the above keys `"secret": true`. Keys listed
there are masked; everything else is shown raw. When the schema or jq
is unavailable (older installs, minimal environments), fall back to a
case-insensitive substring match on secret/password/pass/token/key/salt
so the default behaviour errs on the side of masking rather than
leaking.
The schema already marks every known-leaking key above as
`"secret": true`; no schema change is required. A small number of
operational keys (TOKEN_SPY_PORT, TOKEN_SPY_URL, TARGET_API_KEY,
ANTHROPIC_API_KEY, OPENAI_API_KEY, TOGETHER_API_KEY, LIVEKIT_API_KEY,
LANGFUSE_PROJECT_PUBLIC_KEY) are not marked secret in the schema —
most are legitimately public (ports, URLs, publishable keys) or are
user-supplied. Left for a follow-up review; the regex fallback covers
them defensively if the schema ever goes missing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`dream config validate` crashed on every fresh macOS install. The cause was a shebang mismatch: `scripts/validate-env.sh` is `#!/bin/bash`, and macOS ships /bin/bash 3.2 (licensing-frozen) which has no `declare -A`. The script uses associative arrays at lines 75, 76, 77, and 176, so it aborted immediately with `declare: -A: invalid option` before doing any work. Two-part fix: - dream-cli invokes `validate-env.sh` through "$BASH" (the currently- running shell, guaranteed Bash 4+ by the version check at the top of dream-cli itself). This works regardless of the target script's shebang. Linux and WSL2 are unaffected (/bin/bash is already 4+ there, so the invocation is a no-op change). - `validate-env.sh` now starts with a Bash 4+ guard that prints a helpful "install a modern Bash" message and exits 1 before any `declare -A` is reached. This protects users running the script standalone (e.g. from the installer or CI), not just the dream-cli path. Wording matches the existing guard at the top of dream-cli. `scripts/validate-manifests.sh` does not use associative arrays, so it is invoked unchanged. Other `declare -A` users audited but not touched in this PR (fix is the same pattern — either call through "$BASH" or add a top-of-script Bash 4+ guard, depending on how each is invoked): scripts/pre-download.sh scripts/dream-test-functional.sh lib/service-registry.sh lib/progress.sh installers/phases/03-features.sh Follow-up audit can cover each of these in scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…asking Two small follow-up fixes flagged by the tester for the config subcommand hardening: 1. tests/test-validate-env.sh invoked scripts/validate-env.sh by raw path, so shebang resolution picked /bin/bash = Bash 3.2 on macOS. The new Bash-4+ guard in validate-env.sh correctly blocks this invocation, which regressed the previously-passing macOS test cases (missing-.env and missing-schema, both expected exit 3 via early file-check) along with the Bash-4-dependent ones. Invoke the script through "$BASH" in the test harness, mirroring the dream-cli invocation path. One helper variable (VALIDATE_ENV_BASH), five invocation sites updated. Linux CI was unaffected; this fixes the macOS-only harness break. 2. Added "*bearer*" to the fallback substring match in cmd_config show's _cmd_config_is_secret, so FOO_BEARER=... does not leak when the schema is unavailable. Schema-driven path already covers the real DreamServer secrets; this closes a theoretical gap in the defensive fallback. Did not add *auth* (would false-positive on AUTHOR_NAME- style keys) or other substrings — bearer is the only real gap the tester surfaced. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The schema-authoritative check in the previous commit leaks keys that are present in .env.schema.json with secret: false, which today miscovers five real upstream-provider credentials (TARGET_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY, TOGETHER_API_KEY, LIVEKIT_API_KEY). A malformed schema (valid JSON but empty jq output) has the same silent-leak failure mode. After a schema-miss under _schema_loaded=1, fall through to the keyword substring match instead of returning "not secret" immediately. Schema still defines what IS definitely a secret; the keyword pass adds defense in depth for schema gaps. Over-masking of LANGFUSE_PROJECT_PUBLIC_KEY, TOKEN_SPY_PORT, and TOKEN_SPY_URL is acceptable — 'show' should default to over-masking, and raw values remain available via cat .env. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…date The four commands that shell out to `docker compose` (restart, start, stop, update) previously let the raw per-container-state-transition output pass through with no summary and no visible error banner. A failure like "dream-llama-ready is unhealthy" was one line buried in 40+ lines of state transitions. Introduce `_compose_run_with_summary` that runs compose with `--progress quiet`, captures stdout+stderr to a mktemp log, and either prints a "<verb> — done" success line or a red "<verb> failed:" banner that surfaces up to 20 lines matching error|unhealthy|failed|dependency and points the operator at the preserved full log. Wrap all four callers. For `cmd_update`, keep the existing `if ! …; error` guard on both the pull and up-recreate calls so the actionable rollback hint is preserved on top of the surfaced error context. Use `cmd || rc=\$?` to capture the compose exit code (safe under `set -e` and not reliant on `\$?` after `if…fi`, which is spec'd to 0 on the not-taken branch). The grep | sed | head pipeline is safe under `set -e` without pipefail — this branch does not have pipefail, so no SIGPIPE / no-match exit concerns. Platform impact: - macOS / Linux / Windows (WSL2): identical — no platform branching. `docker compose --progress quiet`, `grep -iE`, `sed`, `head`, `mktemp` are all portable.
…iables
Adds nounset (-u) to the shell options on top of the existing pipefail
foundation, catching typos in variable names at runtime instead of
silently expanding to empty strings. The audit added ${VAR:-} guards
to 6 variables that are legitimately conditional (missing positional
args for preset diff, sparse associative-array lookups when comparing
preset ext/env maps, and a registry lookup for service dirs without
valid manifests). All other variables in the file are unconditionally
set before use; no guard added.
Audited by running every read-only subcommand end-to-end in a fake
INSTALL_DIR and against a live install: help, version, list, status,
status-json, config show, gpu status/topology/validate/assignment,
model current/list, mode, stt current/status, preset list/save/load/
delete/export/import/diff, agent status, doctor --json, template
list/preview, audit, chat (against down backend), benchmark,
enable/disable/purge (against both valid-service and unknown paths).
Also verified with minimal environment (env -i PATH HOME BASH
INSTALL_DIR) to catch implicit TERM/USER/LANG assumptions.
Specific bugs caught during audit (each reproducible on main tip +
pipefail foundation):
- preset diff: bare \$2/\$3 crashed immediately with no args.
- preset diff: ext1/env1 associative-array reads for keys only present
in the other preset crashed partway through comparison.
- enable <svc>: bare SERVICE_DEPENDS[\$service_id] crashed for service
directories that exist on disk but have no valid manifest (edge case
user-extensions with malformed manifest).
Platform impact:
- macOS / Linux / Windows (WSL2): identical. Portable Bash 4+ parameter
expansion. No platform branching. No variables that are set on one
OS but not another were found during the audit — all guarded reads
guard against missing registry entries or unprovided positional args,
neither of which is platform-specific.
Scope note: this PR depends on the pipefail addition from
fix/dream-cli-pipefail-exit-codes and is intended to land as a
follow-up after that merges upstream.
…+ preset diff Pins _cmd_config_is_secret and _cmd_config_load_secret_schema behavior from both dream config show (#392) and dream preset diff (#431). Covers schema-driven masking, keyword fallback, and schema absence/malformation edge cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…guards test-compose-summary-wrapper.bats pins _compose_run_with_summary success/error paths + zero-grep-match fallback + exit-code propagation (#406). test-dream-cli-flags.bats pins dream-cli shell-flag hygiene: set -euo pipefail at line 6, nounset safety under minimal env, static assertions on conditional-var syntax (#410). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r_resolve) test-functional-resilience.bats pins dream-test-functional.sh set -e resilience: summary emission even under all-fail, sentinel delivery, bounded set +e/-e around test functions (#428). test-sr-resolve.bats pins sr_resolve 8-case matrix including the dream- prefix strip added post-PR-10 (#430). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
14c3437 to
eb94419
Compare
|
Pushed audit follow-up — rebased onto current Audit resolution: The audit said "preserve the #1008-style tolerant guard: with Post-rebase commit list (12 commits, was 18 — 5 staging-merge commits collapsed, 1 was unrelated):
The BATS test commits ( Audit's "rerun the dream-cli BATS tests": 53/53 PASS.
Sibling PR overlap: this branch's first 6-8 commits duplicate content from #998, #994, #1002, #1016 (all currently open). When those PRs merge to main, this PR's rebase collapses further (the duplicated commits become no-ops). When ALL of #998/#994/#1002/#1016 are on main, this PR reduces to exactly 3 BATS-only commits + 5 new files. Suggested merge order to minimize rebase churn:
— or any subset thereof; rebase is automatic each time. This PR effectively serves as a regression shield — locking in the behavioral expectations of #994/#998/#1002/#1016 so a future PR that touches dream-cli can't silently break the secret-masking rules, the compose-wrapper banner contract, the pipefail tolerance, or the |
Each test file pins behavior introduced by a different unmerged PR. Merge order below:
test-config-masking.bats_cmd_config_is_secret+ schema cachetest-compose-summary-wrapper.bats_compose_run_with_summary(was #1001 — closed/superseded)test-dream-cli-flags.batsset -euo pipefail+ nounset guardstest-functional-resilience.batsdream-test-functional.shset-e resiliencetest-sr-resolve.batssr_resolveprefix-strip (was #1001 — closed/superseded)Once all 5 merge (#994, #998, #1002, #1003, #1016), rebase drops the merge commits and the PR diff becomes exactly 5 new
.batsfiles (+890 LoC total, zero existing-file edits).Note (apr-25): the original #1001 was closed in favor of its superset #1016; the test files in this PR target the same
_compose_run_with_summaryandsr_resolvebehavior, which now ships in #1016. No code change needed in this PR — only the rebase target shifts to post-#1016 main.What
5 new BATS test files in
dream-server/tests/bats-tests/locking in the post-PR-12 dream-cli refactors:_cmd_config_is_secretbehavior from bothdream config showanddream preset diffpaths. Covers schema-driven masking, 7-keyword fallback (secret/password/pass/token/key/salt/bearer), case-insensitive, missing+malformed schema, zero-secret schema.dockerstub with 3 modes (success / fail-keyword / fail-nomatch). Covers wrapper success/failure banners, log removal/preservation, keyword matching, zero-grep-match warn fallback underset -euo pipefail, argv passthrough, exit-code propagation.set -euo pipefailat L6, no weakerset -e[^u], no bare| head -1,sed -n '1p'idiom presence,${FOO:-default}conditional-var count,env -inounset safety.test_*_functional()functions via awk injection. Covers summary emission under all-fail, first-fail-at-counter-0 not aborting (the fix(installer): correct dependency validation path during installation #428 core bug), all-pass exit 0, mixed pass/fail runs all four, static asserts on counter arithmetic and boundedset +e/-e.declare -Ain service-registry.sh creates function-local arrays when sourced from bats setup()). Covers 8-case matrix (exact ID, alias, dream-prefix strip, unknown-dream passthrough, unknown non-dream, empty, dream-convention-false container) + 2 correctness guards.Testing
bats tests/bats-tests/test-*.bats— 53/53 pass locally on macOS (homebrew bash 5.3)bash -n+shellcheckon all 5 files cleanPlatform Impact
test-linux.ymlrunsrun-bats.shwhich auto-discoverstests/bats-tests/*.bats. All 53 new tests will run.📋 Chain status (operator's reminder — apr-25)
This PR is a regression-shield test pack that depends on 5 base PRs landing first:
Before this PR can be promoted:
All 5 of the above must land. The order in which they land doesn't matter for this PR — each test file independently pins behavior from one of them. Once all 5 are in main, the rebase strips the staging-base merge commits and the diff collapses to 5 new
.batsfiles only (no edits to existing files).After all 5 deps merge, the operator must:
gh pr ready 1018).Kept DRAFT until then as a mechanical safeguard.