Skip to content

test(dream-cli): BATS regression shield for 5 dream-cli / supporting behaviors#1018

Draft
yasinBursali wants to merge 12 commits intoLight-Heart-Labs:mainfrom
yasinBursali:test/dream-cli-bats-coverage
Draft

test(dream-cli): BATS regression shield for 5 dream-cli / supporting behaviors#1018
yasinBursali wants to merge 12 commits intoLight-Heart-Labs:mainfrom
yasinBursali:test/dream-cli-bats-coverage

Conversation

@yasinBursali
Copy link
Copy Markdown
Contributor

@yasinBursali yasinBursali commented Apr 23, 2026

⚠️ Draft — depends on 5 upstream PRs merging first

Each test file pins behavior introduced by a different unmerged PR. Merge order below:

Test file Pins behavior from
test-config-masking.bats #994_cmd_config_is_secret + schema cache
test-compose-summary-wrapper.bats #1016_compose_run_with_summary (was #1001 — closed/superseded)
test-dream-cli-flags.bats #998 + #1002set -euo pipefail + nounset guards
test-functional-resilience.bats #1003dream-test-functional.sh set-e resilience
test-sr-resolve.bats #1016sr_resolve prefix-strip (was #1001 — closed/superseded)

Once all 5 merge (#994, #998, #1002, #1003, #1016), rebase drops the merge commits and the PR diff becomes exactly 5 new .bats files (+890 LoC total, zero existing-file edits).

Note (apr-25): the original #1001 was closed in favor of its superset #1016; the test files in this PR target the same _compose_run_with_summary and sr_resolve behavior, which now ships in #1016. No code change needed in this PR — only the rebase target shifts to post-#1016 main.

What

5 new BATS test files in dream-server/tests/bats-tests/ locking in the post-PR-12 dream-cli refactors:

  • test-config-masking.bats (17 cases) — _cmd_config_is_secret behavior from both dream config show and dream preset diff paths. Covers schema-driven masking, 7-keyword fallback (secret/password/pass/token/key/salt/bearer), case-insensitive, missing+malformed schema, zero-secret schema.
  • test-compose-summary-wrapper.bats (11 cases) — PATH-injected docker stub with 3 modes (success / fail-keyword / fail-nomatch). Covers wrapper success/failure banners, log removal/preservation, keyword matching, zero-grep-match warn fallback under set -euo pipefail, argv passthrough, exit-code propagation.
  • test-dream-cli-flags.bats (8 cases) — pins set -euo pipefail at L6, no weaker set -e[^u], no bare | head -1, sed -n '1p' idiom presence, ${FOO:-default} conditional-var count, env -i nounset safety.
  • test-functional-resilience.bats (7 cases) — stubs test_*_functional() functions via awk injection. Covers summary emission under all-fail, first-fail-at-counter-0 not aborting (the fix(installer): correct dependency validation path during installation #428 core bug), all-pass exit 0, mixed pass/fail runs all four, static asserts on counter arithmetic and bounded set +e/-e.
  • test-sr-resolve.bats (10 cases) — subshell-isolated runs (critical: declare -A in service-registry.sh creates function-local arrays when sourced from bats setup()). Covers 8-case matrix (exact ID, alias, dream-prefix strip, unknown-dream passthrough, unknown non-dream, empty, dream-convention-false container) + 2 correctness guards.

Testing

  • bats tests/bats-tests/test-*.bats53/53 pass locally on macOS (homebrew bash 5.3)
  • bash -n + shellcheck on all 5 files clean
  • Full staging suite: 290 pass, 6 pre-existing failures in unrelated test files (docker-phase, path-utils, preflight-phase, requirements-phase) — zero new failures from my files
  • Pre-commit hooks clean

Platform Impact

  • Linux CI (primary): test-linux.yml runs run-bats.sh which auto-discovers tests/bats-tests/*.bats. All 53 new tests will run.
  • macOS dev: passes with homebrew bash.
  • Windows: no BATS runner exists; tests are not executed there.

📋 Chain status (operator's reminder — apr-25)

This PR is a regression-shield test pack that depends on 5 base PRs landing first:

#994  (READY)
#998  (DRAFT, blocked by #1008) ──┐
#1002 (DRAFT, blocked by #998)    ├──▶ #1018 (this — DRAFT)
#1003 (READY)                     │
#1016 (DRAFT, blocked by #999) ───┘

Before this PR can be promoted:
All 5 of the above must land. The order in which they land doesn't matter for this PR — each test file independently pins behavior from one of them. Once all 5 are in main, the rebase strips the staging-base merge commits and the diff collapses to 5 new .bats files only (no edits to existing files).

After all 5 deps merge, the operator must:

  1. Rebase this branch onto post-{1008,994,998,1002,1003,999,1016} main. All "merge: ... as staging base" commits will drop out automatically (their content is in main).
  2. Promote from DRAFT → ready (gh pr ready 1018).

Kept DRAFT until then as a mechanical safeguard.

@Lightheartdevs
Copy link
Copy Markdown
Collaborator

Audit follow-up: keep draft / fix the pipefail fallback before merge.

This adds useful BATS coverage, but the branch still needs the #1008-style tolerant guard: with set -euo pipefail, _check_version_compat must not exit just because .env lacks DREAM_VERSION. Please rebase on current main, preserve that fallback behavior, and rerun the dream-cli BATS tests.

yasinBursali added a commit to yasinBursali/DreamServer that referenced this pull request Apr 29, 2026
Maintainer audit on PR Light-Heart-Labs#998 (Lightheartdevs, 2026-04-28) flagged that
under `set -euo pipefail`, `_check_version_compat` must tolerate a
`.env` lacking `DREAM_VERSION` and fall back to `.version` /
`manifest.json`. The audit-required form is the trailing `|| true`
on the DREAM_VERSION grep pipeline:

    _COMPAT_INSTALLED_VER=$(grep '^DREAM_VERSION=' "$INSTALL_DIR/.env" 2>/dev/null \
        | sed -n '1p' | cut -d= -f2 | tr -d '[:space:]' || true)

The rebase onto current main (which carries Light-Heart-Labs#1008's tolerance fix)
preserved this form via conflict resolution; this test locks it in.

`tests/test-dream-cli-version-compat-pipefail.sh` (4 cases, wired
into `make test`):

1. dream-cli runs under strict mode (set -e or stricter).
2. The DREAM_VERSION grep pipeline ends with `|| true` (matched against
   active code; comment lines stripped before grep so the rationale
   block can't satisfy the assertion on its own).
3. `.version` and `manifest.json` fallback branches still exist (they
   are the destinations the `|| true` enables reaching).
4. Anti-regression: the bare-close form (no trailing `|| true`)
   does NOT appear — catches a future "cleanup" PR removing the guard.

Behavioural test was attempted but rejected: `_check_version_compat`
depends on `warn`/`log`/`_manifest_field`/`_semver_*` and several
file-scope color vars, so isolating it requires too much harness
setup. The dream-cli BATS suite (Light-Heart-Labs#1018) carries the behavioural
coverage at the right layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yasinBursali and others added 12 commits April 29, 2026 04:18
Promote shell option from set -e to set -eo pipefail so failures in
upstream pipeline commands propagate and surface the real exit code
(not the status of the last stage). Aligns dream-cli with the project
standard (CLAUDE.md: 'set -euo pipefail everywhere').

Scope is deliberately narrow — nounset (-u) is intentionally not added
here because it requires a full bare-variable audit.

Side-effect audit:
- Two '| head -1' sites converted to '| sed -n 1p' (SIGPIPE-safe under
  pipefail; upstream producers no longer die when head closes the pipe):
    * _check_version_compat: grep DREAM_VERSION .env
    * preset import:          tar tzf archive listing
- '|| true' sites reviewed individually; all are intentional tolerance
  of known-benign failures (missing optional env keys, idempotent stops,
  arithmetic increment quirks) and were left in place.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…mark'

Before: 'dream chat' against a dead backend printed '[dream] Sending to
...', emitted an empty line, and exited 0. 'dream benchmark' propagated
that silent failure and happily reported sub-second 'Excellent'
performance while the LLM was not even running.

Root cause: curl was invoked with '-s' (silent, no error-show, no
fail-on-HTTP-error) and the response was parsed with 'jq -r
.choices[0].message.content // .error.message // "Error: no response"',
which masks every failure path behind a human-friendly string on stdout.

Fix:
- Add a /v1/models pre-flight probe with --fail-with-body and
  --max-time 3, so an unreachable llama-server fails loudly before the
  user sees 'Sending to ...'.
- Use --show-error --fail-with-body --max-time 30 on the completion
  call; non-2xx HTTP responses now fail the command and the body is
  shown.
- Parse responses with 'jq -er' so missing '.choices[0].message.content'
  exits non-zero; on failure we re-parse '.error.message' and surface
  it through 'error' (exit 1).
- cmd_benchmark: capture cmd_chat output via 'if ! response=...' so a
  failed chat aborts the benchmark instead of reporting a false-positive
  duration.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ssign

Three commands previously printed warnings or pseudo-success messages
while exiting 0, breaking scripted callers that rely on exit codes to
detect failure.

- cmd_disable: 'sr_resolve' returns its input unchanged for unknown
  names, so 'dream disable bogus-service' used to silently stop a
  non-existent container ('|| true') and print a success-shaped
  message. Mirror the validation pattern already in cmd_enable: if
  neither the built-in nor user-extension directory exists for the
  resolved id, 'error "Unknown service: <input>"' (exit 1).

- cmd_agent logs: on a fresh install with no host-agent log file, the
  logs subcommand warned and returned 0. Return 1 so 'dream agent
  logs' fails fast when there is nothing to tail.

- _gpu_reassign: the NVIDIA-only guard and three prerequisite guards
  (jq missing, topo lib missing, assign script missing) all exited 0
  after warning, so automation could not tell whether reassignment
  succeeded or was skipped. Each 'warn ...; return' becomes 'warn ...;
  return 1'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous implementation used a narrow regex anchored to `=`:

    grep -qiE "(secret|pass|token|key)="

which required the keyword to appear immediately before `=`. It caught
`_SECRET=`, `_KEY=`, `_TOKEN=`, but missed `_PASSWORD=` (because `pass`
is followed by `word`, not `=`), `_SALT=` (not in the alternation), and
failed-open for any future secret name not ending in a matching suffix.

Real env keys that leaked in plaintext from `dream config show`:
  - LANGFUSE_DB_PASSWORD
  - LANGFUSE_SALT
  - LANGFUSE_CLICKHOUSE_PASSWORD
  - LANGFUSE_REDIS_PASSWORD
  - LANGFUSE_INIT_USER_PASSWORD
  - OPENCODE_SERVER_PASSWORD

Fix: drive masking from `.env.schema.json` — the canonical source of
truth that already marks the above keys `"secret": true`. Keys listed
there are masked; everything else is shown raw. When the schema or jq
is unavailable (older installs, minimal environments), fall back to a
case-insensitive substring match on secret/password/pass/token/key/salt
so the default behaviour errs on the side of masking rather than
leaking.

The schema already marks every known-leaking key above as
`"secret": true`; no schema change is required. A small number of
operational keys (TOKEN_SPY_PORT, TOKEN_SPY_URL, TARGET_API_KEY,
ANTHROPIC_API_KEY, OPENAI_API_KEY, TOGETHER_API_KEY, LIVEKIT_API_KEY,
LANGFUSE_PROJECT_PUBLIC_KEY) are not marked secret in the schema —
most are legitimately public (ports, URLs, publishable keys) or are
user-supplied. Left for a follow-up review; the regex fallback covers
them defensively if the schema ever goes missing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`dream config validate` crashed on every fresh macOS install. The
cause was a shebang mismatch: `scripts/validate-env.sh` is
`#!/bin/bash`, and macOS ships /bin/bash 3.2 (licensing-frozen) which
has no `declare -A`. The script uses associative arrays at lines 75,
76, 77, and 176, so it aborted immediately with
`declare: -A: invalid option` before doing any work.

Two-part fix:

- dream-cli invokes `validate-env.sh` through "$BASH" (the currently-
  running shell, guaranteed Bash 4+ by the version check at the top of
  dream-cli itself). This works regardless of the target script's
  shebang. Linux and WSL2 are unaffected (/bin/bash is already 4+
  there, so the invocation is a no-op change).

- `validate-env.sh` now starts with a Bash 4+ guard that prints a
  helpful "install a modern Bash" message and exits 1 before any
  `declare -A` is reached. This protects users running the script
  standalone (e.g. from the installer or CI), not just the dream-cli
  path. Wording matches the existing guard at the top of dream-cli.

`scripts/validate-manifests.sh` does not use associative arrays, so it
is invoked unchanged. Other `declare -A` users audited but not touched
in this PR (fix is the same pattern — either call through "$BASH" or
add a top-of-script Bash 4+ guard, depending on how each is invoked):

  scripts/pre-download.sh
  scripts/dream-test-functional.sh
  lib/service-registry.sh
  lib/progress.sh
  installers/phases/03-features.sh

Follow-up audit can cover each of these in scope.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…asking

Two small follow-up fixes flagged by the tester for the config subcommand
hardening:

1. tests/test-validate-env.sh invoked scripts/validate-env.sh by raw path,
   so shebang resolution picked /bin/bash = Bash 3.2 on macOS. The new
   Bash-4+ guard in validate-env.sh correctly blocks this invocation,
   which regressed the previously-passing macOS test cases (missing-.env
   and missing-schema, both expected exit 3 via early file-check) along
   with the Bash-4-dependent ones. Invoke the script through "$BASH" in
   the test harness, mirroring the dream-cli invocation path. One helper
   variable (VALIDATE_ENV_BASH), five invocation sites updated. Linux CI
   was unaffected; this fixes the macOS-only harness break.

2. Added "*bearer*" to the fallback substring match in cmd_config show's
   _cmd_config_is_secret, so FOO_BEARER=... does not leak when the
   schema is unavailable. Schema-driven path already covers the real
   DreamServer secrets; this closes a theoretical gap in the defensive
   fallback. Did not add *auth* (would false-positive on AUTHOR_NAME-
   style keys) or other substrings — bearer is the only real gap the
   tester surfaced.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The schema-authoritative check in the previous commit leaks keys that are
present in .env.schema.json with secret: false, which today miscovers five
real upstream-provider credentials (TARGET_API_KEY, ANTHROPIC_API_KEY,
OPENAI_API_KEY, TOGETHER_API_KEY, LIVEKIT_API_KEY). A malformed schema
(valid JSON but empty jq output) has the same silent-leak failure mode.

After a schema-miss under _schema_loaded=1, fall through to the keyword
substring match instead of returning "not secret" immediately. Schema
still defines what IS definitely a secret; the keyword pass adds defense
in depth for schema gaps. Over-masking of LANGFUSE_PROJECT_PUBLIC_KEY,
TOKEN_SPY_PORT, and TOKEN_SPY_URL is acceptable — 'show' should default
to over-masking, and raw values remain available via cat .env.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…date

The four commands that shell out to `docker compose` (restart, start, stop,
update) previously let the raw per-container-state-transition output pass
through with no summary and no visible error banner. A failure like
"dream-llama-ready is unhealthy" was one line buried in 40+ lines of state
transitions.

Introduce `_compose_run_with_summary` that runs compose with `--progress
quiet`, captures stdout+stderr to a mktemp log, and either prints a
"<verb> — done" success line or a red "<verb> failed:" banner that
surfaces up to 20 lines matching error|unhealthy|failed|dependency and
points the operator at the preserved full log.

Wrap all four callers. For `cmd_update`, keep the existing `if ! …; error`
guard on both the pull and up-recreate calls so the actionable rollback
hint is preserved on top of the surfaced error context.

Use `cmd || rc=\$?` to capture the compose exit code (safe under `set -e`
and not reliant on `\$?` after `if…fi`, which is spec'd to 0 on the
not-taken branch). The grep | sed | head pipeline is safe under `set -e`
without pipefail — this branch does not have pipefail, so no SIGPIPE /
no-match exit concerns.

Platform impact:
- macOS / Linux / Windows (WSL2): identical — no platform branching.
  `docker compose --progress quiet`, `grep -iE`, `sed`, `head`, `mktemp`
  are all portable.
…iables

Adds nounset (-u) to the shell options on top of the existing pipefail
foundation, catching typos in variable names at runtime instead of
silently expanding to empty strings. The audit added ${VAR:-} guards
to 6 variables that are legitimately conditional (missing positional
args for preset diff, sparse associative-array lookups when comparing
preset ext/env maps, and a registry lookup for service dirs without
valid manifests). All other variables in the file are unconditionally
set before use; no guard added.

Audited by running every read-only subcommand end-to-end in a fake
INSTALL_DIR and against a live install: help, version, list, status,
status-json, config show, gpu status/topology/validate/assignment,
model current/list, mode, stt current/status, preset list/save/load/
delete/export/import/diff, agent status, doctor --json, template
list/preview, audit, chat (against down backend), benchmark,
enable/disable/purge (against both valid-service and unknown paths).
Also verified with minimal environment (env -i PATH HOME BASH
INSTALL_DIR) to catch implicit TERM/USER/LANG assumptions.

Specific bugs caught during audit (each reproducible on main tip +
pipefail foundation):
- preset diff: bare \$2/\$3 crashed immediately with no args.
- preset diff: ext1/env1 associative-array reads for keys only present
  in the other preset crashed partway through comparison.
- enable <svc>: bare SERVICE_DEPENDS[\$service_id] crashed for service
  directories that exist on disk but have no valid manifest (edge case
  user-extensions with malformed manifest).

Platform impact:
- macOS / Linux / Windows (WSL2): identical. Portable Bash 4+ parameter
  expansion. No platform branching. No variables that are set on one
  OS but not another were found during the audit — all guarded reads
  guard against missing registry entries or unprovided positional args,
  neither of which is platform-specific.

Scope note: this PR depends on the pipefail addition from
fix/dream-cli-pipefail-exit-codes and is intended to land as a
follow-up after that merges upstream.
…+ preset diff

Pins _cmd_config_is_secret and _cmd_config_load_secret_schema
behavior from both dream config show (#392) and dream preset diff
(#431). Covers schema-driven masking, keyword fallback, and schema
absence/malformation edge cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…guards

test-compose-summary-wrapper.bats pins _compose_run_with_summary
success/error paths + zero-grep-match fallback + exit-code
propagation (#406).

test-dream-cli-flags.bats pins dream-cli shell-flag hygiene: set
-euo pipefail at line 6, nounset safety under minimal env, static
assertions on conditional-var syntax (#410).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r_resolve)

test-functional-resilience.bats pins dream-test-functional.sh set -e
resilience: summary emission even under all-fail, sentinel delivery,
bounded set +e/-e around test functions (#428).

test-sr-resolve.bats pins sr_resolve 8-case matrix including the
dream- prefix strip added post-PR-10 (#430).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yasinBursali yasinBursali force-pushed the test/dream-cli-bats-coverage branch from 14c3437 to eb94419 Compare April 29, 2026 01:23
@yasinBursali
Copy link
Copy Markdown
Contributor Author

Pushed audit follow-up — rebased onto current upstream/main, conflict resolved, BATS tests rerun.

Audit resolution:

The audit said "preserve the #1008-style tolerant guard: with set -euo pipefail, _check_version_compat must not exit just because .env lacks DREAM_VERSION." This branch carried commit dfaa4c8f from #998's chain which had stripped the || true from the DREAM_VERSION grep pipeline. The rebase hit the expected conflict at the _check_version_compat site; resolution kept upstream's || true form (post-#1008), with an inline rationale comment citing this audit + #1008 + #998 so a future "cleanup" reviewer cannot remove it.

Post-rebase commit list (12 commits, was 18 — 5 staging-merge commits collapsed, 1 was unrelated):

  1. 85190576 — pipefail + audit pipe/|| true sites (post-conflict-resolution)
  2. 9c369281 — surface LLM failures in dream chat / dream benchmark
  3. 6cce8310 — exit-code contract for disable / agent logs / gpu reassign
  4. 1f93a422, 399eaad1, 4ac075f7, eee4bc65 — schema-driven secret masking + Bash-4 guard + helper promotion (fix(dream-cli): schema-driven secret masking + macOS Bash 4 validation #994 content)
  5. 8022c8d7 — compose summary wrapper for restart/start/stop/update
  6. 654d3801 — set -u + nounset guards
  7. 0e21d0d8 — config-masking BATS coverage (17 cases)
  8. 9ec161b6 — compose-wrapper + flags BATS coverage (8+11 = 19 cases)
  9. eb944194 — sr_resolve + functional-resilience BATS coverage (10+7 = 17 cases)

The BATS test commits (0e21d0d8, 9ec161b6, eb944194) total 53 cases across 5 files.

Audit's "rerun the dream-cli BATS tests":

$ tests/bats/bats-core/bin/bats tests/bats-tests/test-config-masking.bats \
                                tests/bats-tests/test-compose-summary-wrapper.bats \
                                tests/bats-tests/test-dream-cli-flags.bats \
                                tests/bats-tests/test-functional-resilience.bats \
                                tests/bats-tests/test-sr-resolve.bats
1..53
ok 1 ... ok 53

53/53 PASS.

make lint green, bash -n dream-cli clean.

Sibling PR overlap: this branch's first 6-8 commits duplicate content from #998, #994, #1002, #1016 (all currently open). When those PRs merge to main, this PR's rebase collapses further (the duplicated commits become no-ops). When ALL of #998/#994/#1002/#1016 are on main, this PR reduces to exactly 3 BATS-only commits + 5 new files.

Suggested merge order to minimize rebase churn:

— or any subset thereof; rebase is automatic each time.

This PR effectively serves as a regression shield — locking in the behavioral expectations of #994/#998/#1002/#1016 so a future PR that touches dream-cli can't silently break the secret-masking rules, the compose-wrapper banner contract, the pipefail tolerance, or the sr_resolve prefix-strip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants