test(probe): whole-app coverage — dubbing, i18n, engines, security, migration, dictation, design#247
Conversation
…ation, dictation, design, coverage-critic Broadens the probe harness from one happy-path spec per layer to whole-app feature coverage (web, backend, dictation, clone, design), keeping the Actor/Judge split and offline-by-default + enable-on-demand for heavy paths. New specs + judges (one subprocess boot shared across backend-touching specs): - dubbing (L4): segment duration-ratio, SRT/VTT well-formed, export-archive contents, output language-ID (advisory) - i18n: locale files valid JSON (gate); orphan-keys + coverage (advisory). NOTE: surfaced a real bug — all 20 non-en locales carry gallery.cat_*/ bootstrap.lines keys absent from the en reference (reported, not gated). - engine matrix: active engine available + every unavailable engine explains why (11 TTS / 7 ASR backends via /engines/*) - loopback security: system routes reject non-loopback origins (403) - DB migration: alembic UPGRADE on the seeded omnivoice_data fixture - Coverage Critic: every declared layer still has a spec (gate) + API inventory - dictation: streaming-ASR WebSocket /ws/transcribe registered + handshake - voice design: reuses the audio-correctness ladder - real ASR round-trip: enable-on-demand (PROBE_E2E=1) Enriched _boot_runner.py to capture engines/asr/loopback/openapi/ws in ONE isolated boot (conftest boot_capture session fixture); added env.seeded_data_dir. 13 specs total. probe suite 74 passed / 5 skipped; full repo 687 passed, 0 failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR expands the probe testing framework with comprehensive backend validation. It enhances the boot runner to capture engine matrices, loopback rejection, websocket routes, and OpenAPI paths; adds judge modules for dubbing, i18n, engine, and coverage validation; introduces eight new probe specifications; and wires test implementations for each probe domain. ChangesProbe Test Suite Expansion
Sequence Diagram(s)sequenceDiagram
participant Tester as Probe Test Runner
participant Boot as _boot_runner.py
participant Env as seeded/fresh Env
participant Client as TestClient
participant App as FastAPI_app
Tester->>Boot: invoke capture_first_run with data-dir
Boot->>Env: prepare data-dir (fresh or seeded)
Boot->>Client: start app under TestClient
Client->>App: GET /engines/tts, GET /engines/asr
Client->>App: GET /system/info (non-lifespan loopback check)
Client->>App: open /ws/transcribe websocket
Boot->>Boot: snapshot DB files pre/post and collect app.openapi()/ws routes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Filename | Overview |
|---|---|
| tests/probe/specs/coverage_critic.probe.yaml | Drift-gate spec for the probe suite. The newly added dictation layer is not included in the required list, so the gate won't catch future deletion of dictation.probe.yaml. |
| tests/probe/test_probe_migration.py | Migration probe — run_judges is correctly placed inside the with seeded_data_dir() block so the path_exists check sees the DB before teardown. |
| tests/probe/judges/dubbing.py | New dubbing judges: duration-ratio, SRT/VTT well-formedness, archive membership, and pluggable language-ID — all deterministic and well-guarded against zero/missing inputs. |
| tests/probe/judges/i18n.py | i18n judges covering valid-JSON gate, orphan-key detection (advisory), and coverage reporting. Docstring now correctly labels locale_no_orphan_keys as advisory. |
| tests/probe/_boot_runner.py | Shared-boot subprocess extended to capture engine matrices, loopback-reject status, dictation WS handshake, OpenAPI inventory, and WS route table in a single app boot. |
| tests/probe/conftest.py | Session-scoped boot_capture fixture added; shares a single subprocess boot across engine, security, coverage, and dictation specs. |
Sequence Diagram
sequenceDiagram
participant pytest
participant boot_capture (session fixture)
participant _boot_runner subprocess
participant FastAPI app
pytest->>boot_capture (session fixture): request (once per session)
boot_capture (session fixture)->>_boot_runner subprocess: spawn (fresh data dir)
_boot_runner subprocess->>FastAPI app: TestClient enter (lifespan start)
_boot_runner subprocess->>FastAPI app: GET /health, /system/info, /model/status
_boot_runner subprocess->>FastAPI app: GET /engines/tts, /engines/asr
_boot_runner subprocess->>FastAPI app: GET /system/info (non-loopback client → 403)
_boot_runner subprocess->>FastAPI app: WS /ws/transcribe (handshake only)
_boot_runner subprocess->>FastAPI app: app.openapi() + app.routes
_boot_runner subprocess->>boot_capture (session fixture): write JSON capture
boot_capture (session fixture)->>pytest: return capture dict (shared)
pytest->>test_probe_engines: run
pytest->>test_probe_security: run
pytest->>test_probe_dictation: run
pytest->>test_probe_coverage: run
Note over test_probe_engines,test_probe_coverage: all read from shared boot_capture dict
pytest->>test_probe_migration: run (own seeded_data_dir + subprocess)
pytest->>test_probe_i18n: run (reads real locale files directly)
pytest->>test_probe_dubbing: run (offline synthetic data)
pytest->>test_probe_design: run (offline synthetic audio)
Reviews (3): Last reviewed commit: "fix(probe): ASCII x in dubbing detail (r..." | Re-trigger Greptile
There was a problem hiding this comment.
Actionable comments posted: 12
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/probe/_boot_runner.py`:
- Around line 78-80: Replace storing the raw exception string in
ctx["ws_transcribe_error"] (inside the except Exception as exc block) with a
sanitized value: set ctx["ws_transcribe_connected"]=False as now but assign
ctx["ws_transcribe_error"] to a non-sensitive identifier such as the exception
type (type(exc).__name__) or a short normalized reason (e.g.,
"WebSocketHandshakeError" or exc.__class__.__name__ + optional safe suffix),
truncated to 200 chars if needed; ensure no user paths or secret-like substrings
from str(exc) are persisted.
- Around line 92-99: The current logic sets ctx["db_created"] =
bool(ctx["db_path"]) which flags seeded runs as "created"; instead, detect
pre-boot DB state and post-boot DB state separately: search data_dir before the
boot and store that result (e.g., ctx["pre_db_path"] or ctx["pre_existing_db"]),
then run the boot and re-scan into ctx["post_db_path"] (or ctx["db_path"]) and
set ctx["db_created"] = True only when pre-boot had no DB and post-boot does;
update any code that reads ctx["db_created"] to use this fresh-dir-only
semantics and keep the same keys (ctx["db_path"] can represent the post-boot
path while adding the pre-boot key).
In `@tests/probe/judges/coverage.py`:
- Around line 20-23: The loop that loads probe specs currently calls
yaml.safe_load(open(path, encoding="utf-8")) which leaves file descriptors open;
change it to open each spec with a context manager (with open(path,
encoding="utf-8") as fh:) and pass fh to yaml.safe_load so the file is closed
immediately after parsing, then keep the existing logic that sets doc =
yaml.safe_load(...) or {} and appends {"feature": doc.get("feature"), "layer":
doc.get("layer"), "file": os.path.basename(path")} to out.
In `@tests/probe/judges/dubbing.py`:
- Around line 33-46: The loop currently skips segments with non-positive source
duration so an empty or fully-skipped input returns passed=True; change the
logic to track how many segments were actually checked (e.g., add a checked
counter incremented when src > 0), and if checked == 0 set passed=False and
return a JudgeResult indicating zero valid segments (use JudgeResult name
"segments_duration_ratio", passed=False, measured=0 and a clear detail message
like "no valid source durations" or similar); keep the existing outlier
detection using _seg_durations and outliers but set measured to len(outliers)
when checked>0 and compute ok as (checked>0 and not outliers) so malformed
captures won't produce false greens.
In `@tests/probe/judges/i18n.py`:
- Around line 31-49: The current judge passes when no locale files exist; update
locale_valid_json to fail when len(locales) == 0 or when there are parse errors
by computing passed = bool(locales) and passed = passed and not bad (i.e.,
passed only if at least one locale and no bad files), set measured =
len(locales), and update the detail string to reflect three cases: success ("N
locale files parse"), parse errors ("invalid JSON: [...]"), and empty directory
("no locale files found"); modify functions _load_locales and locale_valid_json
references accordingly and add a small regression test that creates an empty
locales directory and asserts that locale_valid_json returns passed=False and
measured=0.
In `@tests/probe/specs/coverage_critic.probe.yaml`:
- Around line 11-15: The gate "layers_have_specs" currently hard-codes the
required layers in the required array (missing "meta"), so update it to derive
required layers from a single source of truth instead of a static list: compute
the required set from $.specs (e.g. collect each spec's layer value) and use
that derived list for layers_have_specs.required (or at minimum add "meta" to
the static array to match the suite), ensuring the coverage_report advisory
continues to reference $.openapi_paths and $.specs unchanged.
In `@tests/probe/specs/dub_export.probe.yaml`:
- Around line 13-17: The probe is missing capture of the dub_audio context key
before the advisory reads it; add a capture entry mapping dub_audio: $.dub_audio
in the capture block (the same place that currently lists segments, srt, vtt,
zip_names) so the advisory and the language-ID actor step have a real source
value instead of relying on synthetic injection; apply the same change to the
other capture block referenced around lines 27-29 to ensure both advisory usages
get dub_audio captured.
In `@tests/probe/specs/migration.probe.yaml`:
- Around line 9-13: The probe currently only checks boot health via judge.checks
entries status_eq ($.health_status) and json_field_eq on $.health_body.status,
which doesn't verify data preservation; add at least one seeded-data invariant
check (e.g., assert a known seeded row or count and/or schema-version) by adding
a new judge.checks entry such as json_field_eq or numeric comparison against a
seeded endpoint field (for example assert $.seeded_user.id == <expected> or
$.users_count == <expected> or schema_version == <expected>) so the probe
verifies existing user data or schema version in addition to health.
In `@tests/probe/test_probe_asr_e2e.py`:
- Around line 29-33: The test currently gates on PROBE_ASR_SAMPLE using
os.path.exists, which allows directories and causes downstream failures; change
the check to require a regular file by using os.path.isfile on the sample
variable and call pytest.skip if not a file so the ceil of
T.FasterWhisperTranscriber(transcribe...) only runs with a valid file path.
In `@tests/probe/test_probe_design.py`:
- Around line 28-36: The test test_voice_design_verdict currently only calls
probe_spec.run_judges with a prebuilt context so it never exercises the
actor/capture path for the new POST /generate contract; update the test to
either (A) run the spec through the actor step instead of only run_judges
(invoke the probe runner function that executes actors — e.g.,
probe_spec.run_actors or the equivalent API in the test harness) using the
loaded spec from probe_spec.load_spec and the designed_audio so the actual
actor/capture flow is executed, or (B) add explicit assertions after
probe_spec.load_spec that the spec's actions include a request targeting POST
/generate and that the capture path includes $.output_path (inspect the loaded
spec object fields to assert the method/path and capture selector); modify
test_voice_design_verdict to choose one of these fixes so the new /generate
contract and $.output_path capture are validated.
In `@tests/probe/test_probe_engines.py`:
- Around line 20-22: The test is pinned to the "whisperx" engine; change it to
assert the captured ASR active engine instead. Replace the second assertion that
calls E.engine_available(boot_capture["engines_asr"], "whisperx") with
E.engine_available(boot_capture["engines_asr"],
boot_capture["engines_asr"]["active"]) (or otherwise assert that
boot_capture["engines_asr"]["active"] is available) so the probe checks the
recorded active engine rather than a hard-coded id.
In `@tests/probe/test_probe_migration.py`:
- Around line 19-24: The test currently uses db_ok = context["db_created"] (from
tests/probe/_boot_runner.py) which only checks for a DB file and can pass even
if migrations dropped or mutated seeded data; instead, open the database
referenced by context["db_path"] in test_probe_migration.py and assert a
concrete seeded-data invariant (for example: a known table exists, a specific
migrated column is present, or a specific row/value from
tests/fixtures/omnivoice_data is still present). Replace the final assert db_ok
with a targeted assertion that queries the DB for the expected table/row/value
or schema change so the test fails if migrations altered the seeded data
unexpectedly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 1ed63786-68f9-44e6-ad42-2df302409fd3
📒 Files selected for processing (27)
tests/probe/README.mdtests/probe/_boot_runner.pytests/probe/conftest.pytests/probe/env.pytests/probe/judges/coverage.pytests/probe/judges/dictation.pytests/probe/judges/dubbing.pytests/probe/judges/engine.pytests/probe/judges/i18n.pytests/probe/spec.pytests/probe/specs/coverage_critic.probe.yamltests/probe/specs/dictation.probe.yamltests/probe/specs/dub_export.probe.yamltests/probe/specs/engines.probe.yamltests/probe/specs/i18n_parity.probe.yamltests/probe/specs/migration.probe.yamltests/probe/specs/security.probe.yamltests/probe/specs/voice_design.probe.yamltests/probe/test_probe_asr_e2e.pytests/probe/test_probe_coverage.pytests/probe/test_probe_design.pytests/probe/test_probe_dictation.pytests/probe/test_probe_dubbing.pytests/probe/test_probe_engines.pytests/probe/test_probe_i18n.pytests/probe/test_probe_migration.pytests/probe/test_probe_security.py
- coverage.py:22 — use `with open(...)` context to close spec files after
yaml.safe_load (file handle leak)
- _boot_runner.py:80 — store only `type(exc).__name__` for WS errors; drop
raw str(exc) that could leak home paths / secrets into capture JSON
- _boot_runner.py:99 — snapshot DB files before boot; set db_created=True
only when boot creates NEW files (not when fixture already had one)
- dubbing.py:46 — FAIL segments_duration_ratio when validated==0 (guards
against empty/corrupt segment list passing vacuously)
- i18n.py:49 — FAIL locale_valid_json when locales_dir is empty/missing
- i18n.py:7 — fix docstring: locale_no_orphan_keys is advisory, not blocking
- test_probe_i18n.py:59 — assert r.passed is False, not just r.advisory
- coverage_critic.probe.yaml:15 — add "meta" to required layers list
- dub_export.probe.yaml:17 — capture dub_audio in steps before advisory reads it
- migration.probe.yaml:13 — add path_exists(db_path) data-integrity check
- test_probe_asr_e2e.py:33 — os.path.exists → os.path.isfile for PROBE_ASR_SAMPLE
- test_probe_migration.py:24 — assert context["db_path"] (presence) not
db_created (new creation), aligning with the boot_runner fix
Two findings intentionally skipped with reasons (see review thread replies):
test_probe_design.py:36 — offline pattern is intentional; actor step is
bypassed by design throughout the probe suite for CI compatibility
test_probe_engines.py:22 — whisperx pin is intentional; it verifies the
shipped default ASR engine is available out-of-the-box
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/probe/test_probe_migration.py (1)
22-25:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftThe “data integrity” check still never inspects migrated data.
Line 25 only asserts that
db_pathis a non-empty string. The spec already runspath_existson the same value, so this adds no protection, and the test still won't catch a migration that recreates or mutates the seeded DB while leaving a file behind. Please assert a concrete invariant from the copiedomnivoice_datadatabase instead of path presence alone.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/probe/test_probe_migration.py` around lines 22 - 25, The test currently only asserts context["db_path"] exists but doesn't validate migrated data; open the SQLite DB at context["db_path"] (the same variable used earlier) and run a concrete query against the seeded omnivoice_data (e.g., verify a known table exists and contains an expected row or count) and assert the result, so the test fails if the migration recreated or mutated the seeded DB; use the test's sqlite3 connection or existing DB helper to SELECT a known record/rowcount from the omnivoice_data table and assert it matches the expected value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/probe/judges/dubbing.py`:
- Line 59: Replace the Unicode multiplication sign in the f-string that builds
the detail message so it uses ASCII "x"; specifically update the f-string that
currently reads detail=f"all {validated} segment(s) within [{min_ratio},
{max_ratio}]×" in tests/probe/judges/dubbing.py to use "x" instead of "×" (the
variables validated, min_ratio, and max_ratio remain unchanged).
---
Duplicate comments:
In `@tests/probe/test_probe_migration.py`:
- Around line 22-25: The test currently only asserts context["db_path"] exists
but doesn't validate migrated data; open the SQLite DB at context["db_path"]
(the same variable used earlier) and run a concrete query against the seeded
omnivoice_data (e.g., verify a known table exists and contains an expected row
or count) and assert the result, so the test fails if the migration recreated or
mutated the seeded DB; use the test's sqlite3 connection or existing DB helper
to SELECT a known record/rowcount from the omnivoice_data table and assert it
matches the expected value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 4f039a49-2902-4b64-a134-b5d039fd1316
📒 Files selected for processing (10)
tests/probe/_boot_runner.pytests/probe/judges/coverage.pytests/probe/judges/dubbing.pytests/probe/judges/i18n.pytests/probe/specs/coverage_critic.probe.yamltests/probe/specs/dub_export.probe.yamltests/probe/specs/migration.probe.yamltests/probe/test_probe_asr_e2e.pytests/probe/test_probe_i18n.pytests/probe/test_probe_migration.py
✅ Files skipped from review due to trivial changes (2)
- tests/probe/specs/coverage_critic.probe.yaml
- tests/probe/specs/migration.probe.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/probe/judges/coverage.py
- tests/probe/specs/dub_export.probe.yaml
- tests/probe/test_probe_i18n.py
- tests/probe/test_probe_asr_e2e.py
…nside seeded dir Two regressions from the hardening pass: - dubbing.py: replace non-ASCII '×' with 'x' (Ruff ambiguous-unicode → Tests lint fail) - test_probe_migration: move run_judges inside the seeded_data_dir with-block so the new path_exists check sees the DB before the temp dir is torn down (was always failing)
What
Expands the
probeharness from one happy-path spec per layer to whole-app feature coverage — web, backend, dictation, clone, design — keeping the Actor/Judge split (no LLM on the verdict path) and offline-by-default + enable-on-demand for heavy paths. A single subprocess boot is shared across all backend-touching specs.New feature specs
dub_export)i18n_parity)engines)omnivoice_datafixture (backward-compat)/ws/transcriberegistered + accepts loopback handshakePROBE_E2E=1)🐞 Real bug surfaced
The i18n orphan-key check found that all 20 non-
enlocales carrygallery.cat_*(and somebootstrap.lines) keys that are absent from theenreference — so those gallery-category labels don't render for English users. It's reported in the advisory lane (non-blocking), since the fix is a product change, not a harness change. Worth a follow-up to add the missing keys toen.json.Design
boot_capturesession fixture) captures first-run endpoints, engine/ASR matrices, loopback-reject, OpenAPI inventory, and the dictation WS handshake — so the suite pays for ~one boot, not one per spec. Subprocess isolation keeps it from contaminating the rest of the test session.PROBE_E2E=1), live browser, Docker.Verification
uv run pytest tests/probe→ 74 passed, 5 skipped.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests