Skip to content

test(probe): whole-app coverage — dubbing, i18n, engines, security, migration, dictation, design#247

Merged
debpalash merged 3 commits into
mainfrom
debpalash/probe-coverage
Jun 2, 2026
Merged

test(probe): whole-app coverage — dubbing, i18n, engines, security, migration, dictation, design#247
debpalash merged 3 commits into
mainfrom
debpalash/probe-coverage

Conversation

@debpalash

@debpalash debpalash commented Jun 2, 2026

Copy link
Copy Markdown
Owner

What

Expands the probe harness from one happy-path spec per layer to whole-app feature coverage — web, backend, dictation, clone, design — keeping the Actor/Judge split (no LLM on the verdict path) and offline-by-default + enable-on-demand for heavy paths. A single subprocess boot is shared across all backend-touching specs.

New feature specs

Spec Verifies Runs for real here?
dubbing (dub_export) per-segment duration ratio (core dubbing constraint), SRT/VTT well-formed, export-archive contents; output language-ID (advisory) ✅ offline (synthetic)
i18n (i18n_parity) locale files valid JSON (gate); orphan-keys + coverage (advisory) ✅ against real locales
engine matrix (engines) active TTS/ASR engine available + every unavailable engine explains how to install it (11 TTS / 7 ASR backends) ✅ real backend
security system routes reject non-loopback origins (403) ✅ real backend
migration alembic UPGRADE on the seeded omnivoice_data fixture (backward-compat) ✅ real backend
dictation streaming-ASR WebSocket /ws/transcribe registered + accepts loopback handshake ✅ real backend
voice design design output runs the audio-correctness ladder ✅ offline (synthetic)
coverage-critic every declared layer still has a spec (drift gate) + API-surface inventory (advisory) ✅ real (156 routes)
real ASR round-trip TTS→Whisper→WER enable-on-demand (PROBE_E2E=1)

🐞 Real bug surfaced

The i18n orphan-key check found that all 20 non-en locales carry gallery.cat_* (and some bootstrap.lines) keys that are absent from the en reference — so those gallery-category labels don't render for English users. It's reported in the advisory lane (non-blocking), since the fix is a product change, not a harness change. Worth a follow-up to add the missing keys to en.json.

Design

  • One shared subprocess boot (boot_capture session fixture) captures first-run endpoints, engine/ASR matrices, loopback-reject, OpenAPI inventory, and the dictation WS handshake — so the suite pays for ~one boot, not one per spec. Subprocess isolation keeps it from contaminating the rest of the test session.
  • Heavy/live paths skip cleanly: real ASR (PROBE_E2E=1), live browser, Docker.

Verification

  • uv run pytest tests/probe74 passed, 5 skipped.
  • Full repo → 687 passed, 13 skipped, 13 xfailed, 0 failures; no contamination.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Comprehensive probe framework for dubbing, voice-design, streaming ASR/dictation, engine matrices, i18n parity, DB migration checks, and loopback-security; new judges for engines, dubbing/export, subtitles, dictation handshakes, and locale coverage.
  • Documentation

    • README: added "Feature coverage (specs)" section and noted an i18n missing-key surfaced as an advisory.
  • Tests

    • Numerous probe tests added (E2E and synthetic) exercising the new probes and judges.

…ation, dictation, design, coverage-critic

Broadens the probe harness from one happy-path spec per layer to whole-app
feature coverage (web, backend, dictation, clone, design), keeping the
Actor/Judge split and offline-by-default + enable-on-demand for heavy paths.

New specs + judges (one subprocess boot shared across backend-touching specs):
- dubbing (L4): segment duration-ratio, SRT/VTT well-formed, export-archive
  contents, output language-ID (advisory)
- i18n: locale files valid JSON (gate); orphan-keys + coverage (advisory).
  NOTE: surfaced a real bug — all 20 non-en locales carry gallery.cat_*/
  bootstrap.lines keys absent from the en reference (reported, not gated).
- engine matrix: active engine available + every unavailable engine explains
  why (11 TTS / 7 ASR backends via /engines/*)
- loopback security: system routes reject non-loopback origins (403)
- DB migration: alembic UPGRADE on the seeded omnivoice_data fixture
- Coverage Critic: every declared layer still has a spec (gate) + API inventory
- dictation: streaming-ASR WebSocket /ws/transcribe registered + handshake
- voice design: reuses the audio-correctness ladder
- real ASR round-trip: enable-on-demand (PROBE_E2E=1)

Enriched _boot_runner.py to capture engines/asr/loopback/openapi/ws in ONE
isolated boot (conftest boot_capture session fixture); added env.seeded_data_dir.

13 specs total. probe suite 74 passed / 5 skipped; full repo 687 passed, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: eb258f15-3767-4ed6-baff-142deb5d0533

📥 Commits

Reviewing files that changed from the base of the PR and between 7a8993a and 0b6d823.

📒 Files selected for processing (2)
  • tests/probe/judges/dubbing.py
  • tests/probe/test_probe_migration.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/probe/test_probe_migration.py
  • tests/probe/judges/dubbing.py

📝 Walkthrough

Walkthrough

This PR expands the probe testing framework with comprehensive backend validation. It enhances the boot runner to capture engine matrices, loopback rejection, websocket routes, and OpenAPI paths; adds judge modules for dubbing, i18n, engine, and coverage validation; introduces eight new probe specifications; and wires test implementations for each probe domain.

Changes

Probe Test Suite Expansion

Layer / File(s) Summary
Boot infrastructure and session fixtures
tests/probe/_boot_runner.py, tests/probe/conftest.py, tests/probe/env.py, tests/probe/README.md
Expands out-of-process boot runner to capture engine matrices, loopback rejection status, websocket routes, and OpenAPI paths. Adds session-scoped boot_capture fixture and seeded_data_dir context manager for test reuse. Documents probe spec coverage in README.
Dubbing pipeline validation
tests/probe/judges/dubbing.py, tests/probe/specs/dub_export.probe.yaml, tests/probe/test_probe_dubbing.py
Judges validate segment duration ratios, SRT/VTT subtitle structure, archive entry patterns, and output language via pluggable detector protocol. Tests cover ratio bounds, subtitle well-formedness, archive matching, and language detection with/without detector.
Internationalization localization checks
tests/probe/judges/i18n.py, tests/probe/specs/i18n_parity.probe.yaml, tests/probe/test_probe_i18n.py
Judges enforce locale JSON validity, detect orphan keys across non-reference locales, and measure translation coverage. Tests validate JSON parsing, orphan-key detection, and coverage measurement against frontend locale files.
Engine matrix and coverage analysis judges
tests/probe/judges/engine.py, tests/probe/judges/coverage.py, tests/probe/spec.py
Engine judges validate backend presence, active engine availability, and unavailability explanations. Coverage judges scan probe specs, derive API route prefixes, and enforce required layer coverage. Registry wiring extends JUDGE_REGISTRY with new judge keys.
Engine matrix and coverage probe tests
tests/probe/specs/engines.probe.yaml, tests/probe/specs/coverage_critic.probe.yaml, tests/probe/test_probe_engines.py, tests/probe/test_probe_coverage.py
Engine probe tests validate TTS/ASR registries and default engine availability. Coverage probe asserts required spec layers are present and validates OpenAPI surface meets thresholds.
Security and loopback rejection validation
tests/probe/specs/security.probe.yaml, tests/probe/test_probe_security.py
Security probe validates non-loopback origins receive HTTP 403 from /system/info. Test verifies boot-captured loopback rejection status is 403.
Dictation WebSocket validation
tests/probe/judges/dictation.py, tests/probe/specs/dictation.probe.yaml, tests/probe/test_probe_dictation.py
Dictation judges validate WebSocket endpoint registration and handshake success. Spec and tests verify /ws/transcribe contract (endpoint present and connection succeeds).
Voice design and audio output validation
tests/probe/specs/voice_design.probe.yaml, tests/probe/test_probe_design.py
Voice design probe exercises POST /generate with fixed parameters, captures audio, and validates decoding, numerical validity, duration/silence, and ASR accuracy. Test includes fixture for deterministic synthetic audio.
Database migration and data integrity
tests/probe/specs/migration.probe.yaml, tests/probe/test_probe_migration.py
Migration probe boots from seeded omnivoice_data fixture and validates health endpoint. Test confirms database integrity after upgrade on pre-existing data.
ASR end-to-end and transcriber wiring
tests/probe/test_probe_asr_e2e.py
ASR tests validate FasterWhisper transcriber API conformance and lazy model loading, plus optional real round-trip validation with word error rate thresholds.

Sequence Diagram(s)

sequenceDiagram
  participant Tester as Probe Test Runner
  participant Boot as _boot_runner.py
  participant Env as seeded/fresh Env
  participant Client as TestClient
  participant App as FastAPI_app
  Tester->>Boot: invoke capture_first_run with data-dir
  Boot->>Env: prepare data-dir (fresh or seeded)
  Boot->>Client: start app under TestClient
  Client->>App: GET /engines/tts, GET /engines/asr
  Client->>App: GET /system/info (non-lifespan loopback check)
  Client->>App: open /ws/transcribe websocket
  Boot->>Boot: snapshot DB files pre/post and collect app.openapi()/ws routes
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately and concisely summarizes the main change: expanding test probe coverage to include multiple new feature areas (dubbing, i18n, engines, security, migration, dictation, design).
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering what was changed, which new specs were added, the design rationale, verification results, and even a real bug discovered. It exceeds template requirements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch debpalash/probe-coverage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR expands the probe harness from a per-layer skeleton into whole-app feature coverage: dubbing timing/export, i18n parity, engine matrices (11 TTS + 7 ASR backends), loopback security, DB migration, streaming-ASR dictation, and voice design. All new specs share a single subprocess boot via the session-scoped boot_capture fixture, and heavy paths (real ASR, browser, Docker) skip cleanly unless opted in.

  • Three bugs flagged by the previous review round were all resolved in commit 7a8993a: the open() file-handle leak in scan_specs, the weak advisory-only assertion in test_probe_i18n, and the misleading "(blocking)" docstring on locale_no_orphan_keys.
  • The dictation layer introduced by this PR is not included in the required drift-gate list inside coverage_critic.probe.yaml, so future deletion of dictation.probe.yaml would go undetected by the coverage critic.
  • The i18n orphan-key check correctly surfaces the real gallery.cat_* / bootstrap.lines gap in en.json as an advisory finding without blocking CI.

Confidence Score: 5/5

Safe to merge — all new judges are deterministic and offline-capable; the shared boot fixture is correctly scoped and doesn't leak temp directories or lifespan state to other tests.

The three bugs identified in the prior review round are confirmed fixed. The only remaining finding is that the new dictation layer was added to the spec set but not registered in the coverage-critic's required list, meaning its spec isn't guarded by the drift gate going forward. That gap doesn't affect the current test run and can be addressed with a one-line YAML change.

tests/probe/specs/coverage_critic.probe.yaml — the required layer list should include dictation to protect the new spec from future drift.

Important Files Changed

Filename Overview
tests/probe/specs/coverage_critic.probe.yaml Drift-gate spec for the probe suite. The newly added dictation layer is not included in the required list, so the gate won't catch future deletion of dictation.probe.yaml.
tests/probe/test_probe_migration.py Migration probe — run_judges is correctly placed inside the with seeded_data_dir() block so the path_exists check sees the DB before teardown.
tests/probe/judges/dubbing.py New dubbing judges: duration-ratio, SRT/VTT well-formedness, archive membership, and pluggable language-ID — all deterministic and well-guarded against zero/missing inputs.
tests/probe/judges/i18n.py i18n judges covering valid-JSON gate, orphan-key detection (advisory), and coverage reporting. Docstring now correctly labels locale_no_orphan_keys as advisory.
tests/probe/_boot_runner.py Shared-boot subprocess extended to capture engine matrices, loopback-reject status, dictation WS handshake, OpenAPI inventory, and WS route table in a single app boot.
tests/probe/conftest.py Session-scoped boot_capture fixture added; shares a single subprocess boot across engine, security, coverage, and dictation specs.

Sequence Diagram

sequenceDiagram
    participant pytest
    participant boot_capture (session fixture)
    participant _boot_runner subprocess
    participant FastAPI app

    pytest->>boot_capture (session fixture): request (once per session)
    boot_capture (session fixture)->>_boot_runner subprocess: spawn (fresh data dir)
    _boot_runner subprocess->>FastAPI app: TestClient enter (lifespan start)
    _boot_runner subprocess->>FastAPI app: GET /health, /system/info, /model/status
    _boot_runner subprocess->>FastAPI app: GET /engines/tts, /engines/asr
    _boot_runner subprocess->>FastAPI app: GET /system/info (non-loopback client → 403)
    _boot_runner subprocess->>FastAPI app: WS /ws/transcribe (handshake only)
    _boot_runner subprocess->>FastAPI app: app.openapi() + app.routes
    _boot_runner subprocess->>boot_capture (session fixture): write JSON capture
    boot_capture (session fixture)->>pytest: return capture dict (shared)

    pytest->>test_probe_engines: run
    pytest->>test_probe_security: run
    pytest->>test_probe_dictation: run
    pytest->>test_probe_coverage: run
    Note over test_probe_engines,test_probe_coverage: all read from shared boot_capture dict

    pytest->>test_probe_migration: run (own seeded_data_dir + subprocess)
    pytest->>test_probe_i18n: run (reads real locale files directly)
    pytest->>test_probe_dubbing: run (offline synthetic data)
    pytest->>test_probe_design: run (offline synthetic audio)
Loading

Fix All in Claude Code

Reviews (3): Last reviewed commit: "fix(probe): ASCII x in dubbing detail (r..." | Re-trigger Greptile

Comment thread tests/probe/judges/coverage.py
Comment thread tests/probe/test_probe_i18n.py Outdated
Comment thread tests/probe/judges/i18n.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/probe/_boot_runner.py`:
- Around line 78-80: Replace storing the raw exception string in
ctx["ws_transcribe_error"] (inside the except Exception as exc block) with a
sanitized value: set ctx["ws_transcribe_connected"]=False as now but assign
ctx["ws_transcribe_error"] to a non-sensitive identifier such as the exception
type (type(exc).__name__) or a short normalized reason (e.g.,
"WebSocketHandshakeError" or exc.__class__.__name__ + optional safe suffix),
truncated to 200 chars if needed; ensure no user paths or secret-like substrings
from str(exc) are persisted.
- Around line 92-99: The current logic sets ctx["db_created"] =
bool(ctx["db_path"]) which flags seeded runs as "created"; instead, detect
pre-boot DB state and post-boot DB state separately: search data_dir before the
boot and store that result (e.g., ctx["pre_db_path"] or ctx["pre_existing_db"]),
then run the boot and re-scan into ctx["post_db_path"] (or ctx["db_path"]) and
set ctx["db_created"] = True only when pre-boot had no DB and post-boot does;
update any code that reads ctx["db_created"] to use this fresh-dir-only
semantics and keep the same keys (ctx["db_path"] can represent the post-boot
path while adding the pre-boot key).

In `@tests/probe/judges/coverage.py`:
- Around line 20-23: The loop that loads probe specs currently calls
yaml.safe_load(open(path, encoding="utf-8")) which leaves file descriptors open;
change it to open each spec with a context manager (with open(path,
encoding="utf-8") as fh:) and pass fh to yaml.safe_load so the file is closed
immediately after parsing, then keep the existing logic that sets doc =
yaml.safe_load(...) or {} and appends {"feature": doc.get("feature"), "layer":
doc.get("layer"), "file": os.path.basename(path")} to out.

In `@tests/probe/judges/dubbing.py`:
- Around line 33-46: The loop currently skips segments with non-positive source
duration so an empty or fully-skipped input returns passed=True; change the
logic to track how many segments were actually checked (e.g., add a checked
counter incremented when src > 0), and if checked == 0 set passed=False and
return a JudgeResult indicating zero valid segments (use JudgeResult name
"segments_duration_ratio", passed=False, measured=0 and a clear detail message
like "no valid source durations" or similar); keep the existing outlier
detection using _seg_durations and outliers but set measured to len(outliers)
when checked>0 and compute ok as (checked>0 and not outliers) so malformed
captures won't produce false greens.

In `@tests/probe/judges/i18n.py`:
- Around line 31-49: The current judge passes when no locale files exist; update
locale_valid_json to fail when len(locales) == 0 or when there are parse errors
by computing passed = bool(locales) and passed = passed and not bad (i.e.,
passed only if at least one locale and no bad files), set measured =
len(locales), and update the detail string to reflect three cases: success ("N
locale files parse"), parse errors ("invalid JSON: [...]"), and empty directory
("no locale files found"); modify functions _load_locales and locale_valid_json
references accordingly and add a small regression test that creates an empty
locales directory and asserts that locale_valid_json returns passed=False and
measured=0.

In `@tests/probe/specs/coverage_critic.probe.yaml`:
- Around line 11-15: The gate "layers_have_specs" currently hard-codes the
required layers in the required array (missing "meta"), so update it to derive
required layers from a single source of truth instead of a static list: compute
the required set from $.specs (e.g. collect each spec's layer value) and use
that derived list for layers_have_specs.required (or at minimum add "meta" to
the static array to match the suite), ensuring the coverage_report advisory
continues to reference $.openapi_paths and $.specs unchanged.

In `@tests/probe/specs/dub_export.probe.yaml`:
- Around line 13-17: The probe is missing capture of the dub_audio context key
before the advisory reads it; add a capture entry mapping dub_audio: $.dub_audio
in the capture block (the same place that currently lists segments, srt, vtt,
zip_names) so the advisory and the language-ID actor step have a real source
value instead of relying on synthetic injection; apply the same change to the
other capture block referenced around lines 27-29 to ensure both advisory usages
get dub_audio captured.

In `@tests/probe/specs/migration.probe.yaml`:
- Around line 9-13: The probe currently only checks boot health via judge.checks
entries status_eq ($.health_status) and json_field_eq on $.health_body.status,
which doesn't verify data preservation; add at least one seeded-data invariant
check (e.g., assert a known seeded row or count and/or schema-version) by adding
a new judge.checks entry such as json_field_eq or numeric comparison against a
seeded endpoint field (for example assert $.seeded_user.id == <expected> or
$.users_count == <expected> or schema_version == <expected>) so the probe
verifies existing user data or schema version in addition to health.

In `@tests/probe/test_probe_asr_e2e.py`:
- Around line 29-33: The test currently gates on PROBE_ASR_SAMPLE using
os.path.exists, which allows directories and causes downstream failures; change
the check to require a regular file by using os.path.isfile on the sample
variable and call pytest.skip if not a file so the ceil of
T.FasterWhisperTranscriber(transcribe...) only runs with a valid file path.

In `@tests/probe/test_probe_design.py`:
- Around line 28-36: The test test_voice_design_verdict currently only calls
probe_spec.run_judges with a prebuilt context so it never exercises the
actor/capture path for the new POST /generate contract; update the test to
either (A) run the spec through the actor step instead of only run_judges
(invoke the probe runner function that executes actors — e.g.,
probe_spec.run_actors or the equivalent API in the test harness) using the
loaded spec from probe_spec.load_spec and the designed_audio so the actual
actor/capture flow is executed, or (B) add explicit assertions after
probe_spec.load_spec that the spec's actions include a request targeting POST
/generate and that the capture path includes $.output_path (inspect the loaded
spec object fields to assert the method/path and capture selector); modify
test_voice_design_verdict to choose one of these fixes so the new /generate
contract and $.output_path capture are validated.

In `@tests/probe/test_probe_engines.py`:
- Around line 20-22: The test is pinned to the "whisperx" engine; change it to
assert the captured ASR active engine instead. Replace the second assertion that
calls E.engine_available(boot_capture["engines_asr"], "whisperx") with
E.engine_available(boot_capture["engines_asr"],
boot_capture["engines_asr"]["active"]) (or otherwise assert that
boot_capture["engines_asr"]["active"] is available) so the probe checks the
recorded active engine rather than a hard-coded id.

In `@tests/probe/test_probe_migration.py`:
- Around line 19-24: The test currently uses db_ok = context["db_created"] (from
tests/probe/_boot_runner.py) which only checks for a DB file and can pass even
if migrations dropped or mutated seeded data; instead, open the database
referenced by context["db_path"] in test_probe_migration.py and assert a
concrete seeded-data invariant (for example: a known table exists, a specific
migrated column is present, or a specific row/value from
tests/fixtures/omnivoice_data is still present). Replace the final assert db_ok
with a targeted assertion that queries the DB for the expected table/row/value
or schema change so the test fails if migrations altered the seeded data
unexpectedly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1ed63786-68f9-44e6-ad42-2df302409fd3

📥 Commits

Reviewing files that changed from the base of the PR and between 24a00be and 88e75e7.

📒 Files selected for processing (27)
  • tests/probe/README.md
  • tests/probe/_boot_runner.py
  • tests/probe/conftest.py
  • tests/probe/env.py
  • tests/probe/judges/coverage.py
  • tests/probe/judges/dictation.py
  • tests/probe/judges/dubbing.py
  • tests/probe/judges/engine.py
  • tests/probe/judges/i18n.py
  • tests/probe/spec.py
  • tests/probe/specs/coverage_critic.probe.yaml
  • tests/probe/specs/dictation.probe.yaml
  • tests/probe/specs/dub_export.probe.yaml
  • tests/probe/specs/engines.probe.yaml
  • tests/probe/specs/i18n_parity.probe.yaml
  • tests/probe/specs/migration.probe.yaml
  • tests/probe/specs/security.probe.yaml
  • tests/probe/specs/voice_design.probe.yaml
  • tests/probe/test_probe_asr_e2e.py
  • tests/probe/test_probe_coverage.py
  • tests/probe/test_probe_design.py
  • tests/probe/test_probe_dictation.py
  • tests/probe/test_probe_dubbing.py
  • tests/probe/test_probe_engines.py
  • tests/probe/test_probe_i18n.py
  • tests/probe/test_probe_migration.py
  • tests/probe/test_probe_security.py

Comment thread tests/probe/_boot_runner.py Outdated
Comment thread tests/probe/_boot_runner.py Outdated
Comment thread tests/probe/judges/coverage.py
Comment thread tests/probe/judges/dubbing.py
Comment thread tests/probe/judges/i18n.py
Comment thread tests/probe/specs/migration.probe.yaml
Comment thread tests/probe/test_probe_asr_e2e.py
Comment thread tests/probe/test_probe_design.py
Comment thread tests/probe/test_probe_engines.py
Comment thread tests/probe/test_probe_migration.py Outdated
- coverage.py:22 — use `with open(...)` context to close spec files after
  yaml.safe_load (file handle leak)
- _boot_runner.py:80 — store only `type(exc).__name__` for WS errors; drop
  raw str(exc) that could leak home paths / secrets into capture JSON
- _boot_runner.py:99 — snapshot DB files before boot; set db_created=True
  only when boot creates NEW files (not when fixture already had one)
- dubbing.py:46 — FAIL segments_duration_ratio when validated==0 (guards
  against empty/corrupt segment list passing vacuously)
- i18n.py:49 — FAIL locale_valid_json when locales_dir is empty/missing
- i18n.py:7 — fix docstring: locale_no_orphan_keys is advisory, not blocking
- test_probe_i18n.py:59 — assert r.passed is False, not just r.advisory
- coverage_critic.probe.yaml:15 — add "meta" to required layers list
- dub_export.probe.yaml:17 — capture dub_audio in steps before advisory reads it
- migration.probe.yaml:13 — add path_exists(db_path) data-integrity check
- test_probe_asr_e2e.py:33 — os.path.exists → os.path.isfile for PROBE_ASR_SAMPLE
- test_probe_migration.py:24 — assert context["db_path"] (presence) not
  db_created (new creation), aligning with the boot_runner fix

Two findings intentionally skipped with reasons (see review thread replies):
  test_probe_design.py:36 — offline pattern is intentional; actor step is
    bypassed by design throughout the probe suite for CI compatibility
  test_probe_engines.py:22 — whisperx pin is intentional; it verifies the
    shipped default ASR engine is available out-of-the-box

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/probe/test_probe_migration.py (1)

22-25: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The “data integrity” check still never inspects migrated data.

Line 25 only asserts that db_path is a non-empty string. The spec already runs path_exists on the same value, so this adds no protection, and the test still won't catch a migration that recreates or mutates the seeded DB while leaving a file behind. Please assert a concrete invariant from the copied omnivoice_data database instead of path presence alone.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/probe/test_probe_migration.py` around lines 22 - 25, The test currently
only asserts context["db_path"] exists but doesn't validate migrated data; open
the SQLite DB at context["db_path"] (the same variable used earlier) and run a
concrete query against the seeded omnivoice_data (e.g., verify a known table
exists and contains an expected row or count) and assert the result, so the test
fails if the migration recreated or mutated the seeded DB; use the test's
sqlite3 connection or existing DB helper to SELECT a known record/rowcount from
the omnivoice_data table and assert it matches the expected value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/probe/judges/dubbing.py`:
- Line 59: Replace the Unicode multiplication sign in the f-string that builds
the detail message so it uses ASCII "x"; specifically update the f-string that
currently reads detail=f"all {validated} segment(s) within [{min_ratio},
{max_ratio}]×" in tests/probe/judges/dubbing.py to use "x" instead of "×" (the
variables validated, min_ratio, and max_ratio remain unchanged).

---

Duplicate comments:
In `@tests/probe/test_probe_migration.py`:
- Around line 22-25: The test currently only asserts context["db_path"] exists
but doesn't validate migrated data; open the SQLite DB at context["db_path"]
(the same variable used earlier) and run a concrete query against the seeded
omnivoice_data (e.g., verify a known table exists and contains an expected row
or count) and assert the result, so the test fails if the migration recreated or
mutated the seeded DB; use the test's sqlite3 connection or existing DB helper
to SELECT a known record/rowcount from the omnivoice_data table and assert it
matches the expected value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 4f039a49-2902-4b64-a134-b5d039fd1316

📥 Commits

Reviewing files that changed from the base of the PR and between 88e75e7 and 7a8993a.

📒 Files selected for processing (10)
  • tests/probe/_boot_runner.py
  • tests/probe/judges/coverage.py
  • tests/probe/judges/dubbing.py
  • tests/probe/judges/i18n.py
  • tests/probe/specs/coverage_critic.probe.yaml
  • tests/probe/specs/dub_export.probe.yaml
  • tests/probe/specs/migration.probe.yaml
  • tests/probe/test_probe_asr_e2e.py
  • tests/probe/test_probe_i18n.py
  • tests/probe/test_probe_migration.py
✅ Files skipped from review due to trivial changes (2)
  • tests/probe/specs/coverage_critic.probe.yaml
  • tests/probe/specs/migration.probe.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/probe/judges/coverage.py
  • tests/probe/specs/dub_export.probe.yaml
  • tests/probe/test_probe_i18n.py
  • tests/probe/test_probe_asr_e2e.py

Comment thread tests/probe/judges/dubbing.py Outdated
Comment thread tests/probe/test_probe_migration.py
…nside seeded dir

Two regressions from the hardening pass:
- dubbing.py: replace non-ASCII '×' with 'x' (Ruff ambiguous-unicode → Tests lint fail)
- test_probe_migration: move run_judges inside the seeded_data_dir with-block so the
  new path_exists check sees the DB before the temp dir is torn down (was always failing)
@debpalash debpalash merged commit ef98aae into main Jun 2, 2026
15 checks passed
@debpalash debpalash deleted the debpalash/probe-coverage branch June 12, 2026 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant