feat(#666): per-role voice modality negotiation — declaration-first, two-phase validation, OTEL stamps#670
Conversation
…tion-first + litellm advisory Implements AC4a, AC4b scenarios from specs/voice-modality-negotiation.feature: - resolve_modality(): declaration wins, litellm advisory warns on mismatch, both directions - ModalityNegotiationError: shared exception type for setup/connect validation (AC6, AC7) - ModalityTier: audio-in, stt-bridge, text Foundational module; all other bundles depend on this. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Audio-capable models (advisory audio-in) now receive raw audio parts; text-only models strip audio exactly as before. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…olver — AC3a, AC3b, AC3c, AC9 gpt-audio-mini now correctly resolves to audio-in (was missed by old list). gpt-4o now correctly takes the transcript path (litellm advisory=False). include_audio explicit override still wins (AC3c preserved). transcribe_segments unchanged for text-modality judges (AC9 regression passes). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Static impossible combo (audio-in × mulaw/8000) raises ModalityNegotiationError at setup. Live transport failure at first-connect re-raises as ModalityNegotiationError with requirement token. interrupt(after_words=N) capability gate moved to first-connect (before first turn). dtmf gate unchanged (AC8b regression). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
UserSimulatorAgent(modality="audio-in") and JudgeAgent(modality="text") now accepted. Declaration reaches resolve_modality() as the explicit declaration arg. Documented in docstrings (user-facing). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…utes — AC5, AC5b scenario.modality.<role>.resolved and scenario.modality.<role>.tier stamped on root span at the start of each turn. Populated in run() from resolve_modality() for UserSimulatorAgent (simulator) and JudgeAgent (judge). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…y field added — AC10b Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughIntroduces per-role voice modality negotiation by adding a new ChangesVoice Modality Negotiation per Role
Sequence Diagram(s)sequenceDiagram
rect rgba(70, 130, 180, 0.5)
Note over ScenarioExecutor: run() — modality resolution
end
ScenarioExecutor->>resolve_modality: declaration=simulator.modality, model_id
resolve_modality-->>ScenarioExecutor: (ModalityTier, warnings)
ScenarioExecutor->>resolve_modality: declaration=judge.modality, model_id
resolve_modality-->>ScenarioExecutor: (ModalityTier, warnings)
ScenarioExecutor->>ScenarioExecutor: store _modality_resolutions
rect rgba(60, 179, 113, 0.5)
Note over ScenarioExecutor: _voice_connect_all() — three-phase validation
end
ScenarioExecutor->>validate_modality_setup: tier + adapter.capabilities.input_formats
validate_modality_setup-->>ScenarioExecutor: OK or ModalityNegotiationError
ScenarioExecutor->>VoiceAgentAdapter: connect()
VoiceAgentAdapter-->>ScenarioExecutor: OK or PendingTransportError → ModalityNegotiationError
ScenarioExecutor->>ScenarioExecutor: scan script steps for _requires_streaming_transcripts
ScenarioExecutor->>VoiceAgentAdapter: check capabilities.streaming_transcripts
VoiceAgentAdapter-->>ScenarioExecutor: missing → UnsupportedCapabilityError
rect rgba(178, 102, 255, 0.5)
Note over ScenarioExecutor: _new_turn() — OTEL stamping
end
ScenarioExecutor->>LangwatchRootSpan: set_attributes(scenario.modality.<role>.tier)
Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Capture and emit warnings from resolve_modality() calls at three call sites (simulator setup, judge setup, voice agent setup) instead of silently discarding them via underscore binding. The resolver contract requires all warnings be logged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ing contract failure The "Capability matrix is rendered into adapter docs" scenario in voice-agents.feature was missing a required @unit/@integration/@e2e tag, causing test_feature_file_contract tests to fail on main and on this PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Audio content dict literals in AgentInput tests are valid at runtime but pyright can't narrow them to ChatCompletionMessageParam. Suppressed with # type: ignore[arg-type] — same pattern already used in this file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
python/scenario/voice/modality_resolver.py (1)
62-65: 💤 Low valueConsider using
raise ... from Nonefor cleaner exception context.When converting
ValueErrortoModalityNegotiationError, the originalValueErrorcontext isn't useful to users—they only need to know the declaration is invalid and see the valid values. Suppressing the chain makes tracebacks cleaner.♻️ Proposed fix
try: declared_tier = ModalityTier(declaration) except ValueError: - raise ModalityNegotiationError( + raise ModalityNegotiationError( f"Unknown modality declaration {declaration!r}; valid values: " + ", ".join(t.value for t in ModalityTier) - ) + ) from None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/scenario/voice/modality_resolver.py` around lines 62 - 65, The ModalityNegotiationError being raised in the modality validation block does not suppress the exception context chain, which clutters the traceback with unnecessary information about the original ValueError. Add `from None` to the raise statement for ModalityNegotiationError to suppress the exception context chain and provide a cleaner traceback to users that focuses only on the helpful information about the invalid declaration and valid values.python/scenario/scenario_executor.py (1)
462-464: ⚡ Quick winRedundant OTEL attribute stamping.
Lines 463–464 both set
scenario.modality.{role}.resolvedandscenario.modality.{role}.tierto the sametier_value. This duplication increases trace size without adding information.Consider setting only one attribute per role, or clarify in a comment why both are needed.
♻️ Suggested simplification
attrs = { "langwatch.origin": "simulation", "scenario.run_id": self._scenario_run_id, } for role, tier_value in getattr(self, '_modality_resolutions', {}).items(): - attrs[f"scenario.modality.{role}.resolved"] = tier_value attrs[f"scenario.modality.{role}.tier"] = tier_value self._trace.root_span.set_attributes(attrs)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/scenario/scenario_executor.py` around lines 462 - 464, The OTEL attribute setting in the loop over self._modality_resolutions is redundant: both lines 463 and 464 assign the same tier_value to different attribute keys for the same role. Remove the duplicate line that sets the scenario.modality.{role}.tier attribute (keeping only scenario.modality.{role}.resolved), or if both attributes are actually needed for different purposes, add a clarifying comment explaining why the duplication is intentional. The loop structure iterating over the modality resolutions should be preserved, but eliminate the unnecessary second attrs assignment per role iteration.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/scenario/scenario_executor.py`:
- Around line 772-776: The exception handler in the connect() error path is
catching all exceptions with `except Exception as e:` and wrapping them as
ModalityNegotiationError, which masks the true nature of non-modality failures
like network timeouts or authentication errors. According to AC7, only
PendingTransportError should be caught and wrapped as ModalityNegotiationError.
Change the except clause to specifically catch only PendingTransportError
instead of the broad Exception class, allowing other exceptions to propagate
unchanged so their true nature is visible to the caller.
---
Nitpick comments:
In `@python/scenario/scenario_executor.py`:
- Around line 462-464: The OTEL attribute setting in the loop over
self._modality_resolutions is redundant: both lines 463 and 464 assign the same
tier_value to different attribute keys for the same role. Remove the duplicate
line that sets the scenario.modality.{role}.tier attribute (keeping only
scenario.modality.{role}.resolved), or if both attributes are actually needed
for different purposes, add a clarifying comment explaining why the duplication
is intentional. The loop structure iterating over the modality resolutions
should be preserved, but eliminate the unnecessary second attrs assignment per
role iteration.
In `@python/scenario/voice/modality_resolver.py`:
- Around line 62-65: The ModalityNegotiationError being raised in the modality
validation block does not suppress the exception context chain, which clutters
the traceback with unnecessary information about the original ValueError. Add
`from None` to the raise statement for ModalityNegotiationError to suppress the
exception context chain and provide a cleaner traceback to users that focuses
only on the helpful information about the invalid declaration and valid values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d2ca81ac-66b3-42dc-be22-29af72edfea2
📒 Files selected for processing (16)
python/scenario/judge_agent.pypython/scenario/scenario_executor.pypython/scenario/user_simulator_agent.pypython/scenario/voice/__init__.pypython/scenario/voice/modality_resolver.pypython/scenario/voice/script_steps.pypython/tests/test_capability_matrix.pypython/tests/test_judge_agent.pypython/tests/test_public_modality_api.pypython/tests/test_user_simulator_agent.pypython/tests/voice/test_judge_audio_transcribe.pypython/tests/voice/test_judge_voice.pypython/tests/voice/test_modality_resolver.pypython/tests/voice/test_modality_stamps.pypython/tests/voice/test_modality_validation.pyspecs/voice-agents.feature
Broad `except Exception` was masking network timeouts, auth errors, and bugs as ModalityNegotiationError. Only PendingTransportError signals a live-transport modality mismatch per AC7. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review verdict: READYReviewed at: Scope: 1 file, 1 line Python — Fixes applied this review pass
Non-blocking (Decide / New Issue)
|
…engthen AC5b test
- Drop `scenario.modality.{role}.resolved` stamp (was identical to `.tier`);
keep only `scenario.modality.{role}.tier` which is the canonical key.
- Expand `test_ac5b_stt_bridge_tier_stamped_correctly` to exercise the full
declaration → `resolve_modality()` → span-stamp path instead of bypassing
the resolver via direct `_modality_resolutions` injection.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/tests/voice/test_modality_stamps.py (1)
131-136:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winBaseline test currently doesn’t exercise the “attribute missing” path it describes.
The docstring/comment says
_modality_resolutionsis intentionally unset, but_new_turn_with_resolutions(executor, {})sets it to an empty dict on Line 63. This weakens coverage of the realgetattr(..., {})fallback path in_new_turn().Suggested minimal test fix
-def _new_turn_with_resolutions( +def _new_turn_with_resolutions( executor: ScenarioExecutor, - resolutions: dict, + resolutions: dict | None, ) -> dict: @@ - executor._modality_resolutions = resolutions + if resolutions is not None: + executor._modality_resolutions = resolutions executor._new_turn() @@ - captured = _new_turn_with_resolutions(executor, {}) + captured = _new_turn_with_resolutions(executor, None)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/tests/voice/test_modality_stamps.py` around lines 131 - 136, The test is not actually exercising the attribute-missing path because the helper function _new_turn_with_resolutions sets _modality_resolutions to an empty dict on line 63, contradicting the test's intent. To test the true fallback behavior, modify the test to call the actual _new_turn function directly on the executor (without using _new_turn_with_resolutions) so that the getattr fallback path in the production code is genuinely invoked when _modality_resolutions is not set.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/tests/voice/test_modality_stamps.py`:
- Line 118: The unpacking assignment in the resolve_modality function call
unpacks a warnings variable that is never used, which triggers Ruff linting
warnings. Rename the unused warnings binding to either _warnings or _
(underscore) to explicitly indicate that this variable is intentionally not
used. This will satisfy the linter and make the intent clear to future readers.
---
Outside diff comments:
In `@python/tests/voice/test_modality_stamps.py`:
- Around line 131-136: The test is not actually exercising the attribute-missing
path because the helper function _new_turn_with_resolutions sets
_modality_resolutions to an empty dict on line 63, contradicting the test's
intent. To test the true fallback behavior, modify the test to call the actual
_new_turn function directly on the executor (without using
_new_turn_with_resolutions) so that the getattr fallback path in the production
code is genuinely invoked when _modality_resolutions is not set.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d4f24884-951a-4899-be0e-2967773c958f
📒 Files selected for processing (2)
python/scenario/scenario_executor.pypython/tests/voice/test_modality_stamps.py
💤 Files with no reviewable changes (1)
- python/scenario/scenario_executor.py
|
No description provided. |
…nts spy tests (AC5/AC5b/AC9) Three prove-it gaps closed: - AC5: stamp scenario.modality.<role>.resolved alongside .tier in _new_turn() — feature spec requires both exact keys; previously only .tier was set. - AC5b: add spy test asserting transcribe_segments is invoked with the judge's VoiceRecording when modality='stt-bridge', not just inferred from tier stamp. - AC9: add spy test asserting transcribe_segments still runs for a text-modality gpt-4o judge after the substring-list to resolver change; confirms regression path. All 59 AC-relevant tests green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No description provided. |
…ve duplicate imports messages=[audio_message] failed pyright because dict[str, Unknown] is not assignable to ChatCompletionMessageParam; cast(Any, ...) matches the existing pattern at line 263. Also removes the unused duplicate VoiceRecording import in both spy test functions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Automated low-risk assessment This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.
This PR requires a manual review before merging. |
Why
Today the voice user simulator always strips audio to text (STT bridge) with no capability check, and the judge guesses audio support from a hardcoded model-name substring list that misses
gpt-audio-mini. Silent degradation is the root cause of #664. This PR ships the observable spine: per-role modality declaration with declaration-first resolution, loud two-phase validation, and OTEL span stamps so a degraded run is always visible.Closes #666
What changed
modality_resolver.py— new core module — declaration beats litellm advisory; mismatch emits aWARNINGrather than silently overriding; unknown declarations raiseModalityNegotiationError.UserSimulatorAgentandJudgeAgentwired to resolver —modality="audio-in" | "text" | "stt-bridge"accepted as public param (AC0); effective tier resolved once per role at scenario init; simulator strips or keeps audio parts accordingly; judge delegates audio-capability detection to resolver instead of hardcoded substring list.audio-in+ non-streaming adapter) raise atScenarioExecutorinit; live-transport mismatches raise at firstconnect(), before the first turn (AC6–AC8b).scenario.modality.<role>.resolvedandscenario.modality.<role>.tierwritten per-role into the active span at_new_turn()(AC5, AC5b).How it works
Resolution rules:
WARNING(AC4a, AC4b).Test plan
Key new test files:
tests/voice/test_modality_resolver.py— AC4a, AC4btests/voice/test_modality_validation.py— AC6–AC8btests/voice/test_modality_stamps.py— AC5, AC5b (tier + resolved keys)tests/test_judge_agent.py::test_ac9_transcribe_segments_invoked_for_text_judge— AC9 spytests/test_judge_agent.py::test_ac5b_stt_bridge_judge_invokes_transcribe_segments— AC5b spyRegression:
test_gemini_is_detected_as_audio_capableintentionally removed — the resolver now delegates to litellm advisory rather than an internal substring list; the equivalent behavior is covered byTestNoDeclaration.test_no_declaration_advisory_true_returns_audio_in.How I can prove I was successful
Show-it-working demo (actual run output — 2026-06-15)
Script:
uv run python /tmp/demo_voice_modality.pyin the worktree venv.Exercises: per-role resolution across 6 combos, static impossible-combo exception, OTEL span attrs via
_new_turn().Acceptance Criteria Evidence Table
modality=public param on both agentsuser_simulator_agent.py:162,judge_agent.py:251; 4test_ac0_*tests passtest_audio_in_simulator_retains_audio_partspass; demo:gpt-audio-mini→tier='audio-in'test_text_simulator_strips_audio_with_placeholderspassgpt-audio-minijudge receives audiotest_gpt_audio_mini_judge_receives_audiopassgpt-4ojudge (no decl) → transcript pathtest_gpt4o_judge_no_declaration_takes_transcript_pathpass; demo:gpt-4oadvisory=False →tier='text'include_audio=Falsewinstest_explicit_include_audio_false_winspassaudio-inon advisory-text → declared tier + WARNINGtier='audio-in';TestAC4a(5 tests) passtexton advisory-audio → declared tier + WARNINGtier='text';TestAC4b(4 tests) passscenario.modality.<role>.resolvedAND.tierper roletest_ac5_*tests assert.resolved+.tier;scenario_executor.py:462-464transcribe_segmentsspyresolve_modality(decl='stt-bridge')→STT_BRIDGE✅; span.tier='stt-bridge'✅;test_ac5b_stt_bridge_judge_invokes_transcribe_segmentsspiestranscribe_segments(recording)✅ModalityNegotiationErrormessage contains'audio-in'+'mulaw';TestAC6StaticValidation(7 tests) passModalityNegotiationErrorbefore first turntest_ac7_live_transport_failure_raises_before_first_turn+test_ac7_error_is_modality_negotiation_error_not_pending_transportpassinterrupt(after_words=N)gate fires at connect, not step-exectest_ac8a_interrupt_after_words_raises_at_connect_not_step_executionpasstest_ac8b_dtmf_gate_unchangedpasstranscribe_segmentsruns over VoiceRecording for text judge (regression)test_ac9_transcribe_segments_invoked_for_text_judgespiestranscribe_segmentscalled withVoiceRecordingfor gpt-4o (TEXT tier) ✅;judge_agent.py:474uv run python scripts/gen_capability_matrix.py→ "No changes: capability-matrix.mdx"Verdict: 16/16 PASS — ALL PASS