Skip to content

fix(voice/#648): terminal drain on non-audio completion (EL + WebSocket)#693

Open
drewdrewthis wants to merge 4 commits into
mainfrom
fix/648-voice-audio-gated-drain
Open

fix(voice/#648): terminal drain on non-audio completion (EL + WebSocket)#693
drewdrewthis wants to merge 4 commits into
mainfrom
fix/648-voice-audio-gated-drain

Conversation

@drewdrewthis

@drewdrewthis drewdrewthis commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Why

The ElevenLabs (hosted ConvAI) and generic WebSocket adapters shared an audio-gated receive loop that returned ONLY on an audio frame, so a turn completing without audio drained to the response_timeout deadline and raised — a latent hang surfaced by /sweep during PR #647 (which fixed the same anti-pattern in the OpenAI Realtime adapter for #646).

Closes #648

What changed

  • Reused the OpenAI Realtime adapter times out on tool-only responses (no audio chunk ever arrives) #646/PR647 empty-AudioChunk terminal pattern (proven for OpenAI Realtime; same idiom as Gemini Live / Pipecat) rather than inventing a new signal: recv_audio returns an empty chunk on a non-audio terminal, so the base _drain_agent_response loop exits through its existing empty-chunk break.
  • ElevenLabs (elevenlabs.py recv_audio / elevenlabs.ts onMessage): treat client_tool_call as a tool-only terminal — this adapter has no client_tool_result path, so the hosted agent can never follow up with spoken audio. PY now also catches ConnectionClosed; the TS socket-close terminal was already handled by onSocketClose.
  • Generic WebSocket (websocket.py recv_audio): a clean server close (end of stream) returns empty instead of letting ConnectionClosed propagate — the drain only catches asyncio.TimeoutError, so an unhandled close crashed the turn.
  • Narrowest correct change per adapter; the normal audio path is untouched.

Test plan

  • python/tests/voice/test_audio_gated_drain.py (new) — cd python && PYTHONPATH=. uv run pytest tests/voice/test_audio_gated_drain.py7 passed. Covers EL client_tool_call→empty, EL socket-close→empty, WebSocket socket-close→empty (each socket-close case parametrized over both ConnectionClosedOK and ConnectionClosedError, since production catches the base ConnectionClosed), plus normal-audio no-regression for both adapters. Terminal cases give recv_audio a long budget under a short outer asyncio.wait_for ceiling, so the un-fixed adapter fails fast (hang→timeout / propagated ConnectionClosed) instead of stalling the suite.
  • javascript/src/voice/adapters/__tests__/elevenlabs.test.ts — added a client_tool_call→empty terminal test (red without the fix: receiveAudio rejects with a timeout); client_tool_call removed from the "swallows unknown events" case since it is now a terminal. cd javascript && pnpm exec vitest run src/voice/adapters/__tests__/140 passed / 1 skipped.
  • cd javascript && pnpm typecheck → clean.

Human verification

backend-only, no UI surface — this is a pure library/transport fix in the voice adapter drain logic (python/scenario/voice/adapters/{elevenlabs,websocket}.py, javascript/src/voice/adapters/elevenlabs.ts); there is no front-end, route, or visual surface to exercise, so there is no screenshot/recording to attach.

To verify by hand:

  1. Green at HEAD (both languages).
    • cd python && PYTHONPATH=. uv run pytest tests/voice/test_audio_gated_drain.py7 passed.
    • cd javascript && pnpm exec vitest run src/voice/adapters/__tests__/140 passed / 1 skipped; pnpm typecheck → clean.
  2. Red without the fix (the discriminating check). Revert only the source files and re-run the suites above:
    git checkout origin/main -- python/scenario/voice/adapters/elevenlabs.py python/scenario/voice/adapters/websocket.py javascript/src/voice/adapters/elevenlabs.ts
    
    The terminal-case tests then FAIL (Python: 3 failed, 2 passed; TS: the #648 client_tool_call test times out after ~2s), confirming the tests gate the bug rather than passing vacuously. Restore with git checkout HEAD -- <same paths>.

How I can prove I was successful

No playable artifact — this is a pure library/transport fix. AC1's gating surface is the base drain loop (_drain_agent_response / drainAgentResponse) exiting on an empty chunk, which the unit tests sit directly on (they assert the returned AudioChunk(data=b""), not a code-path trace). The proof is falsifiability:

  • Hand-verified red-without-fix → green-with-fix (both languages). Against origin/main source the terminal-case tests FAIL — Python: 3 failed, 2 passed; TS: the #648 client_tool_call test fails on a ~2s receiveAudio timed out. With the fix, all pass. (Note: CI only ever runs HEAD, which is all-green, so the discriminating red-state is not visible from CI alone — it was verified locally by reverting just the source files and re-running.)
  • A full scenario run against a live tool-only EL turn is not cheaply reproducible — the provisioned test agent never emits client_tool_call (per the issue body) — so the transport-layer unit proof on the gating contract is the correct surface.

Anything surprising?

#567 (post-interrupt re-engage on EL hosted) is a separate follow-up that also touches the EL adapter; per the plan it should be driven AFTER this lands to avoid conflicts. Local pnpm lint reports a large pre-existing import-order/eslint-disable baseline across many untouched files (identical count with and without this diff) — not introduced here.

drewdrewthis and others added 2 commits June 21, 2026 10:15
…bSocket py)

The ElevenLabs and generic WebSocket adapters shared an audio-gated receive
loop that returned ONLY on an audio frame, so a turn completing without audio
drained to the response_timeout deadline and raised — a latent hang surfaced by
/sweep during PR #647 (#646 fixed the same anti-pattern in OpenAI Realtime).

Mirror the #646/PR647 reference pattern (and the Gemini Live / Pipecat idiom):
return an empty AudioChunk on a non-audio terminal so the base
_drain_agent_response loop exits cleanly instead of hanging to the deadline.

- elevenlabs.py recv_audio: a socket close (ConnectionClosed) and a
  client_tool_call (tool-only turn, no client_tool_result path) each return an
  empty chunk.
- websocket.py recv_audio: a clean server close (end of stream) returns empty.
- elevenlabs.ts onMessage: client_tool_call resolves the active receiver with an
  empty chunk (socket close was already handled by onSocketClose).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lose, tool-only)

Prove the audio-gated drain returns an empty AudioChunk (not a timeout/raise) on
a non-audio completion, and that the normal audio path still drains (no regression).

- python/tests/voice/test_audio_gated_drain.py (new): EL client_tool_call, EL
  socket close, and WebSocket socket close each return an empty chunk; normal
  audio for both adapters still returns the decoded payload. Terminal-case calls
  use a long recv budget under a short outer asyncio.wait_for ceiling, so an
  un-fixed adapter fails fast instead of stalling the suite.
- elevenlabs.test.ts: add a client_tool_call terminal test; drop client_tool_call
  from the "swallows unknown events" case (it is now a terminal, not swallowed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@drewdrewthis drewdrewthis self-assigned this Jun 21, 2026
@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

The ElevenLabs and generic WebSocket voice adapters had a latent hang: their audio-gated drain loops only returned on audio events, so tool-only turns or clean socket closes would wait until timeout. This PR adds client_tool_call handling and ConnectionClosed catch paths in both adapters (Python and JavaScript) that resolve the drain immediately with an empty AudioChunk, and covers all three terminal paths with new and updated tests.

Changes

Audio-gated drain hang fix

Layer / File(s) Summary
WebSocket base adapter: ConnectionClosed terminal path
python/scenario/voice/adapters/websocket.py
recv_audio now catches websockets.exceptions.ConnectionClosed around self._ws.recv() and returns AudioChunk(data=b"") instead of propagating the exception. Docstring distinguishes end-of-stream from timeout.
ElevenLabs Python adapter: client_tool_call and ConnectionClosed terminal paths
python/scenario/voice/adapters/elevenlabs.py
recv_audio adds an in-method websockets import, a ConnectionClosed catch returning an empty AudioChunk, and a client_tool_call branch that logs and returns an empty AudioChunk. Module and method docstrings document the new termination semantics.
ElevenLabs JavaScript adapter: client_tool_call terminal path
javascript/src/voice/adapters/elevenlabs.ts
onMessage adds a client_tool_call handler that shifts the active receiveAudio waiter and resolves it with an empty AudioChunk, or drops the terminal if no waiter exists. Docstring documents tool-only turns and socket-close termination.
Python test suite: drain termination and normal audio coverage
python/tests/voice/test_audio_gated_drain.py
New test module with _scripted_ws mock and _BytesAudioProtocol covers ElevenLabs client_tool_call→empty chunk, socket close (both variants)→empty chunk, normal audio→non-empty PCM, and drain-level tool-only termination; similar coverage for generic WebSocket adapter with fast-fail timeout ceiling.
JavaScript test updates: client_tool_call terminal coverage
javascript/src/voice/adapters/__tests__/elevenlabs.test.ts
The "swallows interruption and unknown events" test now emits agent_response_metadata instead of client_tool_call, and a dedicated client_tool_call (tool-only turn) test was added asserting receiveAudio resolves to zero-length PCM.

Possibly related issues

  • #648: The main issue implements fixes for the exact hang problem identified in issue #648 by adding terminal-event handling (client_tool_call and socket close) to both ElevenLabs and WebSocket adapters to return empty AudioChunk values instead of timing out.
  • #567: Related follow-up concerning post-interrupt and scripted next-turn timeouts due to server-VAD not re-engaging; both affect the same adapter receive path but target different timeout scenarios.

Suggested labels

ai-reviewed, prove-it-clean

Suggested reviewers

  • 0xdeafcafe
  • Aryansharma28
  • sergioestebance
  • rogeriochaves

🐇 A tool call arrives with no sound to play,
the drain used to hang — what a long, silent day!
Now an empty AudioChunk signals "all done,"
and ConnectionClosed no longer makes callers run.
Hops clean through the socket, tests pass in a flash —
no more timeout hangs, no dramatic crash! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title concisely summarizes the main change: fixing terminal drain behavior on non-audio completion for ElevenLabs and WebSocket adapters.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly explains the bug (audio-gated receive loops causing hangs on non-audio turns), the solution (returning empty AudioChunk on terminal events), and provides comprehensive test coverage and verification steps across both Python and TypeScript implementations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/648-voice-audio-gated-drain

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@drewdrewthis drewdrewthis marked this pull request as ready for review June 21, 2026 10:24

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/tests/voice/test_audio_gated_drain.py`:
- Around line 51-79: Add full type annotations to the `_scripted_ws` function
and the nested `fake_recv` function to comply with strict typing requirements.
Change the `frames` parameter type from `list` to `Sequence` from
`collections.abc` for better flexibility, add a return type annotation to
`_scripted_ws`, and add a return type annotation to the `fake_recv` async
function. Import `Sequence` from `collections.abc` at the top of the file if not
already present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f579411d-8edc-4b90-af52-3235bdb4cf1b

📥 Commits

Reviewing files that changed from the base of the PR and between b819849 and 2423a92.

📒 Files selected for processing (5)
  • javascript/src/voice/adapters/__tests__/elevenlabs.test.ts
  • javascript/src/voice/adapters/elevenlabs.ts
  • python/scenario/voice/adapters/elevenlabs.py
  • python/scenario/voice/adapters/websocket.py
  • python/tests/voice/test_audio_gated_drain.py

Comment thread python/tests/voice/test_audio_gated_drain.py Outdated
drewdrewthis and others added 2 commits June 21, 2026 10:36
…rop rationale

Address non-blocking review feedback on PR #693 (no production-logic change):
- test_audio_gated_drain.py: parametrize the EL + WebSocket socket-close tests
  over both ConnectionClosedOK (clean) and ConnectionClosedError (abnormal),
  since production catches the base ConnectionClosed — both subclasses must
  terminate the drain cleanly.
- elevenlabs.ts: expand the client_tool_call comment to explain why dropping the
  terminal when no receiver is in flight is safe (a terminal carries no payload;
  the drain always parks a waiter first) and why queuing an empty sentinel would
  be worse (a spurious empty turn on the next receiveAudio).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address review feedback (Metz/Beck, Fowler) on PR #693 — test-only:
- Add test_elevenlabs_tool_only_turn_drain_exits_cleanly: drives
  _drain_agent_response end-to-end on a tool-only turn and asserts it returns an
  empty merged turn instead of raising FirstChunkTimeoutError. Guards the bug at
  the level it was reported (a drain-level hang), above the recv_audio unit
  tests. Red without the fix (drain hangs to the outer ceiling).
- Rename the recv_audio unit tests *_terminates_drain -> *_returns_empty_chunk
  so the names match what they assert (the recv contract, not the drain loop).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/tests/voice/test_audio_gated_drain.py`:
- Line 93: Add explicit type annotations to the async test function signatures
that are missing them. For the async test functions
test_elevenlabs_client_tool_call_returns_empty_chunk, the test at line 127, the
test at line 168, and the test at line 224, add explicit return type annotations
(typically -> None for test functions). Additionally, for the parametrized test
functions that have a close_cls parameter (at lines 127 and 224), add explicit
type annotations to this parameter. Ensure all function signatures include
complete type information to satisfy pyright type checking in strict mode as
required by the project's coding guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e6e8ae61-7212-4906-b4ba-0f1505fd0853

📥 Commits

Reviewing files that changed from the base of the PR and between 5827647 and 3f9b4ac.

📒 Files selected for processing (1)
  • python/tests/voice/test_audio_gated_drain.py

Comment thread python/tests/voice/test_audio_gated_drain.py
@drewdrewthis

Copy link
Copy Markdown
Collaborator Author

Review verdict: READY

Reviewed at: 3f9b4ac · Run: /review (own-PR)

10-agent fan-out (principles, hygiene, security, test, proof-reviewer + 4 personas/design-soundness + drift) ran at 5827647; the delta since is test-only polish addressing the feedback below (unit-test renames + a drain-level regression test), so per the methodology's cosmetic/test-only skip-guard the persona/design/drift lenses were not re-run.

No blocking concerns. No <!-- review-thread --> threads were opened — every finding was non-blocking, and the actionable ones are already fixed in this PR. Both required checks (python-complete, javascript-complete) are green at this SHA.

Addressed in this PR (review feedback)

  • [metz-beck][fowler] Unit tests named *_terminates_drain overclaimed (they exercise recv_audio, not the drain) → renamed to *_returns_empty_chunk, and added a drain-level guard test_elevenlabs_tool_only_turn_drain_exits_cleanly that drives _drain_agent_response end-to-end (red without the fix).
  • [test][proof] Socket-close tests only covered ConnectionClosedOK while production catches the base ConnectionClosed → parametrized the EL + WebSocket socket-close tests over both ConnectionClosedOK and ConnectionClosedError.
  • [principles][test] PY (pull-loop return) vs TS (active-waiter resolve, drop-on-no-waiter) terminal divergence → expanded the TS comment to explain why dropping a terminal with no receiver is safe (no payload to preserve; the drain always parks a waiter first) and why queuing a sentinel would be worse.
  • [proof] Red-without-fix state isn't visible from CI (which only runs green HEAD) → PR body "How I can prove" now states it is hand-verified by reverting source and re-running.

Non-blocking (Decide / New Issue)

  • [fowler] (New Issue, out of scope): the empty-AudioChunk "terminal" is an undeclared protocol now repeated across ~all voice adapters (the base drain breaks on not chunk.data; AudioChunk has no is_terminal/terminal()). Recommend a follow-up to name the sentinel on AudioChunk + document the abstract recv_audio contract, before a future adapter copies it again. Cross-adapter refactor beyond ElevenLabs + WebSocket voice adapters can hang on a silent/non-audio completion (audio-gated drain, no terminal path) #648 — recommended for maintainers, not filed.
  • [uncle-bob][security] (Decide): generic websocket.py swallows an abnormal close (ConnectionClosedError/1006) silently, while the EL twin logger.debugs. A parity debug line would aid bring-your-own-backend diagnostics. Deferred to keep the "narrowest correct change per adapter" the issue asked for.
  • [hygiene][uncle-bob][fowler] (Decide, no change): per-method import websockets in recv_audio. It matches the established 6-call-site lazy-import convention (principles confirmed) and is necessary, not redundant — Python function-local imports do not cross methods, so connect()'s import is not visible in recv_audio.
  • [uncle-bob] (Decide): new AudioChunk({ data: new Uint8Array(0) }) appears in ~3 TS spots; a named emptyChunk()/terminal sentinel would read better — best done with the AudioChunk.terminal() follow-up above.

Lenses that passed clean

  • [design-soundness] PASS — reuses the library-native websockets ConnectionClosed end-of-stream signal and the repo's own empty-chunk sentinel; the "use async for instead" alternative was refuted (websockets.__aiter__ re-raises ConnectionClosedError, which the parametrized tests require to terminate cleanly, so explicit except ConnectionClosed is the more correct mechanism).
  • [drift] PASS — delivery matches the plan/issue exactly: the 3 named sites only, commit-per-AC honored, #567 correctly left out as a separate follow-up, no scope creep.
  • [completeness] PASS — 2/2 issue ElevenLabs + WebSocket voice adapters can hang on a silent/non-audio completion (audio-gated drain, no terminal path) #648 ACs covered with load-bearing, independently re-run evidence (red-without-fix → green-with-fix, both languages).

Verdict is prose, not a GitHub approval.

@drewdrewthis drewdrewthis added the slack-requested Slack PR review request posted label Jun 21, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.

The PR changes runtime message-handling in the ElevenLabs and generic WebSocket voice adapters so that client_tool_call and socket-close events return an empty AudioChunk instead of propagating ConnectionClosed or timing out. These are changes to integration/transport logic that affect how the system interacts with external/websocket services, so they do not meet the policy's low-risk criteria.

This PR requires a manual review before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

slack-requested Slack PR review request posted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ElevenLabs + WebSocket voice adapters can hang on a silent/non-audio completion (audio-gated drain, no terminal path)

1 participant