Skip to content

fix(voice): guard response.create on active response in recv_audio#659

Merged
drewdrewthis merged 8 commits into
mainfrom
fix/657
Jun 17, 2026
Merged

fix(voice): guard response.create on active response in recv_audio#659
drewdrewthis merged 8 commits into
mainfrom
fix/657

Conversation

@drewdrewthis

@drewdrewthis drewdrewthis commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Why

The OpenAI Realtime API silently drops a response.create sent while a response is already in flight. In multi-turn voice scenarios this caused recv_audio to hang until timeout — the committed user audio had no response kicked off, so the recv loop drained forever instead of producing a chunk. Closes #657.

What changed

  • _deferred_response_create flag (separate from _agent_turn_pending) — when user audio arrives while _response_active is true, defer the response.create via this flag instead of sending it immediately (which the server ignores). Keeping the flags separate preserves each flag's single meaning.
  • response.done / response.cancelled handler fires the deferred create — once _response_active clears, the handler sends response.create and re-arms _response_active, so the next recv iteration picks up the response for the committed audio.
  • _agent_turn_pending cleared in deferred path — the deferred path now clears _agent_turn_pending so a stale flag cannot fire a spurious second response.create after the deferred one completes.
  • _deferred_response_create reset in call() turn-start block — prevents a turn interrupted mid-deferral from leaking the flag into the next turn's response.done handler.

Test plan

Regression tests in python/tests/voice/test_realtime_response_create_guard.py:

cd python && uv run pytest tests/voice/test_realtime_response_create_guard.py -v
Test What it asserts
test_ac1_response_create_suppressed_while_response_active guard suppresses response.create when _response_active=True; only commit sent
test_ac2_deferred_response_create_fires_after_response_done deferred create appears in the send log after response.done is received
test_ac3_exactly_one_commit_and_one_create exactly 1 commit + 1 create across the full race sequence
test_ac4_agent_turn_branch_still_fires_response_create agent-turn branch unaffected by the guard (control)
test_ac5_normal_path_commit_then_create normal (non-race) path still fires response.create immediately
test_ac6_server_rejection_raises_runtime_error explicit server rejection of response.create raises RuntimeError
test_ac7_race_sequence_returns_audio_chunk_not_timeout race sequence returns AudioChunk instead of timing out
test_deferred_path_clears_agent_turn_pending deferred path clears _agent_turn_pending (line 428); mutation-test-confirmed

All 8 pass locally. CI green on 4e2e858d.

Feature file: specs/realtime-response-create-guard.feature

Prove-it 2.7 — live demo output

Guard demonstrated on the real OpenAI Realtime API (gpt-realtime-mini), with no mock WebSocket. Key excerpt from the live wire log:

[   1.633s]     STATE  _deferred_response_create: False -> True  <<< GUARD FIRES
             (user audio committed while response in flight)

[105] RECV <- response.done
[106] SEND -> response.create   *** deferred response.create ON THE WIRE ***
[108] RECV <- response.created  (new response started — deferred turn picked up)

GUARD PROOF: PASS
[PASS] DEFERRED response.create sent AFTER response.done received
       first RECV response.done at log[105], last SEND response.create at log[106]
[PASS] Exactly 2 response.create on the wire (1 normal + 1 deferred)
[PASS] Two input_audio_buffer.commit on the wire (one per user turn)
[PASS] At least 2 response.done received (original + deferred response)

Full 311-line output: /tmp/demo_659_guard_proof_output.txt (run at commit 82d6c52c; guard code unchanged in subsequent commits).

Comment thread python/tests/voice/test_realtime_response_create_guard.py Fixed
@drewdrewthis

Copy link
Copy Markdown
Collaborator Author

Sweep: unguarded response.create sends (race with active response)

Checked all response.create send sites in python/ + javascript/. Source: this PR's fix (#657).

Must-fix (out of scope here, follow-up issue to be filed):

File Line Note
javascript/src/voice/adapters/openai-realtime.ts 376 receiveAudio unguarded send; JS adapter has no _responseActive tracking at all
javascript/src/voice/adapters/openai-realtime.ts 680 sendText bare unconditional send
javascript/src/agents/realtime/realtime-agent.adapter.ts 189 handleInitialResponse unguarded
javascript/src/agents/realtime/realtime-agent.adapter.ts 231 handleAudioInput commit+create unguarded

Review: python/scenario/voice/adapters/openai_realtime.py:691 send_text — unconditional send, same race shape.

Clean: all other occurrences are tests/docs.

@drewdrewthis drewdrewthis added the prove-it-clean All ACs verified by /prove-it at this HEAD label Jun 11, 2026
Comment thread python/tests/voice/test_realtime_response_create_guard.py Fixed
@drewdrewthis

drewdrewthis commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

Review verdict: READY

Commit: ed4a8359b5ca5a3419f8f91f9d5356f62fe64076
Delta reviewed since: 82d6c52c010e11628245900f25e073efb7ca5c82

All prior blocking threads resolved. Delta review (two fix commits + rebase + uv.lock sync) surfaces no new blocking concerns. All three personas (Uncle Bob, Metz/Beck, Fowler) agree: proceed.


Non-blocking (prose — do not block merge)

From full review (prior):

  • [hygiene] openai_realtime.py:428+431_agent_turn_pending = False appears identically in both branches; could hoist above the if. Clean, no test impact.
  • [principles] openai_realtime.py:525–528 — deferred response.create fires before the _completed_tool_calls tool-only-turn check at line 539; coupling is undocumented. Decide: add ordering comment or restructure.
  • [test-reviewer] AC2 is a strict subset of AC3. Decide: merge or keep for clarity.
  • [hygiene] test_ac1_test_ac7_ naming encodes spec numbers, not behavior. Decide.
  • [test-reviewer] tuple[str, str] annotation inconsistency on _MockWS.log. Minor.

From delta review (82d6c52c4e2e858d):

  • [principles] openai_realtime.py:624 — comment reads "per-turn tool-call state" but _deferred_response_create is a race-guard flag, not tool-call state. Broadening the comment to "per-turn state" or splitting with its own note would stop the next engineer misreading the grouping. Decide.
  • [hygiene] test_realtime_response_create_guard.py:441test_deferred_path_clears_agent_turn_pending breaks the test_acN_ naming convention the rest of the file uses. Decide (rename to test_ac8_deferred_path_consumes_agent_turn_signal or similar).
  • [hygiene][test] test_realtime_response_create_guard.py:468 — second assertion _deferred_response_create is True is a sanity check on branch-taken, not a product invariant. Consider removing; AC1 already confirms the deferred branch ran. Decide.
  • [test][metz-beck] pytest.raises((asyncio.TimeoutError, RuntimeError)) broad catch masks which exit path was taken. If only TimeoutError is expected, tighten. Decide.
  • [metz-beck] Docstring and section comment hard-code "line 428" — line numbers in prose rot on the next edit above them. Reference the behavior ("deferral branch"), not the address. Fix (trivial, but worth noting).

From scoped re-review (39c7810aed4a8359, uv.lock sync only):

  • [proof-reviewer] AC-gating files (openai_realtime.py, test_realtime_response_create_guard.py, specs/realtime-response-create-guard.feature) are byte-identical between pre-rebase tip and HEAD. Uv.lock version bump (0.7.30→0.7.31) is inert to all ACs. No new concerns.

New Issues (file as follow-ups):

  • [hygiene] _MockWS defined identically in 3 test files. Extract to tests/voice/conftest.py.
  • [test] _deferred_response_create = False in call() (line 626) has no dedicated test. An interrupted turn leaving the flag set would silently drop the deferred create; a test for that edge belongs here.
  • [fowler][principles] Two turn-start reset sites: call() (lines 626–628) and _drain_agent_response() (transcripts). Asymmetric membership; extract _begin_turn() to prevent next flag landing in only one.
  • [uncle-bob][fowler][metz-beck] 4 booleans (_response_active, _response_ever_active, _agent_turn_pending, _deferred_response_create) model an implicit FSM with ~11 illegal combinations. Known limitation at line 150 (second deferred commit collapses silently) is a symptom. File FSM-refactor issue and assign before the fifth flag arrives.

Prove-it summary

Live demo on real OpenAI Realtime API (gpt-realtime-mini) at commit 82d6c52c (guard code unchanged since):

  • Guard fires: _deferred_response_create: False → True during active response
  • Deferred response.create sent AFTER response.done (log positions 106 > 105)
  • Exactly 2 response.create on wire; no hang; response completed — GUARD PROOF: PASS

AC-gating files verified byte-identical 4e2e858ded4a8359 (0-line diff); test (3.12) CI green at HEAD.

@drewdrewthis drewdrewthis added the in-ai-review Workflow: in-ai-review label Jun 11, 2026
@drewdrewthis

drewdrewthis commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

No description provided.

@drewdrewthis drewdrewthis marked this pull request as ready for review June 11, 2026 12:09
@drewdrewthis drewdrewthis self-assigned this Jun 11, 2026
drewdrewthis pushed a commit that referenced this pull request Jun 15, 2026
… (PY)

When _response_active is True, send_text now sets _deferred_response_create
and returns early instead of firing response.create unconditionally. The
existing response.done handler in recv_audio (from PR #659) consumes the
flag and fires the deferred create after the in-flight response completes.

Covers AC-PY1, AC-PY2, AC-DEFER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
drewdrewthis pushed a commit that referenced this pull request Jun 15, 2026
… (PY)

When _response_active is True, send_text now sets _deferred_response_create
and returns early instead of firing response.create unconditionally. The
existing response.done handler in recv_audio (from PR #659) consumes the
flag and fires the deferred create after the in-flight response completes.

Covers AC-PY1, AC-PY2, AC-DEFER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 70c0a09d-6033-4e0d-813f-7c95a7f4d028

📥 Commits

Reviewing files that changed from the base of the PR and between 4e2e858 and ed4a835.

⛔ Files ignored due to path filters (1)
  • python/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • python/scenario/voice/adapters/openai_realtime.py
  • python/tests/voice/test_realtime_response_create_guard.py
  • specs/realtime-response-create-guard.feature
🚧 Files skipped from review as they are similar to previous changes (1)
  • specs/realtime-response-create-guard.feature

Walkthrough

Adds a _deferred_response_create boolean flag to OpenAIRealtimeAgentAdapter to fix a race where recv_audio sends a duplicate response.create while a response is already in flight. The adapter now defers that send until response.done/response.cancelled, accompanied by eight hermetic pytest regression tests and a Gherkin feature spec.

Changes

response.create Race Guard

Layer / File(s) Summary
Adapter deferral flag and recv_audio / response.done logic
python/scenario/voice/adapters/openai_realtime.py
Declares _deferred_response_create as a per-instance flag; recv_audio sets it (instead of sending response.create) when _response_active is true and consumes _agent_turn_pending; response.done/response.cancelled handler fires exactly one deferred response.create and re-marks _response_active; flag reset at call() entry prevents state carryover.
Hermetic regression tests (AC1–AC7 + deferred-path)
python/tests/voice/test_realtime_response_create_guard.py
New test file with _MockWS (interleaved chronological log), _make_adapter factory, and eight async tests: response.create suppression, deferred ordering after response.done, exactly-once semantics, agent-turn control, normal-path control, server-rejection RuntimeError, timeout-regression fix, and _agent_turn_pending consumption in the deferred path.
Gherkin feature spec
specs/realtime-response-create-guard.feature
New feature file with Background and seven AC scenarios mirroring the pytest suite, an AC coverage map, and a manual-VAD-only scope note (server-VAD explicitly out of scope).

Possibly related issues

  • #662 – Identifies that the same _response_active guard pattern introduced here must also be applied to send_text and three JavaScript call sites not in scope for this PR.
  • #663 – Proposes refactoring all response-lifecycle booleans including the newly added _deferred_response_create into an explicit FSM, as a direct follow-up to the state-management patterns introduced here.

Suggested reviewers

  • rogeriochaves
  • 0xdeafcafe
  • Aryansharma28

Poem

🐇 A response was racing, oh what a fright,
Two response.creates sent into the night!
Now a flag defers the call with care,
Until response.done floats through the air.
No more timeouts, no duplicate fire —
The rabbit fixed the race, and raised the bar higher! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'fix(voice): guard response.create on active response in recv_audio' clearly summarizes the main change—adding a guard to prevent unconditional response.create sends when a response is active.
Description check ✅ Passed The description thoroughly explains the race condition, the fix (deferred response.create flag), test coverage, and includes live API proof, all directly related to the changeset.
Linked Issues check ✅ Passed All objectives from #657 are met: guard added to user-audio branch, response.create deferred via _deferred_response_create flag (instead of reusing _agent_turn_pending), input_audio_buffer.commit remains unconditional, comprehensive hermetic tests verify suppression/ordering/cardinality/regression, and manual-VAD scope maintained.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the response.create race condition: the adapter flag logic, test coverage, and feature documentation are all aligned with #657 requirements; no unrelated modifications present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/657

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/scenario/voice/adapters/openai_realtime.py`:
- Around line 420-430: In the deferred response path within the conditional
block at lines 420–427 (when `_response_active` is true and you set
`self._deferred_response_create = True`), you must also clear the
`_agent_turn_pending` flag by adding `self._agent_turn_pending = False`.
Currently this flag is only cleared in the else block when `response.create` is
sent immediately, but the deferred path leaves it stale. Clearing it in the
deferred path ensures the per-turn signal is consumed consistently and prevents
an unintended extra `response.create` from firing later at the agent-turn
branch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5689e4cc-81ca-409b-a622-bf4cb96b70bb

📥 Commits

Reviewing files that changed from the base of the PR and between 21226c0 and fc0ff2d.

📒 Files selected for processing (4)
  • python/scenario/voice/adapters/openai_realtime.py
  • python/tests/voice/test_realtime_response_create_guard.py
  • specs/realtime-response-create-guard.feature
  • specs/voice-agents.feature

Comment thread python/scenario/voice/adapters/openai_realtime.py
drewdrewthis pushed a commit that referenced this pull request Jun 15, 2026
… (PY)

When _response_active is True, send_text now sets _deferred_response_create
and returns early instead of firing response.create unconditionally. The
existing response.done handler in recv_audio (from PR #659) consumes the
flag and fires the deferred create after the in-flight response completes.

Covers AC-PY1, AC-PY2, AC-DEFER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread python/scenario/voice/adapters/openai_realtime.py
Comment thread python/scenario/voice/adapters/openai_realtime.py
@drewdrewthis drewdrewthis added ai-reviewed /review was run on this PR (multi-agent: principles, hygiene, test, security) and removed in-ai-review Workflow: in-ai-review labels Jun 15, 2026
drewdrewthis pushed a commit that referenced this pull request Jun 16, 2026
… (PY)

When _response_active is True, send_text now sets _deferred_response_create
and returns early instead of firing response.create unconditionally. The
existing response.done handler in recv_audio (from PR #659) consumes the
flag and fires the deferred create after the in-flight response completes.

Covers AC-PY1, AC-PY2, AC-DEFER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
drewdrewthis pushed a commit that referenced this pull request Jun 16, 2026
… (PY)

When _response_active is True, send_text now sets _deferred_response_create
and returns early instead of firing response.create unconditionally. The
existing response.done handler in recv_audio (from PR #659) consumes the
flag and fires the deferred create after the in-flight response completes.

Covers AC-PY1, AC-PY2, AC-DEFER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
drewdrewthis and others added 2 commits June 16, 2026 15:45
Gherkin spec covering all 7 ACs + AC-scope for the response.create race
condition in OpenAIRealtimeAgentAdapter.recv_audio. All scenarios are
@Unit (hermetic _MockWS, no live API key).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
AC1/AC2/AC3/AC7 assert post-fix behaviour — all fail on current pre-fix code.
AC4/AC5/AC6 are control assertions — all pass on current code.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
drewdrewthis and others added 6 commits June 16, 2026 15:45
Replace send-position proxy assertions (create_idx != commit_idx+1) with
true send-vs-recv ordering using an interleaved chronological log on _MockWS.
Each send() appends ("sent", type) and each recv() appends ("recv", type)
to mock_ws.log; log_index_of_first() looks up either kind.  AC2/AC3/AC7
now assert log_create > log_done — response.create appeared strictly after
response.done was received — which is definitionally correct and cannot
be gamed by adding extra sends.

Remove input_audio_buffer.clear from the deferred path in the adapter.
That send existed only to create a send at index 1 so the now-deleted
position proxy would see create at index 2; there is no protocol reason
for it. The deferred path is now minimal: send response.create, clear
_agent_turn_pending, set _response_active.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…w fixes

- Introduce _deferred_response_create (distinct from _agent_turn_pending) for
  the user-audio race guard; _agent_turn_pending retains its single original
  meaning (executor-signalled agent turn).
- Add known-limitation comment at deferral site (boolean collapses; FSM refactor
  as follow-up; manual-VAD makes it unreachable in practice).
- Test: tighten pytest.raises to (asyncio.TimeoutError, RuntimeError), delete
  unused _audio_delta_events helper, trim AC1 docstring, update module docstring
  to reference _deferred_response_create.
- Feature: update coverage-map comments @Unit → @integration; update header
  comment to reference _deferred_response_create.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e path

When _response_active is True we defer response.create via
_deferred_response_create, but previously _agent_turn_pending was left
set. After the deferred response completes, a subsequent recv_audio call
with no pending audio could satisfy the agent-turn branch (line 442) and
fire a spurious second response.create.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ending clear

- Add `self._deferred_response_create = False` to the `call()` per-turn
  reset block so an interrupted turn cannot leak the flag into the next
  turn's response.done handler (principles reviewer finding).
- Add `test_deferred_path_clears_agent_turn_pending` — the deferred path
  previously had zero test coverage for the `_agent_turn_pending = False`
  transition at line 428 (proof-reviewer mutation finding: deleting that
  line left all 7 AC tests green). New test enters the deferral branch
  and asserts both flags reach their expected states.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.

The PR changes the OpenAIRealtimeAgentAdapter’s runtime behavior for when and how response.create messages are sent to the OpenAI Realtime API (introducing a _deferred_response_create flag and new event-handling behavior). The policy explicitly disallows labeling changes that modify integrations with third‑party systems or external APIs as low risk, so this must go through the normal review process. While the change is limited to adapter logic and tests, it nevertheless alters external API messaging and therefore cannot be auto‑merged as low risk.

This PR requires a manual review before merging.

drewdrewthis pushed a commit that referenced this pull request Jun 16, 2026
… (PY)

When _response_active is True, send_text now sets _deferred_response_create
and returns early instead of firing response.create unconditionally. The
existing response.done handler in recv_audio (from PR #659) consumes the
flag and fires the deferred create after the in-flight response completes.

Covers AC-PY1, AC-PY2, AC-DEFER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
drewdrewthis pushed a commit that referenced this pull request Jun 16, 2026
… (PY)

When _response_active is True, send_text now sets _deferred_response_create
and returns early instead of firing response.create unconditionally. The
existing response.done handler in recv_audio (from PR #659) consumes the
flag and fires the deferred create after the in-flight response completes.

Covers AC-PY1, AC-PY2, AC-DEFER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@drewdrewthis drewdrewthis merged commit 5e844ea into main Jun 17, 2026
18 checks passed
@drewdrewthis drewdrewthis deleted the fix/657 branch June 17, 2026 10:57
@drewdrewthis drewdrewthis restored the fix/657 branch June 17, 2026 11:00
@drewdrewthis drewdrewthis deleted the fix/657 branch June 17, 2026 11:00
drewdrewthis pushed a commit that referenced this pull request Jun 17, 2026
… (PY)

When _response_active is True, send_text now sets _deferred_response_create
and returns early instead of firing response.create unconditionally. The
existing response.done handler in recv_audio (from PR #659) consumes the
flag and fires the deferred create after the in-flight response completes.

Covers AC-PY1, AC-PY2, AC-DEFER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-reviewed /review was run on this PR (multi-agent: principles, hygiene, test, security) prove-it-clean All ACs verified by /prove-it at this HEAD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI Realtime adapter fires response.create unconditionally after audio commit — races an active response into a drain timeout

2 participants