Description
Confirm this is an issue with the Python library and not an underlying OpenAI API
- This is an issue with the Python library
Describe the bug
The intended behavior is to disable server-side VAD for the OpenAI Realtime model. We are using LiveKit to facilitate the websocket connection, but the bug is in the OpenAI library.
In particular, the openai.resources.beta.realtime.AsyncRealtimeConnection.send
method uses event.to_json(use_api_names=True, exclude_defaults=True, exclude_unset=True)
to serialize the SessionUpdateEvent
. The issue is with the exclude_defaults=True
parameter which doesn't include any values that are equal to their default values.
We have confirmed the two serial SessionUpdateEvent
s get composed, so a change from the first event is reflected in the resulting configuration of the second event. This makes the exclude_defaults=True
argument particularly problematic because there is now no way to ever change a default and then change it back.
There are a couple of problems here. For VAD in particular, despite the default value of turn_detection=None
in Session(BaseModel)
, it is in fact not None
and instead some default server-side VAD values. When you try to pass None
in the SessionUpdateEvent
you can't change the turn_detection
value because (1) exclude_defaults=True
prevents you and (2) the default value is inconsistent with what actually exists by default.
There are two solutions:
- Remove
exclude_defaults=True
- Update the default
turn_detection
inSession
To Reproduce
Please follow the steps below.
Code snippets
When running python minimal_worker.py console
using LiveKit agents on branch dev-1.0
with the following model configuration:
agent = VoiceAgent(
instructions="You are a helpful assistant that can answer questions and help with tasks.",
llm=openai.realtime.RealtimeModel(
model="gpt-4o-realtime-preview-2024-12-17",
voice="alloy"
)
)
Then, within the _main_task
of RealtimeSession
, we hardcode the turn_detection=None
parameter as follows:
self._msg_ch.send_nowait(
SessionUpdateEvent(
type="session.update",
session=session_update_event.Session(
model=self._realtime_model._opts.model, # type: ignore
voice=self._realtime_model._opts.voice, # type: ignore
input_audio_transcription=input_audio_transcription,
turn_detection=None
),
event_id=utils.shortuuid("session_update_"),
)
)
The issue here is that turn_detection
never gets updated properly according to the SessionUpdatedEvent
. This is related to the problem that this PR was attempting to solve.
For example, we get:
- The
SessionCreatedEvent
with the defaultturn_detector
. By the way, even after passing the query param to the websocket uriturn_detector=
for a null value, it still returns with server-side VAD.
SessionCreatedEvent(..., turn_detection=TurnDetection(create_response=True, interrupt_response=True, prefix_padding_ms=300, silence_duration_ms=200, threshold=0.5, type='server_vad'), ...) type='session.created')
- After passing the
turn_detector=None
argument to theSessionUpdateEvent
as mentioned above, we still eventually observe theSessionUpdatedEvent
.
SessionUpdatedEvent(..., turn_detection=TurnDetection(create_response=True, interrupt_response=True, prefix_padding_ms=300, silence_duration_ms=200, threshold=0.5, type='server_vad'), ...), type='session.updated')
OS
macOS
Python version
Python v3.13
Library version
openai v1.66.3