[WIP] Realtime websocket API#282
Draft
CTKnight wants to merge 24 commits intosgl-project:mainfrom
Draft
Conversation
Collaborator
|
Can you include some description of your work? |
- tests for early consumer exit and normal completion
Fix orphaned request leaks on stream aborts
# Conflicts: # playground/README.md # pyproject.toml
This reverts commit 4d5387e.
Collaborator
|
Hi @CTKnight I am really interested in this feature, can I ask for progress and collaboration? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This PR adds a prototype realtime API for interactive Qwen3-Omni conversations. It addresses #59 by enabling low-latency multi-turn interaction with server-side
webrtcvad, automatic turn commits, and response interruption when the user starts speaking again.Modifications
/v1/realtime/wsendpoint plus thesglang_omni.realtimepackage for realtime session orchestration, backend abstractions, websocket event streaming, and streamed assistant audio output.OmniResponseBackendadapter on top of the existing omni client, plus a mock backend for browser smoke tests and frontend development.playground/realtime-wsfrontend and launcher for microphone capture, text input, streamed assistant audio playback, and local mock-server testing.[realtime]extra (webrtcvad-wheels,websockets), and documents setup / usage inplayground/README.md.Related Issues
Accuracy Test
N/A. This PR adds transport, session orchestration, and playground code; it does not change model weights or kernel logic.
Benchmark & Profiling
Not included yet. This is a prototype realtime API / playground PR.
Validation
uv run pytest -q tests/test_realtime_audio_pipeline.py tests/test_realtime_backend_mock.py tests/test_realtime_backend_omni.py tests/test_realtime_media.py tests/test_realtime_session.py tests/test_realtime_utils.py tests/test_realtime_vad.py tests/test_realtime_ws_api.py tests/test_client_media_inputs.py25 passed in 5.09sQwen3-Omni-30B-A3B-Instruct: [Feature] Support Interactive Real-Time API #59 (comment)Screenshot:
Video:
Screen.Recording.sglang.ws.mov
Note: on macOS, the returned assistant audio was not recorded in the capture, but the UI's
assistant audio levelindicates audio playback.Checklist