Skip to content

[WIP] Realtime websocket API#282

Draft
CTKnight wants to merge 24 commits intosgl-project:mainfrom
CTKnight:prototype/webrtc-vad
Draft

[WIP] Realtime websocket API#282
CTKnight wants to merge 24 commits intosgl-project:mainfrom
CTKnight:prototype/webrtc-vad

Conversation

@CTKnight
Copy link
Copy Markdown

@CTKnight CTKnight commented Apr 13, 2026

Motivation

This PR adds a prototype realtime API for interactive Qwen3-Omni conversations. It addresses #59 by enabling low-latency multi-turn interaction with server-side webrtcvad, automatic turn commits, and response interruption when the user starts speaking again.

Modifications

  • Adds a new /v1/realtime/ws endpoint plus the sglang_omni.realtime package for realtime session orchestration, backend abstractions, websocket event streaming, and streamed assistant audio output.
  • Implements realtime session behaviors for auto-VAD turns, manual push-to-talk, text-only turns in the same session, conversation history across turns, and barge-in / cancellation when a new utterance arrives while the assistant is responding.
  • Adds an OmniResponseBackend adapter on top of the existing omni client, plus a mock backend for browser smoke tests and frontend development.
  • Adds a standalone playground/realtime-ws frontend and launcher for microphone capture, text input, streamed assistant audio playback, and local mock-server testing.
  • Registers the realtime websocket route in the FastAPI server, adds the [realtime] extra (webrtcvad-wheels, websockets), and documents setup / usage in playground/README.md.

Related Issues

Accuracy Test

N/A. This PR adds transport, session orchestration, and playground code; it does not change model weights or kernel logic.

Benchmark & Profiling

Not included yet. This is a prototype realtime API / playground PR.

Validation

  • uv run pytest -q tests/test_realtime_audio_pipeline.py tests/test_realtime_backend_mock.py tests/test_realtime_backend_omni.py tests/test_realtime_media.py tests/test_realtime_session.py tests/test_realtime_utils.py tests/test_realtime_vad.py tests/test_realtime_ws_api.py tests/test_client_media_inputs.py
  • Result: 25 passed in 5.09s
  • Manual browser validation was also captured on the issue thread for Qwen3-Omni-30B-A3B-Instruct: [Feature] Support Interactive Real-Time API #59 (comment)

Screenshot:

Realtime websocket playground

Video:

Screen.Recording.sglang.ws.mov

Note: on macOS, the returned assistant audio was not recorded in the capture, but the UI's assistant audio level indicates audio playback.

Checklist

  • Format your code according with pre-commit.
  • Add unit tests.
  • Update documentation / docstrings / example tutorials as needed.
  • Provide throughput / latency benchmark results and accuracy evaluation results as needed. Not applicable for this prototype transport/playground PR.
  • For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

@FrankLeeeee
Copy link
Copy Markdown
Collaborator

Can you include some description of your work?

@CTKnight CTKnight changed the title Prototype/webrtc vad [WIP] Realtime websocket API Apr 15, 2026
@PopSoda2002
Copy link
Copy Markdown
Collaborator

Hi @CTKnight I am really interested in this feature, can I ask for progress and collaboration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants