Skip to content

Latest commit

 

History

History
185 lines (127 loc) · 4.8 KB

File metadata and controls

185 lines (127 loc) · 4.8 KB

Playground

This directory contains multiple playground interfaces for SGLang-Omni.

Subdirectory Description
web/ Full-featured HTML/CSS/JS UI served directly by the sglang-omni server. Supports text, audio, image, video inputs and a built-in file browser.
gradio/ Lightweight Gradio app that connects to a running server via HTTP. Text chat with streaming, model selector, and generation parameter controls.
realtime-ws/ Standalone websocket realtime app with server-side VAD, text input, microphone streaming, and streamed assistant audio playback.
tts/ S2 Pro TTS Gradio app with shared controls for voice cloning plus separate streaming and non-streaming playback modes.

Web Playground

The web playground is embedded in the backend — a single process serves both the API and the UI.

uv pip install -v -e .
./playground/web/start.sh \
  --model-path Qwen/Qwen3-Omni-30B-A3B-Instruct

Then open http://localhost:8000 in your browser.

Realtime WebSocket Playground

Install the project before launching:

uv pip install -v -e .

Launch the backend plus standalone frontend app with one command:

./playground/realtime-ws/start.sh [--mock] [realtime-options] [backend-options...]

Minimal usable commands:

# local smoke test
./playground/realtime-ws/start.sh --mock

# real model
./playground/realtime-ws/start.sh --model-path Qwen/Qwen3-Omni-30B-A3B-Instruct

In normal backend mode, pass the usual speech server flags such as --model-path:

./playground/realtime-ws/start.sh \
  --model-path Qwen/Qwen3-Omni-30B-A3B-Instruct

Then open http://localhost:7862.

For a browser smoke test without loading any model, launch the mock realtime API:

./playground/realtime-ws/start.sh --mock

That path exercises:

  • browser microphone capture over websocket PCM streaming
  • server-side VAD turn detection
  • automatic response start after speech stop
  • streamed assistant audio playback in the browser
  • text prompts over the same websocket session

The mock backend returns canned text plus playback of the captured client audio (falling back to a synthetic tone when there is no input audio) instead of calling the inference pipeline.

Remote browser over SSH port forwarding

Because the transport is plain HTTP + WebSocket, standard SSH forwarding is enough for remote browser testing.

Example:

./playground/realtime-ws/start.sh --mock

Forward the backend port and the frontend port from the remote machine:

ssh -L 8000:localhost:8000 -L 7862:localhost:7862 user@host

For the full launcher help, run:

./playground/realtime-ws/start.sh --help

The websocket playground:

  • streams microphone PCM to the backend over /v1/realtime/ws
  • runs server-side VAD to auto-trigger one inference turn per utterance
  • supports manual push-to-talk and text prompts in the same session
  • streams assistant audio back over the websocket and auto-plays it in the browser
  • keeps the frontend separate from the inference API server

Custom port

./playground/web/start.sh \
  --model-path Qwen/Qwen3-Omni-30B-A3B-Instruct \
  --port 8080

Gradio Playground

Install

pip install sglang-omni

Launch (one command)

start.sh launches the backend server, waits for it to become healthy, then starts the Gradio UI:

./playground/gradio/start.sh \
  --model-path Qwen/Qwen3-Omni-30B-A3B-Instruct

Backend runs on http://localhost:8000, Gradio UI on http://localhost:7860. Use --port / --gradio-port to change, --share for a public link.

Connect to an existing server

If you already have a server running, use app.py directly:

python playground/gradio/app.py --api-base http://localhost:8000

TTS Playground

Install

pip install "sglang-omni[s2pro]"

Launch

./playground/tts/start.sh --model-path fishaudio/s2-pro

The TTS playground starts the S2 Pro backend and exposes two tabs:

  • Non-Streaming for final-audio latency measurement
  • Streaming for incremental playback from /v1/audio/speech SSE chunks

Connect to an existing server

python -m playground.tts.app --api-base http://localhost:8000

SSH tunnel (for remote servers / Docker)

From your local machine:

ssh -L 8000:localhost:8000 -L 7860:localhost:7860 user@host

Architecture

Endpoint Description
/ Web playground UI (index.html, app.js, styles.css)
/v1/chat/completions Chat completions (text + audio, streaming)
/v1/audio/speech Text-to-speech
/v1/realtime/ws Realtime websocket session transport
/v1/models List available models
/v1/fs/list Browse server filesystem
/v1/fs/file Download a server file
/health Health check