Lightweight wrapper that exposes the standard LinTO streaming API and forwards audio to a running Kyutai moshi-server. WebSocket streaming mode only (no HTTP file transcription or Celery task mode).
See the main README for the WebSocket protocol and general docs. See ENV.md for all environment variables.
Install uv if not already available and install the wrapper requirements:
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv --python 3.9 .venv && source .venv/bin/activate
uv sync --extra moshiThen run the websocket server from the repository root. By default it reaches the moshi server on KYUTAI_URL=ws://localhost:8080 and listens on default 8001
LOG_TRANSCRIPTS=true LOG_LEVEL=INFO FINAL_TRANSCRIPT_DELAY=1.5 STREAMING_PORT=8002 KYUTAI_URL=ws://localhost:8080 uv run main.py -m websocket -e kyutaiThe wrapper now supports Semantic Voice Activity Detection (VAD) from the Moshi 1B model's extra heads. This provides intelligent utterance boundary detection beyond simple timer-based heuristics.
Environment Variables:
| Variable | Default | Description |
|---|---|---|
USE_SEMANTIC_VAD |
true |
Enable/disable semantic VAD processing |
VAD_THRESHOLD |
0.5 |
Probability threshold for end-of-utterance detection (0.0-1.0) |
VAD_HISTORY_SIZE |
3 |
Number of consecutive high VAD signals needed for confirmation |
VAD_DELAY |
0.3 |
Delay (seconds) after VAD trigger before sending final transcript |
VAD_REQUIRE_PUNCTUATION |
true |
Require both VAD signal AND punctuation for finalization |
LOG_VAD |
false |
Log VAD probabilities for debugging |
LOG_TRANSCRIPTS |
false |
Log final transcripts with finalization reason |
FINAL_TRANSCRIPT_DELAY |
1.5 |
Fallback timer delay (seconds) when VAD is not triggered |
Examples:
# Default: Use VAD with punctuation confirmation
uv run main.py -m websocket -e kyutai
# More sensitive VAD (lower threshold)
USE_SEMANTIC_VAD=true VAD_THRESHOLD=0.3 uv run main.py -m websocket -e kyutai
# VAD without requiring punctuation (more aggressive)
VAD_REQUIRE_PUNCTUATION=false uv run main.py -m websocket -e kyutai
# Debug VAD behavior
LOG_VAD=true LOG_TRANSCRIPTS=true LOG_LEVEL=DEBUG uv run main.py -m websocket -e kyutai
# Disable VAD (fallback to original timer-only behavior)
USE_SEMANTIC_VAD=false uv run main.py -m websocket -e kyutaidocker build -t linto-stt-kyutai:latest --build-arg STT_ENGINE=kyutai .
docker run --rm -p 8001:80 \
-e SERVICE_MODE=websocket \
-e STT_ENGINE=kyutai \
-e KYUTAI_URL=ws://host.docker.internal:8080 \
linto-stt-kyutaiThe container exposes the same /streaming endpoint as other LinTO engines and
forwards requests to a Kyutai server running locally or on the network.
Note: Depending on your NVIDIA GPU architecture, you may need to specify the
CUDARC_COMPUTEbuild argument. For example, for a Turing architecture (e.g., RTX 2080), you would build with:docker build --build-arg CUDARC_COMPUTE=75 -t moshi-stt:cuda --target runtime .You can find the compute capability of your GPU on the NVIDIA CUDA GPUs - Compute Capability page. The default is
89(Ada Lovelace).
docker build -t moshi-stt:cuda --target runtime .
docker build -t moshi-stt:cpu --target runtime-cpu .docker run --rm --gpus all -p 8080:8080 \
-e RUST_LOG=info \
-it moshi-stt:cudaA docker-compose.yml file is provided to easily run the entire stack locally. This includes the Moshi server, the LinTO wrapper, and a web client for testing.
The setup uses Docker Compose profiles to select between the CPU and CUDA environments. Navigate to the linto_stt/engines/kyutai directory and use one of the following commands:
To build and run the CPU version:
docker-compose --profile cpu up --buildTo build and run the CUDA version, first set the MOSHI_SERVER environment variable, then run the compose command. You can also set the CUDARC_COMPUTE build argument if you need to target a specific CUDA architecture (default is 89 - Ada Lovelace):
export MOSHI_SERVER=moshi-server-cuda
CUDARC_COMPUTE=75 docker compose --profile cuda up --buildAs the Kyutai's model is downloaded, wait a bit until you see logs like
moshi-stt-server-cuda | 2025-07-08T22:28:57.767774Z INFO moshi_server: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/moshi-server-0.6.3/src/main.rs:529: listening on http://0.0.0.0:8080
moshi-stt-server-cuda | 2025-07-08T22:28:57.767895Z INFO moshi_server::batched_asr: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/moshi-server-0.6.3/src/batched_asr.rs:226: warming-up the asr
moshi-stt-server-cuda | 2025-07-08T22:28:58.231816Z INFO moshi_server::batched_asr: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/moshi-server-0.6.3/src/batched_asr.rs:228: starting asr loop 64In either case, the web client will be available at http://localhost:8088/?server=ws://localhost:8001/streaming. Allow microphone usage and transcribe.
sudo apt update
sudo apt install -y build-essential pkg-config clang cmake libssl-dev git curl wget
sudo apt install -y mold
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
rustup default stableAfter git submodule update --init --recursive inside linto_stt/engines/kyutai/delayed-streams-modeling
moshi-server worker --config configs/config-stt-en_fr-hf.tomlinvestigate - official doc : moshi-backend --features cuda --config $(moshi-backend default-config) standalone
See local scripts in delayed-streams-modeling. They run with uv.
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone kyutai stuff as submodules
git submodule update --init --recursive
inside delayed-streams-modeling folder :
uv run scripts/stt_from_mic_rust_server.py # Don't run on WSL. No ALSA for pyaudio
uv run scripts/stt_from_file_rust_server.py audio/bria.mp3python3 -m http.server --directory webclient 8000
Open your browser (tested both implementations with Chrome) ; use the links below while setting GET parameter to the address of the LinTO ASR Websocket server you want to connect to.
http://localhost:8000/audioprocessor.html?server=ws://localhost:8001/streaming
or
http://localhost:8000/worklet.html?server=ws://localhost:8001/streaming
Replace the ?server=ws://localhost:8001/streaming with the actual address you want to test. (the /streaming part remains the same)