Skip to content

Commit 8ef2841

Browse files
Add realtime translation cookbook guide (#2667)
1 parent 2ccacf2 commit 8ef2841

95 files changed

Lines changed: 15484 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ downloads/
1515
eggs/
1616
.eggs/
1717
lib/
18+
!examples/voice_solutions/realtime_translation_guide/livekit-translation-demo/lib/
19+
!examples/voice_solutions/realtime_translation_guide/livekit-translation-demo/lib/**
1820
lib64
1921
parts/
2022
sdist/
@@ -103,6 +105,8 @@ celerybeat.pid
103105

104106
# Environments
105107
.env
108+
.env.*
109+
!.env.example
106110
.venv
107111
env/
108112
venv/

examples/voice_solutions/realtime_translation_guide.mdx

Lines changed: 562 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.env
2+
.env.*
3+
!.env.example
4+
.git
5+
node_modules
6+
npm-debug.log*
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
OPENAI_API_KEY=
2+
OPENAI_TRANSLATION_MODEL=gpt-realtime-translate
3+
OPENAI_INPUT_TRANSCRIPTION_MODEL=gpt-realtime-whisper
4+
PORT=5173
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.env
2+
.env.*
3+
!.env.example
4+
node_modules/
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
FROM node:20-slim
2+
3+
WORKDIR /app
4+
ENV NODE_ENV=production
5+
6+
COPY --chown=node:node package.json ./
7+
COPY --chown=node:node src ./src
8+
9+
USER node
10+
EXPOSE 5173
11+
CMD ["node", "src/server.js"]
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Browser Realtime Translation Demo
2+
3+
This is a small browser demo for one-way live translation from tab audio. The
4+
server creates a short-lived OpenAI Realtime Translation client secret, and the
5+
browser uses WebRTC to send captured tab audio and play translated speech with
6+
captions.
7+
8+
## Setup
9+
10+
Create a local `.env` file in this demo folder using `.env.example` as the list
11+
of required variables.
12+
13+
Required:
14+
15+
```bash
16+
OPENAI_API_KEY=your-openai-api-key
17+
```
18+
19+
Optional:
20+
21+
```bash
22+
OPENAI_TRANSLATION_MODEL=gpt-realtime-translate
23+
OPENAI_INPUT_TRANSCRIPTION_MODEL=gpt-realtime-whisper
24+
PORT=5173
25+
HOST=127.0.0.1
26+
```
27+
28+
## Run
29+
30+
From the cookbook repo root:
31+
32+
```bash
33+
cd examples/voice_solutions/realtime_translation_guide/browser-translation-demo
34+
npm install
35+
npm run dev
36+
```
37+
38+
Open the local URL printed by the server. Choose a browser tab with audio, pick
39+
the language you want to hear, and start translation.
40+
41+
## Audio mix
42+
43+
The app includes an Audio mix slider for balancing translated speech with the
44+
original tab audio. By default, it plays 85% translated audio and 15% original
45+
audio, matching the LiveKit demo.
46+
47+
When the selected source is a browser tab, the demo still requests
48+
`suppressLocalAudioPlayback` in the `getDisplayMedia()` audio constraints.
49+
If the browser honors that setting, the slider controls both the translated
50+
audio and the original audio playback from this app. If the browser does not
51+
support local playback suppression, the source tab may continue playing outside
52+
the slider, so lower or mute the source tab if you hear too much original audio.
53+
54+
## Validation
55+
56+
From the cookbook repo root:
57+
58+
```bash
59+
cd examples/voice_solutions/realtime_translation_guide/browser-translation-demo
60+
npm test
61+
```
62+
63+
To run a live API smoke test:
64+
65+
```bash
66+
npm run smoke
67+
```
68+
69+
## Notes
70+
71+
- The browser uses `getDisplayMedia()` so the user explicitly chooses the source
72+
tab.
73+
- WebRTC handles browser audio transport, so the browser does not need to
74+
resample tab audio or manually send PCM chunks.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"name": "browser-realtime-translation-demo",
3+
"version": "0.1.0",
4+
"private": true,
5+
"type": "module",
6+
"engines": {
7+
"node": ">=20"
8+
},
9+
"scripts": {
10+
"start": "node src/server.js",
11+
"dev": "node --watch src/server.js",
12+
"test": "node --test test/*.test.js",
13+
"smoke": "node scripts/smoke-realtime.mjs"
14+
}
15+
}

0 commit comments

Comments
 (0)