Skip to content

Commit 1a0d3ba

Browse files
committed
Add Symbl Sapat transcription guide
Signed-off-by: jtc268 <89586838+jtc268@users.noreply.github.com>
1 parent df97ac5 commit 1a0d3ba

4 files changed

Lines changed: 390 additions & 0 deletions

File tree

Lines changed: 321 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,321 @@
1+
---
2+
title: "Run Symbl.ai Transcription With Sapat"
3+
description:
4+
"Use Daytona, Sapat, and Symbl.ai to run a reproducible async audio
5+
transcription workflow for video files."
6+
date: 2026-05-27
7+
author: "jtc268"
8+
tags: ["daytona", "sapat", "symbl-ai", "transcription"]
9+
---
10+
11+
# Run Symbl.ai Transcription With Sapat
12+
13+
# Introduction
14+
15+
Video transcription usually starts as a small utility task. One meeting recording
16+
needs notes. One product demo needs a searchable transcript. One interview needs
17+
to become a written brief before the team forgets the details. The work feels
18+
simple until the same command must run on another machine, with another API key,
19+
against a larger recording, inside a clean environment.
20+
21+
This guide shows how to run a reproducible transcription workflow with
22+
[Daytona](https://www.daytona.io/), [Sapat](https://github.com/nkkko/sapat), and
23+
Symbl.ai. The companion Sapat implementation is open at
24+
[nibzard/sapat#52](https://github.com/nibzard/sapat/pull/52). It adds
25+
`--api symbl`, submits converted MP3 audio to Symbl.ai's async audio workflow,
26+
polls the returned job, retrieves conversation messages, and writes the final
27+
transcript into Sapat's existing `.txt` output path.
28+
29+
![Sapat Symbl.ai transcription flow in Daytona](assets/20260527_run_symbl_transcription_with_sapat_in_daytona.svg)
30+
31+
## TL;DR
32+
33+
- Use Daytona to keep Python, ffmpeg, dependencies, and environment variables in
34+
one reproducible workspace.
35+
- Use Sapat to convert `.mp4` files to MP3 and route transcription through
36+
`--api symbl`.
37+
- Use Symbl.ai's [async audio API](../definitions/20260527_definition_async_audio_api.md)
38+
pattern for longer recorded conversations: submit audio, poll the job, then
39+
fetch the transcript messages.
40+
- Keep credentials in `.env` or Daytona workspace environment variables. Do not
41+
commit API keys, audio files, or generated transcripts.
42+
43+
## Prerequisites
44+
45+
You need:
46+
47+
- A Daytona workspace that can open the Sapat repository.
48+
- Python 3.10 or newer for the commands in this guide.
49+
- `ffmpeg` installed in the workspace.
50+
- A Symbl.ai account with either a temporary access token or an app ID and app
51+
secret.
52+
- A video file in `.mp4` format for the sample run.
53+
54+
The provider in the companion PR does not require the Symbl Python SDK. It uses
55+
the same `requests` dependency that Sapat already uses for its OpenAI and Groq
56+
providers.
57+
58+
## Step 1: Create a Daytona Workspace
59+
60+
Start from the Sapat repository. Daytona will provision a clean workspace around
61+
the repository so the setup can be repeated without relying on local laptop
62+
state.
63+
64+
```bash
65+
daytona create https://github.com/nkkko/sapat --code
66+
```
67+
68+
After the workspace opens, confirm the project has the expected shape:
69+
70+
```bash
71+
ls
72+
```
73+
74+
You should see files such as `README.md`, `pyproject.toml`, `requirements.txt`,
75+
and the `src/sapat` package. If you are testing the Symbl provider before it is
76+
merged upstream, check out the companion PR branch or apply the patch from
77+
[nibzard/sapat#52](https://github.com/nibzard/sapat/pull/52).
78+
79+
## Step 2: Install Dependencies
80+
81+
Create an isolated virtual environment inside the Daytona workspace:
82+
83+
```bash
84+
python -m venv .venv
85+
source .venv/bin/activate
86+
pip install -r requirements.txt
87+
```
88+
89+
Sapat calls `ffmpeg` to turn video into MP3 before it sends audio to a provider.
90+
Check that `ffmpeg` is available:
91+
92+
```bash
93+
ffmpeg -version
94+
```
95+
96+
If the command is missing, install it in the workspace image or through the
97+
package manager available in your Daytona environment. For Debian-based
98+
workspaces, the usual command is:
99+
100+
```bash
101+
sudo apt-get update
102+
sudo apt-get install -y ffmpeg
103+
```
104+
105+
## Step 3: Configure Symbl.ai Credentials
106+
107+
The Symbl provider supports two credential modes.
108+
109+
Use an existing access token:
110+
111+
```bash
112+
cat > .env <<'EOF'
113+
SYMBL_ACCESS_TOKEN=replace_with_your_access_token
114+
SYMBL_API_BASE_URL=https://api.symbl.ai/v1
115+
SYMBL_JOB_POLL_INTERVAL_SECONDS=5
116+
SYMBL_JOB_TIMEOUT_SECONDS=600
117+
EOF
118+
```
119+
120+
Or let Sapat generate an access token from an app ID and app secret:
121+
122+
```bash
123+
cat > .env <<'EOF'
124+
SYMBL_APP_ID=replace_with_your_app_id
125+
SYMBL_APP_SECRET=replace_with_your_app_secret
126+
SYMBL_API_BASE_URL=https://api.symbl.ai/v1
127+
SYMBL_JOB_POLL_INTERVAL_SECONDS=5
128+
SYMBL_JOB_TIMEOUT_SECONDS=600
129+
EOF
130+
```
131+
132+
Keep `.env` out of source control. The Sapat repository already ignores `.env`,
133+
and the companion PR adds the Symbl variable names to `.env.example` with
134+
placeholder values only.
135+
136+
Symbl.ai's public docs describe the async audio flow as a three-step process:
137+
submit recorded audio, check the job status, and use the returned conversation
138+
ID to retrieve messages from the Conversations API. The provider follows that
139+
same shape so the Sapat command can stay simple.
140+
141+
## Step 4: Run Sapat With Symbl.ai
142+
143+
Put a test video in the workspace. For this example, assume the file is named
144+
`demo.mp4`.
145+
146+
Run Sapat with the Symbl provider:
147+
148+
```bash
149+
sapat demo.mp4 --quality M --language en --api symbl
150+
```
151+
152+
Behind the scenes, Sapat performs the following steps:
153+
154+
| Step | What happens |
155+
| --- | --- |
156+
| Convert | `ffmpeg` creates `demo.mp3` next to the input video. |
157+
| Submit | Sapat posts the MP3 file to Symbl.ai's async audio endpoint. |
158+
| Poll | Sapat polls the returned job ID until it is complete or times out. |
159+
| Retrieve | Sapat fetches conversation messages and joins them into transcript text. |
160+
| Save | Sapat writes `demo.txt` and removes the temporary MP3 file. |
161+
162+
When the run finishes, open the transcript:
163+
164+
```bash
165+
sed -n '1,80p' demo.txt
166+
```
167+
168+
For a directory of recordings, point Sapat at the directory:
169+
170+
```bash
171+
sapat recordings/ --quality M --language en --api symbl
172+
```
173+
174+
Sapat processes `.mp4` files in that directory and writes one `.txt` file for
175+
each video.
176+
177+
## Step 5: Tune the Workflow
178+
179+
The default polling settings are conservative:
180+
181+
```bash
182+
SYMBL_JOB_POLL_INTERVAL_SECONDS=5
183+
SYMBL_JOB_TIMEOUT_SECONDS=600
184+
```
185+
186+
Use a shorter polling interval while testing small files. Use a longer timeout
187+
for long meetings or classes. The timeout should reflect your expected audio
188+
length and the service latency you see in practice.
189+
190+
For language, Sapat accepts simple values such as `en`, `es`, `fr`, or a full
191+
BCP-47 code such as `en-US`. The provider maps common short language names to
192+
Symbl.ai language codes and passes full codes through unchanged.
193+
194+
## How the Symbl.ai Provider Fits Sapat
195+
196+
Sapat's existing provider model is intentionally small. Every provider receives
197+
the converted MP3 path and returns transcript text or a JSON object containing a
198+
`text` field. That means a provider can use a synchronous API such as OpenAI's
199+
audio transcription endpoint or an async flow such as Symbl.ai without changing
200+
the command a user runs.
201+
202+
The Symbl provider keeps that contract. It handles the longer async lifecycle
203+
inside the provider class:
204+
205+
1. It checks that the converted audio file exists.
206+
2. It resolves credentials from `SYMBL_ACCESS_TOKEN` or generates a token from
207+
`SYMBL_APP_ID` and `SYMBL_APP_SECRET`.
208+
3. It posts the MP3 bytes to `/process/audio` with a `languageCode` query
209+
parameter.
210+
4. It stores the returned `jobId` and `conversationId`.
211+
5. It polls `/job/{jobId}` until the job is complete, failed, or timed out.
212+
6. It calls `/conversations/{conversationId}/messages` and joins message text
213+
into the transcript returned to Sapat.
214+
215+
This shape keeps the user experience stable. The command remains:
216+
217+
```bash
218+
sapat demo.mp4 --api symbl
219+
```
220+
221+
The only difference is that the provider may wait while Symbl.ai processes the
222+
recording. That wait is why the timeout setting matters.
223+
224+
## When to Use This Provider
225+
226+
Use the Symbl.ai route when your workflow benefits from an async conversation
227+
pipeline rather than a one-shot transcription response. Meeting recordings,
228+
customer interviews, research calls, lectures, and webinar recordings are good
229+
fits because they often become more useful when the transcript is tied to a
230+
conversation identifier and can later support richer conversation intelligence.
231+
232+
For quick one-off clips, another provider may be simpler. For private local
233+
transcription, an offline provider may be a better fit. The advantage of keeping
234+
Symbl.ai behind Sapat's `--api` option is that your team can switch providers
235+
without changing the file layout, Daytona workspace, or transcript destination.
236+
237+
## Operational Guardrails
238+
239+
Treat recorded conversations as sensitive data. Keep raw videos, generated MP3
240+
files, and transcripts out of Git unless you have a deliberate publishing
241+
workflow. A clean pattern is:
242+
243+
- Store input recordings in a private workspace folder.
244+
- Run Sapat from that folder.
245+
- Review the generated `.txt` file before sharing it.
246+
- Move approved transcripts into the system that needs them.
247+
- Delete temporary or test recordings when the review is done.
248+
249+
If multiple teammates use the same Daytona workspace template, document the
250+
required environment variables but not the values. The `.env.example` file should
251+
show names and safe placeholders only. Secrets should come from each developer's
252+
workspace environment or secret manager.
253+
254+
## Step 6: Validate Before Sharing
255+
256+
Before opening a PR or handing this workflow to teammates, run checks that prove
257+
the provider is wired correctly:
258+
259+
```bash
260+
PYTHONPATH=src python -m unittest discover -s tests -v
261+
PYTHONPATH=src python -m compileall src tests
262+
PYTHONPATH=src python -m sapat.script --help
263+
git diff --check
264+
```
265+
266+
The mocked provider tests in the companion PR cover:
267+
268+
- Submitting an MP3 file with the expected bearer token and content type.
269+
- Converting a short `en` language flag into `en-US`.
270+
- Polling an in-progress job until it completes.
271+
- Reading conversation messages into a final transcript string.
272+
- Generating an access token from `SYMBL_APP_ID` and `SYMBL_APP_SECRET`.
273+
- Raising an error when the async job fails.
274+
275+
These tests do not upload private audio and do not require live Symbl.ai
276+
credentials.
277+
278+
## Common Issues and Troubleshooting
279+
280+
**Problem:** `ffmpeg` is not found.
281+
282+
**Solution:** Install `ffmpeg` in the Daytona workspace and rerun the command.
283+
Sapat cannot submit to Symbl.ai until it has converted the video to MP3.
284+
285+
**Problem:** The command says `Set SYMBL_ACCESS_TOKEN or both SYMBL_APP_ID and
286+
SYMBL_APP_SECRET`.
287+
288+
**Solution:** Confirm `.env` exists in the Sapat project root and that your
289+
terminal session loads it from that directory. Do not commit this file.
290+
291+
**Problem:** The job times out.
292+
293+
**Solution:** Increase `SYMBL_JOB_TIMEOUT_SECONDS`. If the file is unusually
294+
large, test with a shorter clip first so you can confirm credentials and provider
295+
wiring before running the full recording.
296+
297+
**Problem:** The transcript is empty.
298+
299+
**Solution:** Check the Symbl.ai job status in your provider logs and confirm the
300+
conversation messages endpoint returns message objects with `text` fields. Also
301+
check that the audio is audible after conversion.
302+
303+
## Conclusion
304+
305+
Daytona gives the workflow a clean workspace. Sapat gives it a small command
306+
surface. Symbl.ai gives it an async transcription path for recorded audio. Put
307+
together, the setup is easy to rerun: create the workspace, install dependencies,
308+
set credentials, run `sapat --api symbl`, and verify the `.txt` transcript.
309+
310+
That repeatability is the main win. The same flow works for one demo recording,
311+
a batch of class videos, or a set of customer interviews, without requiring each
312+
developer to rediscover the provider wiring on their own machine.
313+
314+
## References
315+
316+
- [Sapat repository](https://github.com/nkkko/sapat)
317+
- [Companion Symbl.ai provider PR](https://github.com/nibzard/sapat/pull/52)
318+
- [Symbl.ai transcription overview](https://symbl.ai/platform/understanding-apis/transcription/)
319+
- [Symbl.ai process conversation overview](https://docs.symbl.ai/docs/overview-process-a-conversation)
320+
- [Symbl.ai supported languages](https://docs.symbl.ai/docs/supported-languages)
321+
- [Symbl.ai async audio example](https://symbl.ai/developers/blog/use-async-audio-api-in-your-react-app/)
Lines changed: 39 additions & 0 deletions
Loading

authors/jtc268.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Author: jtc268
2+
Title: Independent Developer
3+
Description: jtc268 works on practical developer tooling, automation, and
4+
open-source integrations, with a focus on reproducible workflows that can be
5+
verified from code, tests, and clear runbooks.
6+
Company Name: Independent
7+
Company Description: Independent open-source contributor.
8+
Author Image: [https://avatars.githubusercontent.com/u/89586838?v=4]
9+
Company Logo Dark:
10+
Company Logo White:
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
title: "Async audio API"
3+
description:
4+
"An async audio API accepts recorded audio, processes it as a background job,
5+
and returns transcript data after the job completes."
6+
date: 2026-05-27
7+
author: "jtc268"
8+
tags: ["audio", "transcription", "api"]
9+
---
10+
11+
# Async audio API
12+
13+
An async audio API processes recorded audio outside the original request and
14+
returns a job or conversation identifier that clients can poll or receive
15+
through a webhook. This pattern is useful when audio files are too large or too
16+
slow to transcribe during a single HTTP request.
17+
18+
In a transcription workflow, the client uploads or links to an audio file, stores
19+
the returned job ID, waits until processing completes, and then fetches the
20+
transcript or conversation messages from a follow-up endpoint.

0 commit comments

Comments
 (0)