Use this guide when you want Dify, n8n, webhook workers, or another workflow engine to call a private FunASR speech API. Start with the local smoke test in this directory, then replace localhost with the reachable service name inside your network.
cd examples/openai_api
python server.py --model sensevoice --device cuda --port 8000From the workflow host or container:
export FUNASR_BASE_URL=http://<funasr-host>:8000
curl -fsS "$FUNASR_BASE_URL/health"
curl -fsS "$FUNASR_BASE_URL/v1/models"If the workflow engine runs in Docker, localhost usually means the workflow container itself. Use a Docker Compose service name, Kubernetes service name, or LAN host name instead.
Before configuring a low-code tool, you can import the Postman collection and run health, model-list, and transcription requests from a GUI. For schema-driven imports, use the OpenAPI spec. Set FUNASR_BASE_URL, choose a local audio file for the multipart file field, and keep MODEL_ALIAS=sensevoice for the first test.
Every workflow engine eventually needs to send this request shape:
| Field | Value |
|---|---|
| Method | POST |
| URL | http://<funasr-host>:8000/v1/audio/transcriptions |
| Body type | multipart/form-data |
| File field | file |
| Text field | model=sensevoice |
| Text field | response_format=verbose_json |
| Timeout | Set according to maximum audio duration, for example 300 seconds for long files. |
Equivalent curl command:
curl "$FUNASR_BASE_URL/v1/audio/transcriptions" \
-F file=@meeting.wav \
-F model=sensevoice \
-F response_format=verbose_jsonTypical JSON fields to map downstream:
| Path | Use |
|---|---|
text |
Plain transcript for a chatbot, ticket, or knowledge-base step. |
segments |
Timestamps and speaker labels when verbose_json is requested. |
duration |
Audio processing time reported by the API, useful for logs. |
model |
Model alias used for the request. |
Use this pattern when a Dify application receives an uploaded audio file or a URL to internal audio storage.
Configure an HTTP request node or custom tool with:
- Method:
POST - URL:
http://<funasr-host>:8000/v1/audio/transcriptions - Body:
multipart/form-data - File part:
file, bound to the uploaded audio variable - Text parts:
model=sensevoice,response_format=verbose_json - Output variable: map
textas the transcript, and keepsegmentswhen timestamps or speaker labels matter
Some workflow tools pass a file URL rather than raw multipart bytes. In that case, add a small internal worker:
- Dify sends the audio URL and metadata to the worker.
- The worker downloads the file from trusted storage.
- The worker posts multipart data to FunASR.
- The worker returns
text,segments, and operational logs to Dify.
import requests
FUNASR_URL = "http://funasr-api:8000/v1/audio/transcriptions"
def transcribe_from_url(audio_url: str) -> dict:
audio_response = requests.get(audio_url, timeout=120)
audio_response.raise_for_status()
files = {"file": ("audio.wav", audio_response.content, "audio/wav")}
data = {"model": "sensevoice", "response_format": "verbose_json"}
response = requests.post(FUNASR_URL, files=files, data=data, timeout=300)
response.raise_for_status()
return response.json()Keep this worker inside your trusted network and validate allowed URL domains before downloading user-provided links.
A common n8n flow is: trigger -> binary audio data -> HTTP Request -> transcript consumer.
Recommended HTTP Request settings:
| n8n setting | Value |
|---|---|
| Method | POST |
| URL | http://<funasr-host>:8000/v1/audio/transcriptions |
| Send Body | enabled |
| Body Content Type | Form-Data / multipart |
| Binary file field | file |
| Additional form fields | model=sensevoice, response_format=verbose_json |
| Response Format | JSON |
| Timeout | Increase for long recordings. |
After the request, use {{$json.text}} as the transcript. If verbose_json is enabled, route {{$json.segments}} to subtitle, speaker, or QA steps.
Use this when the workflow engine cannot send multipart files reliably or when audio needs pre-processing.
from pathlib import Path
import tempfile
import requests
FUNASR_URL = "http://localhost:8000/v1/audio/transcriptions"
def transcribe_bytes(filename: str, payload: bytes, content_type: str = "audio/wav") -> dict:
with tempfile.NamedTemporaryFile(suffix=Path(filename).suffix or ".wav") as tmp:
tmp.write(payload)
tmp.flush()
with open(tmp.name, "rb") as audio:
response = requests.post(
FUNASR_URL,
files={"file": (filename, audio, content_type)},
data={"model": "sensevoice", "response_format": "verbose_json"},
timeout=300,
)
response.raise_for_status()
return response.json()This worker is also the right place to add audio conversion, file-size checks, request IDs, authentication, and retries.
- Put authentication, TLS, upload-size limits, and rate limits in front of FunASR before sharing it across teams; use the security and gateway guide for proxy and gateway patterns.
- Use
/healthfor workflow readiness checks and/v1/modelsto validate model aliases. - Log request id, audio duration, model alias, response format, device, latency, and error type.
- Set workflow timeouts according to maximum audio duration; split very long recordings before sending them through low-code tools.
- Keep private audio in trusted storage and avoid putting signed URLs, credentials, or transcripts into public logs.
- Run the same workflow with at least one public smoke sample and one realistic private sample before production use.
| Symptom | Fix |
|---|---|
Workflow can call /health but transcription fails |
Confirm the request is multipart/form-data and the binary field is named file. |
localhost connection fails from Dify or n8n |
Use the host, Compose service, or Kubernetes service reachable from the workflow runtime. |
Response has no segments |
Set response_format=verbose_json. |
| Requests time out | Increase HTTP timeout or split long recordings. |
| First request is slow | Preload the model with --model sensevoice and use /health as a readiness check. |
| Unknown model alias | Call /v1/models and use one of the returned aliases. |