You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(gateway): feishu voice message STT via gateway audio attachment
- Add msg_type=audio support to feishu adapter (parse, download, base64 encode)
- Add MediaRef::Audio variant and download_feishu_audio() function
- Add "audio" attachment type to core gateway handler (decode → stt::transcribe)
- Pass SttConfig to gateway handler via GatewayParams
- Update docs/feishu.md and docs/stt.md for multi-platform voice support
Feishu voice messages (opus/ogg) are downloaded by the gateway, passed as
base64-encoded audio attachments to core, and transcribed via the existing
[stt] infrastructure (Groq Whisper by default). This is the first gateway
platform to support audio — LINE/Telegram can reuse the core-side handler.
Tested: 102 gateway tests + 197 core tests pass. E2E verified.
|`file`| Text files only (`.txt`, `.py`, `.rs`, `.md`, `.json`, etc., max 512KB). Non-text files (`.pdf`, `.zip`, etc.) are silently ignored. |
170
+
|`audio`| Voice message downloaded (opus/ogg, max 25MB), base64 encoded, forwarded to core. If `[stt]` is enabled, core transcribes via Whisper API and injects `[Voice message transcript]: ...` into the prompt. If STT is disabled or fails, the message is silently skipped. |
170
171
|`post`| Rich text: text nodes extracted as prompt, `img` nodes downloaded as image attachments. This is the format Feishu uses when @mention + paste image in a group. |
171
172
172
173
**Group chat limitation:** Feishu does not allow @mention and image upload in the same message. However, @mention + paste (Ctrl+V) an image works — Feishu sends this as a `post` message containing both the mention and the image. Direct image upload (via the attachment button) cannot include @mention, so the bot will not respond in groups.
Copy file name to clipboardExpand all lines: docs/stt.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Speech-to-Text (STT) for Voice Messages
2
2
3
-
openab can automatically transcribe Discord voice message attachments and forward the transcript to your ACP agent as text.
3
+
openab can automatically transcribe voice message attachments (Discord, Feishu, and other gateway platforms) and forward the transcript to your ACP agent as text.
@@ -161,6 +161,6 @@ When disabled, audio attachments are silently skipped with no impact on existing
161
161
## Technical Notes
162
162
163
163
- openab sends `response_format=json` in the transcription request to ensure the response is always parseable JSON. Some local whisper servers default to plain text output without this parameter.
164
-
- The actual MIME type from the Discord attachment is passed through to the STT API (e.g. `audio/ogg`, `audio/mp4`, `audio/wav`).
164
+
- The actual MIME type from the platform attachment is passed through to the STT API (e.g. `audio/ogg` for Discord and Feishu voice messages, `audio/mp4`, `audio/wav`).
165
165
- Environment variables in config values are expanded via `${VAR}` syntax (e.g. `api_key = "${GROQ_API_KEY}"`).
166
166
- The `api_key` field is auto-detected from the `GROQ_API_KEY` environment variable when using the default Groq endpoint. If you set a custom `base_url` (e.g. local server), auto-detect is disabled to avoid leaking the Groq key to unrelated endpoints — you must set `api_key` explicitly.
0 commit comments