Skip to content

Commit 0079352

Browse files
canyugsclaudechaodu-agent
authored
feat(gateway): Google Chat attachment support (image / file / audio + STT) (#762)
* feat(gateway): inbound attachment support for Google Chat Implements image / text file / audio download from Google Chat via Media API + service account token, following the PR #731 base64 pattern. Changes: - GoogleChatMessage: parse attachment[] array (Attachment / AttachmentDataRef / DriveDataRef structs) - GoogleChatMediaRef enum: Image / File / Audio variants for typed dispatch - parse_attachments(): branches on contentType prefix, skips DRIVE_FILE source - download_googlechat_image(): resize → 1200px JPEG q75, max 10MB, GIF preserved - download_googlechat_file(): text extension whitelist (.txt/.md/.py/...), max 512KB - download_googlechat_audio(): forwarded as-is for core STT pipeline, max 25MB - media_url(): percent-encode resource_name as path segment - webhook handler: parses attachments, async-downloads via adapter token, populates Content.attachments - Empty-text events with attachments are now forwarded (previously dropped) - Tests: 11 new (parse, download success/skip/oversized, URL encoding) Refs: #731 (Feishu pattern) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(core): STT for Custom Gateway audio attachments Extends src/gateway.rs attachment handling to transcribe audio attachments via the existing STT pipeline (previously only Discord/Slack adapters went through download_and_transcribe; Custom Gateway adapters got no audio path even though stt::transcribe was available). When a gateway adapter (Feishu, Google Chat, etc.) sends an Attachment with attachment_type = "audio", core now: 1. Decodes base64 → audio bytes 2. Calls stt::transcribe with the configured SttConfig 3. Wraps the transcript as a ContentBlock::Text: "[Voice message transcript]: ..." The audio branch is gated on stt_config.enabled — if STT is disabled in config, audio attachments fall through unchanged (same as before). Threads stt_config through GatewayParams and run_gateway_adapter. This closes the audio attachment gap left by the (now-closed) PR #726 without re-introducing the HTTP MediaStore proxy approach. Pairs with the Google Chat adapter audio download (separate PR) — once both land, Google Chat voice/audio attachments work end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(gateway): address googlechat attachment review feedback Addresses canyugs#4 must-fix items: #1+#2 Webhook timeout safety: - Spawn background tokio task for attachment downloads so the webhook returns 200 within Google Chat's 30s deadline regardless of how long downloads take. - Add 30s per-request timeout to all Media API GET calls — a single hung connection can no longer stall the download task indefinitely. - Refactor event emission into send_googlechat_event helper to share between sync (no-attachment) and async (background download) paths. #4 Text file caps (matches Discord/Slack): - TEXT_FILE_COUNT_CAP = 5: skip text files past the 5th with a warning. - TEXT_TOTAL_CAP = 1 MB: skip text files that would push the running aggregate past 1 MB with a warning. - Image and audio attachments are not capped (same as Discord/Slack). #6 STT silent failure: - When stt::transcribe returns None, push a fallback ContentBlock::Text ("[Voice message — transcription failed for ...]") so the agent knows a voice message was attempted and can ask the user to re-send. Previously the failure was silent and confusing. Skipped from issue #4: #3 (streaming download), #5 (cross-adapter refactor — adapters stay independent per design), #7-#10 (cosmetic). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(gateway): correct media_url encoding, remove lossy UTF-8 round-trip, add spawn panic logging - media_url: preserve `/` as literal path separators per Google Chat Media API's RFC 6570 reserved expansion (`{+resourceName}`). Previously all `/` were encoded as `%2F` which is fragile. - download_googlechat_file: base64-encode raw bytes directly instead of round-tripping through String::from_utf8_lossy which silently replaces invalid bytes with U+FFFD. - Spawned attachment download task: log an error if the task panics so silent message drops are diagnosable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(gateway): address review — remove .env from whitelist, add audio decode fallback - Remove `"env"` from TEXT_EXTS whitelist to prevent credential exposure if a user accidentally uploads a .env file. - Audio base64 decode failure now produces a fallback text block ("[Voice message — decode failed for X]") instead of silently dropping. - Audio attachments when STT is disabled now log at debug level instead of being silently discarded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(gateway): simplify text file cap logic, defer text allocation to spawn path - Flatten nested if/else in File download cap check using early continue, improving readability. - Defer text .to_string() allocation to the tokio::spawn path so the no-attachment fast path avoids a heap allocation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(gateway): address remaining review nits - Replace double-spawn panic logging with single spawn + catch_unwind — more idiomatic, same observability. - Remove unused content_type from Image/File variants of GoogleChatMediaRef; only Audio needs it. Drops #[allow(dead_code)] on the enum. - Pass remaining aggregate budget to download_googlechat_file so Content-Length is checked against the budget before downloading, avoiding wasted bandwidth on files that would exceed the cap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(gateway): enforce aggregate budget on post-download check, log skipped video attachments - download_googlechat_file: post-download size check now uses max_size (min of FILE_MAX_DOWNLOAD and remaining_budget) instead of only FILE_MAX_DOWNLOAD, ensuring TEXT_TOTAL_CAP is respected even when Content-Length header is absent. - parse_attachments: video/ MIME type now gets an explicit info! log and is skipped early, instead of silently failing the text extension whitelist downstream. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: chaodu-agent <chaodu-agent@openab.dev>
1 parent 1f8864c commit 0079352

3 files changed

Lines changed: 770 additions & 30 deletions

File tree

docs/google-chat.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,11 +143,17 @@ working_dir = "/home/agent"
143143
- Inline code, fenced code blocks: pass through unchanged
144144
- Tables and other unsupported syntax pass through as-is
145145
- **Streaming (edit_message)** — when OAB streaming is enabled, the bot edits its initial reply in-place as tokens arrive (typewriter effect)
146+
- **Inbound attachments** — image, text file, and audio attachments are downloaded via Google Chat Media API and forwarded to the agent as base64 (PR #731 pattern):
147+
- Images: resized to ≤1200px JPEG (q75); GIFs preserved. Max 10 MB.
148+
- Text files: only known text extensions (`.txt`, `.md`, `.json`, `.py`, `.rs`, etc.). Max 512 KB.
149+
- Audio: forwarded as-is for STT processing by core. Max 25 MB.
150+
- Drive-sourced attachments are skipped (require separate Drive API integration).
146151

147152
### Not Supported
148153

149154
- **Reactions** — Google Chat API does not support message reactions on behalf of bots
150-
- **File/image attachments** — not yet implemented
155+
- **Outbound attachments** — bot cannot send image/file attachments back to the user yet
156+
- **Drive-linked attachments** — only `UPLOADED_CONTENT` source is handled; `DRIVE_FILE` source skipped
151157

152158
## Environment Variables (Gateway)
153159

0 commit comments

Comments
 (0)