fix(telegram): retry polling on all errors, not just 409 Conflict#1377
Closed
3koozy wants to merge 1 commit intoanthropics:mainfrom
Closed
fix(telegram): retry polling on all errors, not just 409 Conflict#13773koozy wants to merge 1 commit intoanthropics:mainfrom
3koozy wants to merge 1 commit intoanthropics:mainfrom
Conversation
The poll-loop startup in server.ts only retried 409 Conflict errors. Any other transient error (connection timeout, TLS reset, DNS hiccup) caused bot.start() to reject, the catch block to log and return, and polling to stop permanently. The plugin process stayed alive because MCP stdin kept it running — so outbound tools (reply, react, edit_message) continued to work while the bot was completely deaf to inbound Telegram messages. This was confirmed via strace on a running container: the plugin process had zero TCP sockets, zero syscalls in a 10-second window, and zero network activity — despite being alive with 18 open FDs (all internal: epoll, timerfd, eventfd, /dev/urandom, MCP stdio). Fix: catch ALL errors (not just 409) and retry with exponential backoff up to 15 seconds. Track consecutive 409s separately so the "another poller is active" give-up logic is preserved. Reset both counters on successful bot.start() via onStart callback. Addresses the plugin-side root cause of: - anthropics/claude-code#46744 - anthropics/claude-code#46016 - anthropics/claude-code#46356 - anthropics#870 - anthropics#1345
|
Thanks for your interest! This repo only accepts contributions from Anthropic team members. If you'd like to submit a plugin to the marketplace, please submit your plugin here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The poll-loop startup in
server.tsonly retries409 Conflicterrors. Any other transient error (connection timeout, TLS reset, DNS hiccup) causesbot.start()to reject, the catch block toreturn, and polling to stop permanently. The plugin process stays alive because MCP stdin keeps it running — so outbound tools (reply,react,edit_message) continue to work while the bot is completely deaf to inbound Telegram messages.This is one of the root causes behind the widespread "outbound works, inbound doesn't" reports across 65+ open issues.
Root cause analysis
The vulnerable code path (lines 988–1035):
After
return, the async IIFE exits. The process stays alive because the MCP server's stdin reader keeps the event loop running. Outbound tool calls work (they create fresh HTTP requests), but grammy's poll loop is dead — nogetUpdatescalls, no inbound messages.Evidence (strace on a stalled plugin)
FD inspection during the stall:
After applying this fix, the plugin maintained 2 ESTABLISHED TCP connections to
149.154.166.110:443for 3+ hours and recovered from transient errors automatically.What this PR changes
bot.start()via theonStartcallbackWhat this PR does NOT fix
There is a second, separate bug in Claude Code itself: after ~2–3 hours, Claude Code's internal handler for
notifications/claude/channelevents stops surfacing inbound messages, even when the plugin is alive and actively polling (confirmed via strace showing activewritesyscalls and ESTABLISHED TCP sockets). This is tracked in anthropics/claude-code#46744 and others. This PR fixes the plugin-side crash; the Claude Code notification handler degradation is a separate issue.Related issues
Test plan
bot.stop()/ "Aborted delay") still exits cleanly🤖 Generated with Claude Code