fix(telegram): retry polling on all errors, not just 409 Conflict by 3koozy · Pull Request #1377 · anthropics/claude-plugins-official

3koozy · 2026-04-12T21:11:37Z

Summary

The poll-loop startup in server.ts only retries 409 Conflict errors. Any other transient error (connection timeout, TLS reset, DNS hiccup) causes bot.start() to reject, the catch block to return, and polling to stop permanently. The plugin process stays alive because MCP stdin keeps it running — so outbound tools (reply, react, edit_message) continue to work while the bot is completely deaf to inbound Telegram messages.

This is one of the root causes behind the widespread "outbound works, inbound doesn't" reports across 65+ open issues.

Root cause analysis

The vulnerable code path (lines 988–1035):

} catch (err) {
  // ...
  if (err instanceof GrammyError && err.error_code === 409) {
    // retry with backoff
    continue
  }
  // ...
  process.stderr.write(`telegram channel: polling failed: ${err}\n`)
  return  // ← ANY non-409 error exits the loop permanently
}

After return, the async IIFE exits. The process stays alive because the MCP server's stdin reader keeps the event loop running. Outbound tool calls work (they create fresh HTTP requests), but grammy's poll loop is dead — no getUpdates calls, no inbound messages.

Evidence (strace on a stalled plugin)

$ timeout 10 strace -p <plugin_pid> -e trace=network,write,read -f
strace: Process <pid> attached
strace: Process <pid> detached
(zero output — zero syscalls in 10 seconds)

FD inspection during the stall:

$ for fd in /proc/<pid>/fd/*; do echo "$(basename $fd) -> $(readlink $fd)"; done
0 -> socket:[...]    # MCP stdin (alive)
1 -> socket:[...]    # MCP stdout (alive)
2 -> socket:[...]    # stderr
3 -> /dev/urandom
4 -> anon_inode:[eventpoll]
# ... all internal FDs, ZERO TCP sockets to api.telegram.org

After applying this fix, the plugin maintained 2 ESTABLISHED TCP connections to 149.154.166.110:443 for 3+ hours and recovered from transient errors automatically.

What this PR changes

Catch ALL errors in the poll-loop, not just 409 — retry with exponential backoff (up to 15s)
Track consecutive 409s separately so the "another poller is active" give-up logic (8 attempts) is preserved
Reset counters on successful bot.start() via the onStart callback
Log all retry errors with the error message so operators can see what's happening

What this PR does NOT fix

There is a second, separate bug in Claude Code itself: after ~2–3 hours, Claude Code's internal handler for notifications/claude/channel events stops surfacing inbound messages, even when the plugin is alive and actively polling (confirmed via strace showing active write syscalls and ESTABLISHED TCP sockets). This is tracked in anthropics/claude-code#46744 and others. This PR fixes the plugin-side crash; the Claude Code notification handler degradation is a separate issue.

Related issues

Telegram plugin: inbound messages not delivered to conversation (outbound works) claude-code#46744 — Telegram plugin: inbound messages not delivered to conversation
Telegram plugin: inbound messages not delivered to session (outbound works) claude-code#46016 — Telegram plugin: inbound messages not delivered to session
Telegram channel plugin: notifications not injected into conversation claude-code#46356 — Telegram channel plugin: notifications not injected into conversation
Inbound Telegram DMs silently dropped #870 — Inbound Telegram DMs silently dropped
telegram@0.0.4: inbound messages consumed by server but never rendered as <channel> tag in Claude Code session #1345 — inbound messages consumed by server but never rendered as channel tag

Test plan

Verified in Docker container (Debian bookworm, Bun 1.x) for 3+ hours
Confirmed via strace that plugin maintains TCP connections after fix
409 Conflict handling still works (consecutive409 counter, 8-attempt limit)
Graceful shutdown (bot.stop() / "Aborted delay") still exits cleanly
Needs testing on macOS and Windows native

🤖 Generated with Claude Code

The poll-loop startup in server.ts only retried 409 Conflict errors. Any other transient error (connection timeout, TLS reset, DNS hiccup) caused bot.start() to reject, the catch block to log and return, and polling to stop permanently. The plugin process stayed alive because MCP stdin kept it running — so outbound tools (reply, react, edit_message) continued to work while the bot was completely deaf to inbound Telegram messages. This was confirmed via strace on a running container: the plugin process had zero TCP sockets, zero syscalls in a 10-second window, and zero network activity — despite being alive with 18 open FDs (all internal: epoll, timerfd, eventfd, /dev/urandom, MCP stdio). Fix: catch ALL errors (not just 409) and retry with exponential backoff up to 15 seconds. Track consecutive 409s separately so the "another poller is active" give-up logic is preserved. Reset both counters on successful bot.start() via onStart callback. Addresses the plugin-side root cause of: - anthropics/claude-code#46744 - anthropics/claude-code#46016 - anthropics/claude-code#46356 - anthropics#870 - anthropics#1345

github-actions · 2026-04-12T21:11:45Z

Thanks for your interest! This repo only accepts contributions from Anthropic team members. If you'd like to submit a plugin to the marketplace, please submit your plugin here.

github-actions bot closed this Apr 12, 2026

This was referenced Apr 12, 2026

Channel notification handler stops surfacing inbound messages after ~2-3h (Docker/Linux) anthropics/claude-code#47112

Open

fix(telegram): poll loop exits permanently on non-409 errors — root cause of 'inbound stops' reports #1378

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(telegram): retry polling on all errors, not just 409 Conflict#1377

fix(telegram): retry polling on all errors, not just 409 Conflict#1377
3koozy wants to merge 1 commit intoanthropics:mainfrom
3koozy:fix/telegram-poll-loop-retry

3koozy commented Apr 12, 2026

Uh oh!

github-actions bot commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

3koozy commented Apr 12, 2026

Summary

Root cause analysis

Evidence (strace on a stalled plugin)

What this PR changes

What this PR does NOT fix

Related issues

Test plan

Uh oh!

github-actions bot commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant