Skip to content

Improve Heartbeat Reliability#33

Merged
yiwang merged 7 commits into
localgpt-app:mainfrom
jcorbin:heartbeat_wen
Feb 17, 2026
Merged

Improve Heartbeat Reliability#33
yiwang merged 7 commits into
localgpt-app:mainfrom
jcorbin:heartbeat_wen

Conversation

@jcorbin
Copy link
Copy Markdown
Contributor

@jcorbin jcorbin commented Feb 17, 2026

So my heartbeats stopped working somewhere between v0.1.3 and the currently unreleased v0.3.0 main.

I've been trying to diagnose and fix, mostly increasing the observability, consistency, and persistency of heartbeat interval timing.

Just finally got to a place where I can at least see the problem, so decided to share my WIP improvements for discussion.

What looks to be happening is some kind of hang (deadlock? livelock? unclear as yet...) with the TurnGate and the heartbeat runner; see below for logs.

The commits in this branch are reasonably clear, but to recap my changes:

  • elevated all of the heartbeat runner logs from debug to info, since like "the one main job" of the server is to run the heartbeat reliably
  • shifted to using a tokio interval rather than a raw sleep, this makes the timing period consistent and not perturbed by however long the heartbeat run itself takes
  • persist the last heartbeat event into a json state file and...
  • ...use that state to resume heartbeats on time after a restart, rather than waiting a full heartbeat interval for a first overdue tick
  • differentiate transient skips ( ones that might readily go away on a retry ) and use that to quicken the heartbeat interval with backoff

Right so, now for the current diagnostics:

Feb 17 14:38:02 doral localgpt[524162]: 2026-02-17T19:38:02.222331Z  INFO localgpt_core::memory: Using OpenAI embedding provider: nomic-embed-text:v1.5
Feb 17 14:38:02 doral localgpt[524162]: 2026-02-17T19:38:02.222335Z  INFO localgpt_core::memory: Using OpenAI embedding provider: nomic-embed-text:v1.5
Feb 17 14:38:02 doral localgpt[524162]: 2026-02-17T19:38:02.222366Z  INFO localgpt_core::heartbeat::runner: starting runner with interval: 1800s
Feb 17 14:38:02 doral localgpt[524162]: 2026-02-17T19:38:02.222411Z  INFO localgpt_core::heartbeat::runner: first tick scheduled after: 60s at: Instant { tv_sec: 83333, tv_nsec: 349302103 }
Feb 17 14:38:02 doral localgpt[524162]: 2026-02-17T19:38:02.222550Z  INFO localgpt_server::http: Starting HTTP server on http://127.0.0.1:31327
Feb 17 14:38:02 doral localgpt[524162]: 2026-02-17T19:38:02.233782Z  INFO localgpt_core::memory: Using OpenAI embedding provider: nomic-embed-text:v1.5
Feb 17 14:38:02 doral localgpt[524162]: 2026-02-17T19:38:02.233830Z  INFO localgpt_server::telegram: Telegram bot: paired with user unknown (ID: 8410244203)
Feb 17 14:38:02 doral localgpt[524162]: 2026-02-17T19:38:02.702953Z  INFO localgpt_server::telegram: Starting Telegram bot...
Feb 17 14:38:03 doral localgpt[524162]: 2026-02-17T19:38:03.451915Z  INFO localgpt_core::agent: Created new session: a22f704e-1429-4310-95d1-0435e8fef3e2
Feb 17 14:38:19 doral localgpt[524162]: 2026-02-17T19:38:19.991746Z  INFO teloxide::update_listeners::polling: retrying getting updates in 1s
Feb 17 14:38:19 doral localgpt[524162]: 2026-02-17T19:38:19.991788Z ERROR teloxide::error_handlers: An error from the update listener: Network(reqwest::Error { kind: Request, url: "https://api.telegram.org/token:redacted/GetUpdates", source: TimedOut })
Feb 17 14:39:02 doral localgpt[524162]: 2026-02-17T19:39:02.223416Z  INFO localgpt_core::heartbeat::runner: tick starting at: Instant { tv_sec: 83333, tv_nsec: 350296439 }
Feb 17 14:39:02 doral localgpt[524162]: 2026-02-17T19:39:02.223450Z  INFO localgpt_core::heartbeat::runner: skipping: agent turn in flight (TurnGate busy)
Feb 17 14:39:02 doral localgpt[524162]: 2026-02-17T19:39:02.223456Z  INFO localgpt_core::heartbeat::runner: tick done elapsed: 49.984µs
Feb 17 14:39:02 doral localgpt[524162]: 2026-02-17T19:39:02.223467Z  INFO localgpt_core::heartbeat::runner: transient skip, retry quickly after: 2s
Feb 17 14:39:02 doral localgpt[524162]: 2026-02-17T19:39:02.223655Z  INFO localgpt_core::heartbeat::runner: waiting for next tick
Feb 17 14:39:04 doral localgpt[524162]: 2026-02-17T19:39:04.225230Z  INFO localgpt_core::heartbeat::runner: tick starting at: Instant { tv_sec: 83335, tv_nsec: 352116844 }
Feb 17 14:39:04 doral localgpt[524162]: 2026-02-17T19:39:04.225251Z  INFO localgpt_core::heartbeat::runner: skipping: agent turn in flight (TurnGate busy)
Feb 17 14:39:04 doral localgpt[524162]: 2026-02-17T19:39:04.225259Z  INFO localgpt_core::heartbeat::runner: tick done elapsed: 32.671µs
Feb 17 14:39:04 doral localgpt[524162]: 2026-02-17T19:39:04.225268Z  INFO localgpt_core::heartbeat::runner: transient skip, retry quickly after: 4s
Feb 17 14:39:04 doral localgpt[524162]: 2026-02-17T19:39:04.225400Z  INFO localgpt_core::heartbeat::runner: waiting for next tick
Feb 17 14:39:08 doral localgpt[524162]: 2026-02-17T19:39:08.226698Z  INFO localgpt_core::heartbeat::runner: tick starting at: Instant { tv_sec: 83339, tv_nsec: 353584154 }
Feb 17 14:39:08 doral localgpt[524162]: 2026-02-17T19:39:08.226724Z  INFO localgpt_core::heartbeat::runner: skipping: agent turn in flight (TurnGate busy)
Feb 17 14:39:08 doral localgpt[524162]: 2026-02-17T19:39:08.226729Z  INFO localgpt_core::heartbeat::runner: tick done elapsed: 36.248µs
Feb 17 14:39:08 doral localgpt[524162]: 2026-02-17T19:39:08.226736Z  INFO localgpt_core::heartbeat::runner: transient skip, retry quickly after: 8s
Feb 17 14:39:08 doral localgpt[524162]: 2026-02-17T19:39:08.226887Z  INFO localgpt_core::heartbeat::runner: waiting for next tick

and there the log stop, we do not in fact see another heartbeat runner wakeup after the promised 8s quickened interval.

Will update as I find more, but wanted to share earlier than "fully fixed it" :-)

- everything info: operation of the heartbeat is basically the main job
  of the daemon, let's not silence those away as debug logs
- clarify log sites with name tagging
- add an additional "actually doing the thing" log to help puzzled
  operators left wondering
@yiwang yiwang marked this pull request as ready for review February 17, 2026 21:43
@yiwang yiwang merged commit aa78201 into localgpt-app:main Feb 17, 2026
3 of 5 checks passed
@jcorbin jcorbin deleted the heartbeat_wen branch February 18, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants