Skip to content

Push feeder diagnostics to airplanes.live every 10 minutes#79

Merged
d4rken merged 6 commits into
devfrom
feat/diagnostics-collector
May 14, 2026
Merged

Push feeder diagnostics to airplanes.live every 10 minutes#79
d4rken merged 6 commits into
devfrom
feat/diagnostics-collector

Conversation

@d4rken

@d4rken d4rken commented May 13, 2026

Copy link
Copy Markdown
Member

Adds an airplanes-diagnostics oneshot service + timer that wakes up every 10 minutes and POSTs a snapshot of the feeder's health to airplanes.live — uptime, CPU load and temperature, memory and disk usage, status of the airplanes-feed / airplanes-mlat / dump978-fa systemd units (including their version strings), and an optional pi_health block decoding vcgencmd get_throttled and timedatectl NTP sync. The push is anonymized: no hostname, MAC, LAN IP, SSID, or Pi serial number is collected or has a slot in the wire schema.

Default: enabled. Toggle with sudo apl-feed diagnostics enable|disable — same canonical writer path (apl_feed_apply under /run/airplanes/feed-env.lock) the on-device webconfig uses, so CLI and UI writes are byte-identical on disk. The collector self-gates on REPORT_STATUS at every tick and re-checks the value right before the POST, so a disable landing mid-run is honoured without one final stale payload.

Auth uses an Authorization: Bearer alv1.<uuid>.<secret> header sent via curl --config from a 0600 tempfile so the secret never lands in argv. On unrecognized REPORT_STATUS the collector exits 64 so systemd surfaces the bad config; HTTP and transport errors are logged but exit 0 to avoid systemd backoff loops.

d4rken added 5 commits May 13, 2026 23:24
New airplanes-diagnostics.{sh,service,timer} collect uptime, CPU, memory, disk, services, versions, and optional Pi throttle bits, then POST to /api/feeders/diagnostics with an alv1.<uuid>.<secret> bearer in a curl --config tempfile so the token never lands in argv. REPORT_STATUS in feed.env gates the push; unrecognized values exit 64 so systemd surfaces the bad config.
…_health probes

Codex-review followups. The collector ships 0644 (cp scripts/*.sh preserves source mode); systemd would fail with status=203/EXEC on the first timer fire — chmod +x. Replace the jq walk() pruner with an inline _prune def so it works on jq 1.5 packagings (Debian Buster). Split build_pi_health_json so throttle and NTP probes are independent — a broken timedatectl no longer suppresses vcgencmd undervoltage/throttle bits. Fail-loud if the curl --config write fails (so curl never runs without the Authorization header). Bump TimeoutStartSec to 90s for tail-latency tolerance.
Halves the push frequency to ~6 ticks per hour. Bumps the apl-feed status freshness thresholds (ok <= 20 min, warn <= 60 min, stale otherwise) so a single missed tick stays OK after the cadence change.
Operator-facing toggle for the diagnostics push, routing through apl_feed_apply so webconfig and CLI writes share the same lock + validator + canonical bytes. Removes the "hand-edit REPORT_STATUS=" path from the feed.env template comment in favour of the CLI commands.
Codex-review followups for the 5→10 min interval bump. The OnUnitActiveSec=10min + RandomizedDelaySec=30s + systemd's default AccuracySec=1min coalescing pushes worst-case tick spacing to ~11.5 min; one missed tick can leave the previous successful push ~23 min old, plus a 90s slow recovery. Bump the diagnostics_status_line ok bound to 1500s (25 min) so a single missed tick stays green. Add a pre-POST re-read of REPORT_STATUS so an apl-feed diagnostics disable landing during the ~seconds of probe work cancels the in-flight push instead of letting one more payload through.
@d4rken d4rken changed the title Push feeder diagnostics to airplanes.live every 5 minutes Push feeder diagnostics to airplanes.live every 10 minutes May 14, 2026
…ilure

Final codex-review fixes. StateDirectoryMode=0755 so apl-feed status run as a non-airplanes-feed-group user can stat the last-success marker (the dir contains only mtime data, no secrets). Capture jq's exit code when building the payload and exit 0 on failure rather than letting an empty body POST and 4xx in a 10-minute loop forever.
@d4rken d4rken merged commit 46b5175 into dev May 14, 2026
12 checks passed
@d4rken d4rken deleted the feat/diagnostics-collector branch May 14, 2026 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant