Push feeder diagnostics to airplanes.live every 10 minutes#79
Merged
Conversation
New airplanes-diagnostics.{sh,service,timer} collect uptime, CPU, memory, disk, services, versions, and optional Pi throttle bits, then POST to /api/feeders/diagnostics with an alv1.<uuid>.<secret> bearer in a curl --config tempfile so the token never lands in argv. REPORT_STATUS in feed.env gates the push; unrecognized values exit 64 so systemd surfaces the bad config.
…_health probes Codex-review followups. The collector ships 0644 (cp scripts/*.sh preserves source mode); systemd would fail with status=203/EXEC on the first timer fire — chmod +x. Replace the jq walk() pruner with an inline _prune def so it works on jq 1.5 packagings (Debian Buster). Split build_pi_health_json so throttle and NTP probes are independent — a broken timedatectl no longer suppresses vcgencmd undervoltage/throttle bits. Fail-loud if the curl --config write fails (so curl never runs without the Authorization header). Bump TimeoutStartSec to 90s for tail-latency tolerance.
Halves the push frequency to ~6 ticks per hour. Bumps the apl-feed status freshness thresholds (ok <= 20 min, warn <= 60 min, stale otherwise) so a single missed tick stays OK after the cadence change.
Operator-facing toggle for the diagnostics push, routing through apl_feed_apply so webconfig and CLI writes share the same lock + validator + canonical bytes. Removes the "hand-edit REPORT_STATUS=" path from the feed.env template comment in favour of the CLI commands.
Codex-review followups for the 5→10 min interval bump. The OnUnitActiveSec=10min + RandomizedDelaySec=30s + systemd's default AccuracySec=1min coalescing pushes worst-case tick spacing to ~11.5 min; one missed tick can leave the previous successful push ~23 min old, plus a 90s slow recovery. Bump the diagnostics_status_line ok bound to 1500s (25 min) so a single missed tick stays green. Add a pre-POST re-read of REPORT_STATUS so an apl-feed diagnostics disable landing during the ~seconds of probe work cancels the in-flight push instead of letting one more payload through.
…ilure Final codex-review fixes. StateDirectoryMode=0755 so apl-feed status run as a non-airplanes-feed-group user can stat the last-success marker (the dir contains only mtime data, no secrets). Capture jq's exit code when building the payload and exit 0 on failure rather than letting an empty body POST and 4xx in a 10-minute loop forever.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an
airplanes-diagnosticsoneshot service + timer that wakes up every 10 minutes and POSTs a snapshot of the feeder's health to airplanes.live — uptime, CPU load and temperature, memory and disk usage, status of the airplanes-feed / airplanes-mlat / dump978-fa systemd units (including their version strings), and an optionalpi_healthblock decodingvcgencmd get_throttledandtimedatectlNTP sync. The push is anonymized: no hostname, MAC, LAN IP, SSID, or Pi serial number is collected or has a slot in the wire schema.Default: enabled. Toggle with
sudo apl-feed diagnostics enable|disable— same canonical writer path (apl_feed_applyunder/run/airplanes/feed-env.lock) the on-device webconfig uses, so CLI and UI writes are byte-identical on disk. The collector self-gates onREPORT_STATUSat every tick and re-checks the value right before the POST, so a disable landing mid-run is honoured without one final stale payload.Auth uses an
Authorization: Bearer alv1.<uuid>.<secret>header sent viacurl --configfrom a 0600 tempfile so the secret never lands in argv. On unrecognizedREPORT_STATUSthe collector exits 64 so systemd surfaces the bad config; HTTP and transport errors are logged but exit 0 to avoid systemd backoff loops.