Skip to content

feat(addons): self-refresh inside the container, drop sync workflow#211

Merged
rubenhensen merged 3 commits into
mainfrom
feat/addons-container-refresh
May 11, 2026
Merged

feat(addons): self-refresh inside the container, drop sync workflow#211
rubenhensen merged 3 commits into
mainfrom
feat/addons-container-refresh

Conversation

@rubenhensen

@rubenhensen rubenhensen commented May 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Bundles scripts/sync-addons.mjs + a Node binary (copied from node:24-slim) into the nginx runtime image, plus a /docker-entrypoint.d/50-sync-addons.sh shim that backgrounds a refresh loop (default 6h, SYNC_ADDONS_INTERVAL to tune, SYNC_ADDONS_DISABLE to skip entirely).
  • The script grows a DOWNLOADS_DIR env override so it can target the live nginx htdocs dir (/usr/share/nginx/html/postguard/downloads) at runtime, and uses tmp-file + rename(2) so concurrent download requests never observe a partial file.
  • Removes .github/workflows/sync-addons.yml and the chore: sync PostGuard addons … PRs it produced — addon releases now reach prod by way of the running container picking them up on its next loop iteration, no human merge required.
  • Drops the GITHUB_TOKEN plumbing from the fetch path. At 6h intervals from a dedicated egress IP we're at 2 listing calls / 6h — 0.5% of GitHub's 60/h unauthenticated cap. Asset downloads come from browser_download_url and aren't API-rate-limited at all.
  • Also folds in the re-tag fix (8664f87): when upstream re-tags byte-identical content (v0.1.3 → v0.1.5 etc.), the metadata json is refreshed in place without a redundant re-download.

Why

  • The PR-gated cron flow stalled in two ways: cron only fired daily, and the resulting chore/sync-addons PR needed a human to merge it. Today's symptom: outlook addon v0.3.0 published 2026-05-11 12:13 UTC, daily cron had already run at 09:56 UTC, no PR opened, staging stayed on v0.1.3.
  • The committed static/downloads/* binaries stay as the build-time / cold-boot fallback, so the image is always functional even with SYNC_ADDONS_DISABLE=1 or network failures.

Test plan

  • docker build -f docker/Dockerfile -t pg-website:test . succeeds.
  • Run with -e SYNC_ADDONS_INTERVAL=60; docker logs shows the loop iterating and reports [outlook] updating: v0.1.5 -> v0.3.0 on first tick. (Don't drop below ~60s on a shared NAT — at 2 API calls per iteration the 60/h unauthenticated cap is reachable.)
  • curl -I http://localhost/downloads/postguard-outlook-manifest.xml returns 200 with the v0.3.0 size (10457).
  • Run with -e SYNC_ADDONS_DISABLE=1; logs show disabled via SYNC_ADDONS_DISABLE; nginx still serves the baked-in fallback.
  • After merge to main: confirm GitLab CI deploys, then verify staging /downloads/postguard-outlook-manifest.xml updates to the new sha within SYNC_ADDONS_INTERVAL.

Notes

  • The metadata bump in static/downloads/postguard-outlook-manifest.json (v0.1.3 → v0.1.5) is what the cherry-picked re-tag commit left behind; the loop will overwrite it with v0.3.0 on first container start, so it's only the cold-boot fallback that's slightly stale.
  • Image size grows by ~50 MB for the Node binary. We pull it from node:24-slim (same debian:bookworm base as nginx:1.27.4) so the runtime libs match without an extra apt install.
  • Supersedes ci(sync-addons): run hourly instead of daily #210.

…ntent

Previously the byte-skip short-circuit returned without touching
postguard-outlook-manifest.json, so when the outlook addon re-tagged
v0.1.3 → v0.1.4 → v0.1.5 with byte-identical manifest.xml content the
served file was correct but the metadata stayed pinned to v0.1.3 —
tag, releaseUrl and publishedAt all rotted.

When the cached sha matches the upstream sha but the cached tag does
not, rewrite the metadata using the new release info without
re-downloading. The served xml stays as-is.
The image now bundles node + scripts/sync-addons.mjs and a small
/docker-entrypoint.d/50-sync-addons.sh shim that backgrounds a refresh
loop (default 6h, override with SYNC_ADDONS_INTERVAL, disable with
SYNC_ADDONS_DISABLE). The loop writes via temp-file + rename so live
download requests never see a partial file. The committed binaries in
static/downloads/ stay as the build-time / cold-boot fallback.

Removes the Sync addons workflow + the chore PRs it produced — addon
releases now reach prod by way of the running container picking them up
on its next loop iteration.

The script also gained a DOWNLOADS_DIR env override so it can target the
nginx htdocs dir at runtime instead of the in-repo static/ tree.

Verified by building the image and running it with SYNC_ADDONS_INTERVAL=3:
loop output appears in docker logs, nginx serves
/downloads/postguard-outlook-manifest.xml with 200.
@rubenhensen

Copy link
Copy Markdown
Contributor Author

/dobby review this pr

@dobby-coder

dobby-coder Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Dobby is on it! Dobby will read the PR and leave a careful review for Master Ruben. Dobby promises to be thorough!

@dobby-coder

dobby-coder Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Reviewed this end-to-end. Overall the approach is solid — atomic rename, sha256 verification against the API digest, lookback window for assets-not-yet-uploaded races, and the runtime DOWNLOADS_DIR override all look right. A handful of things worth a second look:

Operational

  1. Unauthenticated GitHub API rate limit (60/h per egress IP). The script only adds Authorization when GITHUB_TOKEN is set, and the entrypoint shim doesn't surface a way to pass one. At the 6h default this is comfortably under the limit, but:

    • The test-plan step -e SYNC_ADDONS_INTERVAL=3 will exhaust 60/h in ~3 min (2 calls/iteration × ~1200 iter/h), so the smoke test can fail spuriously on a shared NAT.
    • If we ever lower the production interval, or if staging+prod share an egress, it gets tight. Worth either documenting "set GITHUB_TOKEN for short intervals" or plumbing a SYNC_ADDONS_GITHUB_TOKEN env through the entrypoint.
  2. No guard on SYNC_ADDONS_INTERVAL. A typo like SYNC_ADDONS_INTERVAL=0 (or non-numeric) will tight-loop or break sleep. A minimum clamp (e.g. [ "$INTERVAL" -lt 10 ] && INTERVAL=10) would be cheap insurance.

  3. Backgrounded loop has no signal handling. The subshell is ( … ) & with no trap and will keep running through container SIGTERM until SIGKILL arrives. Not a correctness issue (the container is going down anyway, and writeAtomic keeps the on-disk state consistent), but a trap 'exit 0' TERM INT in the inner shell would make shutdown cleaner and easier to read in logs.

Script

  1. mkdir only on the changed-content path. syncTarget calls mkdir(downloadsDir, { recursive: true }) inside the "download new asset" branch, but the re-tag branch writes metaPath without it. The shim already guards [ ! -d "${DOWNLOADS_DIR}" ], so in practice this is fine — but if anyone reuses syncTarget standalone (or if the dir is removed mid-run) the re-tag path will throw. Moving the mkdir up to the top of syncTarget would make the function self-sufficient.

  2. asset.digest is relatively new on the GitHub API. It rolled out in 2024 and is now stable on release assets, but worth knowing: if an older release in the lookback window predates digest population, parseSha256 returns null and the whole target fails — even though we could still fall back to hashing the downloaded body. Not urgent; today's releases all have it.

Docker

  1. Node binary copy assumes ABI compatibility with nginx:1.27.4. Both base on debian:bookworm-slim today, so this works. The PR comment in the Dockerfile already calls this out, which is good — but it means a future bump of NGINX_VERSION (to a non-bookworm tag) silently breaks node at runtime with a GLIBC_x.yz not found. Worth a small RUN /usr/local/bin/node --version line in the runtime stage so the image build fails loudly when the ABI drifts.

  2. +~50 MB image size is acknowledged; no objection. (The brotli build stage already dominates, so this is in the noise.)

Visibility regression (worth thinking about, not blocking)

Removing .github/workflows/sync-addons.yml also removes the PR-as-audit-trail. Today, "what addon version is staging serving?" was answerable from git log static/downloads/. After this PR, the committed JSON is just a cold-boot fallback and the live state lives only in the container filesystem + container logs. If we want to keep an audit trail we could either:

  • have the loop append a one-line summary to a /usr/share/nginx/html/postguard/downloads/.sync-state.json that's served (gated) for ops, or
  • keep a stripped-down weekly workflow that just opens a tracking issue when staging diverges from main's committed metadata.

Not a blocker — flagging because the PR description leans on "addon releases reach prod by way of the running container" which is exactly the visibility tradeoff.

Test plan

The checklist in the PR description covers the happy path well. Two suggested additions:

  • An iteration where the upstream release's asset is missing (e.g. drafted but not uploaded) — the lookback-and-warn behavior is the most complex part of the script and isn't currently in the test plan.
  • A SIGTERM to the container while the sync is mid-download, to confirm we don't leave .tmp files behind. (We probably do, but they're harmless on next boot.)

Nits

  • findReleaseWithAsset warns when falling back, but only once per iteration. Repeated warns every 6h for the same lagging release will spam logs — if logs are scraped this might matter. Low priority.
  • The shebang on sync-addons.mjs (#!/usr/bin/env node) is unused since it's always invoked as node …. Harmless.

Nothing here is a hard block. (4), (6) and the rate-limit doc note in (1) feel worth landing alongside; the rest are judgment calls.

The auth branch was a holdover from when this ran in GitHub Actions,
where the runner's shared egress IP made the 60/h unauthenticated cap
real. The container has a dedicated egress and at the 6h default makes
2 listing calls / 6h — 0.5% of the unauth quota. Asset downloads come
from browser_download_url and aren't API-rate-limited.

Removes the conditional so we're not tempted to plumb a token through
the entrypoint solving a non-problem.
@rubenhensen

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough pass /dobby. On point (1) specifically:

The token plumbing was a holdover from when this ran in GitHub Actions, where the runner's shared egress IP made the 60/h unauthenticated cap real. The container ships with a dedicated egress IP, so the math changes: 2 listing calls per 6h iteration = ~0.33 calls/h, or 0.5% of the unauthenticated quota. The asset itself comes from browser_download_url (a CDN path), which isn't API-rate-limited.

So rather than plumb SYNC_ADDONS_GITHUB_TOKEN through the entrypoint to solve a non-problem, I've dropped the Authorization conditional from fetchJson entirely (35329f8). Your test-plan concern (SYNC_ADDONS_INTERVAL=3 would burn 60/h in 90s) was a real but test-config artifact — I've bumped the suggested smoke interval to 60s in the PR description and added a "don't drop below ~60s on a shared NAT" note.

Will look at the other points (4, 6, signal-handling, visibility regression) separately.

@dobby-coder

dobby-coder Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Dobby sees the update, master Ruben! Dobby is most pleased to learn the runtime is no longer behind a shared-NAT egress — the math checks out and dropping the Authorization conditional is the right call. The bot will go verify 35329f8 now and await your follow-ups on the other points.

@dobby-coder

dobby-coder Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Verified 35329f8 — the conditional is gone, the inline comment captures the 6h-interval math, and the commit message preserves the why for git archaeology. Point (1) is resolved on this end; the bot will look out for your follow-ups on (4), (6), signal-handling, and the visibility regression.

@rubenhensen rubenhensen merged commit 3dbf9b7 into main May 11, 2026
13 of 14 checks passed
@rubenhensen rubenhensen deleted the feat/addons-container-refresh branch May 11, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant