Skip to content

Commit a140a2e

Browse files
committed
fix(infra): correct sparql-proxy and valkey healthchecks
- sparql-proxy: add a public /healthz route to the Caddyfile and point the Docker healthcheck at it. The SPARQL surface (/ , /query, /update, /store) is auth-gated, so the old healthcheck on `/` got 401 and reported the healthy proxy as unhealthy. /healthz exposes no data — just liveness. - valkey: require SORTINGHAT_REDIS_PASSWORD (matching SortingHat's auth) and pass it in the healthcheck — without it the worker's AUTH was rejected, it crash-looped, and Mordred's identities/enrichment phase stalled.
1 parent 3ed50db commit a140a2e

3 files changed

Lines changed: 20 additions & 2 deletions

File tree

infra/open-pulse-stack/docker-compose.grimoirelab.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,17 @@ services:
1818
valkey:
1919
image: valkey/valkey:8
2020
restart: unless-stopped
21+
# SortingHat (server + worker) authenticate with SORTINGHAT_REDIS_PASSWORD,
22+
# so Valkey must require that same password — otherwise the worker's AUTH is
23+
# rejected ("AUTH called without any password configured"), it crash-loops,
24+
# and Mordred's identities/enrichment phase stalls.
25+
command: ["valkey-server", "--requirepass", "${SORTINGHAT_REDIS_PASSWORD:-replace-me}"]
2126
expose:
2227
- "6379"
2328
healthcheck:
24-
test: ["CMD", "valkey-cli", "--raw", "incr", "ping"]
29+
# Must authenticate now that requirepass is set, else the healthcheck fails
30+
# and sortinghat_worker's `depends_on: service_healthy` never releases.
31+
test: ["CMD", "valkey-cli", "-a", "${SORTINGHAT_REDIS_PASSWORD:-replace-me}", "--no-auth-warning", "--raw", "incr", "ping"]
2532
retries: 5
2633
volumes:
2734
- ${GRIMOIRE_DATA_DIR:-${OPEN_PULSE_DATA_DIR:-./data}/grimoirelab}/valkey:/data

infra/open-pulse-stack/docker-compose.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -341,7 +341,9 @@ services:
341341
- ${OPEN_PULSE_DATA_DIR:-./data}/sparql-proxy/data:/data
342342
- ${OPEN_PULSE_DATA_DIR:-./data}/sparql-proxy/config:/config
343343
healthcheck:
344-
test: ["CMD-SHELL", "wget -q --spider http://localhost:7878/ || exit 1"]
344+
# Probe the public /healthz route (Caddyfile) — the SPARQL surface on
345+
# `/` stays auth-gated, so hitting `/` here would 401 and false-fail.
346+
test: ["CMD-SHELL", "wget -q --spider http://localhost:7878/healthz || exit 1"]
345347
interval: 15s
346348
timeout: 5s
347349
retries: 5

infra/services/sparql-proxy/Caddyfile

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,15 @@
2121
}
2222

2323
:7878 {
24+
# ── Liveness probe ────────────────────────────────────────────────
25+
# Public (no auth, no data): just confirms Caddy is up. Lets Docker's
26+
# healthcheck succeed without exposing the auth-gated SPARQL surface.
27+
# Matched by its own `handle` so it terminates before the read-auth /
28+
# reverse-proxy directives below.
29+
handle /healthz {
30+
respond "ok" 200
31+
}
32+
2433
# ── Static dashboard UI ───────────────────────────────────────────
2534
# Single-page BYOK dashboard for talking to Open Pulse services.
2635
# Mounted from infra/services/sparql-proxy/projects-ui in the host.

0 commit comments

Comments
 (0)