status-page: target finelog's DataFusion SQL dialect, not DuckDB#6569
Conversation
Live verification against the prod finelog showed its read engine is
Apache DataFusion (53), not DuckDB: `json_extract_string` and `epoch_ms`
don't exist there (it registers only prefix/regexp_matches/contains
UDFs, no JSON functions).
- drop json_extract_string from the checks query: select the raw labels
blob and decode the `probe` name in JS (same as the provisioning path),
partitioning the window by labels (one `{"probe":...}` value per check)
- read collected_at as epoch millis via arrow_cast(...,'Int64') instead
of epoch_ms(), and drop the redundant value::DOUBLE cast (already f64)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08096a27de
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| AND collected_at >= TIMESTAMP '${cutoff}' | ||
| ) | ||
| SELECT probe, metric, value::DOUBLE AS value, epoch_ms(collected_at)::BIGINT AS collected_ms | ||
| SELECT labels, metric, value, arrow_cast(collected_at, 'Int64') AS collected_ms |
There was a problem hiding this comment.
Convert DataFusion timestamps down to milliseconds
When this query runs against finelog, collected_at is stored as Arrow Timestamp(Microsecond, None) (lib/finelog/rust/src/store/schema.rs documents the microsecond storage unit), so arrow_cast(collected_at, 'Int64') yields microseconds, not epoch milliseconds. The parsed rows are later passed directly to new Date(...), so every probe/provisioning freshness timestamp will be about 1000x too large and render as a far-future/negative-relative time; the same cast in provisioningSql has the same effect.
Useful? React with 👍 / 👎.
Replace the in-process worker-count ring buffer (lost on every Cloud Run restart) with durable 24h history read from the infra/probes canary's infra.canary.metrics finelog namespace, and add a provisioning success-ratio history chart beside it. - New server/sources/clusterHistory.ts: two bounded DataFusion queries over the trailing 24h — per-region worker_healthy, and fleet + per-pool provision_ready/outcomes rolled up into a fleet-average plus per-region create-success ratio (zones collapse to regions). - /api/workers/history now reads finelog; add /api/provisioning/history. Drop the WorkerHistory ring buffer and its 30s sampler; the iris-ping and control-plane latency buffers stay in process. - Workers panel shows the availability and provisioning charts side by side, sharing one region color map (built off the union of both charts' regions) so a region reads the same color in both. - Fix the collected_at decode: arrow_cast(...,'Int64') yields epoch microseconds, not millis — probe timestamps were rendering ~58000 years in the future. Shared microsToMillis + decodeLabels helpers in finelogQuery.ts.
Address the lint-catalog review of the prior commit: probes.ts and clusterHistory.ts had each re-derived the canary-namespace query scaffold. - Lift the shared bits into finelogQuery.ts: CANARY_METRICS_NAMESPACE, FLEET_SCOPE, the CanaryMetricRow row type, and the asCanaryRows micros-normalizer. Both sources import them now. - Rename the probes.ts SQL timestamp alias collected_ms -> collected_us; the cast yields epoch micros and is only normalized to millis by asCanaryRows, so the old name was misleading and inconsistent with clusterHistory.ts. - Collapse workersHistory/provisioningHistory's identical fetchedAt+cutoff+try/catch bodies into one historyResponse(buildSql, parse) helper. No behavior change; /api/workers/history, /api/provisioning/history, and /api/probes verified to return identical shapes.
QueryRPC runs Apache DataFusion 531; verified live against prod finelog, where the first query failed withInvalid function 'json_extract_string'json_extract_stringfrom the checks query: select the rawlabelsblob and decode theprobename app-side (same as the provisioning path), partitioning the window bylabels(each health check emits exactly one{"probe": "<name>"}value)collected_atas epoch millis viaarrow_cast(collected_at, 'Int64')instead ofepoch_ms(), and drop the redundantvalue::DOUBLEcast (already f64)🤖 Generated with Claude Code
Footnotes
finelog registers only
prefix/regexp_matches/containsUDFs (lib/finelog/rust/src/query/udf.rs) — no JSON functions and noepoch_ms. typecheck/lint/build green; mock parse re-verified. the SQL dialect is now confirmed against the live server; remaining check is the parse against realinfra.canary.metricsdata, done locally. ↩