Skip to content

status-page: target finelog's DataFusion SQL dialect, not DuckDB#6569

Merged
ravwojdyla merged 3 commits into
mainfrom
rav/status-page-datafusion-fix
Jun 23, 2026
Merged

status-page: target finelog's DataFusion SQL dialect, not DuckDB#6569
ravwojdyla merged 3 commits into
mainfrom
rav/status-page-datafusion-fix

Conversation

@ravwojdyla-agent

Copy link
Copy Markdown
Contributor
  • follow-up to status-page: probes panel — synthetic-canary health + provisioning from finelog #6565 — that PR's probe queries used DuckDB builtins, but finelog's Query RPC runs Apache DataFusion 531; verified live against prod finelog, where the first query failed with Invalid function 'json_extract_string'
  • drop json_extract_string from the checks query: select the raw labels blob and decode the probe name app-side (same as the provisioning path), partitioning the window by labels (each health check emits exactly one {"probe": "<name>"} value)
  • read collected_at as epoch millis via arrow_cast(collected_at, 'Int64') instead of epoch_ms(), and drop the redundant value::DOUBLE cast (already f64)
  • README: DuckDB → DataFusion (no JSON funcs, labels decoded app-side)

🤖 Generated with Claude Code

Footnotes

  1. finelog registers only prefix/regexp_matches/contains UDFs (lib/finelog/rust/src/query/udf.rs) — no JSON functions and no epoch_ms. typecheck/lint/build green; mock parse re-verified. the SQL dialect is now confirmed against the live server; remaining check is the parse against real infra.canary.metrics data, done locally.

Live verification against the prod finelog showed its read engine is
Apache DataFusion (53), not DuckDB: `json_extract_string` and `epoch_ms`
don't exist there (it registers only prefix/regexp_matches/contains
UDFs, no JSON functions).

- drop json_extract_string from the checks query: select the raw labels
  blob and decode the `probe` name in JS (same as the provisioning path),
  partitioning the window by labels (one `{"probe":...}` value per check)
- read collected_at as epoch millis via arrow_cast(...,'Int64') instead
  of epoch_ms(), and drop the redundant value::DOUBLE cast (already f64)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ravwojdyla-agent ravwojdyla-agent added the agent-generated Created by automation/agent label Jun 23, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08096a27de

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

AND collected_at >= TIMESTAMP '${cutoff}'
)
SELECT probe, metric, value::DOUBLE AS value, epoch_ms(collected_at)::BIGINT AS collected_ms
SELECT labels, metric, value, arrow_cast(collected_at, 'Int64') AS collected_ms

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Convert DataFusion timestamps down to milliseconds

When this query runs against finelog, collected_at is stored as Arrow Timestamp(Microsecond, None) (lib/finelog/rust/src/store/schema.rs documents the microsecond storage unit), so arrow_cast(collected_at, 'Int64') yields microseconds, not epoch milliseconds. The parsed rows are later passed directly to new Date(...), so every probe/provisioning freshness timestamp will be about 1000x too large and render as a far-future/negative-relative time; the same cast in provisioningSql has the same effect.

Useful? React with 👍 / 👎.

Replace the in-process worker-count ring buffer (lost on every Cloud Run
restart) with durable 24h history read from the infra/probes canary's
infra.canary.metrics finelog namespace, and add a provisioning
success-ratio history chart beside it.

- New server/sources/clusterHistory.ts: two bounded DataFusion queries
  over the trailing 24h — per-region worker_healthy, and fleet + per-pool
  provision_ready/outcomes rolled up into a fleet-average plus per-region
  create-success ratio (zones collapse to regions).
- /api/workers/history now reads finelog; add /api/provisioning/history.
  Drop the WorkerHistory ring buffer and its 30s sampler; the iris-ping
  and control-plane latency buffers stay in process.
- Workers panel shows the availability and provisioning charts side by
  side, sharing one region color map (built off the union of both charts'
  regions) so a region reads the same color in both.
- Fix the collected_at decode: arrow_cast(...,'Int64') yields epoch
  microseconds, not millis — probe timestamps were rendering ~58000 years
  in the future. Shared microsToMillis + decodeLabels helpers in
  finelogQuery.ts.
Address the lint-catalog review of the prior commit: probes.ts and
clusterHistory.ts had each re-derived the canary-namespace query
scaffold.

- Lift the shared bits into finelogQuery.ts: CANARY_METRICS_NAMESPACE,
  FLEET_SCOPE, the CanaryMetricRow row type, and the asCanaryRows
  micros-normalizer. Both sources import them now.
- Rename the probes.ts SQL timestamp alias collected_ms -> collected_us;
  the cast yields epoch micros and is only normalized to millis by
  asCanaryRows, so the old name was misleading and inconsistent with
  clusterHistory.ts.
- Collapse workersHistory/provisioningHistory's identical
  fetchedAt+cutoff+try/catch bodies into one historyResponse(buildSql,
  parse) helper.

No behavior change; /api/workers/history, /api/provisioning/history, and
/api/probes verified to return identical shapes.
@ravwojdyla ravwojdyla merged commit dba316f into main Jun 23, 2026
32 checks passed
@ravwojdyla ravwojdyla deleted the rav/status-page-datafusion-fix branch June 23, 2026 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants