Commit 5c5b02e
authored
feat(task-history): record Ballista stages for distributed queries (spiceai#10831)
* feat(task-history): capture distributed query observability
For distributed Ballista queries, the task_history table now records the
full parent + per-stage tree:
sql_query (parent) duration, error, datasets, summary labels
└── ballista_stage (child) stage_id, executors, task_count,
stage timing, plan tree
Previously the task_history span was created in submit_distributed_internal
but never instrumented across the job's lifetime — it dropped at submission
time with duration ≈ 0 and no error_message. Stage/executor detail was
not recorded at all.
Implementation:
- QueryHandle owns the sql_query span behind Arc<Mutex<Option<Span>>>.
spawn_finalize *takes* it (rather than cloning), so the OTel span closes
when the spawned finalize future drops its instrumented clone — capturing
the query's runtime, not arbitrary post-completion handle lifetime.
- New stage_history module walks the in-process ExecutionGraph at job
completion and emits one child ballista_stage span per stage via the
existing OTel pipeline. Stage labels include executor_histogram,
slowest_task_ms, partitions, attempt_num, total_executor_ms. Stage
input is the plan rendered with ExplainFormat::Tree.
- QueryHandle::cancel() now finalizes the tracker with JobCancelled
before returning, so callers that cancel() then drop the handle
record an accurate cancellation row instead of the Drop guard's
client-disconnected fallback.
- Drop guard finalizes orphaned handles: cancels the cancel_token sync,
then spawns a finalize task that also calls scheduler.cancel_job so
executors don't keep running an unobserved query.
- Errors raised before QueryHandle creation (planning/validation/submission)
emit tracing::error on the parent span so the row's error_message is
populated (mirrors the sync run_internal path).
Tests use a process-wide AsyncMutex (TEST_LOCK) to serialize against
shared DF_SLOT, condition-driven wait_for_row_count polling instead of
fixed sleeps, and parent.job_id-scoped assertions so other tests'
task_history rows in the same binary can't pollute our row counts.
Requires the matching Ballista fork PR (#38, merged onto spiceai-52.5
as 8afc1b74a605) which persists executor_id on terminal TaskInfo and
bumps get_job_execution_graph to pub.
* Bump ballista to pick up stuck-query detection (spiceai/datafusion-ballista#39)
Updates the ballista pin from 8afc1b74 to 7e9872a5 (the merge commit
of spiceai/datafusion-ballista#39) so the next cluster bench run picks
up the new operator-facing warning when a distributed query stops
making progress.
Surgical lockfile update: only the three ballista source URLs change.
Vortex stays pinned at the spiceai-52 commit (c536c9ae) the workspace
was already using; without this, running cargo update -p ballista-*
re-resolves the branch-tracked vortex transitive and bumps it to a
newer rev whose arrow-rs version conflicts with the pinned
datafusion-table-providers (CI symptom: no method as_list on dyn Array,
Arc<Schema>: From<Schema> not satisfied, in adbc compilation).
Verified: cargo check -p runtime --offline and cargo check -p spiced
--features adbc --offline both clean.
* feat(task-history): pluggable span middleware + Ballista stage timeline rendering
Introduces two trait-based extension points on the task_history OTel
exporter and removes the hardcoded `ballista_stage` branches that lived
in the exporter:
- `SpanTransform`: mutates a `SpanData` before conversion to a row.
Implementors can adjust timestamps, inject attributes, redact
fields, etc.
- `SpanRetention`: declares a retention dependency between sibling
spans in a batch. Returning `Some(parent_span_id)` from
`parent_dependency(span)` means "keep this span iff that span is
kept by base rules" — expressing the parent/child relationship
directly instead of hardcoding it.
`TaskHistoryExporter` now stores `Vec<Arc<dyn SpanTransform>>` and
`Vec<Arc<dyn SpanRetention>>` with `.with_transform` /
`.with_retention` builder methods. The exporter no longer mentions
any specific span type — base retention checks PLAN_CAPTURE_LABEL and
`min_sql_duration_ms`, retention rules then override per-span.
Concrete consumer: `BallistaStageMiddleware` in
`crates/runtime/src/datafusion/query/stage_history.rs` implements both
traits:
- As a `SpanTransform`, it rewrites `start_time` / `end_time` on
`ballista_stage` spans using the `stage_started_at` /
`stage_ended_at` attributes (millis since UNIX epoch), so the
task_history row reflects the actual stage execution window. This
is what makes per-stage execution visible on the timeline view.
- As a `SpanRetention`, it declares that a `ballista_stage` row
depends on its parent `sql_query` row, replacing the prior
"if task.as_ref() == ballista_stage" branch in the exporter.
`BallistaStageMiddleware::pair()` returns a single instance reused as
both trait objects so callers register one Arc in two slots; the three
exporter construction sites (production tracing, generic test util,
distributed_task_history test) all use it.
Includes unit tests covering both hooks: time override on the matching
span name, no-op on non-matching names and missing/inverted/zero
attributes; retention reports the parent dependency for stage spans
and abstains for non-stage spans and orphan stages.
* feat(observability): propagate request trace context across distributed query boundaries (spiceai#10896)
Issue spiceai#10202. Distributed query executors no longer create a fresh internal
request context; they inherit the originating request's protocol and W3C
trace ids so executor-side metrics, telemetry dimensions, and any future
task_history rows correlate end-to-end with the scheduler.
* New `SpiceRequestContextConfig` (`spice_ctx` prefix) carries
`protocol`, `trace_id`, and `span_id` through DataFusion's
`ConfigExtension` mechanism — round-tripped opaquely by Ballista as
`TaskDefinition` config props, with zero fork changes.
* Registered as a default option extension at all 3 config_producer /
session_builder sites in `runtime/src/cluster/mod.rs`. The scheduler
session_builder now reads any `SpiceRequestContextConfig` set on the
per-job session config and re-injects it on the built session config
(it previously ignored its `_cfg` argument).
* `Query::submit_distributed_internal` populates the extension from the
current `RequestContext` before `create_or_update_session`.
* `resolve_request_context(TaskContext)` shared helper establishes the
canonical lookup order: typed `Arc<RequestContext>` extension →
config extension → `Protocol::Internal` fallback (or `None`).
* `BytesProcessedExec::execute` switched to the helper. Same panic
behavior preserved when no context is found and `fallback_to_new_context`
is off.
* `FlightSqlExec` gains an optional `trace_parent: Option<String>` field
with a `with_trace_parent` builder. `execute()` sets it as a
`traceparent` gRPC metadata header (via `FlightSqlServiceClient::set_header`),
falling back to formatting from the typed `RequestContext` extension
on the `TaskContext` session config when not preset.
`PartialAggregationFlightSqlExec` inherits and forwards on its own
`execute()`.
* W3C span semantics preserved: the propagated `span_id` is the sender's
current span and becomes the parent of any new spans the receiver
creates — task_history's existing `parent_span_id` chain extends
naturally one level per executor hop.
Tests: 4 config extension roundtrip + 4 resolver lookup-order +
1 `BytesProcessedExec` integration test asserting metric dimensions
carry the correct protocol when only the config extension is present.
* chore(deps): bump datafusion-ballista to c25e25b9 (spiceai-52.5)
* fix(flightsql): propagate traceparent in pushed-down aggregate exec
PartialAggregationFlightSqlExec was sending no traceparent header when
its source FlightSqlExec was built by the aggregate-pushdown optimizer
(which leaves trace_parent unset). Fall back to TaskContext like the
non-pushed FlightSqlExec path does, so pushed-down aggregate calls
preserve the same propagation behavior.
Also picks up fmt drift in two trace-context files merged via spiceai#10896.1 parent c6b06dc commit 5c5b02e
20 files changed
Lines changed: 2402 additions & 362 deletions
File tree
- bin/spiced/src
- crates
- data_components
- src
- datafusion-optimizer-rules/src/physical_plan/flightsql
- runtime-datafusion/src
- config
- extension
- runtime
- src
- cluster
- datafusion
- query
- task_history
- tests
- cluster
- utils
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
426 | 426 | | |
427 | 427 | | |
428 | 428 | | |
429 | | - | |
430 | | - | |
431 | | - | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
432 | 432 | | |
433 | 433 | | |
434 | 434 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
243 | 243 | | |
244 | 244 | | |
245 | 245 | | |
| 246 | + | |
| 247 | + | |
246 | 248 | | |
247 | 249 | | |
248 | 250 | | |
| |||
251 | 253 | | |
252 | 254 | | |
253 | 255 | | |
254 | | - | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
255 | 259 | | |
256 | 260 | | |
257 | 261 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| 91 | + | |
91 | 92 | | |
92 | 93 | | |
93 | 94 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| 60 | + | |
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
64 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
65 | 77 | | |
66 | 78 | | |
67 | 79 | | |
| |||
432 | 444 | | |
433 | 445 | | |
434 | 446 | | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
435 | 453 | | |
436 | 454 | | |
437 | 455 | | |
| |||
460 | 478 | | |
461 | 479 | | |
462 | 480 | | |
| 481 | + | |
463 | 482 | | |
464 | 483 | | |
465 | 484 | | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
466 | 502 | | |
467 | 503 | | |
468 | 504 | | |
| |||
655 | 691 | | |
656 | 692 | | |
657 | 693 | | |
| 694 | + | |
658 | 695 | | |
659 | 696 | | |
660 | 697 | | |
| |||
665 | 702 | | |
666 | 703 | | |
667 | 704 | | |
668 | | - | |
| 705 | + | |
669 | 706 | | |
670 | 707 | | |
671 | 708 | | |
| |||
676 | 713 | | |
677 | 714 | | |
678 | 715 | | |
679 | | - | |
680 | | - | |
681 | | - | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
682 | 729 | | |
683 | 730 | | |
684 | 731 | | |
| |||
727 | 774 | | |
728 | 775 | | |
729 | 776 | | |
| 777 | + | |
730 | 778 | | |
731 | 779 | | |
732 | 780 | | |
| |||
Lines changed: 24 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
44 | 46 | | |
45 | 47 | | |
46 | 48 | | |
| |||
153 | 155 | | |
154 | 156 | | |
155 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
156 | 163 | | |
157 | 164 | | |
158 | 165 | | |
| |||
181 | 188 | | |
182 | 189 | | |
183 | 190 | | |
| 191 | + | |
184 | 192 | | |
185 | 193 | | |
186 | 194 | | |
| |||
250 | 258 | | |
251 | 259 | | |
252 | 260 | | |
253 | | - | |
| 261 | + | |
254 | 262 | | |
255 | 263 | | |
256 | 264 | | |
| |||
261 | 269 | | |
262 | 270 | | |
263 | 271 | | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
272 | 286 | | |
273 | 287 | | |
274 | 288 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
0 commit comments