[pull] trunk from spiceai:trunk#38
Merged
Merged
Conversation
* Test spice chat in e2e_test_spice_cli * remove temporary changes
pull Bot
pushed a commit
that referenced
this pull request
May 18, 2026
…piceai#10831) * feat(task-history): capture distributed query observability For distributed Ballista queries, the task_history table now records the full parent + per-stage tree: sql_query (parent) duration, error, datasets, summary labels └── ballista_stage (child) stage_id, executors, task_count, stage timing, plan tree Previously the task_history span was created in submit_distributed_internal but never instrumented across the job's lifetime — it dropped at submission time with duration ≈ 0 and no error_message. Stage/executor detail was not recorded at all. Implementation: - QueryHandle owns the sql_query span behind Arc<Mutex<Option<Span>>>. spawn_finalize *takes* it (rather than cloning), so the OTel span closes when the spawned finalize future drops its instrumented clone — capturing the query's runtime, not arbitrary post-completion handle lifetime. - New stage_history module walks the in-process ExecutionGraph at job completion and emits one child ballista_stage span per stage via the existing OTel pipeline. Stage labels include executor_histogram, slowest_task_ms, partitions, attempt_num, total_executor_ms. Stage input is the plan rendered with ExplainFormat::Tree. - QueryHandle::cancel() now finalizes the tracker with JobCancelled before returning, so callers that cancel() then drop the handle record an accurate cancellation row instead of the Drop guard's client-disconnected fallback. - Drop guard finalizes orphaned handles: cancels the cancel_token sync, then spawns a finalize task that also calls scheduler.cancel_job so executors don't keep running an unobserved query. - Errors raised before QueryHandle creation (planning/validation/submission) emit tracing::error on the parent span so the row's error_message is populated (mirrors the sync run_internal path). Tests use a process-wide AsyncMutex (TEST_LOCK) to serialize against shared DF_SLOT, condition-driven wait_for_row_count polling instead of fixed sleeps, and parent.job_id-scoped assertions so other tests' task_history rows in the same binary can't pollute our row counts. Requires the matching Ballista fork PR (#38, merged onto spiceai-52.5 as 8afc1b74a605) which persists executor_id on terminal TaskInfo and bumps get_job_execution_graph to pub. * Bump ballista to pick up stuck-query detection (spiceai/datafusion-ballista#39) Updates the ballista pin from 8afc1b74 to 7e9872a5 (the merge commit of spiceai/datafusion-ballista#39) so the next cluster bench run picks up the new operator-facing warning when a distributed query stops making progress. Surgical lockfile update: only the three ballista source URLs change. Vortex stays pinned at the spiceai-52 commit (c536c9ae) the workspace was already using; without this, running cargo update -p ballista-* re-resolves the branch-tracked vortex transitive and bumps it to a newer rev whose arrow-rs version conflicts with the pinned datafusion-table-providers (CI symptom: no method as_list on dyn Array, Arc<Schema>: From<Schema> not satisfied, in adbc compilation). Verified: cargo check -p runtime --offline and cargo check -p spiced --features adbc --offline both clean. * feat(task-history): pluggable span middleware + Ballista stage timeline rendering Introduces two trait-based extension points on the task_history OTel exporter and removes the hardcoded `ballista_stage` branches that lived in the exporter: - `SpanTransform`: mutates a `SpanData` before conversion to a row. Implementors can adjust timestamps, inject attributes, redact fields, etc. - `SpanRetention`: declares a retention dependency between sibling spans in a batch. Returning `Some(parent_span_id)` from `parent_dependency(span)` means "keep this span iff that span is kept by base rules" — expressing the parent/child relationship directly instead of hardcoding it. `TaskHistoryExporter` now stores `Vec<Arc<dyn SpanTransform>>` and `Vec<Arc<dyn SpanRetention>>` with `.with_transform` / `.with_retention` builder methods. The exporter no longer mentions any specific span type — base retention checks PLAN_CAPTURE_LABEL and `min_sql_duration_ms`, retention rules then override per-span. Concrete consumer: `BallistaStageMiddleware` in `crates/runtime/src/datafusion/query/stage_history.rs` implements both traits: - As a `SpanTransform`, it rewrites `start_time` / `end_time` on `ballista_stage` spans using the `stage_started_at` / `stage_ended_at` attributes (millis since UNIX epoch), so the task_history row reflects the actual stage execution window. This is what makes per-stage execution visible on the timeline view. - As a `SpanRetention`, it declares that a `ballista_stage` row depends on its parent `sql_query` row, replacing the prior "if task.as_ref() == ballista_stage" branch in the exporter. `BallistaStageMiddleware::pair()` returns a single instance reused as both trait objects so callers register one Arc in two slots; the three exporter construction sites (production tracing, generic test util, distributed_task_history test) all use it. Includes unit tests covering both hooks: time override on the matching span name, no-op on non-matching names and missing/inverted/zero attributes; retention reports the parent dependency for stage spans and abstains for non-stage spans and orphan stages. * feat(observability): propagate request trace context across distributed query boundaries (spiceai#10896) Issue spiceai#10202. Distributed query executors no longer create a fresh internal request context; they inherit the originating request's protocol and W3C trace ids so executor-side metrics, telemetry dimensions, and any future task_history rows correlate end-to-end with the scheduler. * New `SpiceRequestContextConfig` (`spice_ctx` prefix) carries `protocol`, `trace_id`, and `span_id` through DataFusion's `ConfigExtension` mechanism — round-tripped opaquely by Ballista as `TaskDefinition` config props, with zero fork changes. * Registered as a default option extension at all 3 config_producer / session_builder sites in `runtime/src/cluster/mod.rs`. The scheduler session_builder now reads any `SpiceRequestContextConfig` set on the per-job session config and re-injects it on the built session config (it previously ignored its `_cfg` argument). * `Query::submit_distributed_internal` populates the extension from the current `RequestContext` before `create_or_update_session`. * `resolve_request_context(TaskContext)` shared helper establishes the canonical lookup order: typed `Arc<RequestContext>` extension → config extension → `Protocol::Internal` fallback (or `None`). * `BytesProcessedExec::execute` switched to the helper. Same panic behavior preserved when no context is found and `fallback_to_new_context` is off. * `FlightSqlExec` gains an optional `trace_parent: Option<String>` field with a `with_trace_parent` builder. `execute()` sets it as a `traceparent` gRPC metadata header (via `FlightSqlServiceClient::set_header`), falling back to formatting from the typed `RequestContext` extension on the `TaskContext` session config when not preset. `PartialAggregationFlightSqlExec` inherits and forwards on its own `execute()`. * W3C span semantics preserved: the propagated `span_id` is the sender's current span and becomes the parent of any new spans the receiver creates — task_history's existing `parent_span_id` chain extends naturally one level per executor hop. Tests: 4 config extension roundtrip + 4 resolver lookup-order + 1 `BytesProcessedExec` integration test asserting metric dimensions carry the correct protocol when only the config extension is present. * chore(deps): bump datafusion-ballista to c25e25b9 (spiceai-52.5) * fix(flightsql): propagate traceparent in pushed-down aggregate exec PartialAggregationFlightSqlExec was sending no traceparent header when its source FlightSqlExec was built by the aggregate-pushdown optimizer (which leaves trace_parent unset). Fall back to TaskContext like the non-pushed FlightSqlExec path does, so pushed-down aggregate calls preserve the same propagation behavior. Also picks up fmt drift in two trace-context files merged via spiceai#10896.
pull Bot
pushed a commit
that referenced
this pull request
May 18, 2026
* fix: Update tpch benchmark snapshots for federated/file[parquet].yaml * fix: Update tpch benchmark snapshots for federated/mssql.yaml * fix: Update tpch benchmark snapshots for federated/glue[csv].yaml * fix: Update tpch benchmark snapshots for federated/oracle.yaml * fix: Update tpch benchmark snapshots for federated/iceberg[catalog].yaml * fix: Update tpch benchmark snapshots for federated/abfs_standard_versioned[parquet].yaml * fix: Update tpch benchmark snapshots for federated/spicecloud[catalog].yaml * fix: Update tpch benchmark snapshots for federated/iceberg[hadoop].yaml * fix: Update tpch benchmark snapshots for federated/mssql[catalog].yaml * fix: Update tpch benchmark snapshots for federated/glue[parquet].yaml * fix: Update tpch benchmark snapshots for federated/dynamodb.yaml * fix: Update tpch benchmark snapshots for federated/glue[catalog].yaml * fix: Update tpch benchmark snapshots for federated/s3[parquet].yaml * fix: Update tpch benchmark snapshots for federated/oracle[catalog].yaml * fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-duckdb[file]-partitioned.yaml * fix: Update tpch benchmark snapshots for accelerated/file[parquet]-arrow.yaml * fix: Update tpch benchmark snapshots for accelerated/dynamodb-arrow.yaml * fix: Update tpch benchmark snapshots for federated/scylladb.yaml * fix: Update tpch benchmark snapshots for accelerated/indexes/file[parquet]-cayenne[file]-indexes.yaml * fix: Update tpch benchmark snapshots for accelerated/indexes/file[parquet]-arrow-indexes.yaml * fix: Update tpch benchmark snapshots for accelerated/file[parquet]-cayenne[file]turso.yaml * fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-arrow-partitioned.yaml * fix: Update tpch benchmark snapshots for accelerated/dynamodb-duckdb[file].yaml * fix: Update tpch benchmark snapshots for accelerated/on_zero_results/file[parquet]-duckdb[file]-on_zero_results.yaml * fix: Update tpch benchmark snapshots for accelerated/on_zero_results/file[parquet]-duckdb[memory]-on_zero_results.yaml * fix: Update tpch benchmark snapshots for accelerated/on_zero_results/file[parquet]-cayenne[file]-on_zero_results.yaml * fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-cayenne[file]-partitioned.yaml * fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-turso[file].yaml * fix: Update tpch benchmark snapshots for accelerated/indexes/file[parquet]-turso[file]-indexes.yaml * fix: Update tpch benchmark snapshots for accelerated/spicecloud-arrow.yaml * fix: Update tpch benchmark snapshots for accelerated/file[parquet]-cayenne[file].yaml * feat(task-history): record Ballista stages for distributed queries (spiceai#10831) * feat(task-history): capture distributed query observability For distributed Ballista queries, the task_history table now records the full parent + per-stage tree: sql_query (parent) duration, error, datasets, summary labels └── ballista_stage (child) stage_id, executors, task_count, stage timing, plan tree Previously the task_history span was created in submit_distributed_internal but never instrumented across the job's lifetime — it dropped at submission time with duration ≈ 0 and no error_message. Stage/executor detail was not recorded at all. Implementation: - QueryHandle owns the sql_query span behind Arc<Mutex<Option<Span>>>. spawn_finalize *takes* it (rather than cloning), so the OTel span closes when the spawned finalize future drops its instrumented clone — capturing the query's runtime, not arbitrary post-completion handle lifetime. - New stage_history module walks the in-process ExecutionGraph at job completion and emits one child ballista_stage span per stage via the existing OTel pipeline. Stage labels include executor_histogram, slowest_task_ms, partitions, attempt_num, total_executor_ms. Stage input is the plan rendered with ExplainFormat::Tree. - QueryHandle::cancel() now finalizes the tracker with JobCancelled before returning, so callers that cancel() then drop the handle record an accurate cancellation row instead of the Drop guard's client-disconnected fallback. - Drop guard finalizes orphaned handles: cancels the cancel_token sync, then spawns a finalize task that also calls scheduler.cancel_job so executors don't keep running an unobserved query. - Errors raised before QueryHandle creation (planning/validation/submission) emit tracing::error on the parent span so the row's error_message is populated (mirrors the sync run_internal path). Tests use a process-wide AsyncMutex (TEST_LOCK) to serialize against shared DF_SLOT, condition-driven wait_for_row_count polling instead of fixed sleeps, and parent.job_id-scoped assertions so other tests' task_history rows in the same binary can't pollute our row counts. Requires the matching Ballista fork PR (#38, merged onto spiceai-52.5 as 8afc1b74a605) which persists executor_id on terminal TaskInfo and bumps get_job_execution_graph to pub. * Bump ballista to pick up stuck-query detection (spiceai/datafusion-ballista#39) Updates the ballista pin from 8afc1b74 to 7e9872a5 (the merge commit of spiceai/datafusion-ballista#39) so the next cluster bench run picks up the new operator-facing warning when a distributed query stops making progress. Surgical lockfile update: only the three ballista source URLs change. Vortex stays pinned at the spiceai-52 commit (c536c9ae) the workspace was already using; without this, running cargo update -p ballista-* re-resolves the branch-tracked vortex transitive and bumps it to a newer rev whose arrow-rs version conflicts with the pinned datafusion-table-providers (CI symptom: no method as_list on dyn Array, Arc<Schema>: From<Schema> not satisfied, in adbc compilation). Verified: cargo check -p runtime --offline and cargo check -p spiced --features adbc --offline both clean. * feat(task-history): pluggable span middleware + Ballista stage timeline rendering Introduces two trait-based extension points on the task_history OTel exporter and removes the hardcoded `ballista_stage` branches that lived in the exporter: - `SpanTransform`: mutates a `SpanData` before conversion to a row. Implementors can adjust timestamps, inject attributes, redact fields, etc. - `SpanRetention`: declares a retention dependency between sibling spans in a batch. Returning `Some(parent_span_id)` from `parent_dependency(span)` means "keep this span iff that span is kept by base rules" — expressing the parent/child relationship directly instead of hardcoding it. `TaskHistoryExporter` now stores `Vec<Arc<dyn SpanTransform>>` and `Vec<Arc<dyn SpanRetention>>` with `.with_transform` / `.with_retention` builder methods. The exporter no longer mentions any specific span type — base retention checks PLAN_CAPTURE_LABEL and `min_sql_duration_ms`, retention rules then override per-span. Concrete consumer: `BallistaStageMiddleware` in `crates/runtime/src/datafusion/query/stage_history.rs` implements both traits: - As a `SpanTransform`, it rewrites `start_time` / `end_time` on `ballista_stage` spans using the `stage_started_at` / `stage_ended_at` attributes (millis since UNIX epoch), so the task_history row reflects the actual stage execution window. This is what makes per-stage execution visible on the timeline view. - As a `SpanRetention`, it declares that a `ballista_stage` row depends on its parent `sql_query` row, replacing the prior "if task.as_ref() == ballista_stage" branch in the exporter. `BallistaStageMiddleware::pair()` returns a single instance reused as both trait objects so callers register one Arc in two slots; the three exporter construction sites (production tracing, generic test util, distributed_task_history test) all use it. Includes unit tests covering both hooks: time override on the matching span name, no-op on non-matching names and missing/inverted/zero attributes; retention reports the parent dependency for stage spans and abstains for non-stage spans and orphan stages. * feat(observability): propagate request trace context across distributed query boundaries (spiceai#10896) Issue spiceai#10202. Distributed query executors no longer create a fresh internal request context; they inherit the originating request's protocol and W3C trace ids so executor-side metrics, telemetry dimensions, and any future task_history rows correlate end-to-end with the scheduler. * New `SpiceRequestContextConfig` (`spice_ctx` prefix) carries `protocol`, `trace_id`, and `span_id` through DataFusion's `ConfigExtension` mechanism — round-tripped opaquely by Ballista as `TaskDefinition` config props, with zero fork changes. * Registered as a default option extension at all 3 config_producer / session_builder sites in `runtime/src/cluster/mod.rs`. The scheduler session_builder now reads any `SpiceRequestContextConfig` set on the per-job session config and re-injects it on the built session config (it previously ignored its `_cfg` argument). * `Query::submit_distributed_internal` populates the extension from the current `RequestContext` before `create_or_update_session`. * `resolve_request_context(TaskContext)` shared helper establishes the canonical lookup order: typed `Arc<RequestContext>` extension → config extension → `Protocol::Internal` fallback (or `None`). * `BytesProcessedExec::execute` switched to the helper. Same panic behavior preserved when no context is found and `fallback_to_new_context` is off. * `FlightSqlExec` gains an optional `trace_parent: Option<String>` field with a `with_trace_parent` builder. `execute()` sets it as a `traceparent` gRPC metadata header (via `FlightSqlServiceClient::set_header`), falling back to formatting from the typed `RequestContext` extension on the `TaskContext` session config when not preset. `PartialAggregationFlightSqlExec` inherits and forwards on its own `execute()`. * W3C span semantics preserved: the propagated `span_id` is the sender's current span and becomes the parent of any new spans the receiver creates — task_history's existing `parent_span_id` chain extends naturally one level per executor hop. Tests: 4 config extension roundtrip + 4 resolver lookup-order + 1 `BytesProcessedExec` integration test asserting metric dimensions carry the correct protocol when only the config extension is present. * chore(deps): bump datafusion-ballista to c25e25b9 (spiceai-52.5) * fix(flightsql): propagate traceparent in pushed-down aggregate exec PartialAggregationFlightSqlExec was sending no traceparent header when its source FlightSqlExec was built by the aggregate-pushdown optimizer (which leaves trace_parent unset). Fall back to TaskContext like the non-pushed FlightSqlExec path does, so pushed-down aggregate calls preserve the same propagation behavior. Also picks up fmt drift in two trace-context files merged via spiceai#10896. * Revert "fix(tests): stabilize flaky SQL search snapshots with score normalization (spiceai#10585)" (spiceai#10891) This reverts commit efc8d00. * Add '#[deny(clippy::missing_trait_methods)]' to wrapper/delegation trait impls (spiceai#10795) * Add '#[deny(clippy::missing_trait_methods)]' to wrapper/delegation trait impls * implement missing * fix: implement missing delegation methods caught by missing_trait_methods lint * fix: rustfmt get_logical_plan signature in indexed.rs * fix: implement missing TableProvider delegation methods in PartitionTableProvider * fix: correct ScanResult return type and IndexedMemTable::scan_with_args delegation - runtime-table-partition/src/provider.rs:490: use datafusion::error::Result instead of bare Result<T> (no type alias in scope; std::result::Result requires two generic args, causing E0107 compile error on all CI jobs) - data_components/src/arrow/indexed.rs: unpack ScanArgs and call self.scan() instead of forwarding to self.inner.scan_with_args(), preserving the index fast-path logic that IndexedMemTable::scan implements * fix: replace redundant closure with method reference (clippy pedantic) * fix: replace redundant closure with method reference in scan_with_args * fix: add missing trait method forwarding impls for #[deny(clippy::missing_trait_methods)] Adds explicit implementations for all provided/default trait methods across wrapper and stub types so the deny attribute compiles cleanly: TableProvider wrappers (delegate to inner): - AcceleratedTable: get_table_definition, get_logical_plan, get_column_default, scan_with_args, statistics, truncate - UpsertDedupTableProvider: same set - EmbeddingTable: scan_with_args, truncate - LocationPruningListingTable: get_logical_plan, get_column_default, scan_with_args, statistics, insert_into, delete_from TableProvider with no inner (None / NotImplemented stubs): - SwappableTableProvider: get_table_definition/get_logical_plan/get_column_default return None (can't borrow from current() temporary Arc); scan_with_args and truncate delegate via current() - DatasetTableProvider: all missing methods return Internal errors or None - RerankUDTFProvider: metadata returns None; DML returns NotImplemented - ReciprocalRankFusion: same pattern CatalogProvider wrapper: - RefreshingCatalogProvider: register_schema, deregister_schema forwarded to inner DataConnector wrappers: - DeferredConnector: all missing methods added; stream methods return false/None (connector is a pre-initialization placeholder); setup/metrics methods delegate to self.inner - EmbeddingConnector: resolve_refresh_mode, initialization_for_dataset - FullTextConnector: initialization_for_dataset Embed wrapper: - TaskEmbed: cache, get_cached_embed, put_cached_embed delegated to self.inner * fix: remove #[deny(clippy::missing_trait_methods)] from AcceleratedTable, DeferredConnector, DatasetTableProvider * linting * fix: implement missing TableProvider trait methods for LocationPruningListingTable and EmbeddingTable - Add and to LocationPruningListingTable, delegating to inner - Add and to EmbeddingTable, delegating to base_table - Add missing imports: std::borrow::Cow and datafusion::logical_expr::LogicalPlan * fix AcceleratedTable * comments * clippy * remove nasa --------- Co-authored-by: Jeadie <jeadie@users.noreply.github.com> * fix: Update tpcds benchmark snapshots for federated/file[parquet].yaml * fix: Update tpcds benchmark snapshots for federated/abfs[parquet].yaml * fix: Update tpcds benchmark snapshots for federated/databricks[delta_lake].yaml * fix: Update tpcds benchmark snapshots for federated/s3[parquet].yaml * fix: Update tpcds benchmark snapshots for accelerated/file[parquet]-arrow.yaml * fix(cayenne): fix Vortex panic on highly compressible data (spiceai#10855) * fix(cayenne): fix Vortex panic on highly compressible columns * Update SHA * Add test * Add test * Fix lint --------- Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com> * fix: Update tpcds benchmark snapshots for accelerated/s3[parquet]-arrow-partitioned.yaml * fix: Update tpcds benchmark snapshots for accelerated/on_zero_results/file[parquet]-duckdb[file]-on_zero_results.yaml * fix: Update tpcds benchmark snapshots for accelerated/on_zero_results/file[parquet]-duckdb[memory]-on_zero_results.yaml * fix: Update tpcds benchmark snapshots for accelerated/on_zero_results/file[parquet]-cayenne[file]-on_zero_results.yaml * fix: Update tpcds benchmark snapshots for accelerated/databricks[delta_lake]-arrow.yaml * fix: Update tpcds benchmark snapshots for accelerated/s3[parquet]-cayenne[file].yaml * fix: Update tpcds benchmark snapshots for accelerated/spicecloud-arrow.yaml * fix: Update tpcds benchmark snapshots for accelerated/file[parquet]-cayenne[file].yaml * fix: Update tpcds benchmark snapshots for accelerated/s3[parquet]-arrow.yaml * fix: Update tpcds benchmark snapshots for accelerated/postgres-arrow.yaml * fix: Update clickbench benchmark snapshots for accelerated/s3[parquet]-turso[file].yaml * fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-arrow.yaml * fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-cayenne[file].yaml * fix: Update tpcds benchmark snapshots for accelerated/s3[parquet]-duckdb[file].yaml * fix: Update tpch benchmark snapshots for accelerated/mysql-arrow.yaml * fix: Update tpch benchmark snapshots for accelerated/postgres-arrow.yaml * Update cargo.lock * fix: Update tpch benchmark snapshots for accelerated/mongodb-arrow.yaml * fix: Update tpch benchmark snapshots for federated/mongodb.yaml --------- Co-authored-by: Spice Benchmark Snapshot Update Bot <spiceaibot@spice.ai> Co-authored-by: Phillip LeBlanc <879445+phillipleblanc@users.noreply.github.com> Co-authored-by: Jack Eadie <jack@spice.ai> Co-authored-by: Jeadie <jeadie@users.noreply.github.com> Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )