Skip to content

feat: Add EXPLAIN and EXPLAIN ANALYZE support to federated queries#62

Open
lukekim wants to merge 12 commits into
spiceai-52from
lukim/explain
Open

feat: Add EXPLAIN and EXPLAIN ANALYZE support to federated queries#62
lukekim wants to merge 12 commits into
spiceai-52from
lukim/explain

Conversation

@lukekim
Copy link
Copy Markdown

@lukekim lukekim commented Nov 15, 2025

When federating EXPLAIN or EXPLAIN ANALYZE queries, the unparser now correctly prefixes the unparsed federated query with EXPLAIN or EXPLAIN ANALYZE respectively. This ensures that remote databases receive the full query intent including the EXPLAIN directive.

Changes:

  • Detect LogicalPlan::Explain variant in VirtualExecutionPlan::sql()
  • Extract the inner plan from EXPLAIN and apply table rewrites
  • Prefix the unparsed SQL with 'EXPLAIN ' or 'EXPLAIN ANALYZE '
  • Add test cases for both EXPLAIN and EXPLAIN ANALYZE queries

🗣 Description

🔨 Related Issues

🤔 Concerns

When federating EXPLAIN or EXPLAIN ANALYZE queries, the unparser now
correctly prefixes the unparsed federated query with EXPLAIN or
EXPLAIN ANALYZE respectively. This ensures that remote databases
receive the full query intent including the EXPLAIN directive.

Changes:
- Detect LogicalPlan::Explain variant in VirtualExecutionPlan::sql()
- Extract the inner plan from EXPLAIN and apply table rewrites
- Prefix the unparsed SQL with 'EXPLAIN ' or 'EXPLAIN ANALYZE '
- Add test cases for both EXPLAIN and EXPLAIN ANALYZE queries
When federating EXPLAIN or EXPLAIN ANALYZE queries, the SQL generator now
correctly prefixes the unparsed federated query with EXPLAIN or
EXPLAIN ANALYZE respectively. This ensures that remote databases
receive the full query intent including the EXPLAIN directive.

Changes:
- Import Explain from datafusion::logical_expr
- Detect LogicalPlan::Explain variant in VirtualExecutionPlan::final_sql()
- Extract the inner plan from EXPLAIN and apply all rewrites and optimizations
- Prefix the final SQL with 'EXPLAIN ' or 'EXPLAIN ANALYZE ' based on the analyze flag
- Maintain all existing table rewrites, logical optimizations, and AST transformations
Copilot AI review requested due to automatic review settings March 27, 2026 21:19
@lukekim lukekim changed the base branch from spiceai-50 to spiceai-52 March 27, 2026 21:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for federating EXPLAIN and EXPLAIN ANALYZE by ensuring the remote SQL sent by the federation execution plan preserves the EXPLAIN directive while still applying table-reference rewrites.

Changes:

  • Detect LogicalPlan::Explain and generate remote SQL with an EXPLAIN / EXPLAIN ANALYZE prefix.
  • Apply existing table-scan rewrites to the inner plan before unparsing to SQL.
  • Add EXPLAIN-related tests (currently placed under sources/sql).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
datafusion-federation/src/sql/mod.rs Updates VirtualExecutionPlan::final_sql() to unwrap Explain, rewrite/unparse the inner plan, then prefix EXPLAIN or EXPLAIN ANALYZE.
sources/sql/src/lib.rs Introduces a large SQL federation implementation copy with EXPLAIN prefixing and tests (not currently wired into the workspace build).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sources/sql/src/lib.rs Outdated
Comment thread datafusion-federation/src/sql/mod.rs Outdated
Comment thread datafusion-federation/src/sql/mod.rs Outdated
lukekim added 4 commits March 27, 2026 14:39
When federating EXPLAIN or EXPLAIN ANALYZE queries, the SQL generator now
correctly prefixes the unparsed federated query with EXPLAIN or
EXPLAIN ANALYZE respectively. This ensures that remote databases
receive the full query intent including the EXPLAIN directive.

Changes:
- Detect LogicalPlan::Explain and LogicalPlan::Analyze variants in
  VirtualExecutionPlan::final_sql()
- Extract the inner plan and apply all rewrites and optimizations
- Prefix the final SQL with 'EXPLAIN ' or 'EXPLAIN ANALYZE '
- Extract common rewrite logic into rewrite_plan_to_sql() helper
- Add test cases for both EXPLAIN and EXPLAIN ANALYZE queries
- Remove unused sources/sql/src/lib.rs (not wired into workspace)
# Conflicts:
#	datafusion-federation/src/analyzer/mod.rs
#	datafusion-federation/src/sql/analyzer.rs
#	datafusion-federation/src/sql/analyzer/rewrite_table_scan.rs
Copilot AI review requested due to automatic review settings March 28, 2026 19:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread datafusion-federation/src/sql/mod.rs
Copilot AI review requested due to automatic review settings March 28, 2026 20:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +188 to 196
let logical_plan = Self::annotate_query_directives(logical_plan)?;

let physical_planner =
DefaultPhysicalPlanner::with_extension_planners(vec![
Arc::new(FederatedPlanner::new()),
]);
physical_planner
.create_physical_plan(logical_plan, session_state)
.create_physical_plan(&logical_plan, session_state)
.await
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FederatedQueryPlanner::create_physical_plan now always clones the entire LogicalPlan because annotate_query_directives returns plan.clone() for the non-EXPLAIN/ANALYZE case. This adds an avoidable allocation/copy on every query. Consider only calling annotate_query_directives (and only allocating a new plan) when the root is LogicalPlan::Explain/LogicalPlan::Analyze; otherwise pass the original logical_plan through to create_physical_plan.

Copilot uses AI. Check for mistakes.
Comment on lines 203 to +246
fn analyze_plan_recursively(
&self,
plan: &LogicalPlan,
is_root: bool,
config: &ConfigOptions,
providers: &HashMap<TableReference, Arc<dyn FederationProvider>>,
explain_context: Option<LogicalPlan>,
) -> Result<(Option<LogicalPlan>, ScanResult)> {
let explain_context = explain_context.or_else(|| Self::explain_context_template(plan));
let mut sole_provider: ScanResult = ScanResult::None;

if let LogicalPlan::Extension(Extension { ref node }) = plan {
if node.name() == "Federated" {
// Avoid attempting double federation
return Ok((None, ScanResult::Ambiguous));
}
}

// Check if this plan node is a leaf that determines the FederationProvider
let (leaf_provider, _) = get_leaf_provider(plan)?;

// Check if the expressions contain, a potentially different, FederationProvider
let exprs_result = self.scan_plan_exprs(plan, providers)?;

// Return early if this is a leaf and there is no ambiguity with the expressions.
if leaf_provider.is_some() && (exprs_result.is_none() || exprs_result == leaf_provider) {
return Ok((None, leaf_provider.into()));
}
// Aggregate leaf & expression providers
sole_provider.merge(leaf_provider);
sole_provider.merge(exprs_result.clone());

let inputs = plan.inputs();
// Return early if there are no sources.
if inputs.is_empty() && sole_provider.is_none() {
return Ok((None, ScanResult::None));
}

// Recursively analyze inputs
let input_results = inputs
.iter()
.map(|i| self.analyze_plan_recursively(i, false, config, providers))
.map(|i| {
self.analyze_plan_recursively(i, false, config, providers, explain_context.clone())
})
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analyze_plan_recursively now threads explain_context as an owned Option<LogicalPlan> and repeatedly calls explain_context.clone() when recursing into inputs/expressions. For large plans, this can become a lot of deep clones of the same EXPLAIN/ANALYZE wrapper. Consider storing the wrapper as Option<Arc<LogicalPlan>> (cheap clones) or passing a shared reference down the recursion to avoid repeated full-plan cloning.

Copilot uses AI. Check for mistakes.
fn final_sql(&self) -> Result<String> {
let plan = self.plan.clone();
let sql = self.rewrite_plan_to_sql(self.plan.clone())?;
match self.query_type {
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FederatedQueryType::Analyze currently causes federated execution to prefix the remote SQL with EXPLAIN ANALYZE. For most SQL engines, EXPLAIN ANALYZE returns a plan text result set (different schema) rather than the query’s original rows, which will likely break execution because VirtualExecutionPlan/SchemaCastScanExec still expects the inner query’s schema. Consider not prefixing remote SQL for Analyze (let DataFusion’s AnalyzeExec measure execution of the actual query), or adjust the federated execution path to expect/handle the EXPLAIN ANALYZE output schema explicitly.

Suggested change
match self.query_type {
match self.query_type {
// For Analyze queries, do not prefix the remote SQL with EXPLAIN ANALYZE.
// Let DataFusion's AnalyzeExec measure execution of the actual query instead.
Some(FederatedQueryType::Analyze) => Ok(sql),

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants