feat: Add EXPLAIN and EXPLAIN ANALYZE support to federated queries#62
feat: Add EXPLAIN and EXPLAIN ANALYZE support to federated queries#62lukekim wants to merge 12 commits into
Conversation
When federating EXPLAIN or EXPLAIN ANALYZE queries, the unparser now correctly prefixes the unparsed federated query with EXPLAIN or EXPLAIN ANALYZE respectively. This ensures that remote databases receive the full query intent including the EXPLAIN directive. Changes: - Detect LogicalPlan::Explain variant in VirtualExecutionPlan::sql() - Extract the inner plan from EXPLAIN and apply table rewrites - Prefix the unparsed SQL with 'EXPLAIN ' or 'EXPLAIN ANALYZE ' - Add test cases for both EXPLAIN and EXPLAIN ANALYZE queries
When federating EXPLAIN or EXPLAIN ANALYZE queries, the SQL generator now correctly prefixes the unparsed federated query with EXPLAIN or EXPLAIN ANALYZE respectively. This ensures that remote databases receive the full query intent including the EXPLAIN directive. Changes: - Import Explain from datafusion::logical_expr - Detect LogicalPlan::Explain variant in VirtualExecutionPlan::final_sql() - Extract the inner plan from EXPLAIN and apply all rewrites and optimizations - Prefix the final SQL with 'EXPLAIN ' or 'EXPLAIN ANALYZE ' based on the analyze flag - Maintain all existing table rewrites, logical optimizations, and AST transformations
There was a problem hiding this comment.
Pull request overview
Adds support for federating EXPLAIN and EXPLAIN ANALYZE by ensuring the remote SQL sent by the federation execution plan preserves the EXPLAIN directive while still applying table-reference rewrites.
Changes:
- Detect
LogicalPlan::Explainand generate remote SQL with anEXPLAIN/EXPLAIN ANALYZEprefix. - Apply existing table-scan rewrites to the inner plan before unparsing to SQL.
- Add EXPLAIN-related tests (currently placed under
sources/sql).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
datafusion-federation/src/sql/mod.rs |
Updates VirtualExecutionPlan::final_sql() to unwrap Explain, rewrite/unparse the inner plan, then prefix EXPLAIN or EXPLAIN ANALYZE. |
sources/sql/src/lib.rs |
Introduces a large SQL federation implementation copy with EXPLAIN prefixing and tests (not currently wired into the workspace build). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When federating EXPLAIN or EXPLAIN ANALYZE queries, the SQL generator now correctly prefixes the unparsed federated query with EXPLAIN or EXPLAIN ANALYZE respectively. This ensures that remote databases receive the full query intent including the EXPLAIN directive. Changes: - Detect LogicalPlan::Explain and LogicalPlan::Analyze variants in VirtualExecutionPlan::final_sql() - Extract the inner plan and apply all rewrites and optimizations - Prefix the final SQL with 'EXPLAIN ' or 'EXPLAIN ANALYZE ' - Extract common rewrite logic into rewrite_plan_to_sql() helper - Add test cases for both EXPLAIN and EXPLAIN ANALYZE queries - Remove unused sources/sql/src/lib.rs (not wired into workspace)
# Conflicts: # datafusion-federation/src/analyzer/mod.rs # datafusion-federation/src/sql/analyzer.rs # datafusion-federation/src/sql/analyzer/rewrite_table_scan.rs
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let logical_plan = Self::annotate_query_directives(logical_plan)?; | ||
|
|
||
| let physical_planner = | ||
| DefaultPhysicalPlanner::with_extension_planners(vec![ | ||
| Arc::new(FederatedPlanner::new()), | ||
| ]); | ||
| physical_planner | ||
| .create_physical_plan(logical_plan, session_state) | ||
| .create_physical_plan(&logical_plan, session_state) | ||
| .await |
There was a problem hiding this comment.
FederatedQueryPlanner::create_physical_plan now always clones the entire LogicalPlan because annotate_query_directives returns plan.clone() for the non-EXPLAIN/ANALYZE case. This adds an avoidable allocation/copy on every query. Consider only calling annotate_query_directives (and only allocating a new plan) when the root is LogicalPlan::Explain/LogicalPlan::Analyze; otherwise pass the original logical_plan through to create_physical_plan.
| fn analyze_plan_recursively( | ||
| &self, | ||
| plan: &LogicalPlan, | ||
| is_root: bool, | ||
| config: &ConfigOptions, | ||
| providers: &HashMap<TableReference, Arc<dyn FederationProvider>>, | ||
| explain_context: Option<LogicalPlan>, | ||
| ) -> Result<(Option<LogicalPlan>, ScanResult)> { | ||
| let explain_context = explain_context.or_else(|| Self::explain_context_template(plan)); | ||
| let mut sole_provider: ScanResult = ScanResult::None; | ||
|
|
||
| if let LogicalPlan::Extension(Extension { ref node }) = plan { | ||
| if node.name() == "Federated" { | ||
| // Avoid attempting double federation | ||
| return Ok((None, ScanResult::Ambiguous)); | ||
| } | ||
| } | ||
|
|
||
| // Check if this plan node is a leaf that determines the FederationProvider | ||
| let (leaf_provider, _) = get_leaf_provider(plan)?; | ||
|
|
||
| // Check if the expressions contain, a potentially different, FederationProvider | ||
| let exprs_result = self.scan_plan_exprs(plan, providers)?; | ||
|
|
||
| // Return early if this is a leaf and there is no ambiguity with the expressions. | ||
| if leaf_provider.is_some() && (exprs_result.is_none() || exprs_result == leaf_provider) { | ||
| return Ok((None, leaf_provider.into())); | ||
| } | ||
| // Aggregate leaf & expression providers | ||
| sole_provider.merge(leaf_provider); | ||
| sole_provider.merge(exprs_result.clone()); | ||
|
|
||
| let inputs = plan.inputs(); | ||
| // Return early if there are no sources. | ||
| if inputs.is_empty() && sole_provider.is_none() { | ||
| return Ok((None, ScanResult::None)); | ||
| } | ||
|
|
||
| // Recursively analyze inputs | ||
| let input_results = inputs | ||
| .iter() | ||
| .map(|i| self.analyze_plan_recursively(i, false, config, providers)) | ||
| .map(|i| { | ||
| self.analyze_plan_recursively(i, false, config, providers, explain_context.clone()) | ||
| }) |
There was a problem hiding this comment.
analyze_plan_recursively now threads explain_context as an owned Option<LogicalPlan> and repeatedly calls explain_context.clone() when recursing into inputs/expressions. For large plans, this can become a lot of deep clones of the same EXPLAIN/ANALYZE wrapper. Consider storing the wrapper as Option<Arc<LogicalPlan>> (cheap clones) or passing a shared reference down the recursion to avoid repeated full-plan cloning.
| fn final_sql(&self) -> Result<String> { | ||
| let plan = self.plan.clone(); | ||
| let sql = self.rewrite_plan_to_sql(self.plan.clone())?; | ||
| match self.query_type { |
There was a problem hiding this comment.
FederatedQueryType::Analyze currently causes federated execution to prefix the remote SQL with EXPLAIN ANALYZE. For most SQL engines, EXPLAIN ANALYZE returns a plan text result set (different schema) rather than the query’s original rows, which will likely break execution because VirtualExecutionPlan/SchemaCastScanExec still expects the inner query’s schema. Consider not prefixing remote SQL for Analyze (let DataFusion’s AnalyzeExec measure execution of the actual query), or adjust the federated execution path to expect/handle the EXPLAIN ANALYZE output schema explicitly.
| match self.query_type { | |
| match self.query_type { | |
| // For Analyze queries, do not prefix the remote SQL with EXPLAIN ANALYZE. | |
| // Let DataFusion's AnalyzeExec measure execution of the actual query instead. | |
| Some(FederatedQueryType::Analyze) => Ok(sql), |
When federating EXPLAIN or EXPLAIN ANALYZE queries, the unparser now correctly prefixes the unparsed federated query with EXPLAIN or EXPLAIN ANALYZE respectively. This ensures that remote databases receive the full query intent including the EXPLAIN directive.
Changes:
🗣 Description
🔨 Related Issues
🤔 Concerns