feat: add aggregate pushdown support via GetForeignUpperPaths#549
feat: add aggregate pushdown support via GetForeignUpperPaths#549JohnCari wants to merge 11 commits into
Conversation
Add support for pushing aggregate operations (COUNT, SUM, AVG, MIN, MAX) to foreign data sources via the GetForeignUpperPaths callback. New types: - AggregateKind enum: Represents supported aggregate functions - Aggregate struct: Contains aggregate operation details New ForeignDataWrapper trait methods: - supported_aggregates(): Declare which aggregates the FDW can push down - supports_group_by(): Declare GROUP BY pushdown support - get_aggregate_rel_size(): Cost estimation for aggregate queries - begin_aggregate_scan(): Initialize aggregate query execution New module: - upper.rs: Implements GetForeignUpperPaths callback for aggregate pushdown All new methods have default implementations for backward compatibility.
- Add debug2! logging throughout upper.rs for better observability: - HAVING clause rejection - Unsupported aggregate function names - DISTINCT modifier processing - Extracted aggregates list - GROUP BY columns - Cost estimation values - Update docs/contributing/native.md with aggregate trait methods - Add comprehensive Aggregate Pushdown section to docs/guides/query-pushdown.md
|
Thanks for the PR! Could you format the code so CI can pass? |
|
Yes ofcourse, I will do it at night when I get home and thank you! @burmecia |
|
@burmecia working on this now |
Apply rustfmt formatting to satisfy CI checks: - Break long lines in debug2! macro calls - Adjust line formatting in interface.rs
…er_path - Replace direct List field access with pgrx::PgList::from_pg() for proper iteration - Fix create_foreign_upper_path signature for different PostgreSQL versions: - PG13-16: 9 parameters (no disabled_nodes, no fdw_restrictinfo) - PG17: 10 parameters (added fdw_restrictinfo) - PG18: 11 parameters (added disabled_nodes) - Use conditional compilation (#[cfg(feature = "...")]) for version-specific parameters
Replace pgrx::PgList usage with direct pg_sys::List element access. This avoids type availability issues during cargo test since PgList may not be exported at the pgrx crate level in all configurations. The new list_iter helper function iterates over List elements using raw pointer access to the elements array, which is always available as part of the pg_sys FFI bindings.
- Change type_oid: 0 to type_oid: pgrx::pg_sys::Oid::INVALID in Aggregate::deparse doctest - Change trait method doc examples to rust,ignore since they show implementation patterns with &self parameters that don't compile as standalone functions
|
@burmecia I finished formatting the code, it was harder and took me longer than i expected but finally finished, now all seven checks pass, let me know if you need anything else for this PR. |
|
Hi @burmecia, I sent you a LinkedIn connection request with a note about a question on current Supabase job opportunities. I'd love to work at Supabase—it's a great company with a great product. www.linkedin.com/in/johnuel-carrion |
| ptr::null_mut(), // fdw_outerpath | ||
| #[cfg(any(feature = "pg17", feature = "pg18"))] | ||
| ptr::null_mut(), // fdw_restrictinfo (pg17+ only) | ||
| ptr::null_mut(), // fdw_private |
There was a problem hiding this comment.
the fdw_private is passed null pointer, is this correct?
There was a problem hiding this comment.
Also the extracted aggregates and group_by vectors are computed and then discarded, is this intentional?
Based on PR supabase#549 by JohnCari with critical fix: aggregates and group_by are now stored in FdwState and passed through output_rel->fdw_private to the executor. Previously fdw_private was null, so extracted aggregates were silently discarded. Changes: - Add AggregateKind enum, Aggregate struct with deparse() methods - Add FDW trait methods: supported_aggregates(), supports_group_by(), get_aggregate_rel_size(), begin_aggregate_scan() - Add GetForeignUpperPaths callback in new upper.rs module - Add aggregates/group_by fields to FdwState in scan.rs - begin_foreign_scan detects aggregate path and calls begin_aggregate_scan - EXPLAIN output includes aggregate info when present - PG version compat: PG13-PG18 (disabled_nodes, fdw_restrictinfo)
Based on PR supabase#549 by JohnCari with critical fix: aggregates and group_by are now stored in FdwState and passed through output_rel->fdw_private to the executor. Previously fdw_private was null, so extracted aggregates were silently discarded. Changes: - Add AggregateKind enum, Aggregate struct with deparse() methods - Add FDW trait methods: supported_aggregates(), supports_group_by(), get_aggregate_rel_size(), begin_aggregate_scan() - Add GetForeignUpperPaths callback in new upper.rs module - Add aggregates/group_by fields to FdwState in scan.rs - begin_foreign_scan detects aggregate path and calls begin_aggregate_scan - EXPLAIN output includes aggregate info when present - PG version compat: PG13-PG18 (disabled_nodes, fdw_restrictinfo)
Based on PR supabase#549 by JohnCari with critical fix: aggregates and group_by are now stored in FdwState and passed through output_rel->fdw_private to the executor. Previously fdw_private was null, so extracted aggregates were silently discarded. Changes: - Add AggregateKind enum, Aggregate struct with deparse() methods - Add FDW trait methods: supported_aggregates(), supports_group_by(), get_aggregate_rel_size(), begin_aggregate_scan() - Add GetForeignUpperPaths callback in new upper.rs module - Add aggregates/group_by fields to FdwState in scan.rs - begin_foreign_scan detects aggregate path and calls begin_aggregate_scan - EXPLAIN output includes aggregate info when present - PG version compat: PG13-PG18 (disabled_nodes, fdw_restrictinfo)
* feat: add aggregate pushdown support (COUNT/SUM/AVG/MIN/MAX) Based on PR #549 by JohnCari with critical fix: aggregates and group_by are now stored in FdwState and passed through output_rel->fdw_private to the executor. Previously fdw_private was null, so extracted aggregates were silently discarded. Changes: - Add AggregateKind enum, Aggregate struct with deparse() methods - Add FDW trait methods: supported_aggregates(), supports_group_by(), get_aggregate_rel_size(), begin_aggregate_scan() - Add GetForeignUpperPaths callback in new upper.rs module - Add aggregates/group_by fields to FdwState in scan.rs - begin_foreign_scan detects aggregate path and calls begin_aggregate_scan - EXPLAIN output includes aggregate info when present - PG version compat: PG13-PG18 (disabled_nodes, fdw_restrictinfo) * fix: address CodeRabbit review — validate aggregates and GROUP BY 1. Reject pushdown when SUM/AVG/MIN/MAX has no simple column reference (e.g. SUM(a+b)) to avoid generating invalid SQL like SUM(*) 2. Abort pushdown when any GROUP BY item is not a plain column reference to prevent incomplete GROUP BY clauses 3. Rebuild state.tgts for aggregate paths to reflect the actual output shape (group_by columns + aggregate result columns), preventing ERRCODE_FDW_INVALID_COLUMN_NUMBER from the executor * fix: guard upper module import with PG feature cfg The upper module is conditionally compiled with pg13-pg18 feature flags, but fdw_routine() imported it unconditionally, causing build failures without PG features. Wrap the import and GetForeignUpperPaths assignment with the same cfg guard. * fix: address CodeRabbit round 2 — aggfilter, aggtype, default warning 1. Reject pushdown for aggregates with FILTER clause (aggfilter check) 2. Add type_oid field to Aggregate struct, populated from Aggref::aggtype instead of inferring from the input column type 3. Default begin_aggregate_scan() now emits a warning if called without an override, to surface missing implementations early * fix: fail fast in default begin_aggregate_scan instead of warning Use report_error to abort the transaction when begin_aggregate_scan is called without being overridden. This prevents wrong-results failures when an FDW declares aggregate support but forgets to implement the scan method. * feat: implement aggregate pushdown support for ClickHouse FDW * feat: add aggregate pushdown support for BigQuery and SQL Server FDWs * fix: update explain statements to use EXPLAIN ANALYZE for pushdown tests --------- Co-authored-by: Bo Lu <lv.patrick@gmail.com>
|
This PR has been replaced by #586 . |
Summary
This PR adds support for pushing aggregate operations (COUNT, SUM, AVG, MIN, MAX) to foreign data sources via the PostgreSQL
GetForeignUpperPathscallback. This enables FDWs to execute aggregate queries remotely, significantly reducing data transfer for analytics queries.Key Changes
New types in
interface.rs:AggregateKindenum: Represents supported aggregate functions (Count, CountColumn, Sum, Avg, Min, Max)Aggregatestruct: Contains aggregate operation details including kind, column, distinct flag, and aliassql_name(),deparse(),deparse_with_alias()for SQL generationNew ForeignDataWrapper trait methods:
supported_aggregates(): Declare which aggregates the FDW can push down (default: empty = no pushdown)supports_group_by(): Declare GROUP BY pushdown support (default: false)get_aggregate_rel_size(): Cost estimation for aggregate queriesbegin_aggregate_scan(): Initialize aggregate query executionNew module
upper.rs:GetForeignUpperPathscallback for aggregate pushdownGroupPathExtraDataBackward Compatibility
All new trait methods have default implementations, ensuring existing FDWs continue to work without modification. FDWs opt-in to aggregate pushdown by overriding
supported_aggregates()to return a non-empty vector.Example Usage
Related Issue
Closes #22
Test Plan
SELECT MIN(x), MAX(x), COUNT(*) FROM t)