Skip to content

feat: add aggregate pushdown support via GetForeignUpperPaths#549

Closed
JohnCari wants to merge 11 commits into
supabase:mainfrom
JohnCari:feature/aggregate-pushdown
Closed

feat: add aggregate pushdown support via GetForeignUpperPaths#549
JohnCari wants to merge 11 commits into
supabase:mainfrom
JohnCari:feature/aggregate-pushdown

Conversation

@JohnCari

@JohnCari JohnCari commented Jan 3, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds support for pushing aggregate operations (COUNT, SUM, AVG, MIN, MAX) to foreign data sources via the PostgreSQL GetForeignUpperPaths callback. This enables FDWs to execute aggregate queries remotely, significantly reducing data transfer for analytics queries.

Key Changes

  • New types in interface.rs:

    • AggregateKind enum: Represents supported aggregate functions (Count, CountColumn, Sum, Avg, Min, Max)
    • Aggregate struct: Contains aggregate operation details including kind, column, distinct flag, and alias
    • Helper methods: sql_name(), deparse(), deparse_with_alias() for SQL generation
  • New ForeignDataWrapper trait methods:

    • supported_aggregates(): Declare which aggregates the FDW can push down (default: empty = no pushdown)
    • supports_group_by(): Declare GROUP BY pushdown support (default: false)
    • get_aggregate_rel_size(): Cost estimation for aggregate queries
    • begin_aggregate_scan(): Initialize aggregate query execution
  • New module upper.rs:

    • Implements GetForeignUpperPaths callback for aggregate pushdown
    • Extracts aggregate information from GroupPathExtraData
    • Extracts GROUP BY columns from query
    • Creates and registers foreign upper paths for aggregate queries

Backward Compatibility

All new trait methods have default implementations, ensuring existing FDWs continue to work without modification. FDWs opt-in to aggregate pushdown by overriding supported_aggregates() to return a non-empty vector.

Example Usage

impl ForeignDataWrapper<MyError> for MyFdw {
    fn supported_aggregates(&self) -> Vec<AggregateKind> {
        vec![
            AggregateKind::Count,
            AggregateKind::Sum,
            AggregateKind::Avg,
            AggregateKind::Min,
            AggregateKind::Max,
        ]
    }

    fn supports_group_by(&self) -> bool {
        true
    }

    fn begin_aggregate_scan(
        &mut self,
        aggregates: &[Aggregate],
        group_by: &[Column],
        quals: &[Qual],
        options: &HashMap<String, String>,
    ) -> Result<(), MyError> {
        // Build and execute remote aggregate query
        let select_items: Vec<String> = group_by.iter()
            .map(|c| c.name.clone())
            .chain(aggregates.iter().map(|a| a.deparse_with_alias()))
            .collect();
        // Execute query...
        Ok(())
    }
}

Related Issue

Closes #22

Test Plan

  • Verify existing FDWs without aggregate support continue to work unchanged
  • Test COUNT(*) pushdown with a simple FDW implementation
  • Test GROUP BY pushdown with multiple columns
  • Test multiple aggregates in single query (e.g., SELECT MIN(x), MAX(x), COUNT(*) FROM t)
  • Verify HAVING clause correctly prevents pushdown (not supported)
  • Verify unsupported aggregates correctly fall back to local execution
  • Test EXPLAIN output shows aggregate pushdown when applicable

Add support for pushing aggregate operations (COUNT, SUM, AVG, MIN, MAX)
to foreign data sources via the GetForeignUpperPaths callback.

New types:
- AggregateKind enum: Represents supported aggregate functions
- Aggregate struct: Contains aggregate operation details

New ForeignDataWrapper trait methods:
- supported_aggregates(): Declare which aggregates the FDW can push down
- supports_group_by(): Declare GROUP BY pushdown support
- get_aggregate_rel_size(): Cost estimation for aggregate queries
- begin_aggregate_scan(): Initialize aggregate query execution

New module:
- upper.rs: Implements GetForeignUpperPaths callback for aggregate pushdown

All new methods have default implementations for backward compatibility.
- Add debug2! logging throughout upper.rs for better observability:
  - HAVING clause rejection
  - Unsupported aggregate function names
  - DISTINCT modifier processing
  - Extracted aggregates list
  - GROUP BY columns
  - Cost estimation values

- Update docs/contributing/native.md with aggregate trait methods
- Add comprehensive Aggregate Pushdown section to docs/guides/query-pushdown.md
@burmecia

burmecia commented Jan 5, 2026

Copy link
Copy Markdown
Member

Thanks for the PR! Could you format the code so CI can pass?

@JohnCari

JohnCari commented Jan 5, 2026

Copy link
Copy Markdown
Contributor Author

Yes ofcourse, I will do it at night when I get home and thank you! @burmecia

@JohnCari

JohnCari commented Jan 8, 2026

Copy link
Copy Markdown
Contributor Author

@burmecia working on this now

Apply rustfmt formatting to satisfy CI checks:
- Break long lines in debug2! macro calls
- Adjust line formatting in interface.rs
…er_path

- Replace direct List field access with pgrx::PgList::from_pg() for proper iteration
- Fix create_foreign_upper_path signature for different PostgreSQL versions:
  - PG13-16: 9 parameters (no disabled_nodes, no fdw_restrictinfo)
  - PG17: 10 parameters (added fdw_restrictinfo)
  - PG18: 11 parameters (added disabled_nodes)
- Use conditional compilation (#[cfg(feature = "...")]) for version-specific parameters
Replace pgrx::PgList usage with direct pg_sys::List element access.
This avoids type availability issues during cargo test since PgList
may not be exported at the pgrx crate level in all configurations.

The new list_iter helper function iterates over List elements using
raw pointer access to the elements array, which is always available
as part of the pg_sys FFI bindings.
- Change type_oid: 0 to type_oid: pgrx::pg_sys::Oid::INVALID in Aggregate::deparse doctest
- Change trait method doc examples to rust,ignore since they show implementation patterns
  with &self parameters that don't compile as standalone functions
@JohnCari

JohnCari commented Jan 8, 2026

Copy link
Copy Markdown
Contributor Author

@burmecia I finished formatting the code, it was harder and took me longer than i expected but finally finished, now all seven checks pass, let me know if you need anything else for this PR.

@JohnCari

Copy link
Copy Markdown
Contributor Author

Hi @burmecia, I sent you a LinkedIn connection request with a note about a question on current Supabase job opportunities. I'd love to work at Supabase—it's a great company with a great product. www.linkedin.com/in/johnuel-carrion

@burmecia burmecia added the core label Feb 19, 2026
ptr::null_mut(), // fdw_outerpath
#[cfg(any(feature = "pg17", feature = "pg18"))]
ptr::null_mut(), // fdw_restrictinfo (pg17+ only)
ptr::null_mut(), // fdw_private

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fdw_private is passed null pointer, is this correct?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the extracted aggregates and group_by vectors are computed and then discarded, is this intentional?

wenerme added a commit to wenerme/wrappers that referenced this pull request Mar 5, 2026
Based on PR supabase#549 by JohnCari with critical fix: aggregates and group_by
are now stored in FdwState and passed through output_rel->fdw_private
to the executor. Previously fdw_private was null, so extracted aggregates
were silently discarded.

Changes:
- Add AggregateKind enum, Aggregate struct with deparse() methods
- Add FDW trait methods: supported_aggregates(), supports_group_by(),
  get_aggregate_rel_size(), begin_aggregate_scan()
- Add GetForeignUpperPaths callback in new upper.rs module
- Add aggregates/group_by fields to FdwState in scan.rs
- begin_foreign_scan detects aggregate path and calls begin_aggregate_scan
- EXPLAIN output includes aggregate info when present
- PG version compat: PG13-PG18 (disabled_nodes, fdw_restrictinfo)
wenerme added a commit to wenerme/wrappers that referenced this pull request Mar 23, 2026
Based on PR supabase#549 by JohnCari with critical fix: aggregates and group_by
are now stored in FdwState and passed through output_rel->fdw_private
to the executor. Previously fdw_private was null, so extracted aggregates
were silently discarded.

Changes:
- Add AggregateKind enum, Aggregate struct with deparse() methods
- Add FDW trait methods: supported_aggregates(), supports_group_by(),
  get_aggregate_rel_size(), begin_aggregate_scan()
- Add GetForeignUpperPaths callback in new upper.rs module
- Add aggregates/group_by fields to FdwState in scan.rs
- begin_foreign_scan detects aggregate path and calls begin_aggregate_scan
- EXPLAIN output includes aggregate info when present
- PG version compat: PG13-PG18 (disabled_nodes, fdw_restrictinfo)
wenerme added a commit to wenerme/wrappers that referenced this pull request Mar 26, 2026
Based on PR supabase#549 by JohnCari with critical fix: aggregates and group_by
are now stored in FdwState and passed through output_rel->fdw_private
to the executor. Previously fdw_private was null, so extracted aggregates
were silently discarded.

Changes:
- Add AggregateKind enum, Aggregate struct with deparse() methods
- Add FDW trait methods: supported_aggregates(), supports_group_by(),
  get_aggregate_rel_size(), begin_aggregate_scan()
- Add GetForeignUpperPaths callback in new upper.rs module
- Add aggregates/group_by fields to FdwState in scan.rs
- begin_foreign_scan detects aggregate path and calls begin_aggregate_scan
- EXPLAIN output includes aggregate info when present
- PG version compat: PG13-PG18 (disabled_nodes, fdw_restrictinfo)
burmecia added a commit that referenced this pull request May 1, 2026
* feat: add aggregate pushdown support (COUNT/SUM/AVG/MIN/MAX)

Based on PR #549 by JohnCari with critical fix: aggregates and group_by
are now stored in FdwState and passed through output_rel->fdw_private
to the executor. Previously fdw_private was null, so extracted aggregates
were silently discarded.

Changes:
- Add AggregateKind enum, Aggregate struct with deparse() methods
- Add FDW trait methods: supported_aggregates(), supports_group_by(),
  get_aggregate_rel_size(), begin_aggregate_scan()
- Add GetForeignUpperPaths callback in new upper.rs module
- Add aggregates/group_by fields to FdwState in scan.rs
- begin_foreign_scan detects aggregate path and calls begin_aggregate_scan
- EXPLAIN output includes aggregate info when present
- PG version compat: PG13-PG18 (disabled_nodes, fdw_restrictinfo)

* fix: address CodeRabbit review — validate aggregates and GROUP BY

1. Reject pushdown when SUM/AVG/MIN/MAX has no simple column reference
   (e.g. SUM(a+b)) to avoid generating invalid SQL like SUM(*)
2. Abort pushdown when any GROUP BY item is not a plain column reference
   to prevent incomplete GROUP BY clauses
3. Rebuild state.tgts for aggregate paths to reflect the actual output
   shape (group_by columns + aggregate result columns), preventing
   ERRCODE_FDW_INVALID_COLUMN_NUMBER from the executor

* fix: guard upper module import with PG feature cfg

The upper module is conditionally compiled with pg13-pg18 feature flags,
but fdw_routine() imported it unconditionally, causing build failures
without PG features. Wrap the import and GetForeignUpperPaths assignment
with the same cfg guard.

* fix: address CodeRabbit round 2 — aggfilter, aggtype, default warning

1. Reject pushdown for aggregates with FILTER clause (aggfilter check)
2. Add type_oid field to Aggregate struct, populated from Aggref::aggtype
   instead of inferring from the input column type
3. Default begin_aggregate_scan() now emits a warning if called without
   an override, to surface missing implementations early

* fix: fail fast in default begin_aggregate_scan instead of warning

Use report_error to abort the transaction when begin_aggregate_scan is
called without being overridden. This prevents wrong-results failures
when an FDW declares aggregate support but forgets to implement the
scan method.

* feat: implement aggregate pushdown support for ClickHouse FDW

* feat: add aggregate pushdown support for BigQuery and SQL Server FDWs

* fix: update explain statements to use EXPLAIN ANALYZE for pushdown tests

---------

Co-authored-by: Bo Lu <lv.patrick@gmail.com>
@burmecia

burmecia commented May 1, 2026

Copy link
Copy Markdown
Member

This PR has been replaced by #586 .

@burmecia burmecia closed this May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GetForeignUpperPaths & aggregate pushdown

2 participants