You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add aggregate pushdown support (COUNT/SUM/AVG/MIN/MAX) (#586)
* feat: add aggregate pushdown support (COUNT/SUM/AVG/MIN/MAX)
Based on PR #549 by JohnCari with critical fix: aggregates and group_by
are now stored in FdwState and passed through output_rel->fdw_private
to the executor. Previously fdw_private was null, so extracted aggregates
were silently discarded.
Changes:
- Add AggregateKind enum, Aggregate struct with deparse() methods
- Add FDW trait methods: supported_aggregates(), supports_group_by(),
get_aggregate_rel_size(), begin_aggregate_scan()
- Add GetForeignUpperPaths callback in new upper.rs module
- Add aggregates/group_by fields to FdwState in scan.rs
- begin_foreign_scan detects aggregate path and calls begin_aggregate_scan
- EXPLAIN output includes aggregate info when present
- PG version compat: PG13-PG18 (disabled_nodes, fdw_restrictinfo)
* fix: address CodeRabbit review — validate aggregates and GROUP BY
1. Reject pushdown when SUM/AVG/MIN/MAX has no simple column reference
(e.g. SUM(a+b)) to avoid generating invalid SQL like SUM(*)
2. Abort pushdown when any GROUP BY item is not a plain column reference
to prevent incomplete GROUP BY clauses
3. Rebuild state.tgts for aggregate paths to reflect the actual output
shape (group_by columns + aggregate result columns), preventing
ERRCODE_FDW_INVALID_COLUMN_NUMBER from the executor
* fix: guard upper module import with PG feature cfg
The upper module is conditionally compiled with pg13-pg18 feature flags,
but fdw_routine() imported it unconditionally, causing build failures
without PG features. Wrap the import and GetForeignUpperPaths assignment
with the same cfg guard.
* fix: address CodeRabbit round 2 — aggfilter, aggtype, default warning
1. Reject pushdown for aggregates with FILTER clause (aggfilter check)
2. Add type_oid field to Aggregate struct, populated from Aggref::aggtype
instead of inferring from the input column type
3. Default begin_aggregate_scan() now emits a warning if called without
an override, to surface missing implementations early
* fix: fail fast in default begin_aggregate_scan instead of warning
Use report_error to abort the transaction when begin_aggregate_scan is
called without being overridden. This prevents wrong-results failures
when an FDW declares aggregate support but forgets to implement the
scan method.
* feat: implement aggregate pushdown support for ClickHouse FDW
* feat: add aggregate pushdown support for BigQuery and SQL Server FDWs
* fix: update explain statements to use EXPLAIN ANALYZE for pushdown tests
---------
Co-authored-by: Bo Lu <lv.patrick@gmail.com>
**Supported shapes** — scalar aggregates, `group by` over plain columns, with
167
+
or without a `where` clause. Pushdown also works when the foreign `table`
168
+
option is a sub-query.
169
+
170
+
```sql
171
+
-- All of these run as a single aggregate query on BigQuery:
172
+
selectcount(*) frombigquery.my_table;
173
+
select id, sum(amount) frombigquery.my_tablegroup by id;
174
+
selectcount(distinct name) frombigquery.my_tablewhere id =1;
175
+
```
176
+
177
+
**Cases that are not pushed down** — the query still returns the correct
178
+
result, but the aggregation happens in Postgres after fetching the rows:
179
+
180
+
- The query has a `having` clause
181
+
- The aggregate has a `filter (where …)` clause
182
+
- A `distinct` modifier is used on anything other than `count`
183
+
- The aggregate's argument is not a plain column (for example `sum(a + 1)`)
184
+
- A `group by` item is not a plain column (for example `group by id + 1`)
185
+
- The aggregate function is not in the list above (for example `stddev`, `string_agg`)
186
+
155
187
## Inserting Rows & the Streaming Buffer
156
188
157
189
This foreign data wrapper uses BigQuery’s `insertAll` API method to create a `streamingBuffer` with an associated partition time. **Within that partition time, the data cannot be updated, deleted, or fully exported**. Only after the time has elapsed (up to 90 minutes according to [BigQuery’s documentation](https://cloud.google.com/bigquery/docs/streaming-data-into-bigquery)), can you perform operations.
@@ -256,3 +288,67 @@ where id = 1;
256
288
deletefrombigquery.people
257
289
where id =2;
258
290
```
291
+
292
+
### Aggregate Query Examples
293
+
294
+
These examples assume an `orders` table on BigQuery and a matching foreign
0 commit comments