[Ideas] Switch to Streaming Hashagg for multiple phase aggregation in postgres planner. #1411
Unanswered
avamingli
asked this question in
Ideas / Feature Requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Description
Currently, the Postgres planner selectively employs streaming aggregation in specific scenarios, such as DISTINCT operations or when parallelism is involved.
However, this capability is not generically available for other multi-phase aggregation plans.
I've been looking at our TPC-DS results and the performance difference with Orca.
Recent benchmarking against the Orca optimizer using TPC-DS has revealed a consistent performance gap: even when the final plans are very similar, Orca tends to be faster.
The root cause has been identified: Orca, by design, defaults to using Streaming Hash Aggregation for multi-phase aggregations, while the Postgres planner does not.
Analysis of Streaming vs. Non-Streaming Aggregation
The choice between streaming and non-streaming (hash) aggregation involves a trade-off:
Benefits of Streaming: It can push aggregation computations higher up the plan tree, which often enhances parallelism and reduces the need for heavy data materialization.
Drawbacks of Streaming: It may convert potential disk I/O pressure into network pressure, as grouped data is streamed between nodes.
It's hard for the planner to weigh this precisely because it depends on data distribution and duplication. But the empirical data from our benchmarks is clear: streaming is a net win overall.
In the best-case scenario—where data per segment is largely unique—the non-streaming mode may force a full spill of all data to disk, while the streaming mode incurs almost no additional overhead.
So, I propose we switch the Postgres planner to use Streaming Hash Aggregation for these cases to match Orca's approach and get that performance boost.
Use case/motivation
No response
Related issues
No response
Are you willing to submit a PR?
Beta Was this translation helpful? Give feedback.
All reactions