Make UNION Parallel #1202
-
DescriptionIn PostgreSQL, the UNION operator can leverage parallel processing through the Parallel Append node with subnodes. explain (costs off, verbose) select * from t1 union select * from t2;
QUERY PLAN
--------------------------------------------------
HashAggregate
Output: t1.a, t1.b
Group Key: t1.a, t1.b
-> Gather
Output: t1.a, t1.b
Workers Planned: 3
-> Parallel Append
-> Parallel Seq Scan on public.t1
Output: t1.a, t1.b
-> Parallel Seq Scan on public.t2
Output: t2.a, t2.b
(11 rows)
However, in CBDB, we face challenges when UNION subqueries include Motion nodes, which can lead to incorrect results if Parallel Append is used. This is due to the competition among workers for subnodes: some subnodes are executed by a single worker while others may be processed by multiple workers. When a worker completes a task for a multi-worker subnode, it marks that job as finished. For cases involving only Scan nodes—such as Parallel Scans on partitioned tables—this issue does not arise. Despite these challenges, as an MPP database, CBDB has the potential to support parallel processing for UNION operations. This can be achieved if multiple workers execute Append nodes within a slice and the data is well-distributed (hashed across multiple segments relative to the cluster) according to the output columns of the subqueries. Use case/motivationNo response Related issuesNo response Are you willing to submit a PR?
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Done in #1213
To be fixed later. |
Beta Was this translation helpful? Give feedback.
Done in #1213
To be fixed later.