Make UNION Parallel #1202

avamingli · 2025-07-01T04:17:47Z

avamingli
Jul 1, 2025
Collaborator

Description

In PostgreSQL, the UNION operator can leverage parallel processing through the Parallel Append node with subnodes.

explain (costs off, verbose) select * from t1 union select * from t2;
                    QUERY PLAN
--------------------------------------------------
 HashAggregate
   Output: t1.a, t1.b
   Group Key: t1.a, t1.b
   ->  Gather
         Output: t1.a, t1.b
         Workers Planned: 3
         ->  Parallel Append
               ->  Parallel Seq Scan on public.t1
                     Output: t1.a, t1.b
               ->  Parallel Seq Scan on public.t2
                     Output: t2.a, t2.b
(11 rows)

However, in CBDB, we face challenges when UNION subqueries include Motion nodes, which can lead to incorrect results if Parallel Append is used. This is due to the competition among workers for subnodes: some subnodes are executed by a single worker while others may be processed by multiple workers.

When a worker completes a task for a multi-worker subnode, it marks that job as finished.
While this is acceptable in PostgreSQL, but in CBDB, the presence of Motion nodes complicates this process. The premature marking of completion can cause the Motion Sender to stop early, resulting in data loss for other workers.

For cases involving only Scan nodes—such as Parallel Scans on partitioned tables—this issue does not arise.
However, for UNION ALL and similar scenarios, the use of Parallel Append is unsafe. That's another problem and we should consider disabling it in the future.

Despite these challenges, as an MPP database, CBDB has the potential to support parallel processing for UNION operations. This can be achieved if multiple workers execute Append nodes within a slice and the data is well-distributed (hashed across multiple segments relative to the cluster) according to the output columns of the subqueries.
Specifically, we can implement a Parallel-oblivious Append approach, where multiple workers operate independently without sharing state, rather than a Parallel-aware Append that requires coordination among workers.

Use case/motivation

No response

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Answered by avamingli

Jul 9, 2025

Specifically, we can implement a Parallel-oblivious Append approach, where multiple workers operate independently without sharing state, rather than a Parallel-aware Append that requires coordination among workers.

Done in #1213

However, for UNION ALL and similar scenarios, the use of Parallel Append is unsafe. That's another problem and we should consider disabling it in the future.

To be fixed later.

View full answer

avamingli · 2025-07-09T05:04:23Z

avamingli
Jul 9, 2025
Collaborator Author

Specifically, we can implement a Parallel-oblivious Append approach, where multiple workers operate independently without sharing state, rather than a Parallel-aware Append that requires coordination among workers.

Done in #1213

However, for UNION ALL and similar scenarios, the use of Parallel Append is unsafe. That's another problem and we should consider disabling it in the future.

To be fixed later.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make UNION Parallel #1202

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Make UNION Parallel #1202

Uh oh!

Uh oh!

avamingli Jul 1, 2025 Collaborator

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

Replies: 1 comment

Uh oh!

avamingli Jul 9, 2025 Collaborator Author

avamingli
Jul 1, 2025
Collaborator

avamingli
Jul 9, 2025
Collaborator Author