Skip to content

[Improve][connector-doris] Improved doris source enumerator splits allocation algorithm for subtasks #9108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 9, 2025

Conversation

JeremyXin
Copy link
Contributor

…location algorithm for subtasks

Purpose of this pull request

Similar to pr #8453, improving doris source enumerator splits allocation algorithm for subtasks and add UT.

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request improves the Doris source enumerator’s splits allocation algorithm for subtasks and adds corresponding unit tests. Key changes include:

  • Refactoring the split allocation algorithm to sort splits and assign them using an AtomicInteger.
  • Adding unit tests for both file-based and Doris-specific source enumerators.
  • Minor formatting corrections in the PartitionDefinition’s toString method.

Reviewed Changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated no comments.

File Description
seatunnel-translation/seatunnel-translation-base/src/test/java/org/apache/seatunnel/translation/source/ParallelSourceTest.java Introduces/refactors tests for file source split enumeration and adds tests for Doris source split enumeration.
seatunnel-connectors-v2/connector-doris/src/test/java/org/apache/seatunnel/connectors/doris/split/DorisSourceSplitEnumeratorTest.java Adds tests validating the Doris source split enumeration and allocation logic.
seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/source/split/DorisSourceSplitEnumerator.java Refactors split assignment to use a sorted list and AtomicInteger for load balancing.
seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/rest/PartitionDefinition.java Fixes minor formatting in the toString implementation.
Files not reviewed (2)
  • pom.xml: Language not supported
  • seatunnel-translation/seatunnel-translation-base/pom.xml: Language not supported
Comments suppressed due to low confidence (1)

seatunnel-translation/seatunnel-translation-base/src/test/java/org/apache/seatunnel/translation/source/ParallelSourceTest.java:183

  • Consider adding edge-case tests for the allocateFiles method (e.g. when fileSize is less than parallelism) to ensure the allocation logic behaves correctly in these scenarios.
public int allocateFiles(int id, int parallelism, int fileSize) {


@Slf4j
public class ParallelSourceTest {

@Test
void testParallelSourceForPollingFileAllocation() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JeremyXin . Why change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hisoka-X I wrote this test method for the file connector before. This time, I want to use the Mockito method to optimize this unit test method.

@Hisoka-X
Copy link
Member

Hisoka-X commented Apr 8, 2025

waiting test case passes.

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JeremyXin

@Hisoka-X Hisoka-X changed the title [Improve][connector-doris] Improved doris source enumerator splits allocation algorithm for subtasks。 [Improve][connector-doris] Improved doris source enumerator splits allocation algorithm for subtasks Apr 9, 2025
@hailin0 hailin0 merged commit 5f55e31 into apache:dev Apr 9, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants