Skip to content

Limit to 3 pipelines per node per source #5792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: test-solution-stability
Choose a base branch
from

Conversation

rdettai
Copy link
Collaborator

@rdettai rdettai commented Jun 10, 2025

Description

Closes #4470
Closes #5747
Closes #4630

This addresses both the limitation that only 1 merge pipelines can run per indexer at any given time and the fact that nodes systematically end up with all pipelines of a source when using Kafka, even if the number of pipelines for that source is rather large.

How was this PR tested?

Added unit test, should add more

@fmassot
Copy link
Collaborator

fmassot commented Jun 10, 2025

why 3?

@rdettai
Copy link
Collaborator Author

rdettai commented Jun 10, 2025

why 3?

It's a ratio proposed by Paul here. The rational is that if you you start saturating the systems, merges being 3x faster than indexing, with this ratio merges would be able to keep up.

@rdettai rdettai requested a review from fulmicoton-dd June 10, 2025 13:50
@rdettai
Copy link
Collaborator Author

rdettai commented Jun 10, 2025

There are 2 issues with the logic so far:

  • when you have 4 pipelines, they will be split into 3 and 1, which is not ideally balanced
  • upon new iterations the extra pipeline might end up on other nodes (e.g 3 on a node, and 1 on each other nodes)

@rdettai rdettai marked this pull request as draft June 11, 2025 08:33
@rdettai rdettai force-pushed the explicit-scheduling-rescaling branch from c065f94 to 1b57434 Compare June 11, 2025 09:20
@rdettai
Copy link
Collaborator Author

rdettai commented Jun 11, 2025

Actually this is wrong for now, I thought shards in the simplified problem were indexing pipelines in the physical plan, but it's not true. Shards in the simplified problem are physical shards, and they are mapped to pipelines in

fn convert_scheduling_solution_to_physical_plan_single_node_single_source(

Base automatically changed from explicit-scheduling-rescaling to main June 11, 2025 12:47
@rdettai rdettai force-pushed the limit-3-pipelines-per-node branch from 2c62834 to 0648c42 Compare June 11, 2025 12:47
@rdettai rdettai changed the base branch from main to test-solution-stability June 11, 2025 12:50
@rdettai rdettai force-pushed the limit-3-pipelines-per-node branch from 0648c42 to e6c4ebd Compare June 11, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants