Skip to content

MEP: Channel Exclusive Mode for QueryCoord#11

Open
weiliu1031 wants to merge 4 commits intomilvus-io:mainfrom
weiliu1031:mep/channel-exclusive-mode
Open

MEP: Channel Exclusive Mode for QueryCoord#11
weiliu1031 wants to merge 4 commits intomilvus-io:mainfrom
weiliu1031:mep/channel-exclusive-mode

Conversation

@weiliu1031
Copy link

Summary

Add comprehensive design document for Channel Exclusive Mode feature in QueryCoord.

This MEP describes the architecture, implementation, and operational considerations for channel-level resource isolation in Milvus 2.6.

Key Features

  • Channel-centric load balancing with dedicated node assignments
  • Automatic enable/disable based on cluster state
  • Runtime-refreshable configuration (no restart required)
  • Graceful fallback when resources insufficient
  • Complete migration and rollback procedures
  • Rolling upgrade support with automatic recovery

Design Highlights

Architecture

  • Copy-on-Write pattern for thread-safe replica modifications
  • ChannelNodeInfo proto map for O(1) channel-to-node lookups
  • ReplicaObserver for continuous state monitoring (1-second interval)
  • ChannelLevelScoreBalancer with graceful fallback to segment-level balancing

Key Components

  • Replica: Immutable state with channel-to-node mappings
  • mutableReplica: Transient state for COW updates
  • ChannelLevelScoreBalancer: Channel-aware load balancing with outbound node handling
  • ReplicaObserver: Dynamic enable/disable based on configuration

Configuration

  • Default balancer: ChannelLevelScoreBalancer
  • ChannelExclusiveNodeFactor: 1 (minimum nodes per channel)
  • Runtime-refreshable without service restart

Migration Path

  • Automatic enablement when sufficient nodes available
  • Automatic disablement when nodes insufficient or balancer changed
  • Rolling upgrade support with temporary violations during StoppingBalancer phase
  • Automatic recovery after upgrade completion

Related

  • Issue: #47500
  • PR: #47505

Document Structure

  1. Summary: High-level overview
  2. Motivation: Problem statement and goals
  3. Design Details:
    • Architecture overview and system components
    • Core data structures with design rationale
    • Channel exclusive mode lifecycle
    • Node assignment algorithm
    • ChannelLevelScoreBalancer implementation
    • Configuration parameters with examples
    • Complete data flow examples
  4. Compatibility, Deprecation, and Migration Plan:
    • Enable/disable procedures
    • Rolling upgrade scenarios
    • Performance impact warnings
  5. Test Plan: Unit, integration, and system tests
  6. References: Code files and configuration

Checklist

  • Followed MEP template structure
  • Included all required sections (Summary, Motivation, Design Details, Compatibility, Test Plan, References)
  • Added configuration examples and default values
  • Documented migration and rollback procedures
  • Included rolling upgrade considerations
  • Added performance impact warnings
  • Included test plan and verification checklist
  • Referenced related issues and PRs
  • Used proper MEP file naming (YYYYMMDD-descriptive-name.md)

issue: #47500, #47505

Add comprehensive design document for Channel Exclusive Mode feature in QueryCoord.

This MEP describes:
- Architecture overview and system components
- Core data structures (ChannelNodeInfo, Replica, mutableReplica)
- Channel exclusive mode lifecycle and activation conditions
- Node assignment algorithm with even distribution
- ChannelLevelScoreBalancer implementation details
- Configuration parameters and examples
- Complete data flow examples including node removal scenarios
- Migration and rollback procedures
- Rolling upgrade considerations
- Resource impact warnings and best practices
- Test plan and verification checklist

The design enables channel-level resource isolation with automatic
enable/disable based on cluster state and runtime configuration.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: weiliu1031
To complete the pull request process, please assign liliu-z after the PR has been reviewed.
You can assign the PR to them by writing /assign @liliu-z in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Replace MEP format with the original design document format from
the milvus repository. This version maintains the original chapter
structure (1-10) and formatting without MEP metadata header.

The document provides comprehensive coverage of:
- Background and motivation
- Architecture overview and system components
- Core data structures with design rationale
- Channel exclusive mode lifecycle
- Node assignment algorithm
- ChannelLevelScoreBalancer implementation
- Configuration parameters and examples
- Complete data flow examples
- Migration and rollback procedures
- Rolling upgrade considerations
- Conclusion and references

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
…cklist

Add the following enhancements to support complete documentation requirements:

1. Related Issues and PRs section at the beginning:
   - Reference to issue #47500
   - Reference to implementation PR #47505
   - Release version (2.6.0)

2. New Chapter 9: Test Plan and Verification
   - Unit tests for replica logic, balancer, and observer
   - Integration test scenarios for end-to-end validation
   - System tests for load testing and resource impact
   - Production verification checklist (pre/during/post-deployment)
   - Rolling upgrade verification steps
   - Regression testing critical paths
   - Performance benchmarks and baselines

3. Enhanced References (Chapter 11)
   - Added related issues and PRs section
   - Added documentation references
   - Added related work references

4. Updated metadata:
   - Document version 1.0 → 1.1
   - Added status and last updated timestamp

The document now provides:
- ✓ Related issues and PRs explicitly referenced
- ✓ Comprehensive test plan with detailed test cases
- ✓ Production verification checklist
- ✓ Performance benchmarks and impact assessment
- ✓ All sections from checklist: Migration, Rollback, Rolling Upgrade, Performance Warnings, Test Plan

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
@liliu-z
Copy link
Member

liliu-z commented Feb 4, 2026

Design Review Comments

Thanks for the comprehensive design document! I have two questions/suggestions:

1. How to avoid massive handoff when toggling exclusive mode?

The document acknowledges that enabling/disabling channel exclusive mode triggers massive rebalancing with 50-80% CPU spikes and 2-5x latency increase. This could be a significant operational risk.

Question: Has the design considered any strategies to reduce the handoff storm during mode transitions?

Some potential approaches to consider:

  • Gradual migration: Instead of moving all segments at once, migrate in batches with rate limiting
  • Preserve existing mappings: When enabling exclusive mode, prioritize keeping current channel-node relationships and only adjust conflicts
  • Lazy enforcement: After config change, don't trigger immediate migration; let natural balancing cycles gradually converge
  • Soft → Hard isolation: Mark target assignments first, place new segments according to new rules, let old segments migrate gradually or age out

2. Can the "remainder" nodes be shared across all channels?

Current design (7 nodes, 3 channels):

channel_0: [1, 2, 3]  ← gets 3 nodes
channel_1: [4, 5]     ← gets 2 nodes  
channel_2: [6, 7]     ← gets 2 nodes

The first channel always gets the extra node(s), which seems unfair.

Suggested alternative:

channel_0: [1, 2] + [7](shared)
channel_1: [3, 4] + [7](shared)
channel_2: [5, 6] + [7](shared)

Each channel gets 2 dedicated nodes, and node 7 becomes a "shared/overflow" node that all channels can use.

Benefits:

  • Fairer resource distribution across channels
  • Better resource utilization - shared node can serve whichever channel needs it most
  • More flexible for handling load imbalance

Considerations:

  • How to schedule segments from multiple channels on the shared node?
  • May need to introduce new concepts like shared_nodes or overflow_nodes
  • Need to clarify if this conflicts with the "exclusive" design philosophy

Looking forward to your thoughts on these!


**Document Version**: 1.1 (with Test Plan and Verification)
**Date**: 2026-02-04
**Author**: Milvus QueryCoord Team
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.... leave ur own name plz

…n document

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
@weiliu1031
Copy link
Author

Thanks for the detailed review and thoughtful suggestions! Let me address both questions:

Response to Question 1: Avoiding Massive Handoff During Mode Transitions

The current implementation already includes several mechanisms to mitigate the handoff storm:

Gradual Balancing (Already Implemented):

  • When channel exclusive mode is toggled, the system does NOT trigger an immediate full-scale rebalancing
  • Instead, it follows the normal balancing rhythm to gradually converge to the target state
  • The balancing tasks are generated incrementally over multiple balancing cycles, not all at once

Existing Rate Limiting Capabilities:
We already have multiple mechanisms to control the balancing pace and priority:

  • Balancing intervals: Configurable cycle time between balance rounds
  • Task concurrency limits: Maximum number of concurrent segment/channel movements
  • Priority scheduling: Critical operations (e.g., stopping node evacuation) take precedence over regular balancing
  • Resource-aware throttling: Balance tasks are throttled based on node CPU/memory/IO utilization

Remaining Concerns:
While these mechanisms significantly reduce the impact, we acknowledge that:

  • The performance impact cannot be completely eliminated during mode transitions
  • For latency-sensitive clusters with high query loads, even gradual rebalancing can cause noticeable degradation
  • This is why the document includes strong warnings and recommends performing mode transitions during low-traffic maintenance windows

The current design strikes a balance between:

  • Operational safety (gradual migration with rate limiting)
  • Convergence speed (reaching target state in reasonable time)
  • System complexity (avoiding overly complex migration orchestration)

Response to Question 2: Shared "Remainder" Nodes

The shared node approach is not recommended for the following reasons:

1. Design Complexity and Philosophical Inconsistency

Hybrid Mode Challenges:

  • If some nodes are channel-exclusive while others are channel-shared, we would need:
    • More complex balancing algorithms to manage both exclusive and shared node groups
    • Separate code paths for handling shared vs. exclusive nodes
    • Additional metadata to track which nodes are shared and which channels can use them

Loss of "Exclusive" Semantics:

  • This is no longer true channel exclusive mode if channels share resources
  • The original goal of strict isolation between channels is compromised
  • It becomes a "mostly exclusive with fallback to shared" hybrid model, which is harder to reason about

2. Severe Query Performance Bottleneck

Hotspot Problem:
Consider a collection with 10 channels:

  • Exclusive nodes: Each handles queries for 1 channel → query load = 1x
  • Shared node: Handles queries for all 10 channels → query load = 10x

The shared node becomes an artificial hotspot:

  • Disproportionate query QPS concentration on the shared node
  • CPU/memory/IO contention on the shared node becomes a single point of failure
  • Query latency for all channels degrades due to shared node saturation

Defeat the Purpose:

  • The whole point of channel exclusive mode is to prevent resource contention between channels
  • Introducing shared nodes reintroduces the exact problem we're trying to solve
  • The isolation guarantees and predictable performance benefits are lost

3. Current Design Rationale

Why "First Channel Gets Extra Nodes" is Acceptable:

The current design (e.g., 7 nodes / 3 channels → [3, 2, 2]) is a simple, pragmatic solution:

  1. Deterministic: Channel assignment is based on sorted channel names, ensuring consistency across restarts
  2. Minimal Complexity: No special handling for shared nodes
  3. Fair in Practice:
    • The imbalance is minor (at most 1 extra node)
    • Channels are typically created at collection load time and rarely change
    • If load imbalance becomes an issue, users can adjust channelExclusiveNodeFactor or add more nodes

Alternative Solutions for Better Fairness (if needed in the future):

  • Round-robin assignment: Instead of allocating all extra nodes to the first channel, distribute them round-robin across channels
    • Example: 7 nodes / 3 channels → [3, 2, 2] becomes [3, 2, 2] (first channel), [2, 3, 2] (second channel), [2, 2, 3] (third channel)
    • This could be done by rotating the channel order in each balancing cycle
  • User-defined affinity: Allow administrators to explicitly specify node assignments per channel
  • Load-based adjustment: Dynamically adjust node allocation based on actual query load per channel

However, these enhancements add complexity and are not critical for the initial implementation.


Summary

  1. Question 1: The design already includes gradual balancing and rate limiting. The performance impact warning remains necessary for latency-sensitive workloads.

  2. Question 2: Shared nodes introduce significant complexity and defeat the purpose of channel exclusive mode by creating query hotspots. The current "extra nodes to first channel" approach is simple, deterministic, and the fairness gap is minimal.

Let me know if you'd like to discuss these trade-offs further or if there are other aspects of the design you'd like me to clarify!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants