Skip to content

Update design doc for force promote failover#12

Open
bigsheeper wants to merge 1 commit intomilvus-io:mainfrom
bigsheeper:feat/force-promote-failover
Open

Update design doc for force promote failover#12
bigsheeper wants to merge 1 commit intomilvus-io:mainfrom
bigsheeper:feat/force-promote-failover

Conversation

@bigsheeper
Copy link
Contributor

Summary

Add design document for the force promote feature that enables failover in Milvus cross-cluster replication.

Key Design Points

  • Force Promote API: Add force_promote flag to UpdateReplicateConfiguration to immediately promote a secondary cluster to standalone primary
  • Empty Config Requirement: Force promote requires empty clusters/topology fields; config is auto-constructed from existing meta
  • WithSecondaryClusterResourceKey API: Acquires exclusive cluster-level lock and verifies secondary status
  • Ignore Field: Marks incomplete broadcasts to be skipped, preventing old messages from overwriting force promote config
  • Transaction Rollback: TxnBuffer rolls back all uncommitted transactions when processing force promote message
  • DDL Fixing: Incomplete broadcasts are marked with ignore=true before being supplemented to remaining vchannels

Related

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bigsheeper
To complete the pull request process, please assign zhengbuqian after the PR has been reviewed.
You can assign the PR to them by writing /assign @zhengbuqian in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- Remove force_promote_timestamp field (replaced by ignore field)
- Add ignore field to AlterReplicateConfigMessageHeader for incomplete message handling
- Update constraints: empty cluster/topology fields required, config auto-constructed
- Document WithSecondaryClusterResourceKey() broadcaster API
- Update flow diagram with TxnBuffer transaction rollback
- Document ignore field handling across 7 locations
- Add alternatives considered: user-specified config, timestamp-based detection

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
@bigsheeper bigsheeper force-pushed the feat/force-promote-failover branch from 6ad4e55 to f9536d7 Compare February 5, 2026 13:08
@bigsheeper bigsheeper changed the title Add design doc for force promote failover Update design doc for force promote failover Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants