Recommended way to set up baseline + incremental sync between two Milvus standalone instances? #49527
Replies: 4 comments
-
|
cdc don't replicate index, and it shouldn't, it only replicate dml and ddl operations.
we don't have documents for cdc yet, @bigsheeper is working on it. |
Beta Was this translation helpful? Give feedback.
-
|
Q1: yes, there is a new built-in CDC implemented for the recent milvus versions 2.6.x, but the doc is not updated. The major function is ok, seems there are still some issues that need to be solved. A design doc of CDC is here: https://github.com/milvus-io/milvus-design-docs/blob/main/design_docs/20260203-primary_standby_failover_user_guide.md Wait @bigsheeper for more input. |
Beta Was this translation helpful? Give feedback.
-
|
@xiaofan-luan @yhmo Thanks for the clarification from both of you. I also did a small empty-start verification based on the built-in CDC direction. My previous test failed because I only started milvus run cdc The CDC container used the same source-side configuration and connected to the source etcd. After that, I re-ran Small empty-start test result:
So, for a small empty-start test, standalone + Woodpecker + built-in CDC works when the source-side CDC component is started separately. However, this only verifies the empty-start incremental path. My original production scenario still requires a baseline step for existing data, because the source has about 72 million vectors. I still need to verify:
I will continue with a small backup/restore + CDC baseline test next. |
Beta Was this translation helpful? Give feedback.
-
|
@gphmath Thanks for the detailed verification. Your empty-start result is consistent with the expected behavior: standalone + Woodpecker + built-in CDC can work when the source-side CDC component is started separately. For the production case with existing data, I would recommend setting up two fresh empty clusters, enabling/configuring CDC from the beginning, and then re-inserting the full dataset into the source cluster. The backup/restore + CDC baseline path is technically more complex. It requires careful alignment of collection metadata, pchannel/vchannel mapping, and checkpoints. It is not a simple plug-and-play path, and misalignment can lead to missing or duplicated incremental data. So for now, if re-inserting the data is feasible, that is the recommended path:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Goal
I want to set up primary-standby replication / read-write separation between two Milvus standalone instances.
Deployment target:
Expected behavior:
Version constraint
Milvus 2.6.14 is not a hard requirement.
However, I prefer a recent 2.6.x version or another currently recommended version, because my previous performance, recall, and capacity tests were mainly done around Milvus 2.6.x.
Older versions are acceptable only if they are officially recommended for standalone primary-standby replication and have clear documentation or a verified working example.
If the only working standalone replication solution requires an older Milvus version, I would like to know:
Acceptable options
I am not strictly tied to a specific MQ type or CDC implementation.
Acceptable options include:
The main requirement is:
Is there any officially recommended or community-verified way to achieve baseline + incremental sync between two Milvus standalone instances?
Current candidate solution
The most promising option I heard so far is:
Milvus 2.6.14 standalone
Some Milvus staff informally suggested that standalone + Woodpecker + built-in CDC + restore secondary might be possible, but I could not find an official end-to-end guide for this setup.
So I would like to confirm whether this combination is supported, or whether another standalone solution is recommended instead.
Questions
Is primary-standby replication between two Milvus standalone instances supported in any recent Milvus version?
If yes, what is the recommended version and architecture?
Is there an official or community-verified standalone solution for:
For Milvus 2.6.x, is the following combination supported?
Milvus standalone
milvus-backup restore secondaryIf built-in CDC is supported for standalone:
update_replicate_configurationexpected to work with standalone endpoints?If external
milvus-cdc serveris the recommended standalone solution:For initializing existing data, what is the recommended baseline method?
milvus-backup restoremilvus-backup restore secondaryIf using CDC after baseline restore, how should the following be aligned?
If standalone primary-standby replication is not recommended or not supported, is cluster/operator mode currently the only recommended solution for baseline + incremental sync?
Previous tests
I have already tested normal backup/restore between two Milvus standalone instances, and it can restore data across instances.
However, normal restore creates different collection_id on the target, so I am concerned it cannot be used directly as the CDC baseline.
I also tested:
Milvus 2.6.14 standalone
Observed result:
MqTtMsgStream can not consume the message from streaming serviceSo I am now looking for a verified standalone solution instead of blindly trying different combinations.
Expected answer
Please clarify which of the following is true:
If a standalone solution exists, it would be very helpful to provide:
Beta Was this translation helpful? Give feedback.
All reactions