Strimzi Kafka external PVC [High Availability and Disaster Recovery] #8750
Replies: 3 comments 8 replies
-
|
There are some issues with Argo and PVC deletion - there are some other threads about that. Using a Mirror Maker to duplicate the Kafka cluster and its data can definitely be used for DR / backup. But it is a fairly expensive way as you need to run a second cluster and in public clouds also the data transfers. There is no simple way to switch traffic between the clusters. You would need to reconfigure all clients to make sure they connect to the backup cluster or use some network infrastructure that would do it for you (including severing existing connections to the old cluster etc.). And assuming you use them as active-passive which would be typical for backup scenarios, there is also no good way how to revert the mirroring flow. You would basically start with a new cluster as a new backup after the switch. So operations-wise, it is not completely straightforward either. Deleting the operator Deployment itself should not really delete anything. That should happen only when you delete the custom resources (e.g. by deleting them directly or deleting the CRDs which will cause garbage collection of the custom resources). You can to some extent prevent that for example by setting finalizers on the CRDs / CRs. It is up to you to evaluate what is the right measure for your situation and what the scenarios you want to have covered are - I'm just offering this as an option. |
Beta Was this translation helpful? Give feedback.
-
|
We ran into the same PVC lifecycle problem — deleting Strimzi CRDs triggers garbage collection of PVCs, and node failures can cause permanent data loss. MirrorMaker2 helps with replication but requires a full secondary cluster and complex failover. We built the Strimzi Backup Operator to solve this. It is a Strimzi-native Kubernetes operator that adds Key features: cron-based scheduled backups, point-in-time recovery with millisecond precision, topic filtering, consumer group offset restore, retention policies, and Prometheus metrics. Written in Rust with kube-rs. Would love feedback from the Strimzi community. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @scholzj, @im-konge, and @ppatierno for the feedback! Fair points! We've addressed everything raised here in v0.1.0:
The @ppatierno Great idea!!! We'd love to be listed in awesome-strimzi! As for why Rust: we wanted a small, fast operator binary with low memory footprint and strong compile-time safety. The kube-rs ecosystem has matured nicely and the resulting container image is ~15MB. Thanks again for the nice words, please let me or the team know of anything else as we are happy to make any further adjustments the maintainers suggest. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I'm testing Strimzi Kafka and we are trying to see how to achieve high availability and disaster recovery.
We have recently seen a weird behavior where a cluster was missing and we were only able to have this back by deleting the operator and recreating everything again by Argo.
Based on that we started to see how to ensure that the PVCs didn't get deleted.
Would you know some approaches? For example, using a mirror maker and having two operators in order to switch the traffic in case of an issue?
Or some way to use an external PVC, so if I delete the entire operator I still have the data stored somewhere.
Beta Was this translation helpful? Give feedback.
All reactions