[ANSIENG-5796] Add KRaft migration rollback playbook with auto-phase detection#2492
Open
Ishika Paliwal (ishikaa-p) wants to merge 3 commits into
Open
[ANSIENG-5796] Add KRaft migration rollback playbook with auto-phase detection#2492Ishika Paliwal (ishikaa-p) wants to merge 3 commits into
Ishika Paliwal (ishikaa-p) wants to merge 3 commits into
Conversation
…o-phase detection Implements automated rollback from ZooKeeper to KRaft migration back to ZooKeeper-only mode, covering Phases 2, 3, and 4. Phase 5 is detected and blocked as irreversible. New files: - playbooks/kraft_migration_rollback.yml: entry point, auto-routes by detected phase, callable via ZKtoKraftMigration.yml --tags rollback - playbooks/tasks/detect_migration_phase.yml: Jolokia and config-based detection, no kafka-migration-check dependency, handles Jolokia- unavailable fallback for mid-rollback re-runs - playbooks/validations/kraft_migration_rollback_validations.yml: pre-flight checks including ZK health and FINALIZED guard Modified: - playbooks/ZKtoKraftMigration.yml: adds rollback entry point via import_playbook with tags rollback Phase 3 order matches tested impl: stop controllers first, clean ZK znodes, clean broker metadata in parallel, rolling broker restart. Phase 4 uses template-based config regen for correct authorizer handling with two-run workflow for non-RBAC ACL clusters.
…os for all three migration phases
Add three Molecule scenarios that run migration to a target phase then
execute the rollback playbook to verify full round-trip correctness:
plaintext-rhel-rollback-phase2 — PREMIGRATION state
Controllers provisioned, brokers untouched. Verifies controller
service stops cleanly without any broker changes.
plaintext-rhel-rollback-phase3 — HYBRID_DUAL_WRITE state
Full migrate_to_dual_write (precheck → controller → Jolokia
validation → broker serial restart → wait ZkMigrationState=1).
Verifies one rolling broker restart restores ZK-only mode.
plaintext-rhel-rollback-phase4 — PURE_DUAL_WRITE state
Dual-write + broker KRaft migration (process.roles=broker set,
zookeeper.connect removed). Controllers intentionally left in
dual-write mode. Verifies three-restart rollback path.
Each scenario includes molecule.yml (minimal ZK + broker + dedicated
controller platform), prepare.yml (unconditional ZK cluster install),
verify.yml (asserts broker.id, zookeeper.connect present; process.roles,
migration flag absent; controller service stopped).
Three converge files at molecule/ root follow ZKtoKraftMigration.yml
play order exactly: migration_precheck → kafka_controller → Jolokia
validation → kafka_broker (serial) → wait → [broker KRaft steps for
phase4] → kraft_migration_rollback.
Also fix ZKtoKraftMigration.yml rollback import: add `never` tag so the
rollback playbook does not execute during normal migration runs when no
--tags filter is passed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ack molecule scenarios
Remove ${JOB_BASE_NAME}${BUILD_NUMBER} env var suffixes from container
names — only 3 of ~60 scenarios use them (plaintext-basic-rhel,
custom-user-plaintext-rhel, plaintext-rhel-customrepo). All other
scenarios, including all migration-related ones, use static names.
Use static names consistent with the majority pattern.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
playbooks/kraft_migration_rollback.yml— a single-entry-point rollback playbook that auto-detects migration phase (PREMIGRATION / HYBRID_DUAL_WRITE / PURE_DUAL_WRITE) via Jolokia + config inspection and runs the correct rollback steps without requiring manual--tagsselectionplaybooks/tasks/detect_migration_phase.yml— reusable phase detection task file; handles Jolokia-unavailable fallback for mid-rollback re-runs (controllers stopped after Phase 4 Steps 1–4)playbooks/validations/kraft_migration_rollback_validations.yml— pre-flight checks (ZK health, inventory structure, phase detection, rollback feasibility)--tags rollbackentry point inZKtoKraftMigration.yml;nevertag prevents accidental execution during normal migration runsWhat's tested by the Molecule scenarios
plaintext-rhel-rollback-phase2plaintext-rhel-rollback-phase3plaintext-rhel-rollback-phase4Each scenario converge mirrors
ZKtoKraftMigration.ymlplay order exactly:migration_precheck→kafka_controller→ Jolokia validation →kafka_brokerserial → wait for ZkMigrationState=1 → [broker KRaft lineinfile + restart for Phase 4] → rollback.Usage
Test plan
molecule test -s plaintext-rhel-rollback-phase2molecule test -s plaintext-rhel-rollback-phase3molecule test -s plaintext-rhel-rollback-phase4plaintext-basic-rhelwithMIGRATION_CONVERGE) are unaffected by thenevertag change on the rollback importRelated
JIRA: https://confluentinc.atlassian.net/browse/ANSIENG-5796
🤖 Generated with Claude Code