Closed
Description
Overview of the Issue
If PRS fails during the initialisation of a shard, and the failure happens while promoting the primary while it is writing to the topo-server such that the write succeeds, but fails with a timeout, then the tablet won't change its internal display state to a primary tablet. VTOrc sees this failure and tries to fix this by calling UndoDemotePrimary
, but that doesn't change the type of the tablet to PRIMARY
. It only fixes the mysql level settings and this causes the cluster to not have a primary at all.
Reproduction Steps
- Run PRS, and simulate a failure that happens before new primary tablet has promoted itself.
Binary Version
main
Operating System and Environment details
-