Skip to content

Bug Report: EmergencyReparentShard issues #18788

@arthurschreiber

Description

@arthurschreiber

Overview of the Issue

I noticed two small-ish issues with EmergencyReparentShard:

  • When the DemotePrimary call fails (e.g. because the old primary is not reachable due to a network outage), we still attempt to call SetReplicationSource on it, which will "forcefully" switch it to be a REPLICA. I don't think this necessarily makes sense - if we couldn't demote the primary I think it's better to just leave it running and have it demote itself once it notices that a different primary has been elected. So depending on whether the old primary is available again at that point, there's two different flows how it will be switched to be a REPLICA.
  • The call to SetReplicationSource can keep executing even after EmergencyReparentShard has been cancelled. I don't think that's intentional and can lead to very hard to understand behavior.

Reproduction Steps

N/A

Binary Version

N/A

Operating System and Environment details

N/A

Log Fragments

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions