Feature Request: Add Leader Election Support for High Availability
Summary
The GKE operator currently does not support running multiple instances safely. This limits its ability to provide high availability and creates potential split-brain scenarios if multiple instances are accidentally deployed.
Problem
Current State:
- Only supports single operator instance
- No protection against multiple instances processing the same resources
- No built-in high availability or failover mechanism
- Risk of race conditions and conflicts if scaled beyond 1 replica
Impact:
- Single point of failure for GKE cluster management
- Cannot scale operator for performance or availability
- Manual intervention required if operator pod fails
- Production deployments lack resilience
Proposed Solution
Implement leader election functionality to enable:
- Safe Multi-Instance Deployment: Multiple operator pods can run simultaneously with only one actively processing resources
- Automatic Failover: If the leader fails, another instance automatically takes over
- High Availability: Eliminates single point of failure
- Configurable Behavior: Allow disabling leader election for development scenarios
Technical Requirements
Core Functionality
Configuration
Deployment Updates
Example Use Cases
Production High Availability:
kubectl scale deployment/gke-config-operator --replicas=3 -n cattle-system
Development Mode:
./gke-operator --leader-election=false
Acceptance Criteria
Technical Implementation Notes
- Use existing
github.com/rancher/wrangler/v3/pkg/leader library for consistency
- Leader election resource name:
gke-operator-leader
- Lease resources created in same namespace as operator deployment
- Required RBAC:
coordination.k8s.io/leases with full permissions
Related Work
This follows the same pattern as other Rancher operators (EKS, AKS) and aligns with Kubernetes controller best practices for leader election.
Priority
Medium-High - This enables production-grade high availability deployments and is essential for enterprise use cases requiring resilient infrastructure management.
Feature Request: Add Leader Election Support for High Availability
Summary
The GKE operator currently does not support running multiple instances safely. This limits its ability to provide high availability and creates potential split-brain scenarios if multiple instances are accidentally deployed.
Problem
Current State:
Impact:
Proposed Solution
Implement leader election functionality to enable:
Technical Requirements
Core Functionality
Configuration
--leader-electionflag (default: enabled)Deployment Updates
Example Use Cases
Production High Availability:
Development Mode:
Acceptance Criteria
Technical Implementation Notes
github.com/rancher/wrangler/v3/pkg/leaderlibrary for consistencygke-operator-leadercoordination.k8s.io/leaseswith full permissionsRelated Work
This follows the same pattern as other Rancher operators (EKS, AKS) and aligns with Kubernetes controller best practices for leader election.
Priority
Medium-High - This enables production-grade high availability deployments and is essential for enterprise use cases requiring resilient infrastructure management.