Feature Request: Add Leader Election Support for High Availability

# Feature Request: Add Leader Election Support for High Availability

## Summary

The GKE operator currently does not support running multiple instances safely. This limits its ability to provide high availability and creates potential split-brain scenarios if multiple instances are accidentally deployed.

## Problem

**Current State:**
- Only supports single operator instance
- No protection against multiple instances processing the same resources
- No built-in high availability or failover mechanism
- Risk of race conditions and conflicts if scaled beyond 1 replica

**Impact:**
- Single point of failure for GKE cluster management
- Cannot scale operator for performance or availability
- Manual intervention required if operator pod fails
- Production deployments lack resilience

## Proposed Solution

Implement leader election functionality to enable:

1. **Safe Multi-Instance Deployment**: Multiple operator pods can run simultaneously with only one actively processing resources
2. **Automatic Failover**: If the leader fails, another instance automatically takes over
3. **High Availability**: Eliminates single point of failure
4. **Configurable Behavior**: Allow disabling leader election for development scenarios

## Technical Requirements

### Core Functionality
- [ ] Implement leader election using Kubernetes coordination.k8s.io/leases API
- [ ] Only leader instance should run controllers and process GKE resources
- [ ] Non-leader instances should wait for leadership opportunity
- [ ] Automatic leadership transfer on failure

### Configuration
- [ ] Add `--leader-election` flag (default: enabled)
- [ ] Support namespace-aware leader election
- [ ] Proper RBAC permissions for lease management

### Deployment Updates
- [ ] Update RBAC to include lease permissions
- [ ] Add POD_NAMESPACE environment variable for namespace detection
- [ ] Ensure backward compatibility with existing deployments

## Example Use Cases

**Production High Availability:**
```bash
kubectl scale deployment/gke-config-operator --replicas=3 -n cattle-system
```

**Development Mode:**
```bash
./gke-operator --leader-election=false
```

## Acceptance Criteria

- [ ] Multiple operator instances can run safely without conflicts
- [ ] Only one instance processes GKE cluster operations at a time
- [ ] Automatic failover when leader instance becomes unavailable
- [ ] No breaking changes to existing single-instance deployments
- [ ] Leader election can be disabled for development/testing
- [ ] Proper logging indicates leader election status

## Technical Implementation Notes

- Use existing `github.com/rancher/wrangler/v3/pkg/leader` library for consistency
- Leader election resource name: `gke-operator-leader`
- Lease resources created in same namespace as operator deployment
- Required RBAC: `coordination.k8s.io/leases` with full permissions

## Related Work

This follows the same pattern as other Rancher operators (EKS, AKS) and aligns with Kubernetes controller best practices for leader election.

## Priority

**Medium-High** - This enables production-grade high availability deployments and is essential for enterprise use cases requiring resilient infrastructure management.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add Leader Election Support for High Availability #1000

Feature Request: Add Leader Election Support for High Availability

Summary

Problem

Proposed Solution

Technical Requirements

Core Functionality

Configuration

Deployment Updates

Example Use Cases

Acceptance Criteria

Technical Implementation Notes

Related Work

Priority

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Request: Add Leader Election Support for High Availability #1000

Description

Feature Request: Add Leader Election Support for High Availability

Summary

Problem

Proposed Solution

Technical Requirements

Core Functionality

Configuration

Deployment Updates

Example Use Cases

Acceptance Criteria

Technical Implementation Notes

Related Work

Priority

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions