Support horizontal scaling via label-based sharding

### What problem are you facing?

We run provider-terraform at scale with several hundred Workspaces. Currently, deploying multiple replicas with `--leader-election` results in active/passive HA — only one replica reconciles all Workspaces while the others sit idle. This is because all replicas compete for the same leader election lease.

As the number of Workspaces grows, the single active replica becomes a throughput bottleneck. Each reconciliation involves running `terraform init/plan/apply`, which is CPU-intensive and time-consuming, meaning a single controller can only process a limited number of Workspaces per reconciliation cycle.

This has been raised before in #212 (closed by stale bot) and is closely related to crossplane/crossplane#2411 (partitioning by ProviderConfig). The provider-ansible project has a [working sharding implementation](https://github.com/crossplane-contrib/provider-ansible/blob/main/pkg/shardutil/shardutil.go) using an event filter, which shows precedent for this pattern in the Crossplane ecosystem.

### How could provider-terraform be improved?

Add a `--shard-name` flag (and `SHARD_NAME` env var) that enables label-based sharding. When set, the controller:

1. Configures the informer cache with a label selector via `cache.ByObject`, so it only watches Workspaces labeled `terraform.crossplane.io/shard=<name>`
2. Uses a per-shard leader election lease (e.g., `crossplane-leader-election-provider-terraform-<shard-name>`), allowing multiple shards to be active simultaneously
3. The workspace garbage collector continues to list ALL workspaces (unfiltered) to safely determine which directories are orphaned, avoiding cross-shard deletion

When `--shard-name` is not set, behavior remains identical to today — full backward compatibility.

### What alternatives have you considered?

- **Hash-based sharding** (e.g., consistent hashing on workspace UID): More automatic, but harder to reason about operationally and difficult to rebalance.
- **ProviderConfig-based partitioning** (crossplane/crossplane#2411): Requires deeper Crossplane runtime changes; label-based sharding can be done purely within the provider.
- **Event filter approach** (as in provider-ansible): Works but less efficient than informer-level filtering since events still arrive at the controller before being discarded.

### Environment

- provider-terraform v1.1.1
- Crossplane 2.2.0
- Kubernetes 1.32

I have a working implementation and have tested it locally with 3 concurrent shards across 6 Workspaces. Happy to open a PR if this approach looks reasonable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support horizontal scaling via label-based sharding #269

What problem are you facing?

How could provider-terraform be improved?

What alternatives have you considered?

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support horizontal scaling via label-based sharding #269

Description

What problem are you facing?

How could provider-terraform be improved?

What alternatives have you considered?

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions