-
Notifications
You must be signed in to change notification settings - Fork 200
[Bug]: Table resource treats externally-managed replicas as drift, deleting them every reconciliation cycle #2024
Description
Is there an existing issue for this?
- I have searched the existing issues
Affected Resource(s)
Resource MRs required to reproduce the bug
Table MR (manages the primary DynamoDB table):
apiVersion: dynamodb.aws.upbound.io/v1beta2
kind: Table
metadata:
annotations:
crossplane.io/external-name: my-global-table
spec:
managementPolicies:
- Create
- Update
- Delete
- Observe
providerConfigRef:
name: default
forProvider:
region: us-west-2
attribute:
- name: id
type: "N"
hashKey: id
billingMode: PAY_PER_REQUEST
streamEnabled: true
streamViewType: NEW_AND_OLD_IMAGES
pointInTimeRecovery:
enabled: true
serverSideEncryption:
enabled: true
deletionProtectionEnabled: falseTableReplica MR (manages a replica in a second region):
apiVersion: dynamodb.aws.upbound.io/v1beta1
kind: TableReplica
spec:
managementPolicies:
- Create
- Update
- Delete
- Observe
providerConfigRef:
name: default
forProvider:
region: us-east-1
globalTableArnRef:
name: <table-mr-name>
pointInTimeRecovery: true
deletionProtectionEnabled: falseSteps to Reproduce
- Create a Table MR for a DynamoDB table with streaming enabled (required for global tables).
- Create a TableReplica MR referencing the Table MR, with a replica in a different region.
- Wait for both resources to become Ready (~2-5 minutes).
- Wait for the Table MR to re-reconcile (dependent on the configured poll interval — 10 minutes in our environment).
- Observe the Table MR issues an
UpdateTableAPI call that deletes the replica. - Observe the TableReplica MR detects the missing replica and recreates it.
- The cycle repeats every reconciliation period indefinitely.
What happened?
Expected Result
The Table MR should not interfere with replicas managed by the TableReplica MR. Once created, replicas should remain stable and available.
This is the same pattern as the Terraform provider, which explicitly documents that aws_dynamodb_table.replica and aws_dynamodb_table_replica are mutually exclusive and recommends lifecycle { ignore_changes = [replica] } when using the separate replica resource.
Actual Result
The Table MR's spec.forProvider has no replica field, so the provider treats observed replicas as drift and removes them on every reconciliation cycle. The TableReplica MR then detects the missing replica and recreates it. This creates a destructive loop:
- For small/empty tables: replicas are down a significant portion of the time (delete takes ~1 min, recreate takes ~2 min out of each reconciliation cycle)
- For large production tables: replicas could be permanently unavailable if recreation takes longer than the configured reconciliation interval
CloudTrail evidence showing the delete/recreate cycle (10-minute poll interval in our environment):
CloudTrail events — without workaround (click to expand)
Timestamps show the Table MR (SESSION_A) deleting the replica every ~10 minutes, and the TableReplica MR (SESSION_B) immediately recreating it:
13:24:13 UpdateTable SESSION_B → replicaUpdates: create us-east-1 ✓ (initial creation)
13:33:40 UpdateTable SESSION_A → replicaUpdates: delete us-east-1 (Table MR drift correction)
13:34:43 UpdateTable SESSION_B → replicaUpdates: create us-east-1 ✗ (ERROR: replica still exists)
13:34:47 UpdateTable SESSION_B → replicaUpdates: create us-east-1 ✓ (TableReplica MR recreates)
13:43:51 UpdateTable SESSION_A → replicaUpdates: delete us-east-1 (cycle repeats)
13:46:15 UpdateTable SESSION_B → replicaUpdates: create us-east-1 ✗ (ERROR: replica still exists)
... table deleted and recreated by Table MR at 14:14 ...
14:16:15 UpdateTable SESSION_B → replicaUpdates: create us-east-1 ✓
14:25:45 UpdateTable SESSION_A → replicaUpdates: delete us-east-1
14:28:01 UpdateTable SESSION_B → replicaUpdates: create us-east-1 ✓
14:35:46 UpdateTable SESSION_A → replicaUpdates: delete us-east-1
14:39:08 UpdateTable SESSION_B → replicaUpdates: create us-east-1 ✓
14:45:52 UpdateTable SESSION_A → replicaUpdates: delete us-east-1
14:49:18 UpdateTable SESSION_B → replicaUpdates: create us-east-1 ✓
14:56:19 UpdateTable SESSION_A → replicaUpdates: delete us-east-1
14:59:20 UpdateTable SESSION_B → replicaUpdates: create us-east-1 ✓
Full redacted CloudTrail logs can be provided on request.
CloudTrail events — with initProvider workaround (click to expand)
After adding initProvider.replica: [{}] to the Table MR, the cycle stops completely. Only the initial replica creation and table setup events appear, with no subsequent delete/recreate activity:
16:17:23 UpdateContinuousBackups SESSION_A → enable PITR ✓
16:17:29 UpdateTable SESSION_B → replicaUpdates: create us-east-1 ✓
16:17:49 CreateTable SESSION_B → replica created in us-east-1 ✓ (AWS internal)
16:18:05 UpdateContinuousBackups SESSION_B → enable PITR on replica ✓
(no further events — replica remains stable)
Workaround
Add initProvider.replica: [{}] to the Table MR spec. This tells Crossplane the replica field was set at init time and should not be enforced during subsequent reconciliations:
apiVersion: dynamodb.aws.upbound.io/v1beta2
kind: Table
spec:
forProvider:
# ... table config ...
initProvider:
replica:
- {}This is the Crossplane equivalent of Terraform's lifecycle { ignore_changes = [replica] }.
Relevant Error Output Snippet
The Table MR itself does not report errors — it successfully deletes the replica each cycle. The errors appear on the AWS side when the TableReplica MR tries to recreate a replica while the previous deletion is still in progress:
ValidationException: Failed to create a the new replica of table with name: 'my-global-table' because one or more replicas already existed as tables.Crossplane Version
1.17.6
Provider Version
1.23.1
Kubernetes Version
1.34
Kubernetes Distribution
EKS
Additional Info
Root cause: The underlying Terraform aws_dynamodb_table resource includes a replica configuration block. When the Upbound provider observes the DynamoDB table, it sees replicas in the AWS state. Since spec.forProvider doesn't include replica, the provider treats them as drift and issues an UpdateTable call to remove them.
The Terraform documentation for aws_dynamodb_table explicitly warns:
Do not use the
replicaconfiguration block ofaws_dynamodb_tabletogether withaws_dynamodb_table_replicaas the two configuration options are mutually exclusive.
Terraform solves this with lifecycle { ignore_changes = [replica] }. Crossplane's closest equivalent is initProvider, which prevents a field from being enforced after initial creation.
Suggestion: The provider could automatically ignore the replica field on the Table resource when it detects that TableReplica resources reference the same table, or at minimum document the initProvider workaround for users managing replicas with the separate TableReplica resource.