Skip to content

[Bug]: Table resource treats externally-managed replicas as drift, deleting them every reconciliation cycle #2024

@eolatham

Description

@eolatham

Is there an existing issue for this?

  • I have searched the existing issues

Affected Resource(s)

Resource MRs required to reproduce the bug

Table MR (manages the primary DynamoDB table):

apiVersion: dynamodb.aws.upbound.io/v1beta2
kind: Table
metadata:
  annotations:
    crossplane.io/external-name: my-global-table
spec:
  managementPolicies:
    - Create
    - Update
    - Delete
    - Observe
  providerConfigRef:
    name: default
  forProvider:
    region: us-west-2
    attribute:
      - name: id
        type: "N"
    hashKey: id
    billingMode: PAY_PER_REQUEST
    streamEnabled: true
    streamViewType: NEW_AND_OLD_IMAGES
    pointInTimeRecovery:
      enabled: true
    serverSideEncryption:
      enabled: true
    deletionProtectionEnabled: false

TableReplica MR (manages a replica in a second region):

apiVersion: dynamodb.aws.upbound.io/v1beta1
kind: TableReplica
spec:
  managementPolicies:
    - Create
    - Update
    - Delete
    - Observe
  providerConfigRef:
    name: default
  forProvider:
    region: us-east-1
    globalTableArnRef:
      name: <table-mr-name>
    pointInTimeRecovery: true
    deletionProtectionEnabled: false

Steps to Reproduce

  1. Create a Table MR for a DynamoDB table with streaming enabled (required for global tables).
  2. Create a TableReplica MR referencing the Table MR, with a replica in a different region.
  3. Wait for both resources to become Ready (~2-5 minutes).
  4. Wait for the Table MR to re-reconcile (dependent on the configured poll interval — 10 minutes in our environment).
  5. Observe the Table MR issues an UpdateTable API call that deletes the replica.
  6. Observe the TableReplica MR detects the missing replica and recreates it.
  7. The cycle repeats every reconciliation period indefinitely.

What happened?

Expected Result

The Table MR should not interfere with replicas managed by the TableReplica MR. Once created, replicas should remain stable and available.

This is the same pattern as the Terraform provider, which explicitly documents that aws_dynamodb_table.replica and aws_dynamodb_table_replica are mutually exclusive and recommends lifecycle { ignore_changes = [replica] } when using the separate replica resource.

Actual Result

The Table MR's spec.forProvider has no replica field, so the provider treats observed replicas as drift and removes them on every reconciliation cycle. The TableReplica MR then detects the missing replica and recreates it. This creates a destructive loop:

  • For small/empty tables: replicas are down a significant portion of the time (delete takes ~1 min, recreate takes ~2 min out of each reconciliation cycle)
  • For large production tables: replicas could be permanently unavailable if recreation takes longer than the configured reconciliation interval

CloudTrail evidence showing the delete/recreate cycle (10-minute poll interval in our environment):

CloudTrail events — without workaround (click to expand)

Timestamps show the Table MR (SESSION_A) deleting the replica every ~10 minutes, and the TableReplica MR (SESSION_B) immediately recreating it:

13:24:13  UpdateTable  SESSION_B  → replicaUpdates: create us-east-1   ✓  (initial creation)
13:33:40  UpdateTable  SESSION_A  → replicaUpdates: delete us-east-1      (Table MR drift correction)
13:34:43  UpdateTable  SESSION_B  → replicaUpdates: create us-east-1   ✗  (ERROR: replica still exists)
13:34:47  UpdateTable  SESSION_B  → replicaUpdates: create us-east-1   ✓  (TableReplica MR recreates)
13:43:51  UpdateTable  SESSION_A  → replicaUpdates: delete us-east-1      (cycle repeats)
13:46:15  UpdateTable  SESSION_B  → replicaUpdates: create us-east-1   ✗  (ERROR: replica still exists)
  ... table deleted and recreated by Table MR at 14:14 ...
14:16:15  UpdateTable  SESSION_B  → replicaUpdates: create us-east-1   ✓
14:25:45  UpdateTable  SESSION_A  → replicaUpdates: delete us-east-1
14:28:01  UpdateTable  SESSION_B  → replicaUpdates: create us-east-1   ✓
14:35:46  UpdateTable  SESSION_A  → replicaUpdates: delete us-east-1
14:39:08  UpdateTable  SESSION_B  → replicaUpdates: create us-east-1   ✓
14:45:52  UpdateTable  SESSION_A  → replicaUpdates: delete us-east-1
14:49:18  UpdateTable  SESSION_B  → replicaUpdates: create us-east-1   ✓
14:56:19  UpdateTable  SESSION_A  → replicaUpdates: delete us-east-1
14:59:20  UpdateTable  SESSION_B  → replicaUpdates: create us-east-1   ✓

Full redacted CloudTrail logs can be provided on request.

CloudTrail events — with initProvider workaround (click to expand)

After adding initProvider.replica: [{}] to the Table MR, the cycle stops completely. Only the initial replica creation and table setup events appear, with no subsequent delete/recreate activity:

16:17:23  UpdateContinuousBackups  SESSION_A  → enable PITR                 ✓
16:17:29  UpdateTable              SESSION_B  → replicaUpdates: create us-east-1  ✓
16:17:49  CreateTable              SESSION_B  → replica created in us-east-1      ✓ (AWS internal)
16:18:05  UpdateContinuousBackups  SESSION_B  → enable PITR on replica      ✓
(no further events — replica remains stable)

Workaround

Add initProvider.replica: [{}] to the Table MR spec. This tells Crossplane the replica field was set at init time and should not be enforced during subsequent reconciliations:

apiVersion: dynamodb.aws.upbound.io/v1beta2
kind: Table
spec:
  forProvider:
    # ... table config ...
  initProvider:
    replica:
      - {}

This is the Crossplane equivalent of Terraform's lifecycle { ignore_changes = [replica] }.

Relevant Error Output Snippet

The Table MR itself does not report errors — it successfully deletes the replica each cycle. The errors appear on the AWS side when the TableReplica MR tries to recreate a replica while the previous deletion is still in progress:


ValidationException: Failed to create a the new replica of table with name: 'my-global-table' because one or more replicas already existed as tables.

Crossplane Version

1.17.6

Provider Version

1.23.1

Kubernetes Version

1.34

Kubernetes Distribution

EKS

Additional Info

Root cause: The underlying Terraform aws_dynamodb_table resource includes a replica configuration block. When the Upbound provider observes the DynamoDB table, it sees replicas in the AWS state. Since spec.forProvider doesn't include replica, the provider treats them as drift and issues an UpdateTable call to remove them.

The Terraform documentation for aws_dynamodb_table explicitly warns:

Do not use the replica configuration block of aws_dynamodb_table together with aws_dynamodb_table_replica as the two configuration options are mutually exclusive.

Terraform solves this with lifecycle { ignore_changes = [replica] }. Crossplane's closest equivalent is initProvider, which prevents a field from being enforced after initial creation.

Suggestion: The provider could automatically ignore the replica field on the Table resource when it detects that TableReplica resources reference the same table, or at minimum document the initProvider workaround for users managing replicas with the separate TableReplica resource.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions