-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Context
The Terraform provider drives app creation and scaling through non-idempotent POST endpoints, notably:
POST /apps/{id}/cvms/{vm_uuid}/replicasPOST /cvms(commit a provisioned app/CVM create)
Without an API-level deduplication mechanism, a client cannot safely retry those requests after a transient 409/429/503 or after losing the response.
Problem
The same logical operation can be applied more than once if the request reaches the backend but the client sees a retryable failure or disconnect:
- replica creation can overshoot the desired count
- initial app/CVM creation can create duplicates or leave the client unsure whether creation succeeded
The provider now avoids retrying non-idempotent create POSTs to prevent duplicate resources, but that shifts the failure mode into hard apply failures on transient backend errors. In other words, client-side caution avoids corruption, but we still lack a way to recover cleanly.
Proposed fix
Add support for an optional Idempotency-Key header (or equivalent request field) on non-idempotent create endpoints, starting with:
POST /apps/{id}/cvms/{vm_uuid}/replicasPOST /cvms
Suggested semantics:
- identical key + same authenticated caller + same endpoint/payload shape within a bounded window returns the original result instead of creating a second resource
- conflicting payload for the same key returns a clear client error
- response should include enough identity (
app_id,vm_uuid, resource id) for the caller to reconcile state
Why this matters
This would let Terraform and other clients safely retry after transient infrastructure failures without risking duplicate creates.
Current workaround
The provider side now:
- retries replay-safe writes such as
PATCH,DELETE, and action-stylePOSTs like/startand/stop - does not retry non-idempotent create
POSTs - still pre-counts replicas before scaling, which helps normal sequential applies but does not solve lost-response/concurrent-request races
That means transient failures during create/scale remain user-visible until the API provides idempotent create semantics.
References
- PR feat(terraform): introduce Phala Cloud Terraform provider (app-first beta) #195 review
terraform/internal/provider/resource_app.goreconcileReplicas()terraform/internal/provider/resource_shared.gocommitAndCreate()