Skip to content

azurerm_firewall_policy_rule_collection_group - use *WithoutTimeout CRUD to prevent timeout during mutex wait#32094

Open
sanderaernouts wants to merge 1 commit intohashicorp:mainfrom
sanderaernouts:fix/firewall-policy-rcg-timeout-before-lock
Open

azurerm_firewall_policy_rule_collection_group - use *WithoutTimeout CRUD to prevent timeout during mutex wait#32094
sanderaernouts wants to merge 1 commit intohashicorp:mainfrom
sanderaernouts:fix/firewall-policy-rcg-timeout-before-lock

Conversation

@sanderaernouts
Copy link
Copy Markdown

Summary

Azure enforces serial processing on firewall policy rule collection groups within the same policy. Concurrent requests receive HTTP 409 AnotherOperationInProgress. The provider already serializes these operations with locks.ByName, but the SDK-managed timeout context started before lock acquisition. Time spent waiting on the mutex consumed the 30-minute budget, causing later-queued operations to fail with context.DeadlineExceeded.

This switches from Create/Read/Update/Delete to CreateWithoutTimeout/ReadWithoutTimeout/UpdateWithoutTimeout/DeleteWithoutTimeout so the resource controls its own timeout context. The lock is acquired first; the timeout starts only after the lock is held.

This is the pattern the SDK recommends for cross-resource synchronization:

"there are cases where operation synchronization across concurrent resources is necessary in the resource logic, such as a mutex, to prevent remote system errors. Since these operations would have an indeterminate timeout that scales with the number of resources, this allows resources to control timeout behavior."

-- CreateWithoutTimeout documentation (source)

Changes

  • Register CRUD functions with *WithoutTimeout variants instead of the legacy Create/Read/Update/Delete fields
  • In CreateUpdate and Delete: acquire mutex before creating the timeout context
  • Update function signatures to func(ctx context.Context, d *pluginsdk.ResourceData, meta interface{}) diag.Diagnostics
  • Wrap all error returns with diag.FromErr()

Supersedes

This replaces the approach in #32081, which moved the lock before the timeout but kept the legacy CRUD registration. Maintainers correctly pointed out that the timeout must cover the full operation. The *WithoutTimeout pattern achieves both: the timeout covers the full API operation, and lock wait time does not eat into the timeout budget.

… CRUD to prevent timeout during mutex wait

Azure enforces serial processing on firewall policy rule collection groups
within the same policy. Concurrent requests receive HTTP 409
AnotherOperationInProgress. The provider already serializes these
operations with locks.ByName, but the SDK-managed timeout context started
before lock acquisition. Time spent waiting on the mutex consumed the
30-minute timeout budget, causing later-queued operations to fail with
context.DeadlineExceeded.

Switch from Create/Read/Update/Delete to CreateWithoutTimeout/
ReadWithoutTimeout/UpdateWithoutTimeout/DeleteWithoutTimeout so the
resource controls its own timeout context. The lock is now acquired first,
and the timeout starts only after the lock is held.

This is the pattern the SDK recommends for cross-resource synchronization:

  "there are cases where operation synchronization across concurrent
  resources is necessary in the resource logic, such as a mutex, to
  prevent remote system errors. Since these operations would have an
  indeterminate timeout that scales with the number of resources, this
  allows resources to control timeout behavior."

  -- terraform-plugin-sdk CreateWithoutTimeout documentation
     https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema#Resource.CreateWithoutTimeout
     https://github.com/hashicorp/terraform-plugin-sdk/blob/main/helper/schema/resource.go#L381-L413
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant