azurerm_firewall_policy_rule_collection_group - fix timeout expiring while waiting for policy lock by sanderaernouts · Pull Request #32081 · hashicorp/terraform-provider-azurerm

sanderaernouts · 2026-03-31T14:49:11Z

Summary

Fix timeout-before-lock ordering in azurerm_firewall_policy_rule_collection_group CreateUpdate and Delete functions
Prevents operations from timing out when multiple rule collection groups target the same firewall policy

Problem

We recently started seeing timeouts when deploying multiple azurerm_firewall_policy_rule_collection_group resources that target the same firewall policy, Terraform processes them in parallel. Azure enforces serial processing on the same firewall policy, responding with 409 AnotherOperationInProgress for concurrent requests.

The provider already serializes these operations using locks.ByName(), which prevents the 409 errors. However, the timeout context (context.WithTimeout with a 30-minute deadline) was created before the lock was acquired:

ctx, cancel := timeouts.ForCreateUpdate(...)  // 30-minute timer STARTS HERE
// ...
locks.ByName(...)                              // BLOCKS here, timer keeps ticking
// ...
client.CreateOrUpdateThenPoll(ctx, id, param)  // ctx may have insufficient time remaining

When N goroutines compete for the same lock, all N timers start simultaneously. Goroutines that wait for the lock have their timeout budget consumed by wait time, eventually causing context.DeadlineExceeded errors that manifest as operation timeouts.

Fix

Move lock acquisition before the timeout context creation, so each operation gets its full timeout regardless of how long it waited for the lock:

locks.ByName(...)                              // BLOCKS here, no timer yet
// ...
ctx, cancel := timeouts.ForCreateUpdate(...)  // 30-minute timer starts AFTER lock acquired
// ...
client.CreateOrUpdateThenPoll(ctx, id, param)  // full 30 minutes available

Applied to both resourceFirewallPolicyRuleCollectionGroupCreateUpdate and resourceFirewallPolicyRuleCollectionGroupDelete.

Side effects

The existence check (client.Get) in CreateUpdate is now inside the lock scope

Note

All 6 firewall resource files have this same timeout-before-lock pattern. This PR only fixes firewall_policy_rule_collection_group_resource.go. The same fix can be applied to other resources separately if needed.

…while waiting for policy lock When multiple firewall policy rule collection groups target the same firewall policy, Terraform processes them in parallel but Azure enforces serial processing (responding with 409 AnotherOperationInProgress). The provider already serializes these operations using locks.ByName(), but the timeout context was created before the lock was acquired. This meant time spent waiting for the lock consumed the 30-minute timeout budget, causing later operations to fail with context.DeadlineExceeded. Move lock acquisition before the timeout context creation in both CreateUpdate and Delete so each operation gets its full timeout budget regardless of how long it waited for the lock.

mstroob · 2026-04-01T15:01:04Z

Thanks for the PR, however, the timeouts should apply to the create/update operation, not to a specifc api call, if you need to increase the timeout for your resource specifically see https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/firewall_policy_rule_collection_group#timeouts

sanderaernouts · 2026-04-02T07:07:59Z

@mstroob I see your point. However, Azure does not allow concurrent updates to firewall resources but the azurerm provider tries to concurrently create or update rule collection groups on the same firewall policy for example. Lately the updated for rule collection groups have been taking longer and longer causing us to hit the timeout.

The current locking mechanism seems like a workaround for this. Because the timeout context is created before the lock is acquired the timeout actually applies all preceding rule collection group updates as well.

Increasing the timeout is not a real fix for the problem. We see updates now often taking 10–15 minutes so if you have a policy with say 10 rules groups that require an update that would require me to set a 150-minute timeout to workaround the fact that the provider attempts a concurrent operation on a resource that does not allow that.

My PR was an attempt to "fix" this, but it would indeed still cause create/update operations to "hang" and now without a timeout. Do you see another way to fix this? The real fix in my opinion would be that the provider does not attempt parallel create/update/delete operations on the same firewall policy ID to begin with. In that case the lock can be removed here as well because the locking is done on a higher level.

sanderaernouts · 2026-04-02T14:06:31Z

@mstroob here is an alternate implementation, in a draft PR #32094. Though technically very similar it leverages ...WithoutTimeOut which seems the pattern the SDK recommends for cross-resource synchronization:

"there are cases where operation synchronization across concurrent resources is necessary in the resource logic, such as a mutex, to prevent remote system errors. Since these operations would have an indeterminate timeout that scales with the number of resources, this allows resources to control timeout behavior."

-- CreateWithoutTimeout documentation (source)

The timeout still covers the full API operation (create/update/delete + polling). It just starts after the mutex is acquired, so lock contention across concurrent rule collection groups on the same policy doesn't eat into the timeout budget.

mstroob · 2026-04-08T17:26:12Z

Hi @sanderaernouts I would suggest checking the issues to see if there is a similar one, or opening a new one and including the required details, especially a way to reproduce the issue, as there may be another underlying problem that causes the very slow updates that could be fixed, rather than just removing the timeout.

github-actions Bot added service/firewall size/XS labels Mar 31, 2026

sanderaernouts marked this pull request as ready for review March 31, 2026 14:54

sanderaernouts requested review from a team, WodansSon and magodo as code owners March 31, 2026 14:54

mstroob closed this Apr 1, 2026

sanderaernouts mentioned this pull request Apr 2, 2026

azurerm_firewall_policy_rule_collection_group - use *WithoutTimeout CRUD to prevent timeout during mutex wait #32094

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

azurerm_firewall_policy_rule_collection_group - fix timeout expiring while waiting for policy lock#32081

azurerm_firewall_policy_rule_collection_group - fix timeout expiring while waiting for policy lock#32081
sanderaernouts wants to merge 1 commit intohashicorp:mainfrom
sanderaernouts:fix/firewall-policy-rcg-timeout-before-lock

sanderaernouts commented Mar 31, 2026 •

edited

Loading

Uh oh!

mstroob commented Apr 1, 2026

Uh oh!

sanderaernouts commented Apr 2, 2026 •

edited

Loading

Uh oh!

sanderaernouts commented Apr 2, 2026 •

edited

Loading

Uh oh!

mstroob commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sanderaernouts commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Side effects

Note

Uh oh!

mstroob commented Apr 1, 2026

Uh oh!

sanderaernouts commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanderaernouts commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mstroob commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanderaernouts commented Mar 31, 2026 •

edited

Loading

sanderaernouts commented Apr 2, 2026 •

edited

Loading

sanderaernouts commented Apr 2, 2026 •

edited

Loading