`azurerm_kubernetes_cluster_node_pool` - Subnet name-based mutex causes false serialization across different VNets/regions

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Community Note



* Please vote on this issue by adding a :thumbsup: [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave comments along the lines of "+1", "me too" or "any updates", they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the [contribution guide](https://github.com/hashicorp/terraform-provider-azurerm/blob/main/contributing/README.md) to help.




### Terraform Version

1.14.0

### AzureRM Provider Version

4.64.0

### Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster_node_pool

### Terraform Configuration Files

```hcl
# Two AKS clusters in different regions/resource groups with same-named subnets
# Nodepools are created sequentially instead of in parallel

module "cluster_east" {
  source = "./modules/aks"

  location            = "eastus"
  resource_group_name = "rg-east"
  vnet_name           = "vnet-east"
  subnet_name         = "aks-nodes"       # Same name as cluster_west
  pod_subnet_name     = "aks-pods"        # Same name as cluster_west
  cluster_name        = "aks-east"
  nodepool_name       = "workload"
}

module "cluster_west" {
  source = "./modules/aks"

  location            = "westus2"
  resource_group_name = "rg-west"
  vnet_name           = "vnet-west"
  subnet_name         = "aks-nodes"       # Same name as cluster_east
  pod_subnet_name     = "aks-pods"        # Same name as cluster_east
  cluster_name        = "aks-west"
  nodepool_name       = "workload"
}

# Inside the module: azurerm_kubernetes_cluster_node_pool with vnet_subnet_id and pod_subnet_id
```

### Debug Output/Panic Output
*Note: The output below is the expected debug output based on the log statements in [`internal/locks/mutexkv.go`](https://github.com/hashicorp/terraform-provider-azurerm/blob/main/internal/locks/mutexkv.go#L22-L24), not captured from an actual run. I've already applied all terraform so applying again for debug is tiresome*

```shell
# TF_LOG=DEBUG terraform apply 2>&1 | grep -E 'Lock|Unlock'
# Shows sequential lock acquisition on the same mutex key despite different subnets:

[DEBUG] provider.terraform-provider-azurerm: Locking "azurerm_subnet.aks-nodes"
[DEBUG] provider.terraform-provider-azurerm: Locked "azurerm_subnet.aks-nodes"
# ... Nodepool A creation takes 10-15 minutes ...
[DEBUG] provider.terraform-provider-azurerm: Unlocking "azurerm_subnet.aks-nodes"
[DEBUG] provider.terraform-provider-azurerm: Unlocked "azurerm_subnet.aks-nodes"

# Only AFTER Nodepool A completes, Nodepool B starts:
[DEBUG] provider.terraform-provider-azurerm: Locking "azurerm_subnet.aks-nodes"
[DEBUG] provider.terraform-provider-azurerm: Locked "azurerm_subnet.aks-nodes"
# ... Nodepool B creation takes 10-15 minutes ...
[DEBUG] provider.terraform-provider-azurerm: Unlocking "azurerm_subnet.aks-nodes"
[DEBUG] provider.terraform-provider-azurerm: Unlocked "azurerm_subnet.aks-nodes"
```

### Expected Behaviour


When two AKS clusters exist in **different VNets, different resource groups, and different regions**, their nodepool creation operations should run **in parallel**, even if the subnets happen to share the same name (e.g., `aks-nodes`). The subnets are completely independent Azure resources with different resource IDs.


### Actual Behaviour

Nodepool creation is **serialized**: the second nodepool blocks on a provider-internal mutex until the first nodepool creation fully completes (including Azure API polling). This adds 10-15 minutes of unnecessary wait time to every `terraform apply`.

The root cause is that the provider's internal locking mechanism uses only the **subnet name** (e.g., `aks-nodes`) as the mutex key, without any qualification by VNet, resource group, subscription, or region. Two completely unrelated subnets that happen to share the same name collide on the same `sync.Mutex`.

### Steps to Reproduce


1. Create two AKS clusters in different regions/resource groups, each with its own VNet
2. Use the **same subnet name** (e.g., `aks-nodes`) in both VNets -- this is a very common naming convention
3. Add a nodepool to each cluster (either via `azurerm_kubernetes_cluster_node_pool` or via `default_node_pool` with additional pools)
4. Run `terraform apply`
5. Observe in Azure Portal or debug logs that the second nodepool waits for the first to complete before starting


### Important Factoids

_No response_

### References

- **PR #26939** -- [The exact same fix (`ByName` -> `ByID`) was proposed and closed without merge](https://github.com/hashicorp/terraform-provider-azurerm/pull/26939) in Sep 2024. It was closed because Azure claimed to have fixed the underlying race condition (Azure/AKS#4522). However, the race condition re-emerged with pod subnets, leading to locks being re-added in PR #29537 using `ByName` again.
- **Issue #26933** -- ["node_pools within same subnet are created sequentially"](https://github.com/hashicorp/terraform-provider-azurerm/issues/26933) -- the original issue that motivated PR #26939.
- **PR #29537** -- [Re-introduced subnet name locks](https://github.com/hashicorp/terraform-provider-azurerm/pull/29537) (Aug 2025) to fix the pod subnet race condition. This PR used `locks.MultipleByName` with subnet name only, re-introducing the false serialization bug.
- **PR #25888** -- [Original lock addition](https://github.com/hashicorp/terraform-provider-azurerm/pull/25888) (May 2024) that first introduced name-based subnet locking for nodepool creation.
- **PR #27583** -- [Removed all locks](https://github.com/hashicorp/terraform-provider-azurerm/pull/27583) (Oct 2024) based on Azure's claimed fix, later found to be incomplete.
- **PR #15098** -- `container_group_resource.go` precedent for `locks.ByID(subnet.ID())`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`azurerm_kubernetes_cluster_node_pool` - Subnet name-based mutex causes false serialization across different VNets/regions #32002

Is there an existing issue for this?

Community Note

Terraform Version

AzureRM Provider Version

Affected Resource(s)/Data Source(s)

Terraform Configuration Files

Debug Output/Panic Output

Expected Behaviour

Actual Behaviour

Steps to Reproduce

Important Factoids

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

azurerm_kubernetes_cluster_node_pool - Subnet name-based mutex causes false serialization across different VNets/regions #32002

Description

Is there an existing issue for this?

Community Note

Terraform Version

AzureRM Provider Version

Affected Resource(s)/Data Source(s)

Terraform Configuration Files

Debug Output/Panic Output

Expected Behaviour

Actual Behaviour

Steps to Reproduce

Important Factoids

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`azurerm_kubernetes_cluster_node_pool` - Subnet name-based mutex causes false serialization across different VNets/regions #32002