Skip to content

azurerm_kubernetes_cluster_node_pool: fix subnet lock to use resource ID instead of name#32001

Open
Reasonably wants to merge 1 commit intohashicorp:mainfrom
Reasonably:fix/nodepool-subnet-lock-by-id
Open

azurerm_kubernetes_cluster_node_pool: fix subnet lock to use resource ID instead of name#32001
Reasonably wants to merge 1 commit intohashicorp:mainfrom
Reasonably:fix/nodepool-subnet-lock-by-id

Conversation

@Reasonably
Copy link

@Reasonably Reasonably commented Mar 19, 2026

Community Note

  • Please vote on this PR by adding a 👍 reaction to the original PR to help the community and maintainers prioritize this request.
  • Please do not leave "+1" or "me too" comments, they generate extra noise for PR reviewers.

Description

The subnet mutex in azurerm_kubernetes_cluster_node_pool during nodepool creation uses locks.MultipleByName with just the subnet name (e.g., "nodesubnet") as the lock key. This causes false-positive lock contention when two nodepools in different VNets or clusters use subnets with the same name, serializing operations that could safely run in parallel.

Root Cause

In kubernetes_cluster_node_pool_resource.go (lines 652-665), the lock key is constructed from the subnet name only:

subnetsToLock = append(subnetsToLock, podSubnetID.SubnetName)
subnetsToLock = append(subnetsToLock, nodeSubnetID.SubnetName)
locks.MultipleByName(&subnetsToLock, network.SubnetResourceName)

Since locks.MultipleByName creates a global mutex key of "azurerm_subnet." + subnetName, two completely independent subnets in different VNets with the same name (e.g., "nodesubnet") will contend on the same mutex. The lock is held for the entire duration of the nodepool creation API call + polling (5-20 minutes), causing the second operation to wait unnecessarily.

Fix

Replace locks.MultipleByName with locks.MultipleByID, which uses the full Azure resource ID as the lock key. This ensures:

  • Operations on the same actual subnet are still serialized (protecting against the original race condition)
  • Operations on different subnets with the same name proceed in parallel

This approach is consistent with the existing pattern in container_group_resource.go (line 758), which already uses locks.ByID(subnet.ID()).

Related Issues / PRs

Changes

  • internal/services/containers/kubernetes_cluster_node_pool_resource.go: Changed locks.MultipleByName(&subnetsToLock, network.SubnetResourceName) to locks.MultipleByID(&subnetIDsToLock), using full subnet resource IDs as lock keys
  • internal/services/containers/kubernetes_cluster_node_pool_resource_test.go: Added TestAccKubernetesClusterNodePool_parallelCrossVNetSameSubnetName acceptance test that creates two AKS clusters with nodepools in different VNets using identically-named subnets

Testing

  • go build ./internal/services/containers/... passes
  • go vet ./internal/services/containers/... passes
  • New acceptance test TestAccKubernetesClusterNodePool_parallelCrossVNetSameSubnetName (requires Azure environment)
  • Existing acceptance test TestAccKubernetesClusterNodePool_parallelPodSubnet still passes

New Test: TestAccKubernetesClusterNodePool_parallelCrossVNetSameSubnetName

Creates:

  • 1 resource group
  • 2 virtual networks with non-overlapping address spaces
  • 2 subnets with the same name ("nodesubnet") in different VNets
  • 2 AKS clusters, each in a different VNet
  • 2 nodepools (one per cluster), which should create in parallel with the fix applied

… ID instead of name

The subnet mutex in nodepool creation used `locks.MultipleByName` with
just the subnet name as the lock key. This caused false positive lock
contention when two nodepools in different VNets/clusters used subnets
with the same name (e.g., "nodesubnet"), serializing operations that
could safely run in parallel.

Switch to `locks.MultipleByID` which uses the full Azure resource ID as
the lock key, ensuring that only operations on the same actual subnet
are serialized. This is consistent with the approach already used in
`container_group_resource.go`.

Add acceptance test `TestAccKubernetesClusterNodePool_parallelCrossVNetSameSubnetName`
to verify parallel nodepool creation across different VNets with
identically-named subnets.
@Reasonably Reasonably closed this Mar 19, 2026
@Reasonably Reasonably reopened this Mar 19, 2026
@Reasonably Reasonably requested review from a team, WodansSon and magodo as code owners March 19, 2026 10:17
@Reasonably Reasonably changed the title azurerm_kubernetes_cluster_node_pool: fix subnet lock to use resource ID instead of name WIP: azurerm_kubernetes_cluster_node_pool: fix subnet lock to use resource ID instead of name Mar 19, 2026
@Reasonably Reasonably closed this Mar 19, 2026
@Reasonably Reasonably reopened this Mar 19, 2026
@Reasonably Reasonably changed the title WIP: azurerm_kubernetes_cluster_node_pool: fix subnet lock to use resource ID instead of name azurerm_kubernetes_cluster_node_pool: fix subnet lock to use resource ID instead of name Mar 19, 2026
@Reasonably Reasonably changed the title azurerm_kubernetes_cluster_node_pool: fix subnet lock to use resource ID instead of name azurerm_kubernetes_cluster_node_pool: fix subnet lock to use resource ID instead of name Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

azurerm_kubernetes_cluster_node_pool - Subnet name-based mutex causes false serialization across different VNets/regions

2 participants