Description
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
- Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
- If an issue is assigned to a user, that user is claiming responsibility for the issue.
- Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.
Terraform Version & Provider Version(s)
Terraform v1.11.3
on darwin_arm64
+ provider registry.terraform.io/hashicorp/google v6.27.0
Affected Resource(s)
google_container_cluster
Terraform Configuration
This is based on the example usage in the resource documentation, with some necessary changes for my test environment. It uses a small shared subnet, and node_locations
and default_max_pods_per_node
are tuned down to match.
data "google_compute_subnetwork" "cna_subnet" {
name = "gke-cluster-confluence-us-east4-05"
project = data.google_compute_network.cna_network.project
}
data "google_project" "project" {}
resource "google_service_account" "workload_identity_test" {
account_id = "workload-identity-test"
display_name = "Workload Identity Test Service Account"
}
resource "google_container_cluster" "workload_identity_test" {
name = "workload-identity-test"
location = "us-east4"
node_locations = [
"us-east4-b",
"us-east4-c",
]
initial_node_count = 1
node_config {
service_account = google_service_account.workload_identity_test.email
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
}
default_max_pods_per_node = 56
network = data.google_compute_network.cna_network.id
subnetwork = data.google_compute_subnetwork.cna_subnet.id
ip_allocation_policy {
cluster_secondary_range_name = data.google_compute_subnetwork.cna_subnet.secondary_ip_range[0].range_name
}
enterprise_config {
desired_tier = "ENTERPRISE"
}
workload_identity_config {
workload_pool = "${data.google_project.project.project_id}.svc.id.goog"
}
timeouts {
create = "120m"
update = "120m"
read = "120m"
}
}
Debug Output
https://gist.github.com/phardy/d1ee5b1cdf4ae5d8ec17f2c3b5aa147b
Expected Behavior
I expect a new cluster to be created, with the default node pool still attached. I expect this to take some time - experience testing similar configurations shows at least 30 minutes, and more to add a workload pool afterwards.
Actual Behavior
The terraform apply fails after ~40 minutes, despite the 120m timeouts specified in the resource, with this error:
Error: Error waiting for creating GKE cluster: All cluster resources were brought up, but: 2 nodes out of 2 are unhealthy.
Inspecting the cluster in the GCS console shows the same error reported by the console. Inspecting the default node pool shows it reporting all nodes OK. Running a terraform plan
at this point indicates the existing cluster is tainted, and proposes destroying it and creating a new cluster.
Steps to reproduce
terraform apply
Important Factoids
As mentioned with the terraform configuration, I'm using a small shared subnet. The primary IPv4 range for this subnet is a /28, and it has a single secondary IPv4 /24 range.
I've tested an identical configuration without the workload_identity_config
block, which created successfully. And then added the workload_identity_config
block, making the final config identical to what I've pasted here. Terraform successfully modifies the cluster enabling workload, although this takes approximately 30 minutes to apply.
My initial attempts at creating this were using a separately managed node pool. However this also fails, my understanding reading the documentation and experimenting is that the provider creates the cluster with a default pool and then deletes it. And this initial node pool creation is unsuccessful.
I've also attempted creating this with the default service account (omitting service_account
and oauth_scopes
from the node_config
block). This makes no change to the Actual Behaviour.
References
No response