Skip to content

Stabilize first-run GCP BYOC provisioning#24

Draft
Alexey Soldatchenko (soldatchenko) wants to merge 4 commits into
mainfrom
fix/gcp-service-agent-and-gke-iam-order
Draft

Stabilize first-run GCP BYOC provisioning#24
Alexey Soldatchenko (soldatchenko) wants to merge 4 commits into
mainfrom
fix/gcp-service-agent-and-gke-iam-order

Conversation

@soldatchenko

@soldatchenko Alexey Soldatchenko (soldatchenko) commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Adds first-run stability fixes for GCP BYOC deployments.

  • Initializes the Cloud Storage service agent during managed BYOC bootstrap so CMEK-backed GCS buckets can be provisioned reliably in fresh customer projects
  • Uses google_storage_project_service_account instead of constructing the GCS service-agent email manually
  • Wires GKE Workload Identity IAM bindings from the actual cluster workload pool output, avoiding timing/readback issues during first apply
  • Ignores GKE Autopilot’s normalized database_encryption.state readback to avoid Terraform drift when GKE reports a more specific encryption state than the provider accepts as input
  • Updates the managed BYOC bootstrap script and README to call out the Cloud Storage service-agent requirement

Backward compatibility:

  • No breaking changes
  • No existing inputs or resources are removed or renamed
  • Existing deployments should continue using the same module interface

Comment thread modules/gke-iam/main.tf

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids deriving the Workload Identity member from the project data source inside this submodule. During first run GKE creation, the real workload pool is produced by the cluster, and passing it explicitly prevents TF from evaluating a null/unknown project value before the cluster output is available. The resulting IAM member is still the same format: serviceAccount:<project>.svc.id.goog[namespace/service-account].

Comment thread modules/storage/main.tf

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces manual construction of the GCS service agent email with the provider data source for the project’s actual Cloud Storage service agent. In normal projects this resolves to the same service-<project-number>@gs-project-accounts.iam.gserviceaccount.com identity, but it is more reliable for fresh BYOC projects because it asks GCP for the service agent instead of assuming it already exists and matching the naming convention.

Comment thread main.tf

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This keeps the existing-cluster path working. When deploy_gke_cluster = true, we use the workload pool returned by the cluster this module creates. When deploy_gke_cluster = false, there is no module.gke-cluster[0] to read from, but gke-iam still needs a workload pool to create the K8s service account IAM bindings. The fallback preserves the existing default GKE Workload Identity pool format: <project-id>.svc.id.goog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant