Skip to content

Race condition: aws_vpc_endpoint lookup fails when wait_for_create_complete = false and additional control-plane security groups are used #155

Description

@coderabbitai

Summary

When a user sets wait_for_create_complete = false and also supplies aws_additional_control_plane_security_group_ids, the modules/additional-cp-sg submodule may attempt to look up the PrivateLink VPC endpoint before it exists, causing the apply to fail.

Files affected

  • main.tf (root) – instantiates module.rhcs_hcp_additional_controlplane_sg without an explicit dependency on cluster readiness
  • modules/additional-cp-sg/main.tf – contains data "aws_vpc_endpoint" "control_plane" that queries the endpoint by tag api.openshift.com/id = <cluster_id>

How a user perceives the error

  1. User configures the root module with:
    wait_for_create_complete                        = false
    aws_additional_control_plane_security_group_ids = ["sg-xxxxxxxx"]
    private                                         = true
  2. Terraform starts the apply. Because wait_for_create_complete = false, the rhcs_cluster_rosa_hcp resource returns as soon as the cluster creation request is accepted — not when the cluster (and its PrivateLink VPC endpoint) is fully provisioned.
  3. Terraform proceeds to evaluate data "aws_vpc_endpoint" "control_plane" in modules/additional-cp-sg/main.tf, which filters by the tag api.openshift.com/id.
  4. Because the PrivateLink endpoint does not yet exist, Terraform returns an error similar to:
    Error: no matching VPC Endpoint found
    

Root cause

wait_for_create_complete only controls whether Terraform waits for the ROSA HCP cluster to reach a Ready state. The PrivateLink VPC endpoint tagged with the cluster ID is created by the ROSA control-plane as part of cluster provisioning. Once the cluster is ready, the endpoint can safely be assumed to exist. However, when waiting is skipped, there is no guarantee the endpoint is present by the time the submodule data source runs.

Suggested fix direction

  • Add an explicit depends_on in module "rhcs_hcp_additional_controlplane_sg" (root main.tf) on the rosa_cluster_hcp resource/module and surface a clear validation or documentation note that this submodule requires the cluster to be fully ready.
  • Alternatively, document that wait_for_create_complete must be true (or left at its default) whenever aws_additional_control_plane_security_group_ids is set.

References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions