Problem
When scanning an external EKS cluster via WIF (IRSA), the operator's init container runs aws eks update-kubeconfig to generate a kubeconfig for the target cluster. The resulting kubeconfig contains the cluster's API server endpoint as returned by the EKS API.
EKS clusters with both public and private endpoints enabled use split-horizon DNS:
- Queries from outside the VPC resolve to the public endpoint IP
- Queries from within the VPC resolve to the private endpoint IP
This means scanner pods running inside a VPC will always resolve the target cluster's API server hostname to its private IP, even when a public endpoint exists. If the scanner and target clusters are in the same VPC but have separate security groups, or if they're in different VPCs without peering/transit gateway, the scanner pods get i/o timeout errors trying to reach the private IP.
What we observed
x unable to create runtime for asset error="Get \"https://<cluster-id>.gr7.eu-central-1.eks.amazonaws.com/version?timeout=32s\": dial tcp 10.0.2.136:443: i/o timeout"
The hostname resolved to 10.0.2.136 (a private IP on a subnet behind the target cluster's primary security group), even though the cluster has a public endpoint available.
Workaround applied in e2e tests
Added an explicit security group rule allowing ingress from the scanner cluster's node security group to the target cluster's primary security group on port 443:
resource "aws_security_group_rule" "scanner_to_target_api" {
type = "ingress"
from_port = 443
to_port = 443
protocol = "tcp"
security_group_id = module.eks_target[0].cluster_primary_security_group_id
source_security_group_id = module.eks.node_security_group_id
}
Key detail: the correct security groups to use are:
- Destination: the EKS-managed primary security group (attached to the API server ENIs), NOT the module-managed cluster security group
- Source: the node security group (where pod traffic originates), NOT the cluster security group
Scope
This affects all cloud providers where the operator generates a kubeconfig via CLI (aws eks update-kubeconfig, gcloud container clusters get-credentials), not just EKS. Any scenario where DNS resolves to an unreachable private IP will fail.
Scenarios:
- Same VPC, different security groups (our e2e case) — fixable with SG rules
- Different VPCs, no peering — requires VPC peering, transit gateway, or PrivateLink
- Private-only clusters — scanner must be in the same network or have a route to the private endpoint
- Cross-region — private endpoints are not reachable cross-region
Proposed improvements
1. Document networking requirements
Add documentation explaining:
- The split-horizon DNS behavior and its impact on cross-cluster scanning
- Required security group rules when scanner and target are in the same VPC
- Network topology requirements for different VPC / cross-region scenarios
2. Support endpoint override in MondooAuditConfig CRD (optional)
Currently the WIF external cluster spec only accepts clusterName and region:
externalClusters:
- name: target-cluster
workloadIdentity:
provider: eks
eks:
region: eu-central-1
clusterName: my-target-cluster
roleArn: arn:aws:iam::123456789:role/scanner-role
Consider adding an optional endpoint field that lets users override the API server address. This would allow users to specify a reachable endpoint (e.g., a public IP, a VPC endpoint, or a load balancer) when the default DNS resolution doesn't work:
endpoint: https://public-ip-or-custom-endpoint:443
The init container would pass this to aws eks update-kubeconfig --endpoint <url> (supported by the AWS CLI) or patch the kubeconfig after generation.
3. Consider public endpoint preference (optional)
When both public and private endpoints are available, the operator could query the EKS API for endpoint configuration and prefer the public endpoint when running from outside the target cluster's node network. This is complex to detect reliably and may not be desirable in all cases.
Problem
When scanning an external EKS cluster via WIF (IRSA), the operator's init container runs
aws eks update-kubeconfigto generate a kubeconfig for the target cluster. The resulting kubeconfig contains the cluster's API server endpoint as returned by the EKS API.EKS clusters with both public and private endpoints enabled use split-horizon DNS:
This means scanner pods running inside a VPC will always resolve the target cluster's API server hostname to its private IP, even when a public endpoint exists. If the scanner and target clusters are in the same VPC but have separate security groups, or if they're in different VPCs without peering/transit gateway, the scanner pods get
i/o timeouterrors trying to reach the private IP.What we observed
The hostname resolved to
10.0.2.136(a private IP on a subnet behind the target cluster's primary security group), even though the cluster has a public endpoint available.Workaround applied in e2e tests
Added an explicit security group rule allowing ingress from the scanner cluster's node security group to the target cluster's primary security group on port 443:
Key detail: the correct security groups to use are:
Scope
This affects all cloud providers where the operator generates a kubeconfig via CLI (
aws eks update-kubeconfig,gcloud container clusters get-credentials), not just EKS. Any scenario where DNS resolves to an unreachable private IP will fail.Scenarios:
Proposed improvements
1. Document networking requirements
Add documentation explaining:
2. Support endpoint override in MondooAuditConfig CRD (optional)
Currently the WIF external cluster spec only accepts
clusterNameandregion:Consider adding an optional
endpointfield that lets users override the API server address. This would allow users to specify a reachable endpoint (e.g., a public IP, a VPC endpoint, or a load balancer) when the default DNS resolution doesn't work:The init container would pass this to
aws eks update-kubeconfig --endpoint <url>(supported by the AWS CLI) or patch the kubeconfig after generation.3. Consider public endpoint preference (optional)
When both public and private endpoints are available, the operator could query the EKS API for endpoint configuration and prefer the public endpoint when running from outside the target cluster's node network. This is complex to detect reliably and may not be desirable in all cases.