WIF External Cluster Scanning: Private Endpoint Connectivity

## Problem

When scanning an external EKS cluster via WIF (IRSA), the operator's init container runs `aws eks update-kubeconfig` to generate a kubeconfig for the target cluster. The resulting kubeconfig contains the cluster's API server endpoint as returned by the EKS API.

EKS clusters with both **public and private endpoints enabled** use split-horizon DNS:
- Queries from **outside** the VPC resolve to the public endpoint IP
- Queries from **within** the VPC resolve to the **private** endpoint IP

This means scanner pods running inside a VPC will always resolve the target cluster's API server hostname to its private IP, even when a public endpoint exists. If the scanner and target clusters are in the same VPC but have separate security groups, or if they're in different VPCs without peering/transit gateway, the scanner pods get `i/o timeout` errors trying to reach the private IP.

### What we observed

```
x unable to create runtime for asset error="Get \"https://<cluster-id>.gr7.eu-central-1.eks.amazonaws.com/version?timeout=32s\": dial tcp 10.0.2.136:443: i/o timeout"
```

The hostname resolved to `10.0.2.136` (a private IP on a subnet behind the target cluster's primary security group), even though the cluster has a public endpoint available.

### Workaround applied in e2e tests

Added an explicit security group rule allowing ingress from the scanner cluster's **node security group** to the target cluster's **primary security group** on port 443:

```hcl
resource "aws_security_group_rule" "scanner_to_target_api" {
  type                     = "ingress"
  from_port                = 443
  to_port                  = 443
  protocol                 = "tcp"
  security_group_id        = module.eks_target[0].cluster_primary_security_group_id
  source_security_group_id = module.eks.node_security_group_id
}
```

Key detail: the correct security groups to use are:
- **Destination**: the EKS-managed **primary** security group (attached to the API server ENIs), NOT the module-managed cluster security group
- **Source**: the **node** security group (where pod traffic originates), NOT the cluster security group

## Scope

This affects all cloud providers where the operator generates a kubeconfig via CLI (`aws eks update-kubeconfig`, `gcloud container clusters get-credentials`), not just EKS. Any scenario where DNS resolves to an unreachable private IP will fail.

Scenarios:
1. **Same VPC, different security groups** (our e2e case) — fixable with SG rules
2. **Different VPCs, no peering** — requires VPC peering, transit gateway, or PrivateLink
3. **Private-only clusters** — scanner must be in the same network or have a route to the private endpoint
4. **Cross-region** — private endpoints are not reachable cross-region

## Proposed improvements

### 1. Document networking requirements

Add documentation explaining:
- The split-horizon DNS behavior and its impact on cross-cluster scanning
- Required security group rules when scanner and target are in the same VPC
- Network topology requirements for different VPC / cross-region scenarios

### 2. Support endpoint override in MondooAuditConfig CRD (optional)

Currently the WIF external cluster spec only accepts `clusterName` and `region`:

```yaml
externalClusters:
  - name: target-cluster
    workloadIdentity:
      provider: eks
      eks:
        region: eu-central-1
        clusterName: my-target-cluster
        roleArn: arn:aws:iam::123456789:role/scanner-role
```

Consider adding an optional `endpoint` field that lets users override the API server address. This would allow users to specify a reachable endpoint (e.g., a public IP, a VPC endpoint, or a load balancer) when the default DNS resolution doesn't work:

```yaml
        endpoint: https://public-ip-or-custom-endpoint:443
```

The init container would pass this to `aws eks update-kubeconfig --endpoint <url>` (supported by the AWS CLI) or patch the kubeconfig after generation.

### 3. Consider public endpoint preference (optional)

When both public and private endpoints are available, the operator could query the EKS API for endpoint configuration and prefer the public endpoint when running from outside the target cluster's node network. This is complex to detect reliably and may not be desirable in all cases.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIF External Cluster Scanning: Private Endpoint Connectivity #1442

Problem

What we observed

Workaround applied in e2e tests

Scope

Proposed improvements

1. Document networking requirements

2. Support endpoint override in MondooAuditConfig CRD (optional)

3. Consider public endpoint preference (optional)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

WIF External Cluster Scanning: Private Endpoint Connectivity #1442

Description

Problem

What we observed

Workaround applied in e2e tests

Scope

Proposed improvements

1. Document networking requirements

2. Support endpoint override in MondooAuditConfig CRD (optional)

3. Consider public endpoint preference (optional)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions