Skip to content

Commit 738f4ae

Browse files
committed
fix: use one big vnet and attach AKS clusters to it to avoid creating bastion multiple times
1 parent ddbcdcc commit 738f4ae

8 files changed

Lines changed: 830 additions & 411 deletions

File tree

e2e/README.md

Lines changed: 111 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -20,38 +20,130 @@ From a high-level, for each scenario,
2020
To write an E2E scenario,
2121

2222
- choose a testing cluster. There are a few defined
23-
in [cluster.go](https://github.com/Azure/AgentBaker/blob/dev/e2e/cluster.go), e.g,
24-
- ClusterKubenetAirgap
25-
- ClusterAzureNetwork
23+
in [cache.go](cache.go), e.g,
2624
- ClusterKubenet
25+
- ClusterAzureNetwork
26+
- ClusterAzureOverlayNetwork
27+
- ClusterCiliumNetwork
2728
- use `NodeBootstrappingConfiugration` (`nbc`) to setup your scenario. it is used to invoke the primary
2829
node-bootstrapping
2930
API [GetLatestNodeBootstrapping](https://github.com/Azure/AgentBaker/blob/2e730b5a498c5be9b082d912fd08ac9346582db9/pkg/agent/bakerapi.go#L14).
3031
to modify agentpool properties, usually you need to set both`nbc.containerService.properties.AgentPoolProfiles[0].xxx`
3132
as well as `nbc.agentPoolProfile`. It is because when RP invokes AgentBaker, it will set the properties in this way
3233
and in e2e we follow the pattern.
3334
- use `VMConfigMutator` to set VMSS properties such as SKU when needed.
34-
Check [vmss](https://github.com/Azure/AgentBaker/blob/dev/e2e/vmss.go) for other configs.
35+
Check [vmss](vmss.go) for other configs.
3536
it is necessary to set `nbc.agentPoolProfile.VMSize` to match the VMSS SKU if you choose to change.
3637
- use `Validator` to include your own verification of the VM's live state, such as file existsnce, sysctl settings, etc.
3738

39+
## Infrastructure Architecture
40+
41+
All E2E clusters share a single VNet and Azure Bastion in the `abe2e-{location}` resource group. This
42+
avoids creating a per-cluster Bastion (~10 min each) and ensures all clusters are reachable from a
43+
single SSH entry point.
44+
45+
```mermaid
46+
graph TB
47+
subgraph RG["abe2e-{location} Resource Group"]
48+
subgraph VNET["abe2e-shared-vnet (10.0.0.0/8)"]
49+
BASTION_SUBNET["AzureBastionSubnet<br/>10.0.0.0/26"]
50+
FW_SUBNET["AzureFirewallSubnet<br/>10.0.1.0/24"]
51+
KUBENET_SUBNET["aks-subnet-abe2e-kubenet-v5<br/>10.1.0.0/20"]
52+
AZNET_SUBNET["aks-subnet-abe2e-azure-network-v4<br/>10.1.16.0/20"]
53+
OVERLAY_SUBNET["aks-subnet-abe2e-azure-overlay-...<br/>10.1.32.0/20"]
54+
MORE_SUBNETS["... more cluster subnets"]
55+
end
56+
BASTION["abe2e-shared-bastion<br/>(Standard SKU, Tunneling)"]
57+
FIREWALL["abe2e-fw<br/>(Azure Firewall)"]
58+
end
59+
60+
subgraph MC_KUBENET["MC_abe2e-kubenet-v5 Resource Group"]
61+
VMSS_K["VMSS (system pool)"]
62+
VMSS_K_TEST["VMSS (test VMs)"]
63+
RT_K["Route Table<br/>(pod routes + firewall)"]
64+
end
65+
66+
subgraph MC_AZNET["MC_abe2e-azure-network-v4 Resource Group"]
67+
VMSS_A["VMSS (system pool)"]
68+
VMSS_A_TEST["VMSS (test VMs)"]
69+
end
70+
71+
BASTION --> BASTION_SUBNET
72+
FIREWALL --> FW_SUBNET
73+
VMSS_K --> KUBENET_SUBNET
74+
VMSS_K_TEST --> KUBENET_SUBNET
75+
RT_K -.->|associated| KUBENET_SUBNET
76+
VMSS_A --> AZNET_SUBNET
77+
VMSS_A_TEST --> AZNET_SUBNET
78+
79+
DEV["Developer / CI"]
80+
DEV -->|SSH via tunnel| BASTION
81+
BASTION -->|"connects to any VM<br/>in shared VNet"| VMSS_K_TEST
82+
BASTION -->|"connects to any VM<br/>in shared VNet"| VMSS_A_TEST
83+
```
84+
85+
### Shared Infrastructure Setup
86+
87+
The shared infrastructure is created **automatically** on first test run via cached idempotent
88+
functions — no separate setup script is needed.
89+
90+
| Resource | Name | Details |
91+
|----------|------|---------|
92+
| VNet | `abe2e-shared-vnet` | `10.0.0.0/8` — supports ~4096 `/20` cluster subnets |
93+
| Bastion | `abe2e-shared-bastion` | Standard SKU with tunneling enabled for native SSH |
94+
| Bastion Subnet | `AzureBastionSubnet` | `10.0.0.0/26` (required by Azure Bastion) |
95+
| Firewall Subnet | `AzureFirewallSubnet` | `10.0.1.0/24` (created by shared infra, firewall on-demand) |
96+
97+
Each AKS cluster gets its own `/20` subnet (4091 usable IPs) in the shared VNet. The subnet is
98+
named `aks-subnet-{clusterName}`.
99+
100+
### How It Works
101+
102+
1. **`CachedEnsureSharedInfra`** — runs once per location per test run. Creates/verifies the shared
103+
VNet, Bastion, and Firewall subnet.
104+
2. **`CachedEnsureClusterSubnet`** — runs once per cluster. Creates/verifies the cluster's dedicated
105+
subnet in the shared VNet.
106+
3. Each cluster model sets `VnetSubnetID` on the agent pool profile (BYOV — Bring Your Own VNet).
107+
4. AKS creates VMSS and route tables in the `MC_` resource group, but uses the shared VNet's subnet.
108+
5. SSH to test VMs goes through the shared Bastion, which can reach any VM in the VNet.
109+
110+
### Test Flow
111+
38112
```mermaid
39113
sequenceDiagram
40-
E2E->>+ARM: Get or Create AKS Cluster
41-
ARM-->>-E2E: Cluster details
42-
E2E->>+AgentBakerCode: Fetch VM Configuration (include CSE)
43-
AgentBakerCode-->>-E2E: VM Configuration
44-
E2E->>+ARM: Create VM using fetched VM Config in cluster network
45-
ARM-->>-E2E: VM instance
46-
E2E->>+Bastion: Create SSH Tunnel
47-
Bastion->>+VM: Forward SSH Connection
48-
E2E->>VM: Healthcheck via SSH Tunnel
49-
VM-->>E2E: Healthcheck OK
50-
E2E->>+KubeAPI: Verify Node Ready
51-
KubeAPI-->>-E2E: Node Ready
52-
E2E->>VM: Execute test validators via SSH Tunnel
53-
VM-->>-E2E: Test results
54-
Bastion-->>-E2E: Close SSH Tunnel
114+
participant CI as Developer / CI
115+
participant Infra as Shared Infra (cached)
116+
participant ARM as Azure Resource Manager
117+
participant AB as AgentBaker API
118+
participant Bastion as Shared Bastion
119+
participant VM as Test VM
120+
participant K8s as Kube API Server
121+
122+
CI->>Infra: Ensure shared VNet + Bastion
123+
Infra-->>CI: Ready (cached after first run)
124+
125+
CI->>Infra: Ensure cluster subnet
126+
Infra-->>CI: Subnet ID
127+
128+
CI->>ARM: Create/Get AKS cluster (BYOV subnet)
129+
ARM-->>CI: Cluster details
130+
131+
CI->>AB: Generate CSE + CustomData
132+
AB-->>CI: VM configuration
133+
134+
CI->>ARM: Create VMSS in cluster subnet
135+
ARM-->>CI: VM instance
136+
137+
CI->>Bastion: SSH tunnel to VM private IP
138+
Bastion->>VM: Forward SSH connection
139+
140+
CI->>VM: Run health checks + validators
141+
VM-->>CI: Results
142+
143+
CI->>K8s: Verify node ready
144+
K8s-->>CI: Node ready ✓
145+
146+
Bastion-->>CI: Close tunnel
55147
```
56148

57149
## Running Locally

0 commit comments

Comments
 (0)