@@ -20,38 +20,130 @@ From a high-level, for each scenario,
2020To write an E2E scenario,
2121
2222- choose a testing cluster. There are a few defined
23- in [ cluster.go] ( https://github.com/Azure/AgentBaker/blob/dev/e2e/cluster.go ) , e.g,
24- - ClusterKubenetAirgap
25- - ClusterAzureNetwork
23+ in [ cache.go] ( cache.go ) , e.g,
2624 - ClusterKubenet
25+ - ClusterAzureNetwork
26+ - ClusterAzureOverlayNetwork
27+ - ClusterCiliumNetwork
2728- use ` NodeBootstrappingConfiugration ` (` nbc ` ) to setup your scenario. it is used to invoke the primary
2829 node-bootstrapping
2930 API [ GetLatestNodeBootstrapping] ( https://github.com/Azure/AgentBaker/blob/2e730b5a498c5be9b082d912fd08ac9346582db9/pkg/agent/bakerapi.go#L14 ) .
3031 to modify agentpool properties, usually you need to set both` nbc.containerService.properties.AgentPoolProfiles[0].xxx `
3132 as well as ` nbc.agentPoolProfile ` . It is because when RP invokes AgentBaker, it will set the properties in this way
3233 and in e2e we follow the pattern.
3334- use ` VMConfigMutator ` to set VMSS properties such as SKU when needed.
34- Check [ vmss] ( https://github.com/Azure/AgentBaker/blob/dev/e2e/ vmss.go) for other configs.
35+ Check [ vmss] ( vmss.go ) for other configs.
3536 it is necessary to set ` nbc.agentPoolProfile.VMSize ` to match the VMSS SKU if you choose to change.
3637- use ` Validator ` to include your own verification of the VM's live state, such as file existsnce, sysctl settings, etc.
3738
39+ ## Infrastructure Architecture
40+
41+ All E2E clusters share a single VNet and Azure Bastion in the ` abe2e-{location} ` resource group. This
42+ avoids creating a per-cluster Bastion (~ 10 min each) and ensures all clusters are reachable from a
43+ single SSH entry point.
44+
45+ ``` mermaid
46+ graph TB
47+ subgraph RG["abe2e-{location} Resource Group"]
48+ subgraph VNET["abe2e-shared-vnet (10.0.0.0/8)"]
49+ BASTION_SUBNET["AzureBastionSubnet<br/>10.0.0.0/26"]
50+ FW_SUBNET["AzureFirewallSubnet<br/>10.0.1.0/24"]
51+ KUBENET_SUBNET["aks-subnet-abe2e-kubenet-v5<br/>10.1.0.0/20"]
52+ AZNET_SUBNET["aks-subnet-abe2e-azure-network-v4<br/>10.1.16.0/20"]
53+ OVERLAY_SUBNET["aks-subnet-abe2e-azure-overlay-...<br/>10.1.32.0/20"]
54+ MORE_SUBNETS["... more cluster subnets"]
55+ end
56+ BASTION["abe2e-shared-bastion<br/>(Standard SKU, Tunneling)"]
57+ FIREWALL["abe2e-fw<br/>(Azure Firewall)"]
58+ end
59+
60+ subgraph MC_KUBENET["MC_abe2e-kubenet-v5 Resource Group"]
61+ VMSS_K["VMSS (system pool)"]
62+ VMSS_K_TEST["VMSS (test VMs)"]
63+ RT_K["Route Table<br/>(pod routes + firewall)"]
64+ end
65+
66+ subgraph MC_AZNET["MC_abe2e-azure-network-v4 Resource Group"]
67+ VMSS_A["VMSS (system pool)"]
68+ VMSS_A_TEST["VMSS (test VMs)"]
69+ end
70+
71+ BASTION --> BASTION_SUBNET
72+ FIREWALL --> FW_SUBNET
73+ VMSS_K --> KUBENET_SUBNET
74+ VMSS_K_TEST --> KUBENET_SUBNET
75+ RT_K -.->|associated| KUBENET_SUBNET
76+ VMSS_A --> AZNET_SUBNET
77+ VMSS_A_TEST --> AZNET_SUBNET
78+
79+ DEV["Developer / CI"]
80+ DEV -->|SSH via tunnel| BASTION
81+ BASTION -->|"connects to any VM<br/>in shared VNet"| VMSS_K_TEST
82+ BASTION -->|"connects to any VM<br/>in shared VNet"| VMSS_A_TEST
83+ ```
84+
85+ ### Shared Infrastructure Setup
86+
87+ The shared infrastructure is created ** automatically** on first test run via cached idempotent
88+ functions — no separate setup script is needed.
89+
90+ | Resource | Name | Details |
91+ | ----------| ------| ---------|
92+ | VNet | ` abe2e-shared-vnet ` | ` 10.0.0.0/8 ` — supports ~ 4096 ` /20 ` cluster subnets |
93+ | Bastion | ` abe2e-shared-bastion ` | Standard SKU with tunneling enabled for native SSH |
94+ | Bastion Subnet | ` AzureBastionSubnet ` | ` 10.0.0.0/26 ` (required by Azure Bastion) |
95+ | Firewall Subnet | ` AzureFirewallSubnet ` | ` 10.0.1.0/24 ` (created by shared infra, firewall on-demand) |
96+
97+ Each AKS cluster gets its own ` /20 ` subnet (4091 usable IPs) in the shared VNet. The subnet is
98+ named ` aks-subnet-{clusterName} ` .
99+
100+ ### How It Works
101+
102+ 1 . ** ` CachedEnsureSharedInfra ` ** — runs once per location per test run. Creates/verifies the shared
103+ VNet, Bastion, and Firewall subnet.
104+ 2 . ** ` CachedEnsureClusterSubnet ` ** — runs once per cluster. Creates/verifies the cluster's dedicated
105+ subnet in the shared VNet.
106+ 3 . Each cluster model sets ` VnetSubnetID ` on the agent pool profile (BYOV — Bring Your Own VNet).
107+ 4 . AKS creates VMSS and route tables in the ` MC_ ` resource group, but uses the shared VNet's subnet.
108+ 5 . SSH to test VMs goes through the shared Bastion, which can reach any VM in the VNet.
109+
110+ ### Test Flow
111+
38112``` mermaid
39113sequenceDiagram
40- E2E->>+ARM: Get or Create AKS Cluster
41- ARM-->>-E2E: Cluster details
42- E2E->>+AgentBakerCode: Fetch VM Configuration (include CSE)
43- AgentBakerCode-->>-E2E: VM Configuration
44- E2E->>+ARM: Create VM using fetched VM Config in cluster network
45- ARM-->>-E2E: VM instance
46- E2E->>+Bastion: Create SSH Tunnel
47- Bastion->>+VM: Forward SSH Connection
48- E2E->>VM: Healthcheck via SSH Tunnel
49- VM-->>E2E: Healthcheck OK
50- E2E->>+KubeAPI: Verify Node Ready
51- KubeAPI-->>-E2E: Node Ready
52- E2E->>VM: Execute test validators via SSH Tunnel
53- VM-->>-E2E: Test results
54- Bastion-->>-E2E: Close SSH Tunnel
114+ participant CI as Developer / CI
115+ participant Infra as Shared Infra (cached)
116+ participant ARM as Azure Resource Manager
117+ participant AB as AgentBaker API
118+ participant Bastion as Shared Bastion
119+ participant VM as Test VM
120+ participant K8s as Kube API Server
121+
122+ CI->>Infra: Ensure shared VNet + Bastion
123+ Infra-->>CI: Ready (cached after first run)
124+
125+ CI->>Infra: Ensure cluster subnet
126+ Infra-->>CI: Subnet ID
127+
128+ CI->>ARM: Create/Get AKS cluster (BYOV subnet)
129+ ARM-->>CI: Cluster details
130+
131+ CI->>AB: Generate CSE + CustomData
132+ AB-->>CI: VM configuration
133+
134+ CI->>ARM: Create VMSS in cluster subnet
135+ ARM-->>CI: VM instance
136+
137+ CI->>Bastion: SSH tunnel to VM private IP
138+ Bastion->>VM: Forward SSH connection
139+
140+ CI->>VM: Run health checks + validators
141+ VM-->>CI: Results
142+
143+ CI->>K8s: Verify node ready
144+ K8s-->>CI: Node ready ✓
145+
146+ Bastion-->>CI: Close tunnel
55147```
56148
57149## Running Locally
0 commit comments