Skip to content

Commit 1f4b1b3

Browse files
authored
Enable parallel CI execution with cluster isolation and local testing support (#24)
* feat: Enable parallel CI execution with isolated infrastructure Add atomic subnet allocation and cluster-specific resource isolation to allow multiple CI jobs to run simultaneously without conflicts. Key changes: - Atomic subnet allocation with file locking (100.64.X.0/24 ranges) - Cluster-specific BMC emulator ports (8000 + subnet_id) - Unique storage pool paths per cluster - Network isolation (bridges, libvirt networks) - Comprehensive network diagnostics for troubleshooting - Orphaned bridge cleanup and conflict detection - Storage pool activation and management improvements - Container lifecycle fixes for sushy-tools Infrastructure reliability improvements: - Handle "already active" storage pools gracefully - Remove orphaned bridges from previous runs - Add layer 2 network diagnostics (ebtables, arptables, nftables) - Network initialization wait for ARP resolution - Bridge IP assignment timeout handling - Clean up dev-scripts -1 suffix pools - Firewall rules for BMC endpoints This enables parallel PR testing and nightly runs without resource conflicts or cross-contamination between jobs. * refactor: Standardize working directory handling across all scripts Refactor all scripts to auto-construct WORKING_DIR from BASE_WORKING_DIR when not explicitly set, enabling cluster-specific working directories. This supports: - Parallel CI execution (each cluster gets own directory) - Local testing (explicit WORKING_DIR) - CI environment (BASE_WORKING_DIR set, auto-construct path) - Workflow steps with if: always() (work even if setup-working-dir failed) Scripts updated with auto-construction pattern: - get_landing_zone_ip.sh - Find Landing Zone IP from environment file - generate_environment_json.sh - Create environment metadata - generate_enclave_vars.sh - Generate Ansible variables - install_enclave.sh - Enclave installation - verify_landing_zone.sh - Landing Zone verification - collect_ci_artifacts.sh - Artifact collection Pattern: Auto-construct from BASE_WORKING_DIR + ENCLAVE_CLUSTER_NAME, with appropriate fallbacks or errors for each script's needs. * refactor: Move CI logic to Makefile and scripts for local testing Refactor GitHub Actions workflows to use Makefile targets and scripts, enabling full CI flow to run locally for development and debugging. New scripts (dual-mode: local terminal + GitHub Actions): - verify_cluster.sh - Cluster deployment verification - verify_cleanup.sh - Infrastructure cleanup verification - verify_networks.sh - Network infrastructure validation - setup_working_dir.sh - Working directory setup - collect_step_logs.sh - Log collection - preflight_checks.sh - Environment validation - generate_cluster_name.sh - Unique cluster naming New Makefile targets: - verify-cluster, verify-cleanup, verify-networks - setup-working-dir, collect-step-logs - preflight-checks, ci-flow-connected, ci-flow-disconnected Workflow changes: - Inline bash → make targets (e2e-deployment, infra-verify, nightlies) - Remove setup-infrastructure action (direct make calls instead) - Refactor preflight-checks action (delegate to script) - Add allocate-subnet action usage - Restore check-e2e-needed to pr-validation runner (intentional change) - Restore e2e-deployment to enclave-large runner - Restore infra-verify to enclave-small runner - Un-comment "Clean existing infrastructure" step Benefits: - Run full CI locally: make ci-flow-connected - Debug individual steps: make verify-cluster - Test changes before CI: make environment && make provision-landing-zone - Consistent logic: same code for CI and local Documentation: - docs/LOCAL_TESTING.md - Comprehensive local testing guide - README.md - New Makefile targets section
1 parent 997e4ce commit 1f4b1b3

28 files changed

Lines changed: 3112 additions & 524 deletions
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
name: Allocate Subnet
2+
description: Allocate unique subnet for cluster to avoid IP conflicts in parallel CI runs
3+
4+
outputs:
5+
subnet-id:
6+
description: 'Allocated subnet ID (2-254)'
7+
value: ${{ steps.allocate.outputs.subnet-id }}
8+
9+
runs:
10+
using: composite
11+
steps:
12+
- name: Allocate unique subnet for cluster
13+
id: allocate
14+
shell: bash
15+
run: |
16+
echo "## Subnet Allocation" >> $GITHUB_STEP_SUMMARY
17+
echo "" >> $GITHUB_STEP_SUMMARY
18+
echo "Allocating unique subnet for cluster: ${ENCLAVE_CLUSTER_NAME}" >> $GITHUB_STEP_SUMMARY
19+
20+
# Allocate subnet and get network configuration
21+
# Use BASE_WORKING_DIR for shared allocation file across all clusters
22+
# Script outputs environment variable assignments with all network details
23+
ALLOCATION_OUTPUT=$(WORKING_DIR="${BASE_WORKING_DIR}" bash scripts/allocate_subnet.sh allocate)
24+
25+
if [ -z "$ALLOCATION_OUTPUT" ]; then
26+
echo "❌ Failed to allocate subnet"
27+
echo "❌ Failed to allocate subnet" >> $GITHUB_STEP_SUMMARY
28+
exit 1
29+
fi
30+
31+
# Source the environment variables from the script output
32+
eval "$ALLOCATION_OUTPUT"
33+
34+
# Export to GitHub environment and outputs
35+
echo "subnet-id=$ENCLAVE_SUBNET_ID" >> $GITHUB_OUTPUT
36+
echo "ENCLAVE_SUBNET_ID=$ENCLAVE_SUBNET_ID" >> $GITHUB_ENV
37+
echo "ENCLAVE_BMC_NETWORK=$ENCLAVE_BMC_NETWORK" >> $GITHUB_ENV
38+
echo "ENCLAVE_CLUSTER_NETWORK=$ENCLAVE_CLUSTER_NETWORK" >> $GITHUB_ENV
39+
40+
# Output to console
41+
echo "✅ Subnet allocated: ID $ENCLAVE_SUBNET_ID"
42+
echo " - BMC Network: $ENCLAVE_BMC_NETWORK"
43+
echo " - Cluster Network: $ENCLAVE_CLUSTER_NETWORK"
44+
45+
# Output to summary
46+
echo "" >> $GITHUB_STEP_SUMMARY
47+
echo "✅ Subnet allocated: **ID $ENCLAVE_SUBNET_ID**" >> $GITHUB_STEP_SUMMARY
48+
echo "- BMC Network: \`$ENCLAVE_BMC_NETWORK\`" >> $GITHUB_STEP_SUMMARY
49+
echo "- Cluster Network: \`$ENCLAVE_CLUSTER_NETWORK\`" >> $GITHUB_STEP_SUMMARY
50+
echo "" >> $GITHUB_STEP_SUMMARY

.github/actions/preflight-checks/action.yml

Lines changed: 11 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -27,79 +27,26 @@ runs:
2727
using: composite
2828
steps:
2929
- name: Run preflight checks
30-
shell: bash {0}
30+
shell: bash
3131
run: |
32-
set +e # Don't exit on first error, collect all failures
33-
FAILED=0
34-
35-
echo "## ${{ inputs.title }}" | tee -a $GITHUB_STEP_SUMMARY
36-
echo "" | tee -a $GITHUB_STEP_SUMMARY
37-
38-
# Check required environment variables
39-
echo "### Environment Variables" | tee -a $GITHUB_STEP_SUMMARY
40-
41-
if [ -z "$DEV_SCRIPTS_PATH" ]; then
42-
echo "❌ DEV_SCRIPTS_PATH not set" | tee -a $GITHUB_STEP_SUMMARY
43-
FAILED=1
44-
else
45-
echo "✅ DEV_SCRIPTS_PATH: $DEV_SCRIPTS_PATH" | tee -a $GITHUB_STEP_SUMMARY
46-
fi
47-
48-
if [ -z "$WORKING_DIR" ]; then
49-
echo "❌ WORKING_DIR not set" | tee -a $GITHUB_STEP_SUMMARY
50-
FAILED=1
51-
else
52-
echo "✅ WORKING_DIR: $WORKING_DIR" | tee -a $GITHUB_STEP_SUMMARY
53-
fi
32+
# Build arguments for the script based on inputs
33+
ARGS="--title '${{ inputs.title }}'"
5434
5535
if [ "${{ inputs.check-pull-secret }}" = "true" ]; then
56-
if [ -z "$PULL_SECRET" ]; then
57-
echo "❌ PULL_SECRET not set" | tee -a $GITHUB_STEP_SUMMARY
58-
FAILED=1
59-
else
60-
echo "✅ PULL_SECRET: configured" | tee -a $GITHUB_STEP_SUMMARY
61-
fi
36+
ARGS="$ARGS --check-pull-secret"
6237
fi
6338
64-
if [ -n "${{ inputs.deployment-mode }}" ]; then
65-
echo "✅ Deployment mode: ${{ inputs.deployment-mode }}" | tee -a $GITHUB_STEP_SUMMARY
66-
fi
67-
68-
# Check system resources if requested
6939
if [ "${{ inputs.check-system-resources }}" = "true" ]; then
70-
echo "" | tee -a $GITHUB_STEP_SUMMARY
71-
echo "### System Resources" | tee -a $GITHUB_STEP_SUMMARY
72-
TOTAL_RAM=$(free -g | awk '/^Mem:/{print $2}')
73-
echo "✅ Total RAM: ${TOTAL_RAM}GB" | tee -a $GITHUB_STEP_SUMMARY
74-
75-
if [ -n "$WORKING_DIR" ]; then
76-
AVAILABLE_DISK=$(df -h $WORKING_DIR 2>/dev/null | awk 'NR==2{print $4}')
77-
if [ -n "$AVAILABLE_DISK" ]; then
78-
echo "✅ Available disk space: $AVAILABLE_DISK" | tee -a $GITHUB_STEP_SUMMARY
79-
fi
80-
fi
40+
ARGS="$ARGS --check-system-resources"
8141
fi
8242
83-
# Check libvirt access if requested
8443
if [ "${{ inputs.check-libvirt }}" = "true" ]; then
85-
echo "" | tee -a $GITHUB_STEP_SUMMARY
86-
echo "### Libvirt Access" | tee -a $GITHUB_STEP_SUMMARY
87-
if sudo virsh list --all > /dev/null 2>&1; then
88-
echo "✅ Libvirt access verified" | tee -a $GITHUB_STEP_SUMMARY
89-
else
90-
echo "❌ Cannot access libvirt" | tee -a $GITHUB_STEP_SUMMARY
91-
FAILED=1
92-
fi
44+
ARGS="$ARGS --check-libvirt"
9345
fi
9446
95-
# Final status
96-
if [ $FAILED -eq 0 ]; then
97-
echo "" | tee -a $GITHUB_STEP_SUMMARY
98-
echo "✅ All pre-flight checks passed" | tee -a $GITHUB_STEP_SUMMARY
99-
else
100-
echo "" | tee -a $GITHUB_STEP_SUMMARY
101-
echo "❌ Pre-flight checks failed" | tee -a $GITHUB_STEP_SUMMARY
102-
echo "" | tee -a $GITHUB_STEP_SUMMARY
103-
echo "**Action Required**: Configure repository variables and secrets in Settings → Secrets and variables → Actions" | tee -a $GITHUB_STEP_SUMMARY
104-
exit 1
47+
if [ -n "${{ inputs.deployment-mode }}" ]; then
48+
ARGS="$ARGS --deployment-mode '${{ inputs.deployment-mode }}'"
10549
fi
50+
51+
# Call the preflight checks script
52+
eval "./scripts/preflight_checks.sh $ARGS"

.github/actions/setup-infrastructure/action.yml

Lines changed: 0 additions & 49 deletions
This file was deleted.

0 commit comments

Comments
 (0)