This guide explains how to use the GitHub Actions CI/CD workflows for Enclave Lab.
Enclave Lab uses GitHub Actions for automated testing and validation with four main workflows:
- PR Validation - Fast code quality checks on every PR
- Infrastructure Verification - Test infrastructure setup
- E2E Connected Mode - Full cluster deployment testing
- Cleanup - Infrastructure maintenance
Purpose: Fast code quality checks that run on every pull request
Trigger: Automatic on PR open, update, or reopen
Duration: ~5-10 minutes
Runs on: GitHub-hosted runners (no CI machine needed)
- ✅ Shell script syntax (shellcheck)
- ✅ YAML formatting (yamllint)
- ✅ Ansible playbook syntax (ansible-lint)
- ✅ Makefile syntax
This workflow runs automatically on every PR. No manual action needed.
To test locally before pushing:
make validate- Go to your PR on GitHub
- Click "Checks" tab
- View "Code Quality Checks" results
- If failed, click to see detailed error messages
Common fixes:
# Fix shell script issues
shellcheck scripts/*.sh
# Fix YAML formatting
yamllint .
# Fix Ansible issues
ansible-lint playbooks/
# Fix all at once
make validatePurpose: Test infrastructure creation without full cluster deployment
Trigger:
- Manual dispatch (Actions tab)
- PR with
test-infralabel
Duration: ~60-75 minutes
Runs on: Self-hosted runner (CI machine)
- ✅ Pre-flight checks (resources, permissions)
- ✅ Create infrastructure (
make environment) - ✅ Provision Landing Zone (
make provision-landing-zone) - ✅ Install Enclave Lab in connected mode (
make install-enclave) - ✅ Verify installation
- ✅ Collect artifacts (logs, environment metadata)
- ✅ Optional cleanup
Option 1: Manual Dispatch
- Go to Actions tab on GitHub
- Select "Infrastructure Verification"
- Click "Run workflow"
- Choose cleanup strategy:
on_failure(recommended): Clean up only if test failsalways: Clean up after every runnever: Keep infrastructure for debugging
Option 2: PR Label
- Add label
test-infrato your PR - Workflow runs automatically
- Remove label to prevent re-runs
- on_failure (recommended): Keeps successful infrastructure for reuse, cleans failures
- always: Always cleans up, ensures fresh state
- never: Manual cleanup required, good for debugging
- Go to Actions tab
- Click on the workflow run
- View step-by-step progress
- Download artifacts (logs, environment.json)
environment.json- Infrastructure metadatavm-status.txt- Virtual machine statusnetwork-status.txt- Network configurationdeployment.log- Enclave Lab installation log
Purpose: Full end-to-end cluster deployment testing
Trigger:
- Manual dispatch (Actions tab)
- PR with
test-e2elabel - Weekly schedule (Sunday 2 AM UTC)
Duration: ~90-120 minutes
Runs on: Self-hosted runner (CI machine)
- ✅ Pre-flight checks
- ✅ Create/reuse infrastructure
- ✅ Provision Landing Zone
- ✅ Install Enclave Lab (connected mode)
- ✅ Deploy OpenShift cluster (
make deploy-cluster) - ✅ Verify cluster health
- ✅ Collect artifacts (kubeconfig, logs)
- ✅ Optional cleanup
Option 1: Manual Dispatch
- Go to Actions tab
- Select "E2E Connected Mode"
- Click "Run workflow"
- Configure options:
- cleanup_strategy:
on_failure/always/never - reuse_infrastructure:
true/false
- cleanup_strategy:
Option 2: PR Label
- Add label
test-e2eto your PR - Workflow runs automatically
Option 3: Scheduled
Runs automatically every Sunday at 2 AM UTC for regression testing.
Enabled (recommended):
- Reuses existing VMs if available
- Faster runs
- Good for iterative testing
Disabled:
- Creates fresh infrastructure every time
- Slower but ensures clean state
- Good for validating from scratch
The workflow checks:
- ✅ Nodes are ready
- ✅ Cluster operators are available
- ✅ No degraded operators
- ✅ Kubeconfig is accessible
environment.json- Infrastructure metadatavm-status.txt- Virtual machine statusnetwork-status.txt- Network configurationdeployment.log- Deployment logskubeconfig- Cluster access configurationconfig/global.yaml- Enclave Lab configuration
After successful E2E run:
- Download
kubeconfigartifact - Use it to access the cluster:
export KUBECONFIG=./kubeconfig oc get nodes oc get co
Purpose: Infrastructure cleanup and maintenance
Trigger:
- Manual dispatch (Actions tab)
- Weekly schedule (Sunday 4 AM UTC)
Duration: ~5-15 minutes
Runs on: Self-hosted runner (CI machine)
Standard (default):
- Runs
make clean - Removes Enclave test infrastructure
- Quick and safe
Deep:
- Standard cleanup
- Force destroy remaining VMs
- Remove networks
- Clean working directory
Full:
- Deep cleanup
- Stop sushy-tools containers
- Remove libvirt pools
- Clean dangling interfaces
- Nuclear option for stuck infrastructure
Manual Cleanup:
- Go to Actions tab
- Select "Cleanup Infrastructure"
- Click "Run workflow"
- Choose cleanup level
Scheduled Cleanup:
Runs automatically every Sunday at 4 AM UTC with standard cleanup.
- Standard: Regular cleanup after tests
- Deep: Infrastructure is stuck or behaving oddly
- Full: Complete reset needed, things are really broken
- Create PR
- PR Validation runs automatically
- If changes are significant, add
test-infralabel - If testing cluster deployment, add
test-e2elabel
- Check workflow logs in Actions tab
- Download artifacts
- Look for error messages in logs
- If infrastructure is stuck, run cleanup workflow
E2E workflow runs automatically every Sunday at 2 AM UTC. Reviews results Monday morning.
Cleanup workflow runs automatically every Sunday at 4 AM UTC. Prevents resource accumulation.
✅ DO:
- Run
make validatelocally before pushing - Use
test-infralabel for infrastructure changes - Use
test-e2elabel sparingly (long running) - Check workflow results before asking for review
❌ DON'T:
- Push without running validation
- Add
test-e2eto every PR (expensive) - Leave failed workflows without investigation
- Ignore cleanup failures
✅ DO:
- Check that PR Validation passed
- Request
test-infrafor infrastructure changes - Request
test-e2efor significant changes - Review workflow artifacts if tests fail
See CI_TROUBLESHOOTING.md for detailed troubleshooting guide.
Workflow stuck in queue:
- Check if another workflow is running
- Workflows queue on shared CI machine
Workflow failed at pre-flight:
- Check GitHub secrets are configured
- Verify runner is online (Settings → Actions → Runners)
Infrastructure creation failed:
- Check CI machine has enough resources
- Run cleanup workflow (deep level)
Can't access cluster:
- Download kubeconfig artifact
- Check deployment.log for errors
- SSH to Landing Zone for debugging
Settings → Actions → Runners
- Runner should show "Idle" when not running workflows
- If "Offline", check runner service on CI machine
Actions tab → Select workflow → View runs
- See success/failure trends
- Download artifacts from past runs
- Review timing data
Monitor CI machine:
# Disk space
df -h /opt/dev-scripts
# RAM usage
free -g
# Running VMs
sudo virsh list --all
# Active networks
sudo virsh net-list --all- CI Runner Setup - Set up the self-hosted runner
- CI Troubleshooting - Fix common issues
- Deployment Guide - Full deployment documentation