Skip to content

Conversation

@yanivagman
Copy link
Collaborator

Add automated VM testing infrastructure for running Tracee integration and e2e tests in isolated Vagrant VMs using QEMU/KVM.

Key features:

  • Automated VM lifecycle management (start, test, cleanup)
  • Selective test execution (--unit, --integration, --e2e-inst, --e2e-net)
  • Automatic VM conflict detection and resolution
  • Log collection and VM preservation on failure
  • Provisioning optimization: first run 10-15min, subsequent runs ~2-3min

Scripts:

  • tests/run-vm-tests.sh: Host-side orchestration
  • tests/run-tests-in-vm.sh: In-VM test execution

Documentation:

  • docs/contributing/vm-testing.md: Comprehensive guide
  • docs/contributing/setup-development-machine-with-vagrant.md: Updated
  • tests/README.md: Quick reference

Add automated VM testing infrastructure for running Tracee integration
and e2e tests in isolated Vagrant VMs using QEMU/KVM.

Key features:
- Automated VM lifecycle management (start, test, cleanup)
- Selective test execution (--unit, --integration, --e2e-inst, --e2e-net)
- Automatic VM conflict detection and resolution
- Log collection and VM preservation on failure
- Provisioning optimization: first run 10-15min, subsequent runs ~2-3min

Scripts:
- tests/run-vm-tests.sh: Host-side orchestration
- tests/run-tests-in-vm.sh: In-VM test execution

Documentation:
- docs/contributing/vm-testing.md: Comprehensive guide
- docs/contributing/setup-development-machine-with-vagrant.md: Updated
- tests/README.md: Quick reference
@yanivagman yanivagman requested a review from geyslan November 23, 2025 11:28
@yanivagman yanivagman self-assigned this Nov 23, 2025
Copilot AI review requested due to automatic review settings November 23, 2025 11:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an automated VM testing infrastructure for running Tracee integration and e2e tests in isolated Vagrant VMs, addressing the need for safe test execution that won't impact developers' host systems.

Key changes:

  • Automated VM lifecycle management with conflict detection and automatic cleanup
  • Selective test execution supporting --unit, --integration, --e2e-inst, and --e2e-net flags
  • Enhanced provisioning efficiency with run: "once" directives reducing subsequent runs from 10-15min to ~2-3min

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/run-vm-tests.sh Host-side orchestration script handling VM lifecycle, test execution, and cleanup
tests/run-tests-in-vm.sh In-VM test executor with selective test suite support and log collection
tests/README.md Quick reference for test types and VM testing approach
docs/contributing/vm-testing.md Comprehensive guide covering setup, usage, troubleshooting, and best practices
docs/contributing/setup-development-machine-with-vagrant.md Updated with VM testing section and improved QEMU setup instructions
Vagrantfile Added run: "once" provisioning optimization and improved 9p mount options

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The YELLOW color code uses bold (1;33) while other colors don't. For consistency, consider using '\033[0;33m' to match the pattern of RED and GREEN, or add comments explaining why YELLOW needs to be bold.

Suggested change
YELLOW='\033[1;33m'
YELLOW='\033[0;33m'

Copilot uses AI. Check for mistakes.
Comment on lines +177 to +179
if vagrant status 2>/dev/null | grep -E "tracee-(dev|test)-vm.*running" | grep -v "${VM_NAME}" > /dev/null; then
warn "Another Tracee VM is running. Stopping it first..."
vagrant halt
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This grep pattern will always filter out ${VM_NAME} ('tracee-test-vm'), but then the code attempts to stop VMs. If only the test VM is running, this check will pass (no other VMs found) but then 'vagrant halt' will still stop the test VM. The logic should either skip the halt if no conflicting VMs exist, or check for any running VM including the test VM.

Suggested change
if vagrant status 2>/dev/null | grep -E "tracee-(dev|test)-vm.*running" | grep -v "${VM_NAME}" > /dev/null; then
warn "Another Tracee VM is running. Stopping it first..."
vagrant halt
running_vms=$(vagrant status 2>/dev/null | grep -E "tracee-(dev|test)-vm.*running" | awk '{print $1}')
other_running_vms=$(echo "$running_vms" | grep -v "^${VM_NAME}$" || true)
if [[ -n "$other_running_vms" ]]; then
warn "Another Tracee VM is running. Stopping it first..."
vagrant halt $other_running_vms

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +177
info "Running go test (without coverage to avoid 9p permission issues)..."
# Run the same test command as the Makefile but without -coverprofile
GOOS=linux CC=clang GOARCH=amd64 GOFIPS140=off \
CGO_CFLAGS="-I/vagrant/dist/libbpf/include" \
CGO_LDFLAGS="-L/vagrant/dist/libbpf/obj -lbpf" \
go test \
-tags core,ebpf,lsmsupport \
-ldflags="-s=false -w=false -extldflags \"-lelf -lz\" -X main.version=\"$(cat /vagrant/VERSION)\"" \
-shuffle on \
-timeout 20m \
-race \
-v \
-p 1 \
-count=1 \
./tests/integration/... 2>&1 | tee "${integration_log}" | tee -a "${LOG_FILE}"
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the integration test command from the Makefile. If the Makefile's test-integration target changes, this needs manual synchronization. Consider either: 1) invoking the Makefile target with coverage disabled via an environment variable, or 2) extracting this command to a shared script that both can use.

Suggested change
info "Running go test (without coverage to avoid 9p permission issues)..."
# Run the same test command as the Makefile but without -coverprofile
GOOS=linux CC=clang GOARCH=amd64 GOFIPS140=off \
CGO_CFLAGS="-I/vagrant/dist/libbpf/include" \
CGO_LDFLAGS="-L/vagrant/dist/libbpf/obj -lbpf" \
go test \
-tags core,ebpf,lsmsupport \
-ldflags="-s=false -w=false -extldflags \"-lelf -lz\" -X main.version=\"$(cat /vagrant/VERSION)\"" \
-shuffle on \
-timeout 20m \
-race \
-v \
-p 1 \
-count=1 \
./tests/integration/... 2>&1 | tee "${integration_log}" | tee -a "${LOG_FILE}"
info "Running integration tests via Makefile (coverage disabled for VM)..."
# Use the Makefile's test-integration target with COVERAGE=0 to avoid coverage on 9p mounts
COVERAGE=0 make test-integration 2>&1 | tee "${integration_log}" | tee -a "${LOG_FILE}"

Copilot uses AI. Check for mistakes.
**Solution**:
- Increase CPU/RAM allocation:
```bash
VM_PROC=8 VM_MEM=16 vagrant up
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variable name should be VM_CPUS, not VM_PROC. The script uses VM_CPUS throughout (line 29 of run-vm-tests.sh and line 313 in this same file).

Suggested change
VM_PROC=8 VM_MEM=16 vagrant up
VM_CPUS=8 VM_MEM=16 vagrant up

Copilot uses AI. Check for mistakes.
./tests/run-vm-tests.sh

# Custom resources
VM_PROC=8 VM_MEM=16 ./tests/run-vm-tests.sh
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variable name should be VM_CPUS, not VM_PROC. This is inconsistent with the script's actual variable name used in run-vm-tests.sh.

Suggested change
VM_PROC=8 VM_MEM=16 ./tests/run-vm-tests.sh
VM_CPUS=8 VM_MEM=16 ./tests/run-vm-tests.sh

Copilot uses AI. Check for mistakes.
- **macOS**: HVF (Hypervisor Framework) acceleration on Intel/Apple Silicon
- **Fallback**: TCG software emulation (slower but works on all systems)
- [vagrant-qemu plugin]: Install with `vagrant plugin install vagrant-qemu`
- [Vagrant] (2.2+)
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown link reference [Vagrant] is used but not defined. Either define the link reference at the bottom of the document (e.g., '[Vagrant]: https://www.vagrantup.com/downloads') or use an inline link format.

Copilot uses AI. Check for mistakes.
echo "Mounting 9p filesystem directly to ${SYNCED_FOLDER}"
mount -t 9p -o trans=virtio,version=9p2000.L,dfltuid=1000,dfltgid=1000,uname=vagrant,access=any shared ${SYNCED_FOLDER} || echo "Note: 9p mount may already be active"
# Use mmap and cache=loose for better performance and permissions
# access=client allows root to write files
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The mount command now uses 'access=client' (changed from 'access=any'). While the comment on line 417 mentions this allows root to write files, it would be clearer to add a comment explaining why 'access=client' is preferred over 'access=any' in this context, especially for future maintainers.

Suggested change
# access=client allows root to write files
# access=client ensures that file ownership and permissions on the mounted 9p filesystem
# match the client (VM) user, allowing root to write files and avoiding permission issues
# that can occur with access=any. This is important for development workflows where root
# or other users in the VM need to modify files in the shared folder.

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented Nov 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 30.31%. Comparing base (034cbab) to head (806b0bf).
⚠️ Report is 38 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5074      +/-   ##
==========================================
+ Coverage   29.72%   30.31%   +0.58%     
==========================================
  Files         234      232       -2     
  Lines       26186    25924     -262     
==========================================
+ Hits         7785     7860      +75     
+ Misses      17864    17526     -338     
- Partials      537      538       +1     
Flag Coverage Δ
unit 30.31% <ø> (+0.58%) ⬆️
see 12 files with indirect coverage changes
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yanivagman yanivagman marked this pull request as draft November 24, 2025 07:51
@geyslan
Copy link
Member

geyslan commented Nov 24, 2025

I'm taking this over.

@geyslan geyslan assigned geyslan and unassigned yanivagman Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants