tests: Add automated VM testing environment with QEMU #5074

yanivagman · 2025-11-23T11:28:33Z

Add automated VM testing infrastructure for running Tracee integration and e2e tests in isolated Vagrant VMs using QEMU/KVM.

Key features:

Automated VM lifecycle management (start, test, cleanup)
Selective test execution (--unit, --integration, --e2e-inst, --e2e-net)
Automatic VM conflict detection and resolution
Log collection and VM preservation on failure
Provisioning optimization: first run 10-15min, subsequent runs ~2-3min

Scripts:

tests/run-vm-tests.sh: Host-side orchestration
tests/run-tests-in-vm.sh: In-VM test execution

Documentation:

docs/contributing/vm-testing.md: Comprehensive guide
docs/contributing/setup-development-machine-with-vagrant.md: Updated
tests/README.md: Quick reference

Add automated VM testing infrastructure for running Tracee integration and e2e tests in isolated Vagrant VMs using QEMU/KVM. Key features: - Automated VM lifecycle management (start, test, cleanup) - Selective test execution (--unit, --integration, --e2e-inst, --e2e-net) - Automatic VM conflict detection and resolution - Log collection and VM preservation on failure - Provisioning optimization: first run 10-15min, subsequent runs ~2-3min Scripts: - tests/run-vm-tests.sh: Host-side orchestration - tests/run-tests-in-vm.sh: In-VM test execution Documentation: - docs/contributing/vm-testing.md: Comprehensive guide - docs/contributing/setup-development-machine-with-vagrant.md: Updated - tests/README.md: Quick reference

Copilot

Pull request overview

This PR introduces an automated VM testing infrastructure for running Tracee integration and e2e tests in isolated Vagrant VMs, addressing the need for safe test execution that won't impact developers' host systems.

Key changes:

Automated VM lifecycle management with conflict detection and automatic cleanup
Selective test execution supporting --unit, --integration, --e2e-inst, and --e2e-net flags
Enhanced provisioning efficiency with run: "once" directives reducing subsequent runs from 10-15min to ~2-3min

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/run-vm-tests.sh	Host-side orchestration script handling VM lifecycle, test execution, and cleanup
tests/run-tests-in-vm.sh	In-VM test executor with selective test suite support and log collection
tests/README.md	Quick reference for test types and VM testing approach
docs/contributing/vm-testing.md	Comprehensive guide covering setup, usage, troubleshooting, and best practices
docs/contributing/setup-development-machine-with-vagrant.md	Updated with VM testing section and improved QEMU setup instructions
Vagrantfile	Added run: "once" provisioning optimization and improved 9p mount options

Copilot · 2025-11-23T11:29:53Z

tests/run-vm-tests.sh

+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'


[nitpick] The YELLOW color code uses bold (1;33) while other colors don't. For consistency, consider using '\033[0;33m' to match the pattern of RED and GREEN, or add comments explaining why YELLOW needs to be bold.

Suggested change

YELLOW='\033[1;33m'

YELLOW='\033[0;33m'

Copilot · 2025-11-23T11:29:53Z

tests/run-vm-tests.sh

+    if vagrant status 2>/dev/null | grep -E "tracee-(dev|test)-vm.*running" | grep -v "${VM_NAME}" > /dev/null; then
+        warn "Another Tracee VM is running. Stopping it first..."
+        vagrant halt


This grep pattern will always filter out ${VM_NAME} ('tracee-test-vm'), but then the code attempts to stop VMs. If only the test VM is running, this check will pass (no other VMs found) but then 'vagrant halt' will still stop the test VM. The logic should either skip the halt if no conflicting VMs exist, or check for any running VM including the test VM.

Suggested change

if vagrant status 2>/dev/null | grep -E "tracee-(dev|test)-vm.*running" | grep -v "${VM_NAME}" > /dev/null; then

warn "Another Tracee VM is running. Stopping it first..."

vagrant halt

running_vms=$(vagrant status 2>/dev/null | grep -E "tracee-(dev|test)-vm.*running" | awk '{print $1}')

other_running_vms=$(echo "$running_vms" | grep -v "^${VM_NAME}$" || true)

if [[ -n "$other_running_vms" ]]; then

warn "Another Tracee VM is running. Stopping it first..."

vagrant halt $other_running_vms

Copilot · 2025-11-23T11:29:54Z

tests/run-tests-in-vm.sh

+    info "Running go test (without coverage to avoid 9p permission issues)..."
+    # Run the same test command as the Makefile but without -coverprofile
+    GOOS=linux CC=clang GOARCH=amd64 GOFIPS140=off \
+        CGO_CFLAGS="-I/vagrant/dist/libbpf/include" \
+        CGO_LDFLAGS="-L/vagrant/dist/libbpf/obj -lbpf" \
+        go test \
+        -tags core,ebpf,lsmsupport \
+        -ldflags="-s=false -w=false -extldflags \"-lelf -lz\" -X main.version=\"$(cat /vagrant/VERSION)\"" \
+        -shuffle on \
+        -timeout 20m \
+        -race \
+        -v \
+        -p 1 \
+        -count=1 \
+        ./tests/integration/... 2>&1 | tee "${integration_log}" | tee -a "${LOG_FILE}"


This duplicates the integration test command from the Makefile. If the Makefile's test-integration target changes, this needs manual synchronization. Consider either: 1) invoking the Makefile target with coverage disabled via an environment variable, or 2) extracting this command to a shared script that both can use.

Suggested change

info "Running go test (without coverage to avoid 9p permission issues)..."

# Run the same test command as the Makefile but without -coverprofile

GOOS=linux CC=clang GOARCH=amd64 GOFIPS140=off \

CGO_CFLAGS="-I/vagrant/dist/libbpf/include" \

CGO_LDFLAGS="-L/vagrant/dist/libbpf/obj -lbpf" \

go test \

-tags core,ebpf,lsmsupport \

-ldflags="-s=false -w=false -extldflags \"-lelf -lz\" -X main.version=\"$(cat /vagrant/VERSION)\"" \

-shuffle on \

-timeout 20m \

-race \

-v \

-p 1 \

-count=1 \

./tests/integration/... 2>&1 | tee "${integration_log}" | tee -a "${LOG_FILE}"

info "Running integration tests via Makefile (coverage disabled for VM)..."

# Use the Makefile's test-integration target with COVERAGE=0 to avoid coverage on 9p mounts

COVERAGE=0 make test-integration 2>&1 | tee "${integration_log}" | tee -a "${LOG_FILE}"

Copilot · 2025-11-23T11:29:54Z

docs/contributing/vm-testing.md

+**Solution**:
+- Increase CPU/RAM allocation:
+  ```bash
+  VM_PROC=8 VM_MEM=16 vagrant up


The environment variable name should be VM_CPUS, not VM_PROC. The script uses VM_CPUS throughout (line 29 of run-vm-tests.sh and line 313 in this same file).

Suggested change

VM_PROC=8 VM_MEM=16 vagrant up

VM_CPUS=8 VM_MEM=16 vagrant up

Copilot · 2025-11-23T11:29:54Z

docs/contributing/vm-testing.md

+./tests/run-vm-tests.sh
+
+# Custom resources
+VM_PROC=8 VM_MEM=16 ./tests/run-vm-tests.sh


The environment variable name should be VM_CPUS, not VM_PROC. This is inconsistent with the script's actual variable name used in run-vm-tests.sh.

Suggested change

VM_PROC=8 VM_MEM=16 ./tests/run-vm-tests.sh

VM_CPUS=8 VM_MEM=16 ./tests/run-vm-tests.sh

Copilot · 2025-11-23T11:29:55Z

docs/contributing/setup-development-machine-with-vagrant.md

-  - **macOS**: HVF (Hypervisor Framework) acceleration on Intel/Apple Silicon
-  - **Fallback**: TCG software emulation (slower but works on all systems)
- [vagrant-qemu plugin]: Install with `vagrant plugin install vagrant-qemu`
+- [Vagrant] (2.2+)


The markdown link reference [Vagrant] is used but not defined. Either define the link reference at the bottom of the document (e.g., '[Vagrant]: https://www.vagrantup.com/downloads') or use an inline link format.

Copilot · 2025-11-23T11:29:55Z

Vagrantfile

        echo "Mounting 9p filesystem directly to ${SYNCED_FOLDER}"
-        mount -t 9p -o trans=virtio,version=9p2000.L,dfltuid=1000,dfltgid=1000,uname=vagrant,access=any shared ${SYNCED_FOLDER} || echo "Note: 9p mount may already be active"
+        # Use mmap and cache=loose for better performance and permissions
+        # access=client allows root to write files


[nitpick] The mount command now uses 'access=client' (changed from 'access=any'). While the comment on line 417 mentions this allows root to write files, it would be clearer to add a comment explaining why 'access=client' is preferred over 'access=any' in this context, especially for future maintainers.

Suggested change

# access=client allows root to write files

# access=client ensures that file ownership and permissions on the mounted 9p filesystem

# match the client (VM) user, allowing root to write files and avoiding permission issues

# that can occur with access=any. This is important for development workflows where root

# or other users in the VM need to modify files in the shared folder.

codecov · 2025-11-23T11:37:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 30.31%. Comparing base (034cbab) to head (806b0bf).
⚠️ Report is 38 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5074      +/-   ##
==========================================
+ Coverage   29.72%   30.31%   +0.58%     
==========================================
  Files         234      232       -2     
  Lines       26186    25924     -262     
==========================================
+ Hits         7785     7860      +75     
+ Misses      17864    17526     -338     
- Partials      537      538       +1

Flag	Coverage Δ
unit	`30.31% <ø> (+0.58%)`	⬆️
see 12 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

geyslan · 2025-11-24T14:16:39Z

I'm taking this over.

yanivagman requested a review from geyslan November 23, 2025 11:28

yanivagman self-assigned this Nov 23, 2025

Copilot AI review requested due to automatic review settings November 23, 2025 11:28

github-actions bot added kind/documentation area/testing labels Nov 23, 2025

Copilot AI reviewed Nov 23, 2025

View reviewed changes

yanivagman marked this pull request as draft November 24, 2025 07:51

geyslan assigned geyslan and unassigned yanivagman Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests: Add automated VM testing environment with QEMU #5074

tests: Add automated VM testing environment with QEMU #5074

yanivagman commented Nov 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 23, 2025

Uh oh!

Copilot AI Nov 23, 2025

Uh oh!

Copilot AI Nov 23, 2025

Uh oh!

Copilot AI Nov 23, 2025

Uh oh!

Copilot AI Nov 23, 2025

Uh oh!

Copilot AI Nov 23, 2025

Uh oh!

Copilot AI Nov 23, 2025

Uh oh!

codecov bot commented Nov 23, 2025

Uh oh!

geyslan commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	VM_PROC=8 VM_MEM=16 vagrant up
	VM_CPUS=8 VM_MEM=16 vagrant up

	VM_PROC=8 VM_MEM=16 ./tests/run-vm-tests.sh
	VM_CPUS=8 VM_MEM=16 ./tests/run-vm-tests.sh

-        # access=client allows root to write files
+        # access=client ensures that file ownership and permissions on the mounted 9p filesystem
+        # match the client (VM) user, allowing root to write files and avoiding permission issues
+        # that can occur with access=any. This is important for development workflows where root
+        # or other users in the VM need to modify files in the shared folder.

tests: Add automated VM testing environment with QEMU #5074

Are you sure you want to change the base?

tests: Add automated VM testing environment with QEMU #5074

Conversation

yanivagman commented Nov 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 23, 2025

Codecov Report

Uh oh!

geyslan commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants