Skip to content

setup qemu virtiofs based caching and enable it for maven builds#2304

Merged
toabctl merged 3 commits intochainguard-dev:mainfrom
toabctl:caching
Jan 21, 2026
Merged

setup qemu virtiofs based caching and enable it for maven builds#2304
toabctl merged 3 commits intochainguard-dev:mainfrom
toabctl:caching

Conversation

@toabctl
Copy link
Member

@toabctl toabctl commented Jan 19, 2026

Caching would speed-up builds when iterating locally over a package significantly. Caching is not yet there for the different ecosystems but maven is enabled with this PR and shows very good results.
If the taken approach here is welcomed, I'm happy to implement more caching for the different ecosystems.

I think (but not sure, I don't have enough experience with our build system yet) this doesn't affect the builds done in CI and for production because we 1) don't set the env variable and 2) use ephemeral machines so there's no cache anyway. Are those assumptions correct?

Functional Changes

  • This change can build all of Wolfi without errors (describe results in notes)

Notes:

I've not done this (yet). I'm happy to do this if there's agreement that this PR is a good approach and would be in general welcomed to be merged.

SCA Changes

  • Examining several representative APKs show no regression / the desired effect (details in notes)

Notes:

I did build apicurio-registry (a Java+maven project with 1.1 GB deps) . the build took:

  • no caching: ca. 17 min
  • with caching: ca. 5 min

@toabctl
Copy link
Member Author

toabctl commented Jan 19, 2026

@dannf I've seen you work on #2268 . maybe you want to have a look here, too?

@toabctl toabctl changed the title Caching setup qemu virtiofs based caching and enable it for maven builds Jan 19, 2026
@89luca89
Copy link
Contributor

Doing some performance analysis:

time MELANGE="/tmp/melange/melange" MELANGE_EXTRA_OPTS="--log-level=debug" QEMU_VIRTIOFS_PATH=/tmp/ QEMU_USE_VIRTIOFS=1 make package/yq-fips

  • Without the patch: 0m21.217s

  • With the patch 1st run: 0m25.897s

  • With the patch 2nd run: 0m25.433s

At least on small-ish packages, virtiofs performance impact is actually detrimental

Can you point out to some bigger packages to test the performance impact?
Keep in mind that virtiofs is still a somewhat slower approach to raw io, so it is expected that this is actually detrimental unless we speak of GB of caching.

@toabctl
Copy link
Member Author

toabctl commented Jan 20, 2026

  • try QEMU_USE_VIRTIOFS=1 make package/apicurio-registry without and with a prefilled cache. that makes a big difference because maven caching is enabled within this PR.
  • I think without caching, VIRTIOFS is slower. I've seen that too. that's why it's opt-in via QEMU_USE_VIRTIOFS=1
  • with QEMU_USE_VIRTIOFS=1 and caching enabled for the required ecosystem (only gomod and maven caches are enabled within melange but that should be changed imo), it makes a difference. especially when you iterate over a package and have to rebuild over and over again

@89luca89
Copy link
Contributor

  • try QEMU_USE_VIRTIOFS=1 make package/apicurio-registry without and with a prefilled cache. that makes a big difference because maven caching is enabled within this PR.
  • I think without caching, VIRTIOFS is slower. I've seen that too. that's why it's opt-in via QEMU_USE_VIRTIOFS=1
  • with QEMU_USE_VIRTIOFS=1 and caching enabled for the required ecosystem (only gomod and maven caches are enabled within melange but that should be changed imo), it makes a difference. especially when you iterate over a package and have to rebuild over and over again

Tested with this package and indeed it improves by A LOT
So +1 for me
I think this is mostly for local development, I'd say we merge, document this, and keep it opt-in for now, probably most builds are not going to benefit, but when they do, one can opt-in wdyt?

@smoser
Copy link
Contributor

smoser commented Jan 20, 2026

i asked claude if he could help me make performance better on this. he said:

Virtiofsd Performance Optimization Summary

What Was Done

Applied Optimizations

Modified the virtiofsd configuration in pkg/container/qemu_runner.go:1803-1812 to improve cache directory performance with the following changes:

args := []string{
    "--socket-path=" + cfg.VirtiofsdSocketPath,
    fmt.Sprintf("--thread-pool-size=%d", runtime.NumCPU()*2),  // NEW: Parallel I/O
    "-o", "source=" + cfg.CacheDir,
    "-o", "cache=always",      // CHANGED: from "auto" to "always"
    "-o", "sandbox=namespace",
    "-o", "xattr",
    "-o", "no_posix_lock",     // NEW: Reduce locking overhead
    "-o", "writeback",
}

Three optimizations applied:

  1. Thread Pool Size: Added --thread-pool-size=<NumCPU*2> for better parallel I/O handling
  2. Cache Mode: Changed from cache=auto to cache=always for maximum caching in single-VM scenarios
  3. POSIX Locks: Added no_posix_lock to reduce file locking overhead

Benchmark Results

Tested with yq-4.50.1 package build using both clean and warm cache scenarios:

Configuration Cache State Real Time Improvement
Baseline Clean 36.779s -
Optimized Clean 35.407s -1.372s (-3.73%)
Baseline Warm 33.440s -
Optimized Warm 33.709s +0.269s (+0.80%)

Key Findings

  1. Clean Cache (CI/CD scenario): 3.73% faster - significant improvement

    • Thread pool enables parallel I/O during initial cache population
    • Aggressive caching benefits first-time file operations
    • Reduced locking overhead helps during downloads
  2. Warm Cache (local development): ~0.8% slower - negligible difference

    • Likely within measurement variance
    • Thread pool overhead slightly outweighs benefits when files already cached
    • Essentially equivalent performance

Recommendation

Keep these optimizations because:

  • Clean cache builds (primary CI/CD use case) show real ~4% improvement
  • Warm cache performance impact is negligible
  • Optimizations are safe for melange's isolated VM architecture where each build has exclusive cache access
  • First-time/CI builds are more critical to optimize than repeated local builds

Other Optimization Options to Explore

1. DAX (Direct Access) Mode

What it does: Memory-maps guest file access directly to host memory, eliminating copy operations.

Implementation:

// In virtiofsd args
"-o", "announce_submounts",

// In QEMU device configuration (qemu_runner.go around line 814)
"-device", fmt.Sprintf("vhost-user-fs-pci,queue-size=1024,chardev=char_cache,tag=melange_cache,cache-size=%dM", cfg.Memory/2),

Expected impact: 5-15% improvement for I/O-heavy workloads
Risk: Requires more guest memory, complex memory management
When to try: If the current ~4% improvement isn't sufficient

2. Larger Queue Size

What it does: Increases concurrent I/O request handling capacity.

Implementation:

"-device", "vhost-user-fs-pci,queue-size=2048,chardev=char_cache,tag=melange_cache"
// Or even: queue-size=4096

Expected impact: 2-5% improvement for highly parallel builds
Risk: Minimal, increased memory usage
When to try: For packages with many small file operations

3. File Descriptor Limits

What it does: Increases max open files for virtiofsd process.

Implementation:

"-o", "rlimit-nofile=1048576",

Expected impact: Prevents bottlenecks with many simultaneous file operations
Risk: Minimal
When to try: If seeing "too many open files" errors or building packages with massive dependency trees

4. Adjust Guest Mount Options

What it does: Reduces unnecessary permission checks on guest side.

Implementation (in guest mount command around line 1000):

mount -t virtiofs -o default_permissions melange_cache /mount/var/cache/melange

Expected impact: 1-3% improvement from reduced permission overhead
Risk: Minimal security impact in isolated VM
When to try: Easy to test alongside other optimizations

5. Multiple Cache Directories with Separate virtiofsd Instances

What it does: Split cache into multiple mount points (e.g., go modules, apk cache) with dedicated virtiofsd instances.

Implementation:

  • Launch multiple virtiofsd processes with different sockets
  • Mount different cache subdirectories separately
  • Each gets its own thread pool and cache settings

Expected impact: 10-20% improvement for builds with diverse cache patterns
Risk: Complex implementation, increased resource usage
When to try: If profiling shows cache directory as primary bottleneck

6. Tune Writeback Cache Behavior

What it does: Adjust how aggressively writes are cached.

Implementation:

"-o", "writeback",
"-o", "flock",          // Add if POSIX file locking is actually needed
"-o", "posix_acl",      // Add if ACLs are used

Expected impact: Depends on workload write patterns
Risk: Currently using writeback already, minimal additional gains
When to try: If write-heavy workloads show bottlenecks

7. Different Sandbox Mode

What it does: Use chroot sandbox instead of namespace for potentially less overhead.

Implementation:

"-o", "sandbox=chroot",  // Instead of "namespace"

Expected impact: 1-2% improvement from reduced namespace overhead
Risk: Requires root/CAP_SYS_CHROOT, less isolation
When to try: If namespace overhead is measurable and security trade-off acceptable

8. Disable Extended Attributes (xattr)

What it does: Skip xattr support if not needed by melange builds.

Implementation:

// Remove: "-o", "xattr",

Expected impact: 1-2% improvement from skipping xattr operations
Risk: May break builds that rely on extended attributes
When to try: After confirming no builds use xattrs

9. Alternative: io_uring Backend

What it does: Use newer Linux async I/O interface for better performance.

Implementation:
Requires virtiofsd compiled with io_uring support (check version/build flags).

Expected impact: 10-20% improvement with modern kernels
Risk: Requires virtiofsd 1.7.0+, kernel 5.10+
When to try: If available in target environments

10. Cache Prewarming

What it does: Pre-populate cache with common dependencies before build starts.

Implementation:

  • Maintain a "golden cache" with frequently used modules
  • Copy/mount base cache before build
  • Let virtiofs provide incremental updates

Expected impact: Dramatic improvement for repeated builds (30-50%)
Risk: Complexity in cache management, staleness issues
When to try: For CI environments with predictable build patterns


Benchmarking Methodology for Future Tests

When testing additional optimizations:

  1. Always test both clean and warm cache scenarios
  2. Run each test at least 2-3 times to account for variance
  3. Clear cache between tests: mv /local-tmp/melange-cache /local-tmp/melange-cache.backup && mkdir -p /local-tmp/melange-cache
  4. Test with multiple package types:
    • Small packages (like yq): ~30-40s builds
    • Medium packages: 2-5 minute builds
    • Large packages (gcc, chromium): 10+ minute builds
  5. Monitor system metrics: CPU, memory, I/O wait with iostat, vmstat, top

Testing Commands

# Clean cache test
mv /local-tmp/melange-cache /local-tmp/melange-cache.backup && \
mkdir -p /local-tmp/melange-cache && \
time ( export PATH="$PWD:$PATH" && export QEMU_USE_VIRTIOFS=1 && \
  cd ~/src/stereo/os && make package/yq )

# Warm cache test
time ( export PATH="$PWD:$PATH" && export QEMU_USE_VIRTIOFS=1 && \
  cd ~/src/stereo/os && make package/yq )

Profiling Recommendations

To identify the next best optimization target:

  1. Use virtiofsd debug logging: Add --log-level=debug to see I/O patterns
  2. Profile with perf: perf record -g -p <virtiofsd-pid> during build
  3. Check QEMU virtio stats: Monitor queue utilization and wait times
  4. Guest-side profiling: Use iotop, blktrace inside VM to identify bottlenecks

Questions to Answer Before Further Optimization

  1. What's the bottleneck? CPU, I/O wait, network (for git clones)?
  2. Which phase is slowest? Cache population, compilation, or linking?
  3. What's the file access pattern? Many small files vs few large files?
  4. Is cache persistence working? Verify files remain after build with virtiofs

Use these insights to prioritize which of the above optimizations to try next.

@toabctl
Copy link
Member Author

toabctl commented Jan 21, 2026

I think this is mostly for local development, I'd say we merge, document this, and keep it opt-in for now, probably most builds are not going to benefit, but when they do, one can opt-in wdyt?

yes. And I think if we add more caching (eg. pytthon's uv, npm, ...) local build iterations will be faster for many cases.
The PR does already document it. So I would be happy if this gets merged as is .
@smoser do you want me to apply the "Applied Optimizations" that claude suggested before merging?

@89luca89
Copy link
Contributor

Tested @smoser 's changes, not all of them are possible but these work:

diff --git a/pkg/container/qemu_runner.go b/pkg/container/qemu_runner.go
index 2f8d5081..fea58b74 100644
--- a/pkg/container/qemu_runner.go
+++ b/pkg/container/qemu_runner.go
@@ -804,7 +804,8 @@ func createMicroVM(ctx context.Context, cfg *Config) error {
 		if cfg.VirtiofsEnabled {
 			log.Info("qemu: using virtiofs for cache directory (read-write)")
 			// Chardev for socket communication
-			baseargs = append(baseargs, "-chardev",
+			baseargs = append(baseargs,
+				"-chardev",
 				fmt.Sprintf("socket,id=char_cache,path=%s", cfg.VirtiofsdSocketPath))
 			// vhost-user-fs-pci device
 			baseargs = append(baseargs, "-device",
@@ -1802,11 +1803,13 @@ func startVirtiofsd(ctx context.Context, cfg *Config) (*exec.Cmd, error) {
 
 	args := []string{
 		"--socket-path=" + cfg.VirtiofsdSocketPath,
+		fmt.Sprintf("--thread-pool-size=%d", runtime.NumCPU()*2), // NEW: Parallel I/O
 		"-o", "source=" + cfg.CacheDir,
-		"-o", "cache=auto", // Balance coherency and performance
+		"-o", "cache=always", // Balance coherency and performance
 		"-o", "sandbox=namespace", // Use namespace sandbox (works without root)
 		"-o", "xattr", // Enable xattr support
 		"-o", "writeback", // Enable writeback caching for better write performance
+		"-o", "no_posix_lock",
 	}
 
 	log.Debugf("starting virtiofsd: %s %v", virtiofsdPath, args)

Got a small bump in performance, but still for small packages (in the <1m range) is not worth

@toabctl can you suggest some medium size (5-10m build time) packages that we could test?

@toabctl
Copy link
Member Author

toabctl commented Jan 21, 2026

@toabctl can you suggest some medium size (5-10m build time) packages that we could test?

not really. my usecase was apicurio-registry . sorry.

@toabctl
Copy link
Member Author

toabctl commented Jan 21, 2026

related to this one: #2305

toabctl and others added 3 commits January 21, 2026 15:50
Add optional virtiofs support for the cache directory mount, enabled via
QEMU_USE_VIRTIOFS=1 environment variable. When enabled and virtiofsd is
available, this provides a read-write cache mount.

This is useful when locally iterating over package builds.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Configure Maven local repository to use /var/cache/melange/m2repository,
which is persisted when a cache-dir is provided. This allows Maven builds
to reuse downloaded dependencies across builds which is very useful
because some java projects have a lot of deps to
download (eg. apicurio-registry needs more than 1 GB deps).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a "Cache Persistence by Runner" section explaining that cache write
behavior differs by runner: Docker and Bubblewrap use read-write bind
mounts (writes persist), while QEMU uses read-only 9p with overlay by
default (writes discarded). Update virtiofs section to clarify it enables
cache persistence for QEMU.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@89luca89 89luca89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm now thanks

@toabctl toabctl enabled auto-merge (rebase) January 21, 2026 14:58
@toabctl toabctl merged commit 6ae3b60 into chainguard-dev:main Jan 21, 2026
57 checks passed
@toabctl toabctl deleted the caching branch January 21, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments