Skip to content

test: fix API timing false positives in test_population_latency#5942

Merged
Manciukic merged 1 commit into
firecracker-microvm:mainfrom
Manciukic:fix-population-latency-api-timing
Jun 10, 2026
Merged

test: fix API timing false positives in test_population_latency#5942
Manciukic merged 1 commit into
firecracker-microvm:mainfrom
Manciukic:fix-population-latency-api-timing

Conversation

@Manciukic

Copy link
Copy Markdown
Contributor

Problem

test_population_latency fails on host kernel 6.18 (and previously on 6.1) with:

AssertionError: Get /machine-config API call exceeded maximum duration: 884 ms

The test uses the fault_all uffd handler which eagerly faults in all guest memory after snapshot restore. During this window, any API call (like Get /machine-config) competes with the faulting activity, causing response times to exceed the MAX_API_CALL_DURATION_MS threshold.

This was previously worked around by blanket-disabling time_api_requests on host kernel 6.1 (commit 03eb60c), but the same issue appeared on 6.18 after it was added to CI.

Fix

  • Revert the kernel-version-based exclusion (!= "6.1") back to unconditional True — re-enables the check on 6.1 for all other tests.
  • Disable time_api_requests only for restored VMs in test_population_latency, where slow API responses during uffd faulting are expected by design.

Why this is correct

  • _validate_api_response_times() is called at VM teardown (kill()), which happens after yield returns from build_n_from_snapshot
  • Setting microvm.time_api_requests = False inside the loop body (before any test logic) ensures the flag is already set when kill() eventually checks it
  • No memory_monitor is attached (monitor_memory=False), so Get /machine-config is not called during restore itself

Testing

  • ./tools/devtool checkstyle passes (20 passed, 4 skipped)

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.00%. Comparing base (d300fd2) to head (036b8fe).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5942   +/-   ##
=======================================
  Coverage   83.00%   83.00%           
=======================================
  Files         277      277           
  Lines       30106    30106           
=======================================
  Hits        24989    24989           
  Misses       5117     5117           
Flag Coverage Δ
5.10-m5n.metal 83.31% <ø> (+<0.01%) ⬆️
5.10-m6a.metal 82.65% <ø> (+<0.01%) ⬆️
5.10-m6g.metal 79.94% <ø> (ø)
5.10-m6i.metal 83.30% <ø> (-0.01%) ⬇️
5.10-m7a.metal-48xl 82.64% <ø> (+<0.01%) ⬆️
5.10-m7g.metal 79.94% <ø> (ø)
5.10-m7i.metal-24xl 83.27% <ø> (-0.01%) ⬇️
5.10-m7i.metal-48xl 83.28% <ø> (+<0.01%) ⬆️
5.10-m8g.metal-24xl 79.94% <ø> (ø)
5.10-m8g.metal-48xl 79.94% <ø> (+<0.01%) ⬆️
5.10-m8i.metal-48xl 83.28% <ø> (+<0.01%) ⬆️
5.10-m8i.metal-96xl 83.28% <ø> (ø)
6.1-m5n.metal 83.33% <ø> (+<0.01%) ⬆️
6.1-m6a.metal 82.68% <ø> (+<0.01%) ⬆️
6.1-m6g.metal 79.94% <ø> (-0.01%) ⬇️
6.1-m6i.metal 83.33% <ø> (+<0.01%) ⬆️
6.1-m7a.metal-48xl 82.66% <ø> (ø)
6.1-m7g.metal 79.93% <ø> (-0.01%) ⬇️
6.1-m7i.metal-24xl 83.34% <ø> (ø)
6.1-m7i.metal-48xl 83.34% <ø> (ø)
6.1-m8g.metal-24xl 79.93% <ø> (-0.01%) ⬇️
6.1-m8g.metal-48xl 79.94% <ø> (ø)
6.1-m8i.metal-48xl 83.34% <ø> (+<0.01%) ⬆️
6.1-m8i.metal-96xl 83.34% <ø> (ø)
6.18-m5n.metal 83.33% <ø> (+<0.01%) ⬆️
6.18-m6a.metal 82.67% <ø> (-0.01%) ⬇️
6.18-m6g.metal 79.94% <ø> (ø)
6.18-m6i.metal 83.33% <ø> (+<0.01%) ⬆️
6.18-m7a.metal-48xl 82.66% <ø> (ø)
6.18-m7g.metal 79.94% <ø> (ø)
6.18-m7i.metal-24xl 83.34% <ø> (+<0.01%) ⬆️
6.18-m7i.metal-48xl 83.34% <ø> (ø)
6.18-m8g.metal-24xl 79.94% <ø> (ø)
6.18-m8g.metal-48xl 79.94% <ø> (ø)
6.18-m8i.metal-48xl 83.34% <ø> (ø)
6.18-m8i.metal-96xl 83.35% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Manciukic

Copy link
Copy Markdown
Contributor Author

Perf pipeline build to verify the fix to test_population_latency: https://buildkite.com/firecracker/performance-a-b-tests/builds/929/canvas

@Manciukic Manciukic marked this pull request as ready for review June 9, 2026 14:07
@Manciukic Manciukic added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Jun 9, 2026
@Manciukic Manciukic enabled auto-merge (rebase) June 9, 2026 14:07
marco-marangoni
marco-marangoni previously approved these changes Jun 9, 2026
@ilstam

ilstam commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Get /machine-config after uffd-based snapshot restore is slow while pages are being demand-faulted

Discussed offline, it's strange that Get /machine-config causes page faults, but perhaps we can at least improve the commit message.

@Manciukic Manciukic force-pushed the fix-population-latency-api-timing branch from 2f4a35d to 7883ea8 Compare June 9, 2026 14:48
Revert the kernel-version-based exclusion for API response time
validation (added in 03eb60c) and instead disable the check only
where it's actually expected to be slow: in test_population_latency's
restored VMs.

The population latency test restores a snapshot with the fault_all uffd
handler and immediately issues API calls while all guest memory is being
faulted in. On host kernels 6.1 and 6.18, Get /machine-config takes
~800-900ms in this window — the exact reason is unclear (the handler
reads Firecracker's own struct, not guest memory), but it consistently
exceeds MAX_API_CALL_DURATION_MS.

Rather than blanket-disabling the timing check per kernel version, scope
the exclusion to the one test where it's known to be unreliable.

Signed-off-by: Riccardo Mancini <mancio@amazon.com>
@Manciukic Manciukic merged commit 9700741 into firecracker-microvm:main Jun 10, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants