feat: enable THP for guest memory#6003
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6003 +/- ##
=======================================
Coverage 83.11% 83.11%
=======================================
Files 277 277
Lines 30207 30234 +27
=======================================
+ Hits 25106 25129 +23
- Misses 5101 5105 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| let divisor = match self { | ||
| // Any integer memory size expressed in MiB will be a multiple of 4096KiB. | ||
| HugePageConfig::None => 1, | ||
| // Note: THP technically supports memory not 2MB aligned, however that would mean |
There was a problem hiding this comment.
The wording is slightly confusing because here it talks about alignment (if it's misaligned there can also be 4KiB pages at the head too), but then it talks about enforcing size, not alignment.
There was a problem hiding this comment.
Fair. I have memory alignment in a separate branch, but I decided to keep it in another PR since this is already changing a lot.
I guess this makes sense only after memory is aligned, so maybe I can set it at 1MB for now, and change this together with the guest mem alignment change.
Alternatively, I can update the comment, since there are benefit on 2MB multiple even without explicit alignment.
What do you think?
| // Any integer memory size expressed in MiB will be a multiple of 4096KiB. | ||
| HugePageConfig::None => 1, | ||
| // Note: THP technically supports memory not 2MB aligned, however that would mean | ||
| // some pages at the tail would be forced to be 4k size. To avoid performance/fragmentation surprises, |
There was a problem hiding this comment.
Is everybody "aligned" (no pun intended) with this? I'm not necessarily against this, but equally I don't think it's a big problem having a few 4K pages at the end of the memory region (especially since THP is not a guarantee anyway). And we already know that internal customers often use non 2MiB multiples.
There was a problem hiding this comment.
And we already know that internal customers often use non 2MiB multiples
This IMHO is the main reason to make it an hard failure. A property of good software is that it minimizes surprises.
The kernel already does some THP optimizations if the memory is a multiple of 2MB (e.g. it aligns it automatically), so I want to avoid surface for potentially hard to troubleshoot performance issues.
| return int(stdout.strip()) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Shall we check that the host has THP enabled?
There was a problem hiding this comment.
If it's not, the test will crash (like it should because it means we can't really evaluate THP)
| anon_huge_kb = get_anon_huge_pages_kb(vm.firecracker_pid) | ||
|
|
||
| if huge_pages == HugePagesConfig.TRANSPARENT: | ||
| # With THP enabled, the kernel should have promoted some pages (let's say 100 MB out of 128MB) |
There was a problem hiding this comment.
Can this be fragile? Also, the check below is for 64MiB, not 100MiB (nit: let's use MiB rather than MB).
There was a problem hiding this comment.
It is, but at the same time I think we have to have a test like this to kind of prove THP is doing "something".
|
Have we concluded the performance analysis on this? |
There are some data on https://buildkite.com/firecracker/mamarang-nested-a-b/builds/110/list (red = performance changed = good), however some tests also failed because of 5.10 false positives (see #6007). I'm running the tests again on https://buildkite.com/firecracker/mamarang-nested-a-b/builds/114/list The gains on nested, with 6.1 Baremetal also shows significant bootime improvements with 6.18 Edit: guest kernel version had no impact sorry. Boot improved only on 6.18, because previous versions don't align memory automatically, but that's for another PR |
This commit adds THP for the guest memory, with a new value for the huge_pages option. Signed-off-by: Marco Marangoni <mamarang@amazon.com>
Changes
This commit adds THP for the guest memory, with a new value for the huge_pages option.
Benchmarks show an improvement of boot time between 5% and 40% for metal host, and 20% to 75% on virtual host (nested virt). See https://buildkite.com/firecracker/mamarang-nested-a-b/builds/110/list?sid=019f04e3-fe78-417d-a83c-dcb857c7f402&tab=output for detailed results.
Note: this is only the first step to make full use of THP. Additional benefit can be gained by aligning memory to 2MB, and by working on the UFFD protocol.
Design choices:
madvisewill just emit a warningmadviseis called even for memfd, even though it is ignored (/sys/kernel/mm/transparent_hugepage/shmem_enabledisneverin most distros, see https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html#shmem-internal-tmpfs)Reason
Reduce page fault frequency, and reduce TLB usage.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.