Skip to content

Commit 004d935

Browse files
feat: enable THP for guest memory
This commit adds THP for the guest memory, with a new value for the huge_pages option. Signed-off-by: Marco Marangoni <mamarang@amazon.com>
1 parent e04e55f commit 004d935

17 files changed

Lines changed: 257 additions & 54 deletions

File tree

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@ and this project adheres to
1010

1111
### Added
1212

13+
- [#6003](https://github.com/firecracker-microvm/firecracker/pull/6003): Added a
14+
new option `Transparent` for the `huge_pages` setting. If set, Firecracker
15+
will use transparent huge pages for the guest memory via
16+
`madvise(MADV_HUGEPAGE)`. Guest memory must be a multiple of 2MB when using
17+
this option.
18+
1319
### Changed
1420

1521
### Deprecated

docs/hugepages.md

Lines changed: 45 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,48 @@
11
# Backing Guest Memory by Huge Pages
22

3-
Firecracker supports backing the guest memory of a VM by 2MB hugetlbfs pages.
4-
This can be enabled by setting the `huge_pages` field of `PUT` or `PATCH`
5-
requests to the `/machine-config` endpoint to `2M`.
6-
7-
Backing guest memory by huge pages can bring performance improvements for
8-
specific workloads, due to less TLB contention and less overhead during
9-
virtual->physical address resolution. It can also help reduce the number of
10-
KVM_EXITS required to rebuild extended page tables post snapshot restore, as
11-
well as improve boot times (by up to 50% as measured by Firecracker's
3+
Firecracker supports three modes for the `huge_pages` field of `PUT` or `PATCH`
4+
requests to the `/machine-config` endpoint:
5+
6+
- `None` (default): Uses regular 4K pages with no huge page behavior.
7+
- `Transparent`: Uses `madvise(MADV_HUGEPAGE)` to request transparent huge pages
8+
for guest memory. Guest memory size must be a multiple of 2MB.
9+
- `2M`: Backs guest memory by 2MB hugetlbfs pages.
10+
11+
## Transparent Huge Pages (THP)
12+
13+
Setting `huge_pages` to `Transparent` enables transparent huge pages for guest
14+
memory via `madvise(MADV_HUGEPAGE)`. This allows the kernel to opportunistically
15+
back guest memory with huge pages without requiring a pre-allocated hugetlbfs
16+
pool.
17+
18+
Note that while traditional THP uses PMD-sized pages (2MB on x86_64), the actual
19+
THP size depends on the CPU architecture. Modern kernels also support
20+
"multi-size THP" (mTHP), which can allocate pages in various power-of-2 sizes
21+
between the base page size and PMD size (e.g. 16K, 32K, 64K). Firecracker
22+
requires guest memory size to be a multiple of 2MB regardless of the THP size
23+
used by the host kernel.
24+
25+
Limitations:
26+
27+
- When vhost-user-blk devices are in use, guest memory is memfd-backed (shared
28+
memory). THP for shared/shmem memory is controlled separately from anonymous
29+
memory via `/sys/kernel/mm/transparent_hugepage/shmem_enabled` and may not be
30+
enabled by default. Refer to the
31+
[Linux Documentation on shmem THP][thp_shmem_docs] for details on how to
32+
configure it.
33+
- THP does not integrate with UFFD; no transparent huge pages will be allocated
34+
during userfault-handling while resuming from a snapshot.
35+
36+
Please refer to the [Linux Documentation][thp_docs] for more information.
37+
38+
## Hugetlbfs (2M)
39+
40+
Setting `huge_pages` to `2M` backs guest memory by 2MB hugetlbfs pages. This can
41+
bring performance improvements for specific workloads, due to less TLB
42+
contention and less overhead during virtual->physical address resolution. It can
43+
also help reduce the number of KVM_EXITS required to rebuild extended page
44+
tables post snapshot restore, as well as improve boot times (by up to 50% as
45+
measured by Firecracker's
1246
[boot time performance tests](../tests/integration_tests/performance/test_boottime.py))
1347

1448
Using hugetlbfs requires the host running Firecracker to have a pre-allocated
@@ -19,7 +53,7 @@ not try to reserve sufficient hugetlbfs pages at the time of the `mmap` call,
1953
trying to claim them from the pool on-demand. For details on how to manage this
2054
pool, please refer to the [Linux Documentation][hugetlbfs_docs].
2155

22-
## Huge Pages and Snapshotting
56+
### Huge Pages and Snapshotting
2357

2458
Restoring a Firecracker snapshot of a microVM backed by huge pages will also use
2559
huge pages to back the restored guest. There is no option to flip between
@@ -43,15 +77,6 @@ the device is unable to reclaim the hugepage backing of the guest and drop RSS.
4377
However, the balloon can still be inflated and used to restrict memory usage in
4478
the guest.
4579

46-
## FAQ
47-
48-
### Why does Firecracker not offer a transparent huge pages (THP) setting?
49-
50-
Firecracker's guest memory can be memfd based. Linux (as of 6.1) does not offer
51-
a way to dynamically enable THP for such memory regions. Additionally, UFFD does
52-
not integrate with THP (no transparent huge pages will be allocated during
53-
userfaulting). Please refer to the [Linux Documentation][thp_docs] for more
54-
information.
55-
5680
[hugetlbfs_docs]: https://docs.kernel.org/admin-guide/mm/hugetlbpage.html
5781
[thp_docs]: https://www.kernel.org/doc/html/next/admin-guide/mm/transhuge.html#hugepages-in-tmpfs-shmem
82+
[thp_shmem_docs]: https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html#shmem-internal-tmpfs

src/firecracker/src/api_server/request/machine_configuration.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,7 @@ mod tests {
104104

105105
let huge_pages_cases = [
106106
("None", HugePageConfig::None),
107+
("Transparent", HugePageConfig::Transparent),
107108
("2M", HugePageConfig::Hugetlbfs2M),
108109
];
109110

src/firecracker/swagger/firecracker.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1442,8 +1442,13 @@ definitions:
14421442
type: string
14431443
enum:
14441444
- None
1445+
- Transparent
14451446
- 2M
1446-
description: Which huge pages configuration (if any) should be used to back guest memory.
1447+
default: None
1448+
description: >-
1449+
Which huge pages configuration should be used to back guest memory.
1450+
"None" uses regular 4K pages. "Transparent" enables THP via
1451+
madvise(MADV_HUGEPAGE). "2M" uses explicit hugetlbfs 2MB pages.
14471452
14481453
MemoryBackend:
14491454
type: object

src/vmm/src/devices/virtio/vhost_user.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -487,6 +487,7 @@ pub(crate) mod tests {
487487
libc::MAP_PRIVATE,
488488
Some(file),
489489
false,
490+
libc::MADV_HUGEPAGE,
490491
)
491492
.unwrap()
492493
.into_iter()

src/vmm/src/persist.rs

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -449,8 +449,13 @@ pub fn restore_from_snapshot(
449449
.into());
450450
}
451451
(
452-
guest_memory_from_file(mem_backend_path, mem_state, track_dirty_pages)
453-
.map_err(RestoreFromSnapshotGuestMemoryError::File)?,
452+
guest_memory_from_file(
453+
mem_backend_path,
454+
mem_state,
455+
track_dirty_pages,
456+
vm_resources.machine_config.huge_pages,
457+
)
458+
.map_err(RestoreFromSnapshotGuestMemoryError::File)?,
454459
None,
455460
)
456461
}
@@ -512,9 +517,11 @@ fn guest_memory_from_file(
512517
mem_file_path: &Path,
513518
mem_state: &GuestMemoryState,
514519
track_dirty_pages: bool,
520+
huge_pages: HugePageConfig,
515521
) -> Result<Vec<GuestRegionMmap>, GuestMemoryFromFileError> {
516522
let mem_file = File::open(mem_file_path)?;
517-
let guest_mem = memory::snapshot_file(mem_file, mem_state.regions(), track_dirty_pages)?;
523+
let guest_mem =
524+
memory::snapshot_file(mem_file, mem_state.regions(), track_dirty_pages, huge_pages)?;
518525
Ok(guest_mem)
519526
}
520527

src/vmm/src/resources.rs

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -580,6 +580,7 @@ mod tests {
580580
use crate::vmm_config::RateLimiterConfig;
581581
use crate::vmm_config::boot_source::{BootConfig, BootSource, BootSourceConfig};
582582
use crate::vmm_config::drive::{BlockBuilder, BlockDeviceConfig};
583+
use crate::vmm_config::machine_config::HugePageConfig::{Hugetlbfs2M, Transparent};
583584
use crate::vmm_config::machine_config::{HugePageConfig, MachineConfig, MachineConfigError};
584585
use crate::vmm_config::net::{NetBuilder, NetworkInterfaceConfig};
585586
use crate::vmm_config::vsock::tests::default_config;
@@ -1476,6 +1477,26 @@ mod tests {
14761477
Err(MachineConfigError::InvalidMemorySize)
14771478
);
14781479

1480+
// Odd memory size - not supported by THP/Hugetlbfs
1481+
aux_vm_config.mem_size_mib = Some(1025);
1482+
aux_vm_config.huge_pages = Some(Transparent);
1483+
assert_eq!(
1484+
vm_resources.update_machine_config(&aux_vm_config),
1485+
Err(MachineConfigError::InvalidMemorySize)
1486+
);
1487+
aux_vm_config.huge_pages = Some(Hugetlbfs2M);
1488+
assert_eq!(
1489+
vm_resources.update_machine_config(&aux_vm_config),
1490+
Err(MachineConfigError::InvalidMemorySize)
1491+
);
1492+
// Odd size supported by HugePageConfig::None
1493+
aux_vm_config.huge_pages = Some(HugePageConfig::None);
1494+
vm_resources.update_machine_config(&aux_vm_config).unwrap();
1495+
assert_eq!(
1496+
MachineConfigUpdate::from(vm_resources.machine_config.clone()),
1497+
aux_vm_config
1498+
);
1499+
14791500
// Incompatible mem_size_mib with balloon size.
14801501
vm_resources.machine_config.mem_size_mib = 128;
14811502
vm_resources

src/vmm/src/vmm_config/machine_config.rs

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,11 @@ pub enum MachineConfigError {
3434
/// Describes the possible (huge)page configurations for a microVM's memory.
3535
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Serialize, Deserialize)]
3636
pub enum HugePageConfig {
37-
/// Do not use hugepages, e.g. back guest memory by 4K
37+
/// Back guest memory by 4K pages, no hugepage behavior
3838
#[default]
3939
None,
40+
/// Use madvise(MADV_HUGEPAGE) for transparent huge pages
41+
Transparent,
4042
/// Back guest memory by 2MB hugetlbfs pages
4143
#[serde(rename = "2M")]
4244
Hugetlbfs2M,
@@ -49,6 +51,10 @@ impl HugePageConfig {
4951
let divisor = match self {
5052
// Any integer memory size expressed in MiB will be a multiple of 4096KiB.
5153
HugePageConfig::None => 1,
54+
// Note: THP technically supports memory not multiple of 2MB, however that disables some optimizations done
55+
// by the kernel (e.g. automatic memory alignment on Linux 6.4+).
56+
// To avoid performance/fragmentation surprises, having a memory multiple of 2MB is wiser.
57+
HugePageConfig::Transparent => 2,
5258
HugePageConfig::Hugetlbfs2M => 2,
5359
};
5460

@@ -59,11 +65,20 @@ impl HugePageConfig {
5965
/// create a mapping backed by huge pages as described by this [`HugePageConfig`].
6066
pub fn mmap_flags(&self) -> libc::c_int {
6167
match self {
62-
HugePageConfig::None => 0,
68+
HugePageConfig::None | HugePageConfig::Transparent => 0,
6369
HugePageConfig::Hugetlbfs2M => libc::MAP_HUGETLB | libc::MAP_HUGE_2MB,
6470
}
6571
}
6672

73+
/// Returns the flags required to pass to [libc::madvise], after allocating anonymous guest memory.
74+
/// Note: returning [libc::MADV_NORMAL] might skip the call to `madvise` entirely.
75+
pub fn madvise_flags(&self) -> libc::c_int {
76+
match self {
77+
HugePageConfig::Transparent => libc::MADV_HUGEPAGE,
78+
HugePageConfig::None | HugePageConfig::Hugetlbfs2M => libc::MADV_NORMAL,
79+
}
80+
}
81+
6782
/// Returns `true` iff this [`HugePageConfig`] describes a hugetlbfs-based configuration.
6883
pub fn is_hugetlbfs(&self) -> bool {
6984
matches!(self, HugePageConfig::Hugetlbfs2M)
@@ -72,7 +87,7 @@ impl HugePageConfig {
7287
/// Gets the page size in bytes of this [`HugePageConfig`].
7388
pub fn page_size(&self) -> usize {
7489
match self {
75-
HugePageConfig::None => 4096,
90+
HugePageConfig::None | HugePageConfig::Transparent => 4096,
7691
HugePageConfig::Hugetlbfs2M => 2 * 1024 * 1024,
7792
}
7893
}
@@ -81,7 +96,7 @@ impl HugePageConfig {
8196
impl From<HugePageConfig> for Option<memfd::HugetlbSize> {
8297
fn from(value: HugePageConfig) -> Self {
8398
match value {
84-
HugePageConfig::None => None,
99+
HugePageConfig::None | HugePageConfig::Transparent => None,
85100
HugePageConfig::Hugetlbfs2M => Some(memfd::HugetlbSize::Huge2MB),
86101
}
87102
}

0 commit comments

Comments
 (0)