Skip to content

Commit d614bc0

Browse files
committed
[host/{lib,mem/{layout,mgr,ptr}},docs/paging] changed BASE_ADDRESS to be 0x0.
We had an arbitrary unmapped section of memory from 0x0 to 0x200_000 (2MB). This was reported to be a point of friction when embedding custom guests. Changes: - removed custom write for page 0. - fixed get_address macro to work properly when guest_offset == 0. - removed test that no longer makes sense w/ starting mem at 0x0. - cleaned docs/comments referring to mapped memory starting at 0x200_000. - fixed some typos. Signed-off-by: danbugs <[email protected]>
1 parent 8eafa0d commit d614bc0

File tree

5 files changed

+163
-105
lines changed

5 files changed

+163
-105
lines changed

Diff for: docs/paging-development-notes.md

+103-34
Original file line numberDiff line numberDiff line change
@@ -1,80 +1,149 @@
11
# Paging in Hyperlight
22

3-
Hyperlight uses paging, which means the all addresses inside a Hyperlight VM are treated as virtual addresses by the processor. Specifically, Hyperlight uses (ordinary) 4-level paging. 4-level paging is used because we set the following control registers on logical cores inside a VM: `CR0.PG = 1, CR4.PAE = 1, IA32_EFER.LME = 1, and CR4.LA57 = 0`. A Hyperlight VM is limited to 1GB of addressable memory, see below for more details. These control register settings have the following effects:
3+
Hyperlight uses paging, which means the all addresses inside a Hyperlight VM are
4+
treated as virtual addresses by the processor. Specifically, Hyperlight uses
5+
(ordinary) 4-level paging. 4-level paging is used because we set the following
6+
control registers on logical cores inside a VM: `CR0.PG = 1, CR4.PAE = 1, IA32_EFER,
7+
LME = 1, and CR4.LA57 = 0`. A Hyperlight VM is limited to 1GB of addressable memory,
8+
see below for more details. These control register settings have the following
9+
effects:
410

511
- `CR0.PG = 1`: Enables paging
6-
- `CR4.PAE = 1`: Enables Physical Address Extension (PAE) mode (this is required for 4-level paging)
12+
- `CR4.PAE = 1`: Enables Physical Address Extension (PAE) mode (this is required for
13+
4-level paging)
714
- `IA32_EFER.LME = 1`: Enables Long Mode (64-bit mode)
815
- `CR4.LA57 = 0`: Makes sure 5-level paging is disabled
916

1017
## Host-to-Guest memory mapping
1118

12-
Into each Hyperlight VM, memory from the host is mapped into the VM as physical memory. The physical memory inside the VM starts at address `0x200_000` and extends linearly to however much memory was mapped into the VM (depends on various parameters).
19+
Into each Hyperlight VM, memory from the host is mapped into the VM as physical
20+
memory. The physical memory inside the VM starts at address `0x0` and extends
21+
linearly to however much memory was mapped into the VM (depends on various
22+
parameters).
1323

1424
## Page table setup
1525

16-
The following page table structs are set up in memory before running a Hyperlight VM (See [Access Flags](#access-flags) for details on access flags that are also set on each entry)
26+
The following page table structs are set up in memory before running a Hyperlight VM
27+
(See [Access Flags](#access-flags) for details on access flags that are also set on each entry)
1728

1829
### PML4 (Page Map Level 4) Table
1930

20-
The PML4 table is located at physical address specified in CR3. In Hyperlight we set `CR3=0x200_000`, which means the PML4 table is located at physical address `0x200_000`. The PML4 table comprises 512 64-bit entries.
31+
The PML4 table is located at physical address specified in CR3. In Hyperlight we set
32+
`CR3=0x0`, which means the PML4 table is located at physical address `0x0`. The PML4
33+
table comprises 512 64-bit entries.
2134

22-
In Hyperlight, we only initialize the first entry (at address `0x200_000`), with value `0x201_000`, implying that we only have a single PDPT.
35+
In Hyperlight, we only initialize the first entry (at address `0x0`), with value
36+
`0x1_000`, implying that we only have a single PDPT.
2337

2438
### PDPT (Page-directory-pointer Table)
2539

26-
The first and only PDPT is located at physical address `0x201_000`. The PDPT comprises 512 64-bit entries. In Hyperlight, we only initialize the first entry of the PDPT (at address `0x201_000`), with the value `0x202_000`, implying that we only have a single PD.
40+
The first and only PDPT is located at physical address `0x1_000`. The PDPT comprises
41+
512 64-bit entries. In Hyperlight, we only initialize the first entry of the PDPT
42+
(at address `0x1_000`), with the value `0x2_000`, implying that we only have a
43+
single PD.
2744

2845
### PD (Page Directory)
2946

30-
The first and only PD is located at physical address `0x202_000`. The PD comprises 512 64-bit entries, each entry `i` is set to the value `(i * 0x1000) + 0x203_000`. Thus, the first entry is `0x203_000`, the second entry is `0x204_000` and so on.
47+
The first and only PD is located at physical address `0x2_000`. The PD comprises 512
48+
64-bit entries, each entry `i` is set to the value `(i * 0x1000) + 0x3_000`. Thus,
49+
the first entry is `0x3_000`, the second entry is `0x4_000` and so on.
3150

3251
### PT (Page Table)
3352

34-
The page tables start at physical address `0x203_000`. Each page table has 512 64-bit entries. Each entry is set to the value `p << 21|i << 12` where `p` is the page table number and `i` is the index of the entry in the page table. Thus, the first entry of the first page table is `0x000_000`, the second entry is `0x000_000 + 0x1000`, and so on. The first entry of the second page table is `0x200_000 + 0x1000`, the second entry is `0x200_000 + 0x2000`, and so on. Enough page tables are created to cover the size of memory mapped into the VM.
53+
The page tables start at physical address `0x3_000`. Each page table has 512 64-bit
54+
entries. Each entry is set to the value `p << 21|i << 12` where `p` is the page
55+
table number and `i` is the index of the entry in the page table. Thus, the first
56+
entry of the first page table is `0x000_000`, the second entry is `0x000_000 +
57+
0x1000`, and so on. The first entry of the second page table is `0x200_000 +
58+
0x1000`, the second entry is `0x200_000 + 0x2000`, and so on. Enough page tables are
59+
created to cover the size of memory mapped into the VM.
3560

3661
## Address Translation
3762

38-
Given a 64-bit virtual address X, the corresponding physical address is obtained as follows:
63+
Given a 64-bit virtual address X, the corresponding physical address is obtained as
64+
follows:
3965

40-
1. PML4 table's physical address is located using CR3 (CR3 is `0x200_000`).
66+
1. PML4 table's physical address is located using CR3 (CR3 is `0x0`).
4167
2. Bits 47:39 of X are used to index into PML4, giving us the address of the PDPT.
4268
3. Bits 38:30 of X are used to index into PDPT, giving us the address of the PD.
4369
4. Bits 29:21 of X are used to index into PD, giving us the address of the PT.
4470
5. Bits 20:12 of X are used to index into PT, giving us a base address of a 4K page.
4571
6. Bits 11:0 of X are treated as an offset.
4672
7. The final physical address is the base address + the offset.
4773

48-
However, because we have only one PDPT4E and only one PDPT4E, bits 47:30 must always be zero. Each PDE points to a PT, and because each PTE with index `p,i` (where p is the page table number of i is the entry within that page) has value `p << 21|i << 12`, the base address received in step 5 above is always just bits 29:12 of X itself. **As bits 11:0 are an offset this means that translating a virtual address to a physical address is essentially a NO-OP**.
74+
However, because we have only one PDPT4E and only one PDPT4E, bits 47:30 must always
75+
be zero. Each PDE points to a PT, and because each PTE with index `p,i` (where p i
76+
the page table number of i is the entry within that page) has value `p << 21|i <<
77+
12`, the base address received in step 5 above is always just bits 29:12 of X
78+
itself. **As bits 11:0 are an offset this means that translating a virtual address
79+
to a physical address is essentially a NO-OP**.
4980

50-
A diagram to describe how a linear (virtual) address is translated to physical address inside a Hyperlight VM:
81+
A diagram to describe how a linear (virtual) address is translated to physical
82+
address inside a Hyperlight VM:
5183

5284
![A diagram to describe how a linear (virtual) address is translated to physical](assets/linear-address-translation.png)
5385

54-
Diagram is taken from "The Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A: System Programming Guide"
86+
Diagram is taken from "The Intel® 64 and IA-32 Architectures Software Developer’s
87+
Manual, Volume 3A: System Programming Guide"
5588

5689
### Limitations
5790

58-
Since we only have 1 PML4E and only 1 PDPTE, bits 47:30 of a linear address must be zero. Thus, we have only 30 bits (bit 29:0) to work with, giving us access to (1 << 30) bytes of memory (1GB).
91+
Since we only have 1 PML4E and only 1 PDPTE, bits 47:30 of a linear address must be
92+
zero. Thus, we have only 30 bits (bit 29:0) to work with, giving us access to (1 <<
93+
30) bytes of memory (1GB).
5994

6095
## Access Flags
6196

62-
In addition to providing addresses, page table entries also contain access flags that describe how memory can be accessed, and whether it is present or not. The following access flags are set on each entry:
63-
64-
PML4E, PDPTE, and PD Entries have the present flag set to 1, and the rest of the flags are not set.
65-
66-
PTE Entries all have the present flag set to 1, apart from those for the address range `0x000_000` to `0x1FF_000` which have the present flag set to 0 as we do not map memory below physical address `0x200_000`.
67-
68-
In addition, the following flags are set according to the type of memory being mapped:
69-
70-
For `Host Function Definitions` and `Host Exception Data` the NX flag is set to 1 meaning that the memory is not executable in the guest and is not accessible to guest code (ring 3) and is also read only even in ring 0.
71-
72-
For `Input/Output Data`, `Page Table Data`, `PEB`, `PanicContext` and `GuestErrorData` the NX flag is set to 1 meaning that the memory is not executable in the guest and the RW flag is set to 1 meaning that the memory is read/write in ring 0, this means that this data is not accessible to guest code unless accessed via the Hyperlight Guest API (which will be in ring 0).
73-
74-
For `Code` the NX flag is not set meaning that the memory is executable in the guest and the RW flag is set to 1 meaning the data is read/write, as the user/supervisor flag is set then the memory is also read/write accessible to user code. (The code section contains both code and data, so it is marked as read/write. In a future update we will parse the layout of the code and set the access flags accordingly).
75-
76-
For `Stack` the NX flag is set to 1 meaning that the memory is not executable in the guest, the RW flag is set to 1 meaning the data is read/write, as the user/supervisor flag is set then the memory is also read/write accessible to user code.
77-
78-
For `Heap` the RW flag is set to 1 meaning the data is read/write, as the user/supervisor flag is set then the memory is also read/write accessible to user code. The NX flag is not set if the feature `executable_heap` is enabled, otherwise the NX flag is set to 1 meaning that the memory is not executable in the guest. The `executable_heap` feature is disabled by default. It is required to allow data in the heap to be executable to when guests dynamically load or generate code, e.g. `hyperlight-wasm` supports loading of AOT compiled WebAssembly modules, these are loaded dynamically by the Wasm runtime and end up in the heap, therefore for this scenario the `executable_heap` feature must be enabled. In a future update we will implement a mechanism to allow the guest to request memory to be executable at runtime via the Hyperlight Guest API.
79-
80-
For `Guard Pages` the NX flag is set to 1 meaning that the memory is not executable in the guest. The RW flag is set to 1 meaning the data is read/write, as the user/supervisor flag is set then the memory is also read/write accessible to user code. **Note that neither of these flags should really be set as the purpose of the guard pages is to cause a fault if accessed, however, as we deal with this fault in the host not in the guest we need to make the memory accessible to the guest, in a future update we will implement exception and interrupt handling in the guest and then change these flags.**
97+
In addition to providing addresses, page table entries also contain access flags
98+
that describe how memory can be accessed, and whether it is present or not. The
99+
following access flags are set on each entry:
100+
101+
PML4E, PDPTE, and PD Entries have the present flag set to 1, and the rest of the
102+
flags are not set.
103+
104+
PTE Entries all have the present flag set to 1.
105+
106+
In addition, the following flags are set according to the type of memory being
107+
mapped:
108+
109+
For `Host Function Definitions` and `Host Exception Data` the NX flag is set to 1
110+
meaning that the memory is not executable in the guest and is not accessible to
111+
guest code (ring 3) and is also read only even in ring 0.
112+
113+
For `Input/Output Data`, `Page Table Data`, `PEB`, `PanicContext` and
114+
`GuestErrorData` the NX flag is set to 1 meaning that the memory is not executable
115+
in the guest and the RW flag is set to 1 meaning that the memory is read/write in
116+
ring 0, this means that this data is not accessible to guest code unless accessed
117+
via the Hyperlight Guest API (which will be in ring 0).
118+
119+
For `Code` the NX flag is not set meaning that the memory is executable in the guest
120+
and the RW flag is set to 1 meaning the data is read/write, as the user/supervisor
121+
flag is set then the memory is also read/write accessible to user code. (The code
122+
section contains both code and data, so it is marked as read/write. In a future
123+
update we will parse the layout of the code and set the access flags accordingly).
124+
125+
For `Stack` the NX flag is set to 1 meaning that the memory is not executable in the
126+
guest, the RW flag is set to 1 meaning the data is read/write, as the
127+
user/supervisor flag is set then the memory is also read/write accessible to user
128+
code.
129+
130+
For `Heap` the RW flag is set to 1 meaning the data is read/write, as the
131+
user/supervisor flag is set then the memory is also read/write accessible to user
132+
code. The NX flag is not set if the feature `executable_heap` is enabled, otherwise
133+
the NX flag is set to 1 meaning that the memory is not executable in the guest. The
134+
`executable_heap` feature is disabled by default. It is required to allow data in
135+
the heap to be executable to when guests dynamically load or generate code, e.g.
136+
`hyperlight-wasm` supports loading of AOT compiled WebAssembly modules, these are
137+
loaded dynamically by the Wasm runtime and end up in the heap, therefore for this
138+
scenario the `executable_heap` feature must be enabled. In a future update we will
139+
implement a mechanism to allow the guest to request memory to be executable at
140+
runtime via the Hyperlight Guest API.
141+
142+
For `Guard Pages` the NX flag is set to 1 meaning that the memory is not executable
143+
in the guest. The RW flag is set to 1 meaning the data is read/write, as the
144+
user/supervisor flag is set then the memory is also read/write accessible to user
145+
code. **Note that neither of these flags should really be set as the purpose of the
146+
guard pages is to cause a fault if accessed, however, as we deal with this fault in
147+
the host not in the guest we need to make the memory accessible to the guest, in a
148+
future update we will implement exception and interrupt handling in the guest and
149+
then change these flags.**

Diff for: src/hyperlight_host/src/lib.rs

+11-11
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ limitations under the License.
1616

1717
use std::sync::Once;
1818

19-
/// This crate contains an SDK that is used to execute specially-
19+
/// This crate contains an SDK that is used to execute specially
2020
/// compiled binaries within a very lightweight hypervisor environment.
2121
use log::info;
2222
/// The `built` crate is used to generate a `built.rs` file that contains
@@ -49,19 +49,19 @@ pub mod hypervisor;
4949
/// - `GuestHeap`
5050
/// - `GuestStack`
5151
///
52-
/// the start of the guest memory contains the page tables and is always located at the Virtual Address 0x00200000 when
53-
/// running in a Hypervisor:
52+
/// the start of the guest memory contains the page tables and is always located at the Virtual
53+
/// Address 0x0 when running in a Hypervisor:
5454
///
5555
/// Virtual Address
5656
///
57-
/// 0x200000 PML4
58-
/// 0x201000 PDPT
59-
/// 0x202000 PD
60-
/// 0x203000 The guest PE code (When the code has been loaded using LoadLibrary to debug the guest this will not be
61-
/// present and code length will be zero;
57+
/// 0x0_000 PML4
58+
/// 0x1_000 PDPT
59+
/// 0x2_000 PD
60+
/// 0x3_000 The guest PE code (when the code has been loaded using LoadLibrary to debug the guest
61+
/// this will not be present and code length will be zero)
6262
///
63-
/// The pointer passed to the Entrypoint in the Guest application is the 0x200000 + size of page table + size of code,
64-
/// at this address structs below are laid out in this order
63+
/// The pointer passed to the Entrypoint in the Guest application is the size of page
64+
/// table + size of code, at this address structs below are laid out in this order
6565
#[deny(dead_code, missing_docs, unused_mut)]
6666
pub mod mem;
6767
/// Metric definitions and helpers
@@ -108,7 +108,7 @@ pub use sandbox::UninitializedSandbox;
108108
pub use crate::func::call_ctx::MultiUseGuestCallContext;
109109

110110
/// The universal `Result` type used throughout the Hyperlight codebase.
111-
pub type Result<T> = core::result::Result<T, error::HyperlightError>;
111+
pub type Result<T> = core::result::Result<T, HyperlightError>;
112112

113113
// Logs an error then returns with it , more or less equivalent to the bail! macro in anyhow
114114
// but for HyperlightError instead of anyhow::Error

0 commit comments

Comments
 (0)