Use the 4 remaining MPU regions (4–7) to protect user data pages, heap, and mmap areas on a per-process basis. Currently only the stack page (Region 2) is switched per process; data/heap/mmap regions have no MPU coverage.
The current MPU layout uses 4 of 8 available regions:
| Region | Purpose | Scope |
|---|---|---|
| 0 | Kernel data | Static (priv-only) |
| 1 | Flash XIP | Static (RO all) |
| 2 | Process stack | Per-process (RW all, XN) |
| 3 | Peripherals | Static (priv-only) |
User data pages (user_pages[]), heap (brk), and mmap regions are not
covered. On RP2350 with TrustZone, the NS MPU grants all of NS SRAM as RW to
user mode — so process A can read/write process B's data if it knows the
address. There is no per-process data isolation.
Assign MPU regions 4–7 as per-process data regions, reprogrammed on every context switch alongside Region 2 (stack). Each region covers one contiguous SRAM range with User RW + XN access.
| Region | Purpose |
|---|---|
| 4 | Data region 0 (primary: data segment + heap) |
| 5 | Data region 1 (overflow / mmap) |
| 6 | Data region 2 (mmap) |
| 7 | Data region 3 (mmap) |
Unused data regions are disabled (RLAR.EN = 0 on ARMv8-M, RASR.ENABLE = 0 on ARMv6-M).
Add a per-process MPU region descriptor array to pcb_t:
#define MPU_DATA_REGIONS 4
typedef struct {
uint32_t base; /* region base address (32-byte aligned) */
uint32_t size; /* region size in bytes (0 = disabled) */
} mpu_data_region_t;
/* In pcb_t: */
mpu_data_region_t mpu_data[MPU_DATA_REGIONS];The key insight is that fewer, larger contiguous regions are better than many small ones, because we only have 4 data regions. The allocator should try to keep all of a process's memory in as few contiguous blocks as possible.
ELF loader (elf_load):
page_alloc_contiguous(data_pages)already allocates the data segment as one contiguous block → occupies 1 MPU data region (region 4).- Stack page is separate (Region 2, unchanged).
- Set
mpu_data[0] = {base, data_pages * PAGE_SIZE}.
sys_brk (heap growth):
page_alloc_at()already extends contiguously afteruser_pages[0]. On success, just updatempu_data[0].sizeto cover the expanded range (data segment + heap together in one region).- If
page_alloc_at()fails (gap in physical pages), the brk expansion fails as it does today — no change needed.
sys_mmap2 (anonymous mappings):
- After allocating pages, find or create an
mpu_data[]slot:- Check if the new allocation is contiguous with an existing data region — if so, extend that region's size (merge).
- Otherwise, find a free
mpu_data[]slot (size == 0) and assign it. - If no free slot is available, fail the mmap with
-ENOMEM. The caller (typically musl) will handle the failure gracefully.
sys_munmap:
- If the unmapped range is at the end of a data region, shrink the region.
- If it splits a region in the middle, we cannot represent the hole with one region — either leave the region covering the full range (wastes protection granularity but is safe), or split into two regions if a free slot exists.
- If the entire region is freed, disable it (size = 0).
vfork:
- Child shares parent's
mpu_data[](copied by value, same asuser_pages[]). - On exec, child gets its own
mpu_data[]from the new ELF load.
mpu_switch() currently programs Region 2 only. Extend it to also program
regions 4–7 from next->mpu_data[0..3]:
void mpu_switch(pcb_t *next) {
if (!mpu_present) return;
/* Region 2: stack (unchanged) */
...
/* Regions 4–7: per-process data */
for (int i = 0; i < MPU_DATA_REGIONS; i++) {
uint32_t reg = 4u + i;
if (next->mpu_data[i].size == 0) {
/* Disable region */
MPU_RNR = reg;
MPU_RLAR = 0; /* ARMv8-M; RASR=0 for ARMv6-M */
} else {
uint32_t base = next->mpu_data[i].base;
uint32_t size = next->mpu_data[i].size;
/* Program as User RW, XN, WB (same attributes as stack) */
mpu_set_region(reg,
base | RBAR_SH(SH_NONE) | RBAR_AP(AP8_RW_ALL) | RBAR_XN,
RLAR_LIMIT(base, size) | RLAR_ATTR(MAIR_IDX_WB) | RLAR_EN);
}
}
arch_dsb_isb();
}The current NS MPU Region 1 grants all of NS SRAM (512 KB) as RW to user mode. With per-process data regions in the Secure MPU, the NS MPU should also be tightened:
-
Option A (simple): Keep the NS MPU as-is. The Secure MPU regions 4–7 already restrict which SRAM the current process can access. NS MPU is a second layer — broader is fine since the Secure MPU is the tighter gate. However, Secure code accessing NS aliases in syscalls still goes through the NS MPU, so it must remain permissive for kernel access.
-
Option B (strict): Reprogram NS MPU data regions per-process too, using the NS alias addresses. This doubles the region programming cost on each switch. Only needed if we want defense-in-depth against speculative access or Secure-side bugs.
Recommendation: Option A — the Secure MPU is the enforcement layer. The NS MPU remains a broad "NS SRAM is accessible" grant, and per-process isolation comes from Secure MPU regions 4–7.
ARMv6-M MPU requires power-of-2 region sizes (minimum 256 bytes). A data segment of e.g. 12 KB cannot be covered by a single region — it must be rounded up to 16 KB, potentially exposing adjacent memory. This is acceptable:
- The exposed area is within the process's own page pool allocation range.
- Other processes' pages are not within the rounded-up region because
page_alloc_contiguous()allocates from the pool sequentially.
Sub-region disable (SRD) bits can be used to mask out the rounded-up tail, giving 1/8th granularity within a power-of-2 region. For a 16 KB region, each sub-region is 2 KB — good enough for most cases.
The critical design choice: if a memory allocation cannot be represented within the 4 available data regions, the allocation fails.
| Operation | Failure condition | Result |
|---|---|---|
exec (ELF load) |
data segment > 1 contiguous block | Cannot happen (uses page_alloc_contiguous) |
sys_brk |
page_alloc_at() fails to extend contiguously |
Returns unchanged break (existing behaviour) |
sys_mmap2 |
No free mpu_data[] slot and cannot merge |
Returns -ENOMEM |
sys_mmap2 |
Contiguous page allocation fails | Returns -ENOMEM (existing behaviour) |
This is reasonable because:
- Most processes use 1 data region (data+heap contiguous) + 0–2 mmap regions.
- musl's malloc uses brk for small allocations and mmap for large ones; with 4 data regions, this works well.
- If a process truly needs >4 discontiguous data areas, it has outgrown what a 4 KB-page, 8-region MPU microcontroller can offer.
- Add
mpu_data[4]topcb_t. - Extend
mpu_switch()to program regions 4–7. - No functional change yet (all
mpu_data[].size== 0 → regions disabled).
- After
page_alloc_contiguous(), setmpu_data[0]to cover the data segment. - Test: process can access its data; accessing other process's data faults.
- On heap growth, update
mpu_data[0].sizeto cover data + heap. - Test: malloc/free in user programs works; heap accessible.
- Allocate/free
mpu_data[]slots on mmap/munmap. - Fail mmap if no slot available.
- Test: mmap-heavy programs (musl large alloc) work within limits.
- Verify Option A is sufficient with tests.
- Optionally tighten NS MPU if security audit requires it.
| Aspect | Current | After |
|---|---|---|
mpu_switch() register writes |
2–3 (Region 2 only) | 10–15 (Region 2 + 4–7) |
| Context switch overhead | ~20 cycles for MPU | ~60–80 cycles for MPU |
| PCB size increase | 0 | +32 bytes (4 × {base, size}) |
| Code size increase | 0 | ~100–200 bytes |
The additional ~40–60 cycles per context switch is negligible compared to the full PendSV handler (~200+ cycles) and the 10 ms time slice (1.33M cycles at 133 MHz).