Note: File paths in this document may be outdated after the source tree reorganization. See Source Tree Structure for the current layout.
Porting PPAP to the Raspberry Pi Zero (BCM2835, ARM1176JZF-S). This is a major architectural leap: the ARM1176 has a full MMU, transforming PPAP from a micro OS into a modern Unix-like OS capable of proper virtual memory, fork() with copy-on-write, demand paging, and process isolation.
Initial target is QEMU (raspi0 / versatilepb), then real Pi Zero hardware.
Produce a bootable PPAP system on the Raspberry Pi Zero that:
- Boots from an SD card FAT32 partition via the GPU bootloader chain.
- Provides a console on PL011 UART (same IP as RP2040), mirrored to HDMI framebuffer when available.
- Runs with full virtual memory: per-process page tables, real
fork()with COW, demand paging, and propermmap(). - Passes the PPAP test suite (
runtests) on QEMUraspi0. - Runs on real Pi Zero hardware.
- HDMI framebuffer console via GPU mailbox.
- USB keyboard input via DWC2 OTG controller.
- EMMC/SD driver (SDHCI) for fast block I/O.
- File page cache (dramatically faster I/O with 512 MB RAM).
- Dynamic linking (full
mmap+ GOT/PLT). - Wi-Fi networking on Pi Zero W (via CYW43439 + USB).
- Multi-user support with MMU-enforced process isolation.
- VideoCore IV GPU programming (3D, video decode).
- Camera interface (CSI).
- DSI display interface.
- Pi Zero 2 W (BCM2710, Cortex-A53, ARMv8-A — separate port).
- 64-bit mode (ARM1176 is 32-bit only).
| Aspect | Cortex-M0+ (RP2040) | ARM1176JZF-S (Pi Zero) |
|---|---|---|
| Architecture | ARMv6-M | ARMv6Z (full ARM) |
| ISA | Thumb-1 only (16-bit) | ARM + Thumb + Thumb-2 |
| Word size | 32-bit | 32-bit |
| Clock | 133 MHz | 1 GHz |
| RAM | 264 KB SRAM | 512 MB SDRAM |
| Flash/Storage | 2-16 MB QSPI (XIP) | SD card (no XIP) |
| MMU | None (4-region MPU) | Full MMU (ARMv6 page tables) |
| FPU | None | VFPv2 (single+double precision) |
| Caches | 16 KB XIP cache | 16 KB I-cache, 16 KB D-cache, L2 cache |
| Exception model | Cortex-M NVIC (handler/thread) | ARM mode exceptions (7 modes) |
| Privilege | Handler/Thread + MSP/PSP | 7 processor modes (USR/SVC/IRQ/FIQ/ABT/UND/SYS) |
| Syscall | SVC instruction |
SWI instruction (same encoding, different name) |
| Context switch | PendSV (deferred, lowest priority) | Software-triggered via IRQ or SWI return path |
| Interrupt controller | NVIC (nested, prioritized) | BCM2835 custom IC (no nesting by default) |
| Timer | SysTick (24-bit) | ARM Timer + System Timer (64-bit) |
| Cores | Dual Cortex-M0+ | Single ARM1176 + VideoCore IV GPU |
| Boot | ROM → boot2 → stage1 → kernel (XIP) | GPU → bootcode.bin → start.elf → kernel.img @ 0x8000 |
| Endianness | Little-endian | Little-endian (configurable, but LE in practice) |
| Address space | 32-bit flat, single space | 32-bit virtual, per-process page tables |
| PIC convention | r9 = GOT base | Standard ARM PIC (GOT via PC-relative) |
| ELF flags | EF_ARM_EABI_VER5, Thumb | EF_ARM_EABI_VER5, ARM/Thumb interwork |
This is the defining feature of the Pi Zero port. The MMU enables capabilities that were impossible on the RP2040:
Each process gets its own virtual address space via per-process page tables. User processes all see the same virtual addresses (e.g., text at 0x00010000, stack at 0x7FFF0000) but map to different physical pages.
Proposed virtual address layout (user/kernel split at 0xC0000000):
| Virtual address | Size | Contents |
|---|---|---|
| 0x00000000 - 0x00000FFF | 4 KB | Null page (unmapped, catches NULL derefs) |
| 0x00010000 - 0x0FFFFFFF | ~256 MB | User text + data + heap (grows up) |
| 0x70000000 - 0x7FFFFFFF | 256 MB | User stack (grows down) + mmap region |
| 0xC0000000 - 0xCFFFFFFF | 256 MB | Kernel direct-map (phys 0 → virt 0xC0000000) |
| 0xD0000000 - 0xDFFFFFFF | 256 MB | Kernel heap, page tables, etc. |
| 0xF0000000 - 0xFFFFFFFF | 256 MB | Peripheral MMIO (0x20000000 phys) |
The kernel is mapped into every process's address space (high addresses), so syscalls don't require a page table switch — only a mode change (USR → SVC). This is the classic Linux 3G/1G split approach.
The RP2040 port uses vfork() because fork() without an MMU would require
physically copying the entire address space. With the MMU:
fork()duplicates the page table, marking all pages read-only- Both parent and child share the same physical pages
- On first write, a data abort (page fault) fires
- The fault handler allocates a new physical page, copies the data, updates the faulting process's page table entry, and resumes
- Only modified pages are ever copied — huge memory savings
This is standard COW fork, the same mechanism used by Linux, BSD, etc.
With 512 MB RAM, PPAP can support many more processes. But more importantly, the MMU enables demand paging: pages can be loaded from storage (SD card) on first access rather than at exec() time.
- Page fault on unmapped page → check if it belongs to a file mapping
- Load the page from SD, map it, resume the process
- Under memory pressure, evict clean pages (just unmap) or dirty pages (write back to swap, then unmap)
With virtual memory, mmap() becomes a real operation:
MAP_ANONYMOUS: allocate virtual pages, back with physical on demandMAP_PRIVATEfile mapping: map file pages COWMAP_SHAREDfile mapping: share physical pages between processesPROT_READ | PROT_WRITE | PROT_EXEC: per-page permissions via PTE bits
Every process has its own page table. A user process cannot access another process's memory or kernel memory — the MMU enforces this in hardware. This is a fundamental security improvement over the MPU-based RP2040 port.
Two-level page tables:
Level 1 (L1) — Translation Table:
- 4096 entries × 4 bytes = 16 KB per process
- Each entry covers 1 MB of virtual address space
- Entry types: Fault (unmapped), Section (1 MB mapping), Page Table (pointer to L2)
Level 2 (L2) — Page Table:
- 256 entries × 4 bytes = 1 KB per L2 table
- Each entry covers 4 KB (small page) or 64 KB (large page)
- Entry types: Fault, Large Page (64 KB), Small Page (4 KB), Extended Small Page
Small page (4 KB) descriptor bits:
[31:12] Physical page base address
[11:10] (should be zero)
[9] AP[2] (access permission extension)
[8:6] TEX[2:0] (memory type)
[5:4] AP[1:0] (access permissions)
[3] C (cacheable)
[2] B (bufferable)
[1] 1 (small page indicator)
[0] XN (execute never)
Access permissions (AP[2:0]):
- 0b001: SVC R/W, USR no access (kernel-only)
- 0b011: SVC R/W, USR R/W
- 0b101: SVC R/O, USR no access
- 0b111: SVC R/O, USR R/O (read-only, used for COW)
The ARM1176 has separate I-TLB and D-TLB. On context switch:
- Write TTBR0 (Translation Table Base Register 0) with new process's L1 base
- Flush TLB (or use ASID to avoid full flush — ARM1176 supports 256 ASIDs)
ASID (Address Space Identifier):
- Each TLB entry is tagged with an 8-bit ASID
- CONTEXTIDR register holds the current ASID
- On context switch: update CONTEXTIDR + TTBR0, no TLB flush needed (unless ASIDs are exhausted)
ARMv6 supports 16 domains. Each L1 entry belongs to a domain. Domain access is checked before page permissions:
- No access: any access faults
- Client: page permission bits checked
- Manager: no permission check (full access)
Simplest approach: use 2 domains:
- Domain 0 = client (user pages — permission bits enforced)
- Domain 1 = manager (kernel pages — unrestricted access)
The Pi Zero uses the same ARM instruction set family as the RP2040 but a completely different processor profile:
| Aspect | Cortex-M0+ (M-profile) | ARM1176 (A-profile) |
|---|---|---|
| Exception entry | Auto-push {r0-r3,r12,lr,pc,xpsr} | Save CPSR→SPSR, LR=return addr, mode switch |
| Exception return | bx lr with EXC_RETURN |
movs pc, lr or subs pc, lr, #4 |
| IRQ disable | cpsid i / cpsie i |
Same (also available: msr cpsr_c, ...) |
| Syscall | svc #N → SVCall vector |
swi #N → SWI vector at 0x08 |
| Stack pointer | MSP/PSP via CONTROL reg | r13 banked per mode (USP, SSP, etc.) |
| Vector table | Pointers at VTOR address | Branch instructions at 0x0 or 0xFFFF0000 |
| Context switch | PendSV (deferred exception) | Manual in IRQ handler or SWI return |
ARM1176 uses branch instructions (not pointers) at fixed addresses:
.section .vectors, "ax"
_vectors:
ldr pc, =reset_handler @ 0x00: Reset
ldr pc, =undefined_handler @ 0x04: Undefined instruction
ldr pc, =swi_handler @ 0x08: SWI (syscall)
ldr pc, =prefetch_handler @ 0x0C: Prefetch abort
ldr pc, =data_abort_handler @ 0x10: Data abort (page fault!)
nop @ 0x14: Reserved
ldr pc, =irq_handler @ 0x18: IRQ
ldr pc, =fiq_handler @ 0x1C: FIQKey difference: Data Abort (0x10) is the page fault handler — this is the heart of the virtual memory system. Every COW fault, demand page load, and access violation flows through here.
The ARM1176 has 7 modes, each with banked r13 (SP) and r14 (LR):
| Mode | Purpose | Banked registers |
|---|---|---|
| USR (User) | User processes | r13_usr, r14_usr |
| SVC (Supervisor) | Kernel, SWI handler | r13_svc, r14_svc, SPSR_svc |
| IRQ | Interrupt handler | r13_irq, r14_irq, SPSR_irq |
| FIQ | Fast interrupt | r8_fiq-r12_fiq, r13_fiq, r14_fiq, SPSR_fiq |
| ABT (Abort) | Data/prefetch abort | r13_abt, r14_abt, SPSR_abt |
| UND (Undefined) | Undefined instruction | r13_und, r14_und, SPSR_und |
| SYS (System) | Privileged, shares USR regs | Same as USR (r13_usr, r14_usr) |
Each mode needs its own stack. At boot, set up:
msr cpsr_c, #0xD2 @ IRQ mode, IRQs disabled
ldr sp, =irq_stack_top
msr cpsr_c, #0xD7 @ ABT mode
ldr sp, =abt_stack_top
msr cpsr_c, #0xDB @ UND mode
ldr sp, =und_stack_top
msr cpsr_c, #0xD3 @ SVC mode
ldr sp, =svc_stack_topNo PendSV on ARM1176. Context switch options:
Recommended: switch on SWI return and IRQ return.
Timer IRQ handler (preemption):
irq_handler:
sub lr, lr, #4 @ adjust return address
srsdb sp!, #0x13 @ save {LR_irq, SPSR_irq} to SVC stack
cps #0x13 @ switch to SVC mode
push {r0-r12, lr} @ save caller-saved + lr on SVC stack
bl timer_handler @ C handler: acknowledge timer, call sched_tick
@ check if context switch needed
bl sched_should_switch
cmp r0, #0
bne do_context_switch
pop {r0-r12, lr}
rfeia sp! @ return from exception (restore CPSR + PC)
do_context_switch:
@ save remaining state to current PCB
@ load next PCB's page table (TTBR0), ASID, stack, registers
@ restore and rfeiaswi_handler:
srsdb sp!, #0x13 @ save {LR_svc, SPSR_svc} to SVC stack
push {r0-r12, lr}
mov r0, r7 @ syscall number (ARM EABI: same as Cortex-M)
mov r1, sp @ pointer to saved registers (args in r0-r5)
bl syscall_dispatch
str r0, [sp, #0] @ store return value in saved r0
pop {r0-r12, lr}
rfeia sp! @ return to user modeThe syscall ABI is identical to the RP2040 port: r7=syscall number, r0-r5=arguments, return in r0. musl libc uses the same convention on all ARM Linux targets.
This is new — the RP2040 has no equivalent:
void data_abort_handler(uint32_t fault_addr, uint32_t fsr, uint32_t pc) {
uint32_t fault_type = fsr & 0xF;
if (fault_type == 0x7) {
/* Translation fault (page not mapped) */
/* → demand paging: load page from file/swap, map it, resume */
} else if (fault_type == 0xF) {
/* Permission fault (write to read-only page) */
/* → COW: copy page, make writable, resume */
} else {
/* Unexpected fault → SIGSEGV to process */
}
}The BCM2835 boot is GPU-first:
- GPU ROM boots from on-chip ROM
- GPU reads bootcode.bin from SD card FAT32 partition
- bootcode.bin loads start.elf (GPU firmware)
- start.elf reads config.txt for configuration
- start.elf loads kernel.img to physical address 0x8000
- GPU releases ARM core with PC = 0x8000
PPAP provides kernel.img (our kernel binary). The GPU bootloader files
(bootcode.bin, start.elf, fixup.dat) come from the Raspberry Pi
firmware repository.
SD Card (FAT32)
├── bootcode.bin # GPU stage 1 (from RPi firmware)
├── start.elf # GPU firmware (from RPi firmware)
├── fixup.dat # GPU memory split config (from RPi firmware)
├── config.txt # Boot configuration (arm_freq, kernel name, etc.)
├── kernel.img # PPAP kernel (our binary, loaded to 0x8000)
├── romfs.img # romfs image (loaded by kernel at boot)
├── ppap_usr.img # UFS image → mounted at /usr
├── ppap_home.img # UFS image → mounted at /home
├── ppap_var.img # UFS image → mounted at /var
└── cmdline.txt # Kernel command line (optional)
1. _start (0x8000): set up exception mode stacks, jump to reset_handler
2. reset_handler: copy .data, zero .bss, set up initial page table
3. Enable MMU: identity-map first 16 MB + kernel high map at 0xC0000000
4. Jump to virtual address (0xC000xxxx) — now running in virtual memory
5. kmain():
a. target_early_init() — UART console (PL011 at 0x20201000)
b. Page allocator init (thousands of pages from 512 MB)
c. MMU: set up kernel page tables for all physical memory
d. Mount romfs (loaded from SD card into RAM, or read on demand)
e. target_late_init() — SD card, interrupts, timer
f. Mount remaining filesystems (VFAT on SD, UFS via loopback)
g. execve("/sbin/init") as PID 1
All BCM2835 peripherals are at physical 0x20000000 (bus address 0x7E000000). The BCM2835 datasheet uses bus addresses; subtract 0x5E000000 to get physical addresses.
| Peripheral | Physical address | Purpose |
|---|---|---|
| System Timer | 0x20003000 | 64-bit free-running counter + 4 compare channels |
| Interrupt Controller | 0x2000B200 | Custom BCM2835 IC (not GIC) |
| ARM Timer | 0x2000B400 | SP804-style timer (from ARM VIC) |
| GPU Mailbox | 0x2000B880 | ARM↔GPU communication (framebuffer, clock, etc.) |
| UART0 (PL011) | 0x20201000 | Full UART (same IP as RP2040!) |
| UART1 (mini) | 0x20215000 | Mini UART (simpler, often used as default) |
| SPI0 | 0x20204000 | SPI master |
| I2C0/1 | 0x20205000/0x20804000 | I2C/BSC |
| GPIO | 0x20200000 | 54 GPIO pins, alt functions |
| EMMC (SD) | 0x20300000 | SD card controller (SDHCI-compatible) |
| USB | 0x20980000 | DWC2 OTG USB controller |
| DMA | 0x20007000 | 16 DMA channels |
The BCM2835 has a custom interrupt controller (NOT ARM GIC):
- 3 pending registers:
IRQ_basic_pending,IRQ_pending_1,IRQ_pending_2 - 3 enable registers + 3 disable registers
- 72 interrupt sources total (GPU + ARM)
- No priority levels, no nesting by default
- IRQ and FIQ supported (FIQ can be routed to one source)
Key IRQ sources:
- System Timer match 1, 3 (channels 0, 2 reserved by GPU)
- UART0
- EMMC (SD card)
- USB
- ARM Timer
The BCM2835's PL011 UART (0x20201000) is the same PrimeCell IP used in the
RP2040. The existing uart.c driver can be adapted with minimal changes:
- Different base address (0x20201000 vs 0x40034000)
- Different clock source (configured via GPU mailbox, typically 48 MHz)
- GPIO alt function setup via BCM2835 GPIO registers (not RP2040 IO_BANK0)
The BCM2835 has an SDHCI-compatible EMMC controller at 0x20300000. This is much faster than SPI-mode SD (used on PicoCalc):
- 4-bit parallel data bus
- DMA capable
- SDHCI standard register interface
- Supports SD, SDHC, SDXC
| Physical address | Size | Contents |
|---|---|---|
| 0x00000000 | 0x8000 | ARM vector table + GPU reserved |
| 0x00008000 | ~1 MB | Kernel .text, .rodata, .data, .bss |
| 0x00108000 | ~2 MB | Kernel page tables (L1 + L2 pool) |
| 0x00308000 | ~4 MB | romfs image (loaded from SD at boot) |
| 0x00708000 | ~440 MB | Page pool (user processes, file cache, etc.) |
| 0x1C000000 | 64 MB | GPU memory (configured via config.txt) |
| 0x20000000 | 16 MB | Peripheral MMIO |
GPU memory split is configurable via gpu_mem=64 in config.txt (default).
ARM gets 512 - 64 = 448 MB.
With 448 MB of RAM and 4 KB pages, the page pool has ~114,000 pages. The RP2040's 51-page free-stack is insufficient. Options:
- Bitmap allocator: 114,000 bits = ~14 KB bitmap. O(n) worst case but simple and cache-friendly.
- Buddy allocator: standard Linux approach. O(log n) alloc/free, supports multi-page allocations efficiently. More complex.
- Free list with zones: simple linked list, but add zones for DMA vs normal memory.
Recommendation: Start with bitmap (simplest), upgrade to buddy if multi-page allocation patterns demand it.
Each process needs a 16 KB L1 table + L2 tables on demand (1 KB each). A process mapping 16 MB of virtual space needs ~16 L2 tables = 16 KB. Total per process: ~32 KB. With 64 processes: ~2 MB. Easily fits.
| Subsystem | RP2040 | Pi Zero | Impact |
|---|---|---|---|
| Memory management | 51-page free-stack, MPU | Bitmap/buddy allocator, MMU page tables | Major rewrite |
| Process model | vfork only, 8 max | Real fork + COW, 64+ max | Major rewrite |
| exec | XIP from flash, GOT reloc | Load to RAM, standard ELF reloc | Moderate rewrite |
| Context switch | PendSV, PSP/MSP swap | IRQ/SWI return path, TTBR0 swap | Full rewrite (asm) |
| Syscall entry | SVC → NVIC SVCall vector | SWI → vector at 0x08 | Full rewrite (asm) |
| Signal delivery | Modify PSP exception frame | Modify user stack directly | Moderate rewrite |
| mmap | Anonymous only, page_alloc | Full mmap (file, anon, shared, COW) | Major new code |
| Scheduler | Same | Mostly same (C code portable) | Minor changes |
| Spinlock | RP2040 SIO hardware | IRQ disable only (single core) | Simplify |
| Subsystem | Notes |
|---|---|
| VFS layer | Unchanged |
| All filesystem drivers | Unchanged (romfs, VFAT, UFS, devfs, procfs, tmpfs) |
| File descriptor layer | Unchanged |
| TTY subsystem | Unchanged (different UART driver, same TTY discipline) |
| Pipe | Unchanged |
| Syscall dispatch (C) | Unchanged (same syscall numbers, same C dispatcher) |
| Signal handling (C) | Unchanged (delivery mechanism needs arch hooks) |
| Block device layer | Unchanged (different SD driver, same blkdev API) |
| klog | Unchanged |
| Endianness | Same (both little-endian) |
| Subsystem | Description |
|---|---|
| MMU driver | Page table create/destroy, map/unmap, TLB flush, ASID management |
| Page fault handler | Data abort → COW, demand paging, SIGSEGV delivery |
| Physical page allocator | Bitmap or buddy for ~114,000 pages |
| Kernel virtual memory | vmalloc, ioremap for peripheral mapping |
| File page cache | Cache file data in RAM pages (huge performance win with 512 MB) |
| EMMC/SD driver | SDHCI-compatible, replaces SPI-mode SD driver |
| GPU mailbox driver | ARM↔VideoCore communication (clock config, framebuffer) |
| Framebuffer | HDMI output via GPU mailbox (allocate FB, set resolution) |
| USB (future) | DWC2 OTG for keyboard, storage, networking |
| Target | Board | Description |
|---|---|---|
ppap_qemu_a6 |
QEMU raspi0 or versatilepb |
Emulated ARM1176 for testing |
ppap_pizero |
Raspberry Pi Zero / Zero W | Real hardware target |
QEMU's raspi0 machine emulates the BCM2835 faithfully:
qemu-system-arm -M raspi0 -kernel kernel.img -serial stdio -dtb bcm2835-rpi-zero.dtbAlternatively, versatilepb with -cpu arm1176 is simpler (standard
PL011 UART, PL190 VIC) for early bringup, then switch to raspi0 for
BCM2835 peripheral testing.
arm-none-eabi-gcc # same toolchain as RP2040 (different -mcpu flag)
-mcpu=arm1176jzf-s # target the specific core
-mfpu=vfp # enable VFP
-mfloat-abi=hard # hardware floating point
-marm # generate ARM (not Thumb) code for kernel
User-space can use Thumb-2 for code density, but the kernel should use ARM mode for exception handlers (ARM1176 enters exceptions in ARM mode).
src/arch/arm_a/
boot.S — Exception vector table (branch instructions at 0x0)
start.S — Mode stack setup, early MMU init, jump to kmain
switch.S — Context switch: save/restore r4-r14, update TTBR0+ASID
trap.S — SWI handler (syscall entry/return)
abort.S — Data abort handler (page fault entry)
cpu.h — CP15 register access, TLB flush, cache ops
mmu.c — Page table create/destroy/map/unmap, ASID management
src/target/pizero/
CMakeLists.txt — Build rules (512 MB RAM, arm1176, kernel at 0x8000)
pizero.ld — Linker script (kernel virtual base at 0xC0000000)
target_pizero.c — Target hooks: early_init (PL011), late_init (SD, timer)
drivers/
uart_pl011.c — PL011 UART driver (adapted from RP2040 uart.c)
timer_bcm.c — BCM2835 System Timer (100 Hz tick via compare channel)
irq_bcm.c — BCM2835 interrupt controller driver
emmc_bcm.c — EMMC/SD card driver (SDHCI-compatible)
mailbox_bcm.c — GPU mailbox driver (clock config, framebuffer)
fb_bcm.c — HDMI framebuffer console (via mailbox)
Changes to existing files:
| File | Change |
|---|---|
src/arch/arch.h |
Add arm_a architecture detection |
src/kernel/proc/proc.h |
Extend PCB with page table pointer (TTBR0), ASID |
src/kernel/proc/sched.c |
Add TTBR0 + ASID update to context switch path |
src/kernel/mm/page.c |
Bitmap allocator for ~114,000 pages (replaces free-stack) |
src/kernel/mm/vm.c |
New: virtual memory manager (map/unmap/fault/COW) |
src/kernel/syscall/sys_proc.c |
Real fork() with COW page table duplication |
src/kernel/syscall/sys_mem.c |
Full mmap() / munmap() / brk() |
scripts/build.sh |
Add pizero and qemu_a6 targets |
scripts/run.sh |
Add qemu_a6 target (qemu-system-arm -M raspi0) |
# src/target/pizero/CMakeLists.txt
project(ppap_pizero C ASM)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mcpu=arm1176jzf-s -marm -mfpu=vfp -mfloat-abi=hard")
set(CMAKE_ASM_FLAGS "${CMAKE_ASM_FLAGS} -mcpu=arm1176jzf-s -marm")
set(PPAP_KERNEL_VIRT_BASE 0xC0000000)
set(PPAP_KERNEL_PHYS_BASE 0x00008000)
set(PPAP_RAM_SIZE 0x20000000) # 512 MB
set(PPAP_GPU_MEM 0x04000000) # 64 MB reserved for GPU
set(PPAP_PAGE_COUNT_MAX 114688) # (512MB - 64MB) / 4KB
# Toolchain: arm-none-eabi-gcc (same as RP2040, different -mcpu)
set(CMAKE_C_COMPILER arm-none-eabi-gcc)
set(CMAKE_ASM_COMPILER arm-none-eabi-gcc)
set(CMAKE_OBJCOPY arm-none-eabi-objcopy)QEMU test target:
# src/target/qemu_a6/CMakeLists.txt
# Same as pizero but:
# - Uses versatilepb machine for initial bringup (standard PL011 + PL190 VIC)
# - Switches to raspi0 machine for BCM2835 peripheral testing
set(PPAP_QEMU_MACHINE "raspi0")
set(PPAP_QEMU_FLAGS "-serial stdio -dtb bcm2835-rpi-zero.dtb")# Run tests on QEMU raspi0
./scripts/run.sh --test qemu_a6The test runner boots PPAP on QEMU raspi0, runs runtests, and
captures output from the PL011 serial port. Same pattern as qemu_arm
and qemu_m68k.
| Category | Tests | Notes |
|---|---|---|
| Core (exec, vfork, pipe, signal) | Same as ARM/m68k | Should pass unchanged |
| Fork (COW) | New: test_fork |
Verify COW page fault, parent/child isolation |
| mmap | New: test_mmap |
Anonymous + file-backed mappings |
| Page fault | New: test_pagefault |
Verify SIGSEGV on invalid access |
| Memory isolation | New: test_isolation |
Process A cannot read process B's memory |
| Large memory | New: test_largemem |
Allocate >1 MB per process |
| Trace / pdb | Same as ARM/m68k | Verify ptrace works with MMU |
- All existing tests pass (exec, vfork, pipe, signal, trace, pdb).
- New VM tests verify COW, demand paging, and memory isolation.
test_pdbworks with the new architecture (ARM register set unchanged).
Same as the m68k port — refactor ARM code into src/arch/arm_m/ (M-profile)
before adding src/arch/arm_a/ (A-profile). This phase is shared with the
m68k porting effort.
- Write
src/arch/arm_a/boot.S— exception vectors (branch table at 0x0) - Write
src/arch/arm_a/start.S— mode stacks, MMU early init - Write
src/arch/arm_a/cpu.h— CP15 registers, TTBR, DACR, DFSR/IFSR - Identity-map first 16 MB + kernel high map, enable MMU
- Jump to virtual kernel, init PL011 UART
- Boot to
kmain()with UART output: "PiPAPo booting... (armv6)"
- Implement L1/L2 page table create/destroy/map/unmap
- Physical page allocator (bitmap, 512 MB)
- Kernel page table: direct-map all physical RAM at 0xC0000000
- Per-process page tables: allocate on proc_alloc(), free on proc_free()
- Context switch: save/restore r4-r14, update TTBR0 + ASID
- Data abort handler: dispatch to COW or demand page or SIGSEGV
- Real
fork()with COW (mark PTEs read-only, fault on write) exec()— load ELF to user virtual addresses (no XIP, full RAM load)_exit()— free all process pages and page tableswaitpid()— same as RP2040 (pure C, unchanged)brk()/mmap()— allocate virtual pages, back with physical on fault- Preemptive scheduling via ARM Timer IRQ
- Port musl libc for armv6 (musl supports armv6 out of the box)
- Build busybox for armv6 (static, musl)
- Interactive shell on QEMU
- Framebuffer console via GPU mailbox (HDMI output)
- EMMC/SD driver (SDHCI) — replaces SPI-mode SD
- GPU mailbox driver — clock configuration, framebuffer allocation
- HDMI framebuffer console
- USB driver (DWC2 OTG) — keyboard input (stretch goal)
- Boot from SD card on real Pi Zero hardware
| Feature | RP2040 (micro OS) | Pi Zero (modern OS) |
|---|---|---|
| Processes | 8 max, vfork only | 64+, real fork + COW |
| Memory per process | 128 KB max (4 KB stack + data pages) | Up to ~400 MB virtual |
| Memory protection | MPU (4 regions, coarse) | MMU (per-page, full isolation) |
| Address space | Flat physical | Per-process virtual |
| Page faults | HardFault → kill process | Data abort → COW / demand page |
| File cache | None (direct SD I/O) | Page cache in RAM (huge speedup) |
| Code execution | XIP from flash (fast, zero RAM) | Load to RAM (512 MB makes this fine) |
| Swap | Manual page-out to SD | Standard page-out via MMU |
| Dynamic linking | Not supported | Possible (full mmap + GOT/PLT) |
| Multi-user | Single-user (no isolation) | Full multi-user (MMU-enforced) |
| Networking | None | Possible via USB (Wi-Fi on Zero W) |
| Display | PicoCalc LCD (320×320) | HDMI (up to 1080p via GPU) |
| Risk | Impact | Mitigation |
|---|---|---|
| MMU complexity | Longest development phase | Incremental: section-mapped first, then 4 KB pages |
| Page table bugs | Kernel crashes, data corruption | Heavy use of QEMU + GDB; test COW with fork-heavy workloads |
| BCM2835 undocumented quirks | Stalled bringup | Reference Linux, Circle OS, and RPi bare-metal projects |
| USB driver complexity | No keyboard on real hardware | Use UART console first; USB is a Phase F stretch goal |
| GPU mailbox protocol | No HDMI output | Reference raspberrypi/firmware wiki; UART console as fallback |
| Scope creep (full Linux clone) | Never finishes | Keep PPAP philosophy: POSIX subset, busybox, correctness first |
-
QEMU machine:
raspi0vsversatilepb -cpu arm1176? The former is more realistic but has BCM2835-specific peripherals; the latter has standard ARM peripherals (PL190 VIC, SP804 timer) that are easier to start with. -
Kernel/user split: 3G/1G (0xC0000000) is standard Linux, but with 512 MB physical RAM, a 2G/2G split wastes less virtual space. 3G/1G is simpler and battle-tested.
-
Page table allocator: where do L2 page tables come from? Slab allocator for 1 KB L2 tables, or just allocate full 4 KB pages and pack 4 L2 tables per page?
-
File page cache: implement as part of the VFS vnode (each vnode holds a radix tree of cached pages)? Or a global page cache indexed by (device, block)?
-
Pi Zero 2 W: uses BCM2710 (quad-core Cortex-A53, ARMv8-A, 64-bit). Different enough to be a separate port. Focus on Pi Zero (v1) first for the ARMv6 MMU milestone.
-
Shared arch code with RP2040: both are ARM, but M-profile vs A-profile are so different that sharing assembly is impractical. The C-level syscall dispatcher, VFS, and filesystem code are shared. The arch split should be
arm_m(Cortex-M) vsarm_a(ARM11/A-class). -
Cross-architecture emulation: Pi Zero ARMv6 binaries can run on other PPAP targets via
ecpu-armv6(seedocs/ecpu/overview.md§4.5). Conversely, ARM Thumb and m68k binaries can run on Pi Zero PPAP via their respective eCPU emulators.
A (arch abstraction — split arm_m / arm_a)
└─→ B (QEMU bringup — boot, UART, exceptions)
└─→ C (MMU — page tables, TLB, fault handler)
└─→ D (process model — fork COW, exec, mmap)
└─→ E (user space — musl, busybox, shell)
└─→ F (Pi Zero hardware — SD, HDMI, USB)
Phases A and B can proceed in parallel with ongoing RP2040/X68000 work. Phase C (MMU) is the critical milestone — everything after it builds on virtual memory.
- docs/kernel/overview.md — PPAP kernel architecture
- docs/user/trace.md — Trace and debug subsystem
- docs/kernel/syscall.md — System call reference
- docs/proposals/pico2_port.md — Pico 2 port (RP2350, same ARM family, no MMU)
- docs/proposals/x68k_port.md — X68000 port (m68k, analogous porting effort)