Introduce support for tight bounds on the kernel stack. by qwattash · Pull Request #2573 · CTSRD-CHERI/cheribsd

qwattash · 2026-02-26T15:01:35Z

The implementation differs slightly depending on the architecture.

RISC-V uses kstack for the pcb, kernframe and trapframe structures.
These patches set tight bounds for the pcb, kernframe and remainder of the kernel stack.
The trapframe is left together with the kernel stack, given that it is part of the normal arithmetic on the kernel stack pointer in the trap handlers at the moment.
The sscratchc register is now used to hold a pointer to struct kernframe instead of the full kstack capability.
The kernframe is expanded to hold a bounded pointer to the kernel stack region and a scratch pointer.
The scratch pointer is used in the trap handler to swap register contents with constrained use of CPU registers.

Edit:

Changed approach. Do not disrupt sscratchc or the existing stashed stack pointer. The following applies to all architectures:
1. The stashed / banked stack pointer includes all kstack, except for struct pcb bounds. Representability is guaranteed.
2. struct pcb has separate bounds that never overlap with the stack pointer.
3. td_kstack is considered the root capability for all kernel stack sub-allocations.
4. Upon entering the trap handler from userland, the trap handler accesses td_frame (and kernframe in RISC-V) from the csp pointer. It then proceeds to shrink csp to exclude the td_frame (and kernframe) regions. Again, representablity is guaranteed by construction of td_frame.
5. Upon exiting the trap handler to userland, the trap handler recovers the csp bounds to include td_frame (and kernframe) prior to loading registers and returning to userland.
Implemented Morello

Note that this is all gated by the CHERI_BOUNDED_KSTACK option, because I'd like to be able to measure the difference for dissertation purposes. Also, this is an intermediate patch for the use of local/global for kernel capability flow enforcement, which is currently WIP.

jrtc27 · 2026-02-26T15:04:17Z

Hm, I'm not immediately convinced the kernframe stuff is worth it

qwattash · 2026-02-26T15:12:43Z

Hm, I'm not immediately convinced the kernframe stuff is worth it

I asked myself a similar question, hence the kernel option to enable/disable it.

However, this is partially necessary for my local/global patches, where I'd like td_frame to not have STORE_LOCAL_CAP permission.
The current patch still does not separate td_frame, but should make it easier to split out. The alternative is to do some bounds-setting in the trap handler and avoid some of this trouble.
I'm not entirely sure whether this is the best way to handle this, but I'd like to avoid installing a kernel stack with full bounds in some form or another.

sys/riscv/include/pcb.h

This is used to enable tight bounds on data structures that share the kernel stack allocation, for example, struct pcb.

This enables tight bounds on the kernel stack sub-allocations. In particular, the pcb, kernframe and actual kernel stack are now fully separated. The td_kstack capability retains full bounds. This modifies the trap handlers to stash the kernframe structure in sscratchc, instead of the kernel stack pointer. The kernel stack and the pcpu capabilities are recovered from the kernframe structure, without assuming that out-of-bounds access is possible.

This simplifies the management of sscratch, leaving it unchanged. In user mode, sscratch still holds the full unbounded kstack (without the pcb) and the trap handler can use it to access kernframe and trapframe. Before entering the C exception handler, set narrower bounds on trapframe and the stack pointer installed in csp.

This is perhaps not the optimal place for these assertions, however these should hold every time we enter the kernel and check that the thread kstack context is in a consistent state.

This enables tight bounds on the kernel stack sub-allocations. In particular, the pcb, and actual kernel stack are now fully separated. The td_kstack capability retains full bounds. The exception handlers are modified to re-derive the trapframe and kernel stack capabilities from the root td_kstack capability.

bsdjhb

Did you consider doing the upstream approach for struct pcb used on amd64 where it is now just part of struct mdthread as a md_pcb field? That could be upstreamed to FreeBSD as well which might reduce our diff and reduce the complexity of this change a bit by not having to worry about the pcb anymore?

bsdjhb · 2026-03-25T13:21:10Z

sys/riscv/riscv/exception.S

+	 * user stack pointer while we keep kernelframe in sscratchc.
+	 */
+.if \mode == 0
+	/* Stash user ctp in kframe stash and place kframe ptr in ctp */


Why ctp rather than ct0 similar to the block of code above for hybrid kernels when deriving csp from ddc?

bsdjhb · 2026-03-25T13:23:58Z

sys/riscv/riscv/exception.S

+	/* Stash user ctp in kframe stash and place kframe ptr in ctp */
+	csc	ctp, (KF_SCRATCH)(csp)
+	cmove	ctp, csp
+	/* Fetch real kstack from kframe */


Normal style(9) is a blank line before comments. We don't use them when there is already an effective break in the flow due to C preprocessor or assembly macro conditionals.

bsdjhb · 2026-03-25T13:27:47Z

sys/arm64/arm64/trap.c

 	KASSERT((uintptr_t)get_pcpu() >= VM_MIN_KERNEL_ADDRESS,
 	    ("Invalid pcpu address from userland: %p (tpidr 0x%lx)",
 	     get_pcpu(), READ_SPECIALREG(tpidr_el1)));
+#ifdef CHERI_BOUNDED_KSTACK


Is this commit separate so that reviewers can evaluate both approaches? If so, if you end up choosing this version, please squash this down into the previous commit and revise the log message to reflect the end result. I'm not sure it's worth having the in-between stage in the history as-is.

bsdjhb · 2026-03-25T13:33:25Z

sys/riscv/riscv/exception.S

 .macro load_registers mode
+#ifdef CHERI_BOUNDED_KSTACK
+.if \mode == 0
+	/*


Did you consider just saving the full kstack cap you need in a new field in td_md that you can reload here so you don't have to do all this computation on each syscall exit, only when creating a new thread?

bsdjhb · 2026-03-25T13:36:16Z

sys/riscv/riscv/exception.S

 #include <machine/trap.h>
 #include <machine/riscvreg.h>

 .macro save_registers mode


I would add a comment here above the start of the macro to state that in the CHERI_KSTACK_BOUNDS case it intentionally returns a bounded pointer to the created trapframe in cs0 for use by callers instead of documenting that in the callers.

bsdjhb · 2026-03-25T13:37:27Z

sys/riscv/riscv/vm_machdep.c

 #if __has_feature(capabilities)
 	p2->p_md.md_sigcode = td1->td_proc->p_md.md_sigcode;
 #endif
+#ifdef __CHERI_PURE_CAPABILITY__


Should this be part of the earlier commit that cleared the permissions? (And can we add a similar assert on Morello?)

qwattash force-pushed the bounded-kstack branch 2 times, most recently from 862de4d to cb57508 Compare March 6, 2026 18:44

qwattash marked this pull request as ready for review March 6, 2026 18:50

qwattash requested review from brooksdavis, bsdjhb and jrtc27 March 6, 2026 18:50

qwattash commented Mar 16, 2026

View reviewed changes

sys/riscv/include/pcb.h Outdated Show resolved Hide resolved

qwattash added 10 commits March 23, 2026 15:43

cheri: Introduce CHERI_BOUNDED_KSTACK option.

ee3d67b

This is used to enable tight bounds on data structures that share the kernel stack allocation, for example, struct pcb.

vm: Strip executable permission bit from kernel stack capabilities.

b254ec0

vm: Set bounds on thread0 pcb.

c4fc0e6

cheri riscv: Add kstack bounds assertions in do_trap_user.

6921a40

This is perhaps not the optimal place for these assertions, however these should hold every time we enter the kernel and check that the thread kstack context is in a consistent state.

cheri riscv: Assert no executable permission on kernel stack capability.

40a3aa5

cheri riscv: Enable CHERI_BOUNDED_KSTACK by default for purecap kernels.

fb4fee4

cheri morello: Enable CHERI_BOUNDED_KSTACK by default on morello.

5cf5537

qwattash force-pushed the bounded-kstack branch from cb57508 to 5cf5537 Compare March 23, 2026 15:43

bsdjhb reviewed Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce support for tight bounds on the kernel stack.#2573

Introduce support for tight bounds on the kernel stack.#2573
qwattash wants to merge 10 commits intodevfrom
bounded-kstack

qwattash commented Feb 26, 2026 •

edited

Loading

Uh oh!

jrtc27 commented Feb 26, 2026

Uh oh!

qwattash commented Feb 26, 2026

Uh oh!

Uh oh!

bsdjhb left a comment

Uh oh!

bsdjhb Mar 25, 2026

Uh oh!

bsdjhb Mar 25, 2026

Uh oh!

bsdjhb Mar 25, 2026

Uh oh!

bsdjhb Mar 25, 2026

Uh oh!

bsdjhb Mar 25, 2026

Uh oh!

bsdjhb Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qwattash commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrtc27 commented Feb 26, 2026

Uh oh!

qwattash commented Feb 26, 2026

Uh oh!

Uh oh!

bsdjhb left a comment

Choose a reason for hiding this comment

Uh oh!

bsdjhb Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

bsdjhb Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

bsdjhb Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

bsdjhb Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

bsdjhb Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

bsdjhb Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qwattash commented Feb 26, 2026 •

edited

Loading