-
Notifications
You must be signed in to change notification settings - Fork 62
Per-page capability-dirty tracking, take N+1 #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Various stylistic comments. There's also inconsistency about |
sys/mips/mips/trap.c
Outdated
|
||
if (try_emulate_capdirty(trapframe, NULL, | ||
&p->p_vmspace->vm_pmap) == 0) { | ||
if (!usermode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know we're in a T_USER case, and there's no fallthrough (nor any gotos) to get us here otherwise, so this should always be true. You're also not passing kernel_pmap
anyway for !usermode
, and maybe this means you can get rid of the if (kpmap)
check? Though maybe you can still trigger CP2 TLB faults for kernel VAs from usermode...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
I'm not sure whether we can get a C2E from user PCs on kernel VAs (I'd hope the segment checks fire first), but it seems like a reasonable defense in depth to only pass the kpmap in when coming from a kernel PC (in the prior call to try_emulate_capdirty), as there's certainly no reason to use it in the user PC case.
3912449
to
8ecc0aa
Compare
There appears to be some bug in my understanding, still, as when I try to make this slightly lazier (setting |
sys/vm/vm_page.h
Outdated
* that it may begin tracking capability presence. (This will be more relevant | ||
* once the notion of Capability Load Generations land; at the moment, | ||
* VPO_FASTCAPDIRTY is expected to reflect the VM object-level permission; | ||
* see OBJ_NOSTORETAGS.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment makes it seem like FASTCAPDIRTY corresponds to MCC (the page is permitted to contain capabilities and in fact is known to maybe contain capabilities), but the code seems to be using FASTCAPDIRTY like ASC (the page is permitted to store capabilities) - e.g. in init_pte_prot.
sys/mips/mips/pmap.c
Outdated
* the target page. | ||
*/ | ||
if ((src->a.flags & PGA_CAPDIRTY) || (src->oflags & VPO_FASTCAPDIRTY)) | ||
vm_page_capdirty(dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also want to copy FASTCAPDIRTY if it is set in src?
sys/vm/vm_page.h
Outdated
* cleared. Setting this bit happens without the page lock held, but clearing | ||
* it requires synchronization. Specifically, this bit may be cleared only if | ||
* the page is xbusy and the clearer subsequently measures all tag bits on the | ||
* page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of this comment makes it seem like CAPDIRTY corresponds to an actual dirty bit (tracks cap stores since the last time it was cleared), but the part about requiring measurement of all tag bits on the page makes it seem like it corresponds to MCC (the page is permitted to contain capabilities and in fact is known to maybe contain capabilities). The code seems to be using CAPDIRTY like an actual dirty bit - e.g. init_pte_prot and pmap_is_capdirty. I think it would be useful to introduce another flag that corresponds to MCC.
sys/vm/vm_mmap.c
Outdated
((m->a.flags & PGA_CAPDIRTY) != 0 || | ||
(m->flags & VPO_FASTCAPDIRTY) != 0 || | ||
pmap_is_capdirty(m) || | ||
(m->a.flags & PGA_CAPDIRTY) != 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to confirm that pmap_is_capdirty() changes the value of m->a.flags because the condition is checked twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pmap_is_capdirty
takes the pmap
lock and so is an opportunity for contention, even if it doesn't explicitly change the flag. This was much more important when mincore
was going to be how we did efficient page skipping in userland, so I've just dropped this particular bit of complexity for this PR.
sys/vm/pmap.h
Outdated
@@ -108,7 +108,6 @@ extern vm_offset_t kernel_vm_end; | |||
#define PMAP_ENTER_WIRED 0x00000200 | |||
#if __has_feature(capabilities) | |||
#define PMAP_ENTER_NOLOADTAGS 0x00000400 | |||
#define PMAP_ENTER_NOSTORETAGS 0x00000800 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this flag still potentially useful for setting PTE_CRO and the vm page flag for tracking that permission?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTE_CRO is going to be taken from VPO_CAPSTORE
(what was FASTCAPDIRTY
) or the object's NOSTORETAGS
flag; the pmap
can look up, since it has the vm_page_t
rather than the vm
needing to pull out duplicate information to pass down.
Converting to draft at least until upstream takes and we merge https://reviews.freebsd.org/D25523. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly I need to start work on the VM_PROT_CAP_*
changes.
sys/riscv/riscv/pmap.c
Outdated
KASSERT(!(msrca.flags & PGA_CAPSTORE) | ||
|| (mdsta.flags & PGA_CAPSTORE), | ||
("pmap_copy_page_internal CAPSTORE src & !dst")); | ||
KASSERT(!(msrca.flags & PGA_CAPDIRTY) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
sys/riscv/riscv/pmap.c
Outdated
@@ -4204,7 +4275,7 @@ pmap_clear_modify(vm_page_t m) | |||
va += VM_PAGE_TO_PHYS(m) - PTE_TO_PHYS(oldl2); | |||
l3 = pmap_l2_to_l3(l2, va); | |||
pmap_clear_bits(l3, PTE_D | PTE_W); | |||
vm_page_dirty(m); | |||
pmap_page_dirty(oldl2, m); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to close all the open comments I think are resolved. I think a few of the ones still open are probably stale.
sys/vm/vm_page.h
Outdated
@@ -433,6 +433,21 @@ extern struct mtx_padalign pa_lock[]; | |||
* PGA_SWAP_FREE is used to defer freeing swap space to the pageout daemon | |||
* when the context that dirties the page does not have the object write lock | |||
* held. | |||
* | |||
* PGA_CAPSTORE right now reflects the object's capability store permission. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this comment might have been the source of my earlier confusion about PGA_CAPSTORE and the swap case with cap_store_on_alloc, etc. I think the comment should talk more about the intended purpose and probably not claim that is identical to OBJ_HASCAP.
PGA_CAPSTORE is a bit weird, as it feels wrong to have VM_PROT_WRITE_CAP asserted yet not want to set PTE_CW. Would it be better to have the VM strip out VM_PROT_WRITE_CAP for !PGA_CAPSTORE and hide it as an implementation detail? Similarly can we reuse flags to remove some (all?) of the PGA_DIRTY uses in pmap other than calling into VM functions to dirty? There are very few queries of PGA_* in the pmap code and I don't like adding more. |
If ftype is VM_PROT_WRITE | VM_PROT_WRITE_CAP but the PTE has PTE_W clear, we'll accidentally skip over the "goto done" and end up returning 1 as if we did something. For the same ftype, even if PTE_W is set, we won't assert PTE_D. Fix by treating VM_PROT_WRITE as the flag it is rather than a whole value.
* If we're allocating a page to store capabilities to it, assume we're about to do that in the near future and let hardware mark it CAPDIRTY, rather than requiring another trap to mark it CAPSTORE first. * Do the same thing if we're populating a page to store capabilities to it. * If we're on the soft_fast path, we might be upgrading a page from not storing capabilities to storing capabilities, so mark it CAPSTORE here, too.
The swap pager might directly set tags on physical pages, bypassing the MMU. Go ahead and mark the vm_page as CAPDIRTY (and CAPSTORE) in such cases.
While here, move EXCP_STORE_AMO_CAP_PAGE_FAULT traps to the ordinary data_abort() path. This builds directly on VM_PROT_WRITE_CAP, unlike in the MIPS analogue, for which the equivalent of EXCP_STORE_AMO_CAP_PAGE_FAULT is a C2E trap, not a TLB trap.
Now that we're tracking capdirty, it might be informative to show userland. Once upon a time, we thought this might be part of the revoker implementation, but now we think it's just diagnostics.
When first mapping a page with abstract capability store permission, should we map it CAPSTORE? If we think that it's likely to experience actual capability stores soon, yes; if not, then these cap-devoid pages will create extra work for the (eventual) revoker. Add sysctl so we can experiment; default to minimizing traps (vm.capstore_on_alloc=1).
Co-authored-by: Brett Gutstein <[email protected]>
Obviated by #937, or take "N + 2". |
Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Kay Pedersen <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050 Closes CTSRD-CHERI#405 Closes #13349
A better stab at capability dirty tracking atop
master
's VM bits.Among other things, this switches from the revocation-based names for the MI bits and uses a more natural (or, well, VM-themed) scheme. While here, it obviates the
PMAP_ENTER_NOSTORETAGS
flag in favor of per-page MI bits (VPO_FASTCAPDIRTY
).This seems to boot both purecap and hybrid and run
cheriabitest -u -a
but I'd really enjoy some more eyes on it.