Skip to content

Per-page capability-dirty tracking, take N+1 #405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed

Conversation

nwf
Copy link
Member

@nwf nwf commented Mar 23, 2020

A better stab at capability dirty tracking atop master's VM bits.

Among other things, this switches from the revocation-based names for the MI bits and uses a more natural (or, well, VM-themed) scheme. While here, it obviates the PMAP_ENTER_NOSTORETAGS flag in favor of per-page MI bits (VPO_FASTCAPDIRTY).

This seems to boot both purecap and hybrid and run cheriabitest -u -a but I'd really enjoy some more eyes on it.

@jrtc27
Copy link
Member

jrtc27 commented Mar 23, 2020

Various stylistic comments. There's also inconsistency about !pte_test vs pte_test == 0; upstream seems to like the former (and is what you copied for your pmap_emulate_capdirty), but everything else you've added is using the latter, for better or worse; we should probably be consistent and do the latter.


if (try_emulate_capdirty(trapframe, NULL,
&p->p_vmspace->vm_pmap) == 0) {
if (!usermode) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know we're in a T_USER case, and there's no fallthrough (nor any gotos) to get us here otherwise, so this should always be true. You're also not passing kernel_pmap anyway for !usermode, and maybe this means you can get rid of the if (kpmap) check? Though maybe you can still trigger CP2 TLB faults for kernel VAs from usermode...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

I'm not sure whether we can get a C2E from user PCs on kernel VAs (I'd hope the segment checks fire first), but it seems like a reasonable defense in depth to only pass the kpmap in when coming from a kernel PC (in the prior call to try_emulate_capdirty), as there's certainly no reason to use it in the user PC case.

@nwf nwf force-pushed the capdirty branch 2 times, most recently from 3912449 to 8ecc0aa Compare March 23, 2020 22:56
@nwf nwf requested a review from jrtc27 March 23, 2020 23:19
@nwf
Copy link
Member Author

nwf commented Mar 24, 2020

There appears to be some bug in my understanding, still, as when I try to make this slightly lazier (setting VPO_FASTCAPDIRTY only after a C2E fault), I get some very strange errors that look like they're triggered by something in this series. Still debugging, expect revised submission later this week. (Gotta update the Cornucopia paper first.)

sys/vm/vm_page.h Outdated
* that it may begin tracking capability presence. (This will be more relevant
* once the notion of Capability Load Generations land; at the moment,
* VPO_FASTCAPDIRTY is expected to reflect the VM object-level permission;
* see OBJ_NOSTORETAGS.)
Copy link
Contributor

@brettferdosi brettferdosi Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment makes it seem like FASTCAPDIRTY corresponds to MCC (the page is permitted to contain capabilities and in fact is known to maybe contain capabilities), but the code seems to be using FASTCAPDIRTY like ASC (the page is permitted to store capabilities) - e.g. in init_pte_prot.

* the target page.
*/
if ((src->a.flags & PGA_CAPDIRTY) || (src->oflags & VPO_FASTCAPDIRTY))
vm_page_capdirty(dst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to copy FASTCAPDIRTY if it is set in src?

sys/vm/vm_page.h Outdated
* cleared. Setting this bit happens without the page lock held, but clearing
* it requires synchronization. Specifically, this bit may be cleared only if
* the page is xbusy and the clearer subsequently measures all tag bits on the
* page.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of this comment makes it seem like CAPDIRTY corresponds to an actual dirty bit (tracks cap stores since the last time it was cleared), but the part about requiring measurement of all tag bits on the page makes it seem like it corresponds to MCC (the page is permitted to contain capabilities and in fact is known to maybe contain capabilities). The code seems to be using CAPDIRTY like an actual dirty bit - e.g. init_pte_prot and pmap_is_capdirty. I think it would be useful to introduce another flag that corresponds to MCC.

sys/vm/vm_mmap.c Outdated
((m->a.flags & PGA_CAPDIRTY) != 0 ||
(m->flags & VPO_FASTCAPDIRTY) != 0 ||
pmap_is_capdirty(m) ||
(m->a.flags & PGA_CAPDIRTY) != 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to confirm that pmap_is_capdirty() changes the value of m->a.flags because the condition is checked twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pmap_is_capdirty takes the pmap lock and so is an opportunity for contention, even if it doesn't explicitly change the flag. This was much more important when mincore was going to be how we did efficient page skipping in userland, so I've just dropped this particular bit of complexity for this PR.

sys/vm/pmap.h Outdated
@@ -108,7 +108,6 @@ extern vm_offset_t kernel_vm_end;
#define PMAP_ENTER_WIRED 0x00000200
#if __has_feature(capabilities)
#define PMAP_ENTER_NOLOADTAGS 0x00000400
#define PMAP_ENTER_NOSTORETAGS 0x00000800
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this flag still potentially useful for setting PTE_CRO and the vm page flag for tracking that permission?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTE_CRO is going to be taken from VPO_CAPSTORE (what was FASTCAPDIRTY) or the object's NOSTORETAGS flag; the pmap can look up, since it has the vm_page_t rather than the vm needing to pull out duplicate information to pass down.

@nwf nwf requested a review from brettferdosi June 25, 2020 02:19
@nwf nwf changed the base branch from master to dev June 25, 2020 02:21
@nwf nwf marked this pull request as draft June 30, 2020 20:28
@nwf
Copy link
Member Author

nwf commented Jun 30, 2020

Converting to draft at least until upstream takes and we merge https://reviews.freebsd.org/D25523.

Copy link
Collaborator

@bsdjhb bsdjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly I need to start work on the VM_PROT_CAP_* changes.

KASSERT(!(msrca.flags & PGA_CAPSTORE)
|| (mdsta.flags & PGA_CAPSTORE),
("pmap_copy_page_internal CAPSTORE src & !dst"));
KASSERT(!(msrca.flags & PGA_CAPDIRTY)

This comment was marked as resolved.

@@ -4204,7 +4275,7 @@ pmap_clear_modify(vm_page_t m)
va += VM_PAGE_TO_PHYS(m) - PTE_TO_PHYS(oldl2);
l3 = pmap_l2_to_l3(l2, va);
pmap_clear_bits(l3, PTE_D | PTE_W);
vm_page_dirty(m);
pmap_page_dirty(oldl2, m);

This comment was marked as resolved.

Copy link
Collaborator

@bsdjhb bsdjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to close all the open comments I think are resolved. I think a few of the ones still open are probably stale.

sys/vm/vm_page.h Outdated
@@ -433,6 +433,21 @@ extern struct mtx_padalign pa_lock[];
* PGA_SWAP_FREE is used to defer freeing swap space to the pageout daemon
* when the context that dirties the page does not have the object write lock
* held.
*
* PGA_CAPSTORE right now reflects the object's capability store permission.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this comment might have been the source of my earlier confusion about PGA_CAPSTORE and the swap case with cap_store_on_alloc, etc. I think the comment should talk more about the intended purpose and probably not claim that is identical to OBJ_HASCAP.

@jrtc27
Copy link
Member

jrtc27 commented Feb 19, 2021

PGA_CAPSTORE is a bit weird, as it feels wrong to have VM_PROT_WRITE_CAP asserted yet not want to set PTE_CW. Would it be better to have the VM strip out VM_PROT_WRITE_CAP for !PGA_CAPSTORE and hide it as an implementation detail? Similarly can we reuse flags to remove some (all?) of the PGA_DIRTY uses in pmap other than calling into VM functions to dirty? There are very few queries of PGA_* in the pmap code and I don't like adding more.

@nwf nwf mentioned this pull request Feb 20, 2021
nwf-msr and others added 14 commits March 25, 2021 22:27
If ftype is VM_PROT_WRITE | VM_PROT_WRITE_CAP but the PTE has PTE_W clear, we'll
accidentally skip over the "goto done" and end up returning 1 as if we did
something.  For the same ftype, even if PTE_W is set, we won't assert PTE_D.
Fix by treating VM_PROT_WRITE as the flag it is rather than a whole value.
* If we're allocating a page to store capabilities to it, assume we're about to
  do that in the near future and let hardware mark it CAPDIRTY, rather than
  requiring another trap to mark it CAPSTORE first.

* Do the same thing if we're populating a page to store capabilities to it.

* If we're on the soft_fast path, we might be upgrading a page from not storing
  capabilities to storing capabilities, so mark it CAPSTORE here, too.
The swap pager might directly set tags on physical pages, bypassing the MMU.
Go ahead and mark the vm_page as CAPDIRTY (and CAPSTORE) in such cases.
While here, move EXCP_STORE_AMO_CAP_PAGE_FAULT traps to the ordinary
data_abort() path.  This builds directly on VM_PROT_WRITE_CAP, unlike in the
MIPS analogue, for which the equivalent of EXCP_STORE_AMO_CAP_PAGE_FAULT is a
C2E trap, not a TLB trap.
Now that we're tracking capdirty, it might be informative to show
userland.  Once upon a time, we thought this might be part of the
revoker implementation, but now we think it's just diagnostics.
When first mapping a page with abstract capability store permission, should we
map it CAPSTORE?  If we think that it's likely to experience actual capability
stores soon, yes; if not, then these cap-devoid pages will create extra work for
the (eventual) revoker.  Add sysctl so we can experiment; default to minimizing
traps (vm.capstore_on_alloc=1).
@nwf
Copy link
Member Author

nwf commented Apr 23, 2021

Obviated by #937, or take "N + 2".

@nwf nwf closed this Apr 23, 2021
@nwf nwf deleted the capdirty branch June 17, 2021 10:37
markjdb pushed a commit to markjdb/cheribsd that referenced this pull request Nov 7, 2023
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
Closes CTSRD-CHERI#405
Closes #13349
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants