Skip to content

Commit dcc6501

Browse files
hansendcKernel Patches Daemon
authored andcommitted
mm: Add RCU-based VMA lookup helper that waits for writers
== Background == There are basically two parallel ways to look up a VMA: the traditional way, which is protected by mmap_lock, and the RCU-based per-VMA lock way which is based on RCU and refcounts. == Problem == The mmap_lock one is more straightforward to use but it has a big disadvantage in that it can not be mixed with page faults since those can take mmap_lock for read, which can deadlock when mixed with page faults. For example: mmap_read_lock(mm); // Another thread does mmap_write_lock(). // New mmap_lock readers are blocked. vma = vma_lookup(mm, address); // This deadlocks on mmap_read_lock() if it faults: copy_from_user(address); mmap_read_unlock(mm); The RCU one can be mixed with faults, but it is not available in all configs, so all RCU users need to be able to fall back to the traditional way. == Solution == Add a variant of the RCU-based lookup that waits for writers. This is basically the same as the existing RCU-based lookup, but it also takes mmap_lock for read and waits for writers to finish before returning the VMA. This has some advantages: 1. Callers do not need to have a fallback path for when they collide with writers. 2. It can be used in contexts where page faults can happen because it can take the mmap_lock for read but never *holds* it. 3. Its fast path does not require taking mmap_lock for read. Basically, when applied correctly, this approach results in faster *and* simpler code. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: linux-mm@kvack.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arve Hjønnevåg <arve@android.com> Cc: Todd Kjos <tkjos@android.com> Cc: Christian Brauner <christian@brauner.io> Cc: Carlos Llamas <cmllamas@google.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Ahern <dsahern@kernel.org> Cc: netdev@vger.kernel.org
1 parent 354fd7e commit dcc6501

2 files changed

Lines changed: 30 additions & 0 deletions

File tree

include/linux/mmap_lock.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,9 @@ static inline bool vma_start_read_locked(struct vm_area_struct *vma)
257257
return vma_start_read_locked_nested(vma, 0);
258258
}
259259

260+
struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
261+
unsigned long address);
262+
260263
static inline void vma_end_read(struct vm_area_struct *vma)
261264
{
262265
vma_refcount_put(vma);

mm/mmap_lock.c

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -338,6 +338,33 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
338338
return NULL;
339339
}
340340

341+
/*
342+
* Find the VMA covering 'address' and lock it for reading. Waits for writers to
343+
* finish if the VMA is being modified. Returns NULL if there is no VMA covering
344+
* 'address'.
345+
*
346+
* Use only in code paths where no mmap_lock and no VMA lock is held.
347+
*
348+
* The fast path does not take mmap_lock.
349+
*/
350+
struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
351+
unsigned long address)
352+
{
353+
struct vm_area_struct *vma;
354+
355+
/* Fast path: return stable VMA covering 'address': */
356+
vma = lock_vma_under_rcu(mm, address);
357+
if (vma)
358+
return vma;
359+
360+
/* Slow path: preclude VMA writers by getting mmap read lock. */
361+
guard(rwsem_read)(&mm->mmap_lock);
362+
if (!vma_start_read_locked(vma))
363+
return NULL;
364+
365+
return vma;
366+
}
367+
341368
static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm,
342369
struct vma_iterator *vmi,
343370
unsigned long from_addr)

0 commit comments

Comments
 (0)