Re: [PATCH] mm/mmap_lock: Reset maple state on lock_vma_under_rcu() retry

14 Nov 2025

      * Lorenzo Stoakes lorenzo.stoakes@oracle.com [251114 06:51]:
...
On Thu, Nov 13, 2025 at 12:28:58PM -0500, Liam R. Howlett wrote:
...

Lorenzo Stoakes lorenzo.stoakes@oracle.com [251113 05:45]:

...
On Thu, Nov 13, 2025 at 12:04:19AM +0000, Matthew Wilcox wrote:
...
On Wed, Nov 12, 2025 at 03:06:38PM +0000, Lorenzo Stoakes wrote:
...
...
Any time the rcu read lock is dropped, the maple state must be
invalidated.  Resetting the address and state to MA_START is the safest
course of action, which will result in the next operation starting from
the top of the tree.
Since we all missed it I do wonder if we need some super clear comment
saying 'hey if you drop + re-acquire RCU lock you MUST revalidate mas state
by doing 'blah'.
I mean, this really isn't an RCU thing.  This is also bad:
spin_lock(a);
   p = *q;
   spin_unlock(a);
   spin_lock(a);
   b = *p;
p could have been freed while you didn't hold lock a.  Detecting this
kind of thing needs compiler assistence (ie Rust) to let you know that
you don't have the right to do that any more.
Right but in your example the use of the pointers is _realy clear_. In the
mas situation, the pointers are embedded in the helper struct, there's a
state machine, etc. so it's harder to catch this.
We could modify the above example to use a helper struct and the same
problem would arise...
I disagree.
It's a helper struct with a state machine, manipulated by API functions. In fact
it turns out we _can_ recover this state even after dropping/reacquiring the
lock by calling the appropriate API functions to do so.
You manipulate this state via mas_xxx() commands, and in fact we resolve the
lock issue by issuing the correct one.
The state is never recovered.. it's re-established entirely.
What is happening is, we are walking a tree data structure and keeping
tack of where we are by keeping a pointer to the node.  This node
remains stable as long as the rcu or write lock is held for the tree.
If you are not unlocking, you could see how keeping the node for
prev/next operations would be much faster.. it's just one pointer away.
When you drop whatever lock you are holding, that node may disappear,
which is what happened in this bug.
When you mas_reset() or mas_set() or mas_set_range(), then you are
setting the node in the maple state to MA_START.  Any operation you call
from then on will start over (get the root node and re-walk the tree).
So, calling the reset removes any potentially stale pointers but does
not restore any state.
mas_set()/mas_set_range() sets the index and last to what you are
looking for, which is part of the state.  The operations will set the
index/last to the range it finds on a search.  In the vma case, this
isn't very useful since we have vm_start/vm_end.
The state is re-established once you call the api to find something
again.
This is, imo, very close to having a vma in a helper struct, then
calling a function that drops the mmap lock, reacquires the lock, and
continues to use the vma.  The only way to restore the vma helper struct
to a safe state is to do the vma lookup again and replace the
(potentially) stale vma pointer.
If, say, for some reason, during copy_vma() we needed to drop the lock
after vma_merge_new_range().  We'd have to restore vmg->target to
whatever it was pointed to by vmg->start.. but vmg->start might not be
right if vmg->target was set to the previous vma.  We'd have to set
vmg->target = vma_lookup(vmg->orig_start) or such, then re-evaluate the
merging scenario.
I don't really see a difference in mas->node being invalid after a
locking dance vs vmg->target being invalid if there was a locking dance.
I also think it's fair to say that vma_merge_new_range() is an api that
copy_vma() is using.
I do see that hiding it in an API could be missed, but the API exists
because the mas struct is used in a lot of places that are in and around
locking like this.
I'll add to the documentation, but I suspect it won't really help.
...
Thanks,
Liam

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] mm/mmap_lock: Reset maple state on lock_vma_under_rcu() retry