* Lorenzo Stoakes lorenzo.stoakes@oracle.com [251112 10:06]:
+cc Paul for hare-brained idea
On Tue, Nov 11, 2025 at 04:56:05PM -0500, Liam R. Howlett wrote:
The retry in lock_vma_under_rcu() drops the rcu read lock before reacquiring the lock and trying again. This may cause a use-after-free if the maple node the maple state was using was freed.
The maple state is protected by the rcu read lock. When the lock is dropped, the state cannot be reused as it tracks pointers to objects that may be freed during the time where the lock was not held.
Any time the rcu read lock is dropped, the maple state must be invalidated. Resetting the address and state to MA_START is the safest course of action, which will result in the next operation starting from the top of the tree.
Since we all missed it I do wonder if we need some super clear comment saying 'hey if you drop + re-acquire RCU lock you MUST revalidate mas state by doing 'blah'.
I think one source of confusion for me with maple tree operations is - what to do if we are in a position where some kind of reset is needed?
So even if I'd realised 'aha we need to reset this' it wouldn't be obvious to me that we ought to set to the address.
I guess a mas_reset() would keep mas->index, last where they where which also wouldn't be right would it?
mas->index and mas->last are updated to the values of the entry you found. So if you ran a mas_find(), the operation will continue until the limit is hit, or if you did a next/prev the address would be lost.
This is why I say mas_set() is safer, because it will ensure we return to the same situation we started from, regardless of the operation.
In which case a mas_reset() is _also_ not a valid operation if invoked after dropping/reacquiring the RCU lock right?
In this case we did a mas_walk(), so a mas_reset() would be fine here.
Prior to commit 0b16f8bed19c ("mm: change vma_start_read() to drop RCU lock on failure"), the rcu read lock was dropped and NULL was returned, so the retry would not have happened. However, now that the read lock is dropped regardless of the return, we may use a freed maple tree node cached in the maple state on retry.
Cc: Suren Baghdasaryan surenb@google.com Cc: stable@vger.kernel.org Fixes: 0b16f8bed19c ("mm: change vma_start_read() to drop RCU lock on failure") Reported-by: syzbot+131f9eb2b5807573275c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=131f9eb2b5807573275c Signed-off-by: Liam R. Howlett Liam.Howlett@oracle.com
The reasoning seems sensible & LGTM, so:
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
mm/mmap_lock.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c index 39f341caf32c0..f2532af6208c0 100644 --- a/mm/mmap_lock.c +++ b/mm/mmap_lock.c @@ -257,6 +257,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, if (PTR_ERR(vma) == -EAGAIN) { count_vm_vma_lock_event(VMA_LOCK_MISS); /* The area was replaced with another one */
mas_set(&mas, address);I wonder if we could detect that the RCU lock was released (+ reacquired) in mas_walk() in a debug mode, like CONFIG_VM_DEBUG_MAPLE_TREE?
Not sure if that's feasible, maybe Paul can comment? :)
I think Vlastimil made a similar kind of comment possibly off-list.
Would there be much overhead if we just did this:
retry: rcu_read_lock(); mas_set(&mas, address); vma = mas_walk(&mas);
The retry path will be super rare, and I think the compiler should be smart enough to not assign index, last twice and this would protect us.
This is what existed before the 0b16f8bed19c change, which was introduced to try and avoid exactly these issues.
I think there's no real way to avoid the complications of an rcu data structure. We've tried to make the interface as simple as possible, and in doing so, have hidden the implementation details of what happens in the 'state' - which is where all these troubles arise.
I can add more documentation around the locking and maple state, hopefully people will find it useful and not just exist to go out of date.
Then we could have some function like:
mas_walk_from(&mas, address);
That did this.
Or, since we _literally_ only use mas for this one walk, have a simple function like:
/** * ... * Performs a single walk of a maple tree to the specified address, * returning a pointer to an entry if found, or NULL if not. * ... */ static void *mt_walk(struct maple_tree *mt, unsigned long address) { MA_STATE(mas, mt, address, adderss);
lockdep_assert_in_rcu_read_lock(); return mas_walk(&mas);}
That literally just does the walk as a one-off?
You have described mtree_load(). The mas_ interfaces are designed for people to handle the locks themselves. The mtree_ interface handles the locking for people.
I don't think we are using the simple interface because we are using the rcu read lock for the vma as well.
If you want to use the simple mtree_load() interface here, I think we'll need two rcu_read_lock()/unlock() calls (nesting is supported fwiu). I don't think we want to nest the locks in this call path though.
...
Thanks, Liam