Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

17 Jan 2021


      ...
On Jan 17, 2021, at 1:16 AM, Yu Zhao yuzhao@google.com wrote:
On Sat, Jan 16, 2021 at 11:32:22PM -0800, Nadav Amit wrote:
...
...
On Jan 16, 2021, at 8:41 PM, Yu Zhao yuzhao@google.com wrote:
On Tue, Jan 12, 2021 at 09:43:38PM +0000, Will Deacon wrote:
...
On Tue, Jan 12, 2021 at 12:38:34PM -0800, Nadav Amit wrote:
...
...
On Jan 12, 2021, at 11:56 AM, Yu Zhao yuzhao@google.com wrote:
On Tue, Jan 12, 2021 at 11:15:43AM -0800, Nadav Amit wrote:
> I will send an RFC soon for per-table deferred TLB flushes tracking.
> The basic idea is to save a generation in the page-struct that tracks
> when deferred PTE change took place, and track whenever a TLB flush
> completed. In addition, other users - such as mprotect - would use
> the tlb_gather interface.
> 
> Unfortunately, due to limited space in page-struct this would only
> be possible for 64-bit (and my implementation is only for x86-64).
I don't want to discourage you but I don't think this would end up
well. PPC doesn't necessarily follow one-page-struct-per-table rule,
and I've run into problems with this before while trying to do
something similar.
Discourage, discourage. Better now than later.
It will be relatively easy to extend the scheme to be per-VMA instead of
per-table for architectures that prefer it this way. It does require
TLB-generation tracking though, which Andy only implemented for x86, so I
will focus on x86-64 right now.
Can you remind me of what we're missing on arm64 in this area, please? I'm
happy to help get this up and running once you have something I can build
on.
I noticed arm/arm64 don't support ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH.
Would it be something worth pursuing? Arm has been using mm_cpumask,
so it might not be too difficult I guess?
[ +Mel Gorman who implemented ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH ]
IIUC, there are at least two bugs in x86 implementation.
First, there is a missing memory barrier in tlbbatch_add_mm() between
inc_mm_tlb_gen() and the read of mm_cpumask().
In arch_tlbbatch_add_mm()? inc_mm_tlb_gen() has builtin barrier as its
comment says -- atomic update ops that return values are also full
memory barriers.
Yes, you are correct.
...
...
Second, try_to_unmap_flush() clears flush_required after flushing. Another
thread can call set_tlb_ubc_flush_pending() after the flush and before
flush_required is cleared, and the indication that a TLB flush is pending
can be lost.
This isn't a problem either because flush_required is per thread.
Sorry, I meant mm->tlb_flush_batched . It is not per-thread.
flush_tlb_batched_pending() clears it after flush and indications that
set_tlb_ubc_flush_pending() sets in between can be lost.
...
...
I am working on addressing these issues among others, but, as you already
saw, I am a bit slow.
On a different but related topic: Another thing that I noticed that Arm does
not do is batching TLB flushes across VMAs. Since Arm does not have its own
tlb_end_vma(), it uses the default tlb_end_vma(), which flushes each VMA
separately. Peter Zijlstra’s comment says that there are advantages in
flushing each VMA separately, but I am not sure it is better or intentional
(especially since x86 does not do so).
I am trying to remove the arch-specific tlb_end_vma() and have a config
option to control this behavior.
One thing worth noting is not all arm/arm64 hw versions support ranges.
(system_supports_tlb_range()). But IIUC what you are trying to do, this
isn't a problem.
I just wanted to get rid of arch-specific tlb_start_vma() it in order to
cleanup the code (I am doing first the VMA-deferred tracking, as you asked).
While I was doing that, I noticed that Arm does per-VMA TLB flushes. I do
not know whether it is intentional, but it seems rather easy to get this
behavior unintentionally.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect