Re: [PATCH 6.6] fork: defer linking file vma until vma is fully initialized

16 Jul 2024

      On Mon, 15 Jul 2024 18:06:25 -0700
Axel Rasmussen axelrasmussen@google.com wrote:
...
On Mon, Jul 15, 2024 at 3:21 PM Alex Williamson
alex.williamson@redhat.com wrote:
...
On Mon, 15 Jul 2024 13:35:41 -0700
Axel Rasmussen axelrasmussen@google.com wrote:
...
I tried out Sasha's suggestion. Note that *just* taking
aac6db75a9 ("vfio/pci: Use unmap_mapping_range()") is not sufficient, we also
need b7c5e64fec ("vfio: Create vfio_fs_type with inode per device").
But, the good news is both of those apply more or less cleanly to 6.6. And, at
least under a very basic test which exercises VFIO memory mapping, things seem
to work properly with that change.
I would agree with Leah that these seem a bit big to be stable fixes. But, I'm
encouraged by the fact that Sasha suggested taking them. If there are no big
objections (Alex? :) ) I can send the backport patches this week.
If you were to take those, I think you'd also want:
d71a989cf5d9 ("vfio/pci: Insert full vma on mmap'd MMIO fault")
which helps avoid a potential regression in VM startup latency vs
faulting each page of the VMA.  Ideally we'd have had huge_fault
working for pfnmaps before this conversion to avoid the latter commit.
I'm a bit confused by the lineage here though, 35e351780fa9 ("fork:
defer linking file vma until vma is fully initialized") entered v6.9
whereas these vfio changes all came in v6.10, so why does the v6.6
backport end up with dependencies on these newer commits?  Is there
something that needs to be fixed in v6.9-stable as well?
Right, I believe 35e351780fa9 introduced a bug for VFIO by calling
vm_ops->open() *before* copy_page_range(). So I think this bug affects
not just 6.6 (to which 35e351780fa9 was stable backported) but also
6.9 as you say.
The reason to bring up all these newer commits is, it's unclear how to
fix the bug. :) We thought we had a simple solution to just reorder
when vm_ops->open() is called, but Miaohe pointed out elsewhere in
this thread an issue with doing that.
Assuming the reordering is unworkable, the only other idea I have for
fixing the bug without the larger refactor is:

Mark VFIO VMAs VM_WIPEONFORK so we don't copy_page_range after

vm_ops->open() is called
2. Remove the WARN_ON_ONCE(1) in get_pat_info() so when VFIO zaps a
not-fully-populated range (expected if we never copy_page_range!) we
don't get a warning
There are downsides to this fix. It's kind of abusing VM_WIPEONFORK
for a new purpose. It's removing a warning which may catch other
legitimate problems. And it's diverging stable kernels from upstream
as Sasha points out.
Just backporting the refactors fixes (well, totally avoids) the bug,
and it doesn't require special hackery only for stable kernels.
Yes, I'd agree that we want to stay as close as possible to the current
upstream solution, even if we got there pretty haphazardly.  Therefore
it sounds like we should queue the following for v6.9-stable:
d71a989cf5d9 ("vfio/pci: Insert full vma on mmap'd MMIO fault")
aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()")
b7c5e64fecfa ("vfio: Create vfio_fs_type with inode per device")
And then anywhere that 35e351780fa9 ("fork: defer linking file vma
until vma is fully initialized") gets backported, those will also need
to follow.
Did anyone report an issue with 35e351780fa9 and vfio on v6.9 or the
previous v6.6 backport to use as a test case or do we just know it's an
issue from inspection?  The revert only notes an xfstest issue.  Thanks,
Alex

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 6.6] fork: defer linking file vma until vma is fully initialized