[Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

23 Nov 2022

On Wed, 23 Nov 2022 at 14:28, Jason Gunthorpe jgg@ziepe.ca wrote:
...
On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
...
Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
...
On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
...
Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
...
On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
...
> Maybe a GFP flag to set the page reference count to zero or something
> like this?
Hm yeah that might work. I'm not sure what it will all break though?
And we'd need to make sure that underflowing the page refcount dies in
a backtrace.
Mucking with the refcount like this to protect against crazy out of
tree drives seems horrible..
Well not only out of tree drivers. The intree KVM got that horrible
wrong as well, those where the latest guys complaining about it.
kvm was taking refs on special PTEs? That seems really unlikely?
Well then look at this code here:
commit add6a0cd1c5ba51b201e1361b05a5df817083618
Author: Paolo Bonzini pbonzini@redhat.com
Date:   Tue Jun 7 17:51:18 2016 +0200
KVM: MMU: try to fix up page faults before giving up

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault

handler
    then can use remap_pfn_range to place some non-reserved pages in the
VMA.
This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
and fixup_user_fault together help supporting it.  The patch also

supports
    VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
    reference counting.
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Neo Jia <cjia@nvidia.com>
Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

This patch is known to be broken in so many ways. It also has a major
security hole that it ignores the PTE flags making the page
RO. Ignoring the special bit is somehow not surprising :(
This probably doesn't work, but is the general idea of what KVM needs
to do:
Oh dear, when I dug around in there I entirely missed that
kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs
to grow a proper mmu notifier.
Another thing I'm wondering right now, the follow_pte();
fixup_user_fault(); follow_pte(); approach does not make any
guarantees of actually being right. If you're sufficiently unlucky you
might race against an immediate pte invalidate between the fixup and
the 2nd follow_pte(). But you can also not loop, because that would
fail to catch permanent faults.
I think the iommu fault drivers have a similar pattern.
What am I missing here? Or is that also just broken. gup works around
this with the slow path that takes the mmap sem and walking the vma
tree, follow_pte/fixup_user_fautl users dont. Maybe mmu notifier based
restarting would help with this too, if done properly.
-Daniel
...

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1376a47fedeedb..4161241fc3228c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
                        return r;
        }

  /*


   * Special PTEs are never convertible into a struct page, even if the


   * driver that owns them might have put a PFN with a struct page into


   * the PFNMAP. If the arch doesn't support special then we cannot


   * safely process these pages.


   */



+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL

  if (pte_special(*ptep))


          return -EINVAL;



+#else

  return -EINVAL;



+#endif

  if (write_fault && !pte_write(*ptep)) {
          pfn = KVM_PFN_ERR_RO_FAULT;
          goto out;




Jason
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap