On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1376a47fedeedb..4161241fc3228c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, return r; }
/*
* Special PTEs are never convertible into a struct page, even if the
* driver that owns them might have put a PFN with a struct page into
* the PFNMAP. If the arch doesn't support special then we cannot
* safely process these pages.
*/
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
if (pte_special(*ptep))
return -EINVAL;
On second thought this wont work, because it completely defeats the point of why this code here exists. remap_pfn_range() (which is what the various dma_mmap functions and the ioremap functions are built on top of too) sets VM_PFNMAP too, so this check would even catch the static mappings.
The problem with the way this code is designed is how it allows returning the pfn without taking any reference based on things like !pfn_valid or page_reserved. This allows it to then conditionally put back the reference based on the same reasoning. It is impossible to thread pte special into that since it is a PTE flag, not a property of the PFN.
I don't entirely understand why it needs the page reference at all, even if it is available - so I can't guess why it is OK to ignore the page reference in other cases, or why it is OK to be racy..
Eg hmm_range_fault() does not obtain page references and implements a very similar algorithm to kvm.
Plus these static mappings aren't all that static either, e.g. pci access also can revoke bar mappings nowadays.
And there are already mmu notifiers to handle that, AFAIK.
Jason