On Wed, 23 Nov 2022 at 16:04, Jason Gunthorpe jgg@ziepe.ca wrote:
On Wed, Nov 23, 2022 at 03:28:27PM +0100, Daniel Vetter wrote:
This patch is known to be broken in so many ways. It also has a major security hole that it ignores the PTE flags making the page RO. Ignoring the special bit is somehow not surprising :(
This probably doesn't work, but is the general idea of what KVM needs to do:
Oh dear, when I dug around in there I entirely missed that kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs to grow a proper mmu notifier.
Another thing I'm wondering right now, the follow_pte(); fixup_user_fault(); follow_pte(); approach does not make any guarantees of actually being right. If you're sufficiently unlucky you might race against an immediate pte invalidate between the fixup and the 2nd follow_pte(). But you can also not loop, because that would fail to catch permanent faults.
Yes, it is pretty broken.
kvm already has support for mmu notifiers and uses it for other stuff. I can't remember what exactly this code path was for, IIRC Paolo talked about having a big rework/fix for it when we last talked about the missing write protect. I also vauagely recall he had some explanation why this might be safe.
I think the iommu fault drivers have a similar pattern.
Where? It shouldn't
The common code for SVA just calls handle_mm_fault() and restarts the PRI. Since the page table is physically shared there is no issue with a stale copy.
What am I missing here? Or is that also just broken. gup works around this with the slow path that takes the mmap sem and walking the vma tree, follow_pte/fixup_user_fautl users dont.
follow_pte() is just fundamentally broken, things must not use it.
Maybe mmu notifier based restarting would help with this too, if done properly.
That is called hmm_range_fault()
Ah right I mixed that up on a quick grep, thanks for pointing me in the right direction. Worries appeased. -Daniel