On Tue, Oct 22, 2024, Matthew Wilcox wrote:
On Tue, Oct 22, 2024 at 08:39:34AM -0700, Sean Christopherson wrote:
Trying to or maybe set VM_SPECIAL in kvm_vcpu_mmap()? I am not sure tbh, but this doesn't seem right.
Agreed. VM_DONTEXPAND is the only VM_SPECIAL flag that is remotely appropriate, but setting VM_DONTEXPAND could theoretically break userspace, and other than preventing mlock(), there is no reason why the VMA can't be expanded. I doubt any userspace VMM is actually remapping and expanding a vCPU mapping, but trying to fudge around this outside of core mm/ feels kludgy and has the potential to turn into a game of whack-a-mole.
Actually, VM_PFNMAP is probably ideal. We're not really mapping pages here (I mean, they are pages, but they're not filesystem pages or anonymous pages ... there's no rmap to them). We're mapping blobs of memory whose refcount is controlled by the vma that maps them. We don't particularly want to be able to splice() this memory, or do RDMA to it. We probably do want gdb to be able to read it (... yes?)
More than likely, yes. And we probably want the pages to show up in core dumps, and be gup()-able. I think that's the underlying problem with KVM's pages. In many cases, we want them to show up as vm_normal_page() pages. But for a few things, e.g. mlock(), it's nonsensical because they aren't entirely normal, just mostly normal.
which might be a complication with a PFNMAP VMA.
We've given a lot of flexibility to device drivers about how they implement mmap() and I think that's now getting in the way of some important improvements. I want to see a simpler way of providing the same functionality, and I'm not quite there yet.