On 17.10.24 19:16, Jason Gunthorpe wrote:
On Thu, Oct 17, 2024 at 07:11:46PM +0200, David Hildenbrand wrote:
On 17.10.24 18:47, Jason Gunthorpe wrote:
On Thu, Oct 17, 2024 at 10:58:29AM -0400, Peter Xu wrote:
My question was more torwards whether gmemfd could still expose the possibility to be used in VA forms to other modules that may not support fd+offsets yet.
I keep hearing they don't want to support page pinning on a guestmemfd mapping, so VA based paths could not work.
For shared pages it absolutely must work. That's what I keep hearing :)
Oh that's confusing. I assume non longterm pins desired on shared pages though??
For user space to driver I/O to shared pages GUP is often required (e.g., O_DIRECT), as was raised at LPC in a session IIRC (someone brought up a use case that involved vhost-user and friends).
Of course, for the guest_memfd use cases where we want to remove also shared pages from the directmap, it's not possible, but let's put that aside (I recall there was a brief discussion at LPC about that: it's tricky for shared memory for exactly this reason -- I/O).
longterm pins would have to be used with care, and it's under user-space control, and user-space must be aware of the implications: for example, registering shared pages as fixed buffers for liburing is possible, but when a conversion to private is requested it must unregister these buffers.
(in VFIO terms, a prior unmap operation would be required)
Of course, a conversion to private will not work as long as the pages are pinned, and this is under user space control.
If the guest attempts to perform such a conversion while pages will be pinned, there will likely be a notification to user space (we touched on that today in the upstream call) that something is blocking the conversion of that page, and user space has to fix that up and retry.
It's not expected to matter much in practice, but it can be triggered and there must be a way to handle it: if a guest triggers a shared->private conversion while there is still I/O going on the page, something is messed up, and the conversion will be delayed until the I/O is done and the page can be converted.
There are still quite some things to be clarified, but this is my understanding so far.