On 14/11/2025 15:18, Kalyazin, Nikita wrote:
On systems that support shared guest memory, write() is useful, for example, for population of the initial image. Even though the same can also be achieved via userspace mapping and memcpying from userspace, write() provides a more performant option because it does not need to set user page tables and it does not cause a page fault for every page like memcpy would. Note that memcpy cannot be accelerated via MADV_POPULATE_WRITE as it is not supported by guest_memfd and relies on GUP.
Populating 512MiB of guest_memfd on a x86 machine:
- via memcpy: 436 ms
- via write: 202 ms (-54%)
Only PAGE_ALIGNED offset and len are allowed. Even though non-aligned writes are technically possible, when in-place conversion support is implemented [1], the restriction makes handling of mixed shared/private huge pages simpler. write() will only be allowed to populate shared pages.
When direct map removal is implemented [2]
- write() will not be allowed to access pages that have already been removed from direct map
- on completion, write() will remove the populated pages from direct map
While it is technically possible to implement read() syscall on systems with shared guest memory, it is not supported as there is currently no use case for it.
[1] https://lore.kernel.org/kvm/cover.1760731772.git.ackerleytng@google.com [2] https://lore.kernel.org/kvm/20250924151101.2225820-1-patrick.roy@campus.lmu....
I failed to include links to previous versions:
v7: - Sean: add GUEST_MEMFD_FLAG_WRITE and documentation for it - Ackerley: only allow PAGE_ALIGNED offset and len - Sean/Ackerley: formatting fixes
v6: - https://lore.kernel.org/kvm/20251020161352.69257-1-kalyazin@amazon.com - Make write support conditional on mmap support instead of relying on the up-to-date flag to decide whether writing to a page is allowed - James: Remove dependencies on folio_test_large - James: Remove page alignment restriction - James: Formatting fixes
v5: - https://lore.kernel.org/kvm/20250902111951.58315-1-kalyazin@amazon.com - Replace the call to the unexported filemap_remove_folio with zeroing the bytes that could not be copied - Fix checkpatch findings
v4: - https://lore.kernel.org/kvm/20250828153049.3922-1-kalyazin@amazon.com - Switch from implementing the write callback to write_iter - Remove conditional compilation
v3: - https://lore.kernel.org/kvm/20250303130838.28812-1-kalyazin@amazon.com - David/Mike D: Only compile support for the write syscall if CONFIG_KVM_GMEM_SHARED_MEM (now gone) is enabled. v2: - https://lore.kernel.org/kvm/20241129123929.64790-1-kalyazin@amazon.com - Switch from an ioctl to the write syscall to implement population
v1: - https://lore.kernel.org/kvm/20241024095429.54052-1-kalyazin@amazon.com
Nikita Kalyazin (2): KVM: guest_memfd: add generic population via write KVM: selftests: update guest_memfd write tests
Documentation/virt/kvm/api.rst | 2 + include/linux/kvm_host.h | 2 +- include/uapi/linux/kvm.h | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 58 +++++++++++++++++-- virt/kvm/guest_memfd.c | 52 +++++++++++++++++ 5 files changed, 108 insertions(+), 7 deletions(-)
base-commit: 8a4821412cf2c1429fffa07c012dd150f2edf78c
2.50.1