This series adds the base support to preserve a VFIO device file across a Live Update. "Base support" means that this allows userspace to safetly preserve a VFIO device file with LIVEUPDATE_SESSION_PRESERVE_FD and retrieve a preserved VFIO device file with LIVEUPDATE_SESSION_RETRIEVE_FD, but the device itself is not preserved in a fully running state across Live Update.
This series unblocks 2 parallel but related streams of work:
- iommufd preservation across Live Update. This work spans iommufd, the IOMMU subsystem, and IOMMU drivers [1]
- Preservation of VFIO device state across Live Update (config space, BAR addresses, power state, SR-IOV state, etc.). This work spans both VFIO and the core PCI subsystem.
While we need all of the above to fully preserve a VFIO device across a Live Update without disrupting the workload on the device, this series aims to be functional and safe enough to merge as the first incremental step toward that goal.
Areas for Discussion --------------------
BDF Stability across Live Update
The PCI support for tracking preserved devices across a Live Update to prevent auto-probing relies on PCI segment numbers and BDFs remaining stable. For now I have disallowed VFs, as the BDFs assigned to VFs can vary depending on how the kernel chooses to allocate bus numbers. For non-VFs I am wondering if there is any more needed to ensure BDF stability across Live Update.
While we would like to support many different systems and configurations in due time (including preserving VFs), I'd like to keep this first serses constrained to simple use-cases.
FLB Locking
I don't see a way to properly synchronize pci_flb_finish() with pci_liveupdate_incoming_is_preserved() since the incoming FLB mutex is dropped by liveupdate_flb_get_incoming() when it returns the pointer to the object, and taking pci_flb_incoming_lock in pci_flb_finish() could result in a deadlock due to reversing the lock ordering.
FLB Retrieving
The first patch of this series includes a fix to prevent an FLB from being retrieved again it is finished. I am wondering if this is the right approach or if subsystems are expected to stop calling liveupdate_flb_get_incoming() after an FLB is finished.
Testing -------
The patches at the end of this series provide comprehensive selftests for the new code added by this series. The selftests have been validated in both a VM environment using a virtio-net PCIe device, and in a baremetal environment on an Intel EMR server with an Intel DSA device.
Here is an example of how to run the new selftests:
vfio_pci_liveupdate_uapi_test:
$ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0 $ tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test 0000:00:04.0 $ tools/testing/selftests/vfio/scripts/cleanup.sh
vfio_pci_liveupdate_kexec_test:
$ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0 $ tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test --stage 1 0000:00:04.0 $ kexec [...] # NOTE: distro-dependent
$ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0 $ tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test --stage 2 0000:00:04.0 $ tools/testing/selftests/vfio/scripts/cleanup.sh
Dependencies ------------
This series was constructed on top of several in-flight series and on top of mm-nonmm-unstable [2].
+-- This series | +-- [PATCH v2 00/18] vfio: selftests: Support for multi-device tests | https://lore.kernel.org/kvm/20251112192232.442761-1-dmatlack@google.com/ | +-- [PATCH v3 0/4] vfio: selftests: update DMA mapping tests to use queried IOVA ranges | https://lore.kernel.org/kvm/20251111-iova-ranges-v3-0-7960244642c5@fb.com/ | +-- [PATCH v8 0/2] Live Update: File-Lifecycle-Bound (FLB) State | https://lore.kernel.org/linux-mm/20251125225006.3722394-1-pasha.tatashin@sol... | +-- [PATCH v8 00/18] Live Update Orchestrator | https://lore.kernel.org/linux-mm/20251125165850.3389713-1-pasha.tatashin@sol... |
To simplify checking out the code, this series can be found on GitHub:
https://github.com/dmatlack/linux/tree/liveupdate/vfio/cdev/v1
Changelog ---------
v1: - Rebase series on top of LUOv8 and VFIO selftests improvements - Drop commits to preserve config space fields across Live Update. These changes require changes to the PCI layer. For exmaple, preserving rbars could lead to an inconsistent device state until device BARs addresses are preserved across Live Update. - Drop commits to preserve Bus Master Enable on the device. There's no reason to preserve this until iommufd preservation is fully working. Furthermore, preserving Bus Master Enable could lead to memory corruption when the device if the device is bound to the default identity-map domain after Live Update. - Drop commits to preserve saved PCI state. This work is not needed until we are ready to preserve the device's config space, and requires more thought to make the PCI state data layout ABI-friendly. - Add support to skip auto-probing devices that are preserved by VFIO to avoid them getting bound to a different driver by the next kernel. - Restrict device preservation further (no VFs, no intel-graphics). - Various refactoring and small edits to improve readability and eliminate code duplication.
rfc: https://lore.kernel.org/kvm/20251018000713.677779-1-vipinsh@google.com/
Cc: Saeed Mahameed saeedm@nvidia.com Cc: Adithya Jayachandran ajayachandra@nvidia.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Parav Pandit parav@nvidia.com Cc: Leon Romanovsky leonro@nvidia.com Cc: William Tu witu@nvidia.com Cc: Jacob Pan jacob.pan@linux.microsoft.com Cc: Lukas Wunner lukas@wunner.de Cc: Pasha Tatashin pasha.tatashin@soleen.com Cc: Mike Rapoport rppt@kernel.org Cc: Pratyush Yadav pratyush@kernel.org Cc: Samiullah Khawaja skhawaja@google.com Cc: Chris Li chrisl@kernel.org Cc: Josh Hilke jrhilke@google.com Cc: David Rientjes rientjes@google.com
[1] https://lore.kernel.org/linux-iommu/20250928190624.3735830-1-skhawaja@google... [2] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/log/?h=mm-nonmm-...
David Matlack (12): liveupdate: luo_flb: Prevent retrieve() after finish() PCI: Add API to track PCI devices preserved across Live Update PCI: Require driver_override for incoming Live Update preserved devices vfio/pci: Notify PCI subsystem about devices preserved across Live Update vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD vfio/pci: Store Live Update state in struct vfio_pci_core_device vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED vfio: selftests: Add vfio_pci_liveupdate_uapi_test vfio: selftests: Expose iommu_modes to tests vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device vfio: selftests: Verify that opening VFIO device fails during Live Update vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test
Vipin Sharma (9): vfio/pci: Register a file handler with Live Update Orchestrator vfio/pci: Preserve vfio-pci device files across Live Update vfio/pci: Retrieve preserved device files after Live Update vfio/pci: Skip reset of preserved device after Live Update selftests/liveupdate: Move luo_test_utils.* into a reusable library selftests/liveupdate: Add helpers to preserve/retrieve FDs vfio: selftests: Build liveupdate library in VFIO selftests vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD vfio: selftests: Add vfio_pci_liveupdate_kexec_test
MAINTAINERS | 1 + drivers/pci/Makefile | 1 + drivers/pci/liveupdate.c | 248 ++++++++++++++++ drivers/pci/pci-driver.c | 12 +- drivers/vfio/device_cdev.c | 25 +- drivers/vfio/group.c | 9 + drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/vfio_pci.c | 11 +- drivers/vfio/pci/vfio_pci_core.c | 23 +- drivers/vfio/pci/vfio_pci_liveupdate.c | 278 ++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 16 + drivers/vfio/vfio.h | 13 - drivers/vfio/vfio_main.c | 22 +- include/linux/kho/abi/pci.h | 53 ++++ include/linux/kho/abi/vfio_pci.h | 45 +++ include/linux/liveupdate.h | 3 + include/linux/pci.h | 38 +++ include/linux/vfio.h | 51 ++++ include/linux/vfio_pci_core.h | 7 + kernel/liveupdate/luo_flb.c | 4 + tools/testing/selftests/liveupdate/.gitignore | 1 + tools/testing/selftests/liveupdate/Makefile | 14 +- .../include/libliveupdate.h} | 11 +- .../selftests/liveupdate/lib/libliveupdate.mk | 20 ++ .../{luo_test_utils.c => lib/liveupdate.c} | 43 ++- .../selftests/liveupdate/luo_kexec_simple.c | 2 +- .../selftests/liveupdate/luo_multi_session.c | 2 +- tools/testing/selftests/vfio/Makefile | 23 +- .../vfio/lib/include/libvfio/iommu.h | 2 + .../lib/include/libvfio/vfio_pci_device.h | 8 + tools/testing/selftests/vfio/lib/iommu.c | 4 +- .../selftests/vfio/lib/vfio_pci_device.c | 60 +++- .../vfio/vfio_pci_liveupdate_kexec_test.c | 255 ++++++++++++++++ .../vfio/vfio_pci_liveupdate_uapi_test.c | 93 ++++++ 34 files changed, 1313 insertions(+), 86 deletions(-) create mode 100644 drivers/pci/liveupdate.c create mode 100644 drivers/vfio/pci/vfio_pci_liveupdate.c create mode 100644 include/linux/kho/abi/pci.h create mode 100644 include/linux/kho/abi/vfio_pci.h rename tools/testing/selftests/liveupdate/{luo_test_utils.h => lib/include/libliveupdate.h} (80%) create mode 100644 tools/testing/selftests/liveupdate/lib/libliveupdate.mk rename tools/testing/selftests/liveupdate/{luo_test_utils.c => lib/liveupdate.c} (89%) create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test.c