This series adds the base support to preserve a VFIO device file across a Live Update. "Base support" means that this allows userspace to safetly preserve a VFIO device file with LIVEUPDATE_SESSION_PRESERVE_FD and retrieve a preserved VFIO device file with LIVEUPDATE_SESSION_RETRIEVE_FD, but the device itself is not preserved in a fully running state across Live Update.
This series unblocks 2 parallel but related streams of work:
- iommufd preservation across Live Update. This work spans iommufd, the IOMMU subsystem, and IOMMU drivers [1]
- Preservation of VFIO device state across Live Update (config space, BAR addresses, power state, SR-IOV state, etc.). This work spans both VFIO and the core PCI subsystem.
While we need all of the above to fully preserve a VFIO device across a Live Update without disrupting the workload on the device, this series aims to be functional and safe enough to merge as the first incremental step toward that goal.
Areas for Discussion --------------------
BDF Stability across Live Update
The PCI support for tracking preserved devices across a Live Update to prevent auto-probing relies on PCI segment numbers and BDFs remaining stable. For now I have disallowed VFs, as the BDFs assigned to VFs can vary depending on how the kernel chooses to allocate bus numbers. For non-VFs I am wondering if there is any more needed to ensure BDF stability across Live Update.
While we would like to support many different systems and configurations in due time (including preserving VFs), I'd like to keep this first serses constrained to simple use-cases.
FLB Locking
I don't see a way to properly synchronize pci_flb_finish() with pci_liveupdate_incoming_is_preserved() since the incoming FLB mutex is dropped by liveupdate_flb_get_incoming() when it returns the pointer to the object, and taking pci_flb_incoming_lock in pci_flb_finish() could result in a deadlock due to reversing the lock ordering.
FLB Retrieving
The first patch of this series includes a fix to prevent an FLB from being retrieved again it is finished. I am wondering if this is the right approach or if subsystems are expected to stop calling liveupdate_flb_get_incoming() after an FLB is finished.
Testing -------
The patches at the end of this series provide comprehensive selftests for the new code added by this series. The selftests have been validated in both a VM environment using a virtio-net PCIe device, and in a baremetal environment on an Intel EMR server with an Intel DSA device.
Here is an example of how to run the new selftests:
vfio_pci_liveupdate_uapi_test:
$ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0 $ tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test 0000:00:04.0 $ tools/testing/selftests/vfio/scripts/cleanup.sh
vfio_pci_liveupdate_kexec_test:
$ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0 $ tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test --stage 1 0000:00:04.0 $ kexec [...] # NOTE: distro-dependent
$ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0 $ tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test --stage 2 0000:00:04.0 $ tools/testing/selftests/vfio/scripts/cleanup.sh
Dependencies ------------
This series was constructed on top of several in-flight series and on top of mm-nonmm-unstable [2].
+-- This series | +-- [PATCH v2 00/18] vfio: selftests: Support for multi-device tests | https://lore.kernel.org/kvm/20251112192232.442761-1-dmatlack@google.com/ | +-- [PATCH v3 0/4] vfio: selftests: update DMA mapping tests to use queried IOVA ranges | https://lore.kernel.org/kvm/20251111-iova-ranges-v3-0-7960244642c5@fb.com/ | +-- [PATCH v8 0/2] Live Update: File-Lifecycle-Bound (FLB) State | https://lore.kernel.org/linux-mm/20251125225006.3722394-1-pasha.tatashin@sol... | +-- [PATCH v8 00/18] Live Update Orchestrator | https://lore.kernel.org/linux-mm/20251125165850.3389713-1-pasha.tatashin@sol... |
To simplify checking out the code, this series can be found on GitHub:
https://github.com/dmatlack/linux/tree/liveupdate/vfio/cdev/v1
Changelog ---------
v1: - Rebase series on top of LUOv8 and VFIO selftests improvements - Drop commits to preserve config space fields across Live Update. These changes require changes to the PCI layer. For exmaple, preserving rbars could lead to an inconsistent device state until device BARs addresses are preserved across Live Update. - Drop commits to preserve Bus Master Enable on the device. There's no reason to preserve this until iommufd preservation is fully working. Furthermore, preserving Bus Master Enable could lead to memory corruption when the device if the device is bound to the default identity-map domain after Live Update. - Drop commits to preserve saved PCI state. This work is not needed until we are ready to preserve the device's config space, and requires more thought to make the PCI state data layout ABI-friendly. - Add support to skip auto-probing devices that are preserved by VFIO to avoid them getting bound to a different driver by the next kernel. - Restrict device preservation further (no VFs, no intel-graphics). - Various refactoring and small edits to improve readability and eliminate code duplication.
rfc: https://lore.kernel.org/kvm/20251018000713.677779-1-vipinsh@google.com/
Cc: Saeed Mahameed saeedm@nvidia.com Cc: Adithya Jayachandran ajayachandra@nvidia.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Parav Pandit parav@nvidia.com Cc: Leon Romanovsky leonro@nvidia.com Cc: William Tu witu@nvidia.com Cc: Jacob Pan jacob.pan@linux.microsoft.com Cc: Lukas Wunner lukas@wunner.de Cc: Pasha Tatashin pasha.tatashin@soleen.com Cc: Mike Rapoport rppt@kernel.org Cc: Pratyush Yadav pratyush@kernel.org Cc: Samiullah Khawaja skhawaja@google.com Cc: Chris Li chrisl@kernel.org Cc: Josh Hilke jrhilke@google.com Cc: David Rientjes rientjes@google.com
[1] https://lore.kernel.org/linux-iommu/20250928190624.3735830-1-skhawaja@google... [2] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/log/?h=mm-nonmm-...
David Matlack (12): liveupdate: luo_flb: Prevent retrieve() after finish() PCI: Add API to track PCI devices preserved across Live Update PCI: Require driver_override for incoming Live Update preserved devices vfio/pci: Notify PCI subsystem about devices preserved across Live Update vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD vfio/pci: Store Live Update state in struct vfio_pci_core_device vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED vfio: selftests: Add vfio_pci_liveupdate_uapi_test vfio: selftests: Expose iommu_modes to tests vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device vfio: selftests: Verify that opening VFIO device fails during Live Update vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test
Vipin Sharma (9): vfio/pci: Register a file handler with Live Update Orchestrator vfio/pci: Preserve vfio-pci device files across Live Update vfio/pci: Retrieve preserved device files after Live Update vfio/pci: Skip reset of preserved device after Live Update selftests/liveupdate: Move luo_test_utils.* into a reusable library selftests/liveupdate: Add helpers to preserve/retrieve FDs vfio: selftests: Build liveupdate library in VFIO selftests vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD vfio: selftests: Add vfio_pci_liveupdate_kexec_test
MAINTAINERS | 1 + drivers/pci/Makefile | 1 + drivers/pci/liveupdate.c | 248 ++++++++++++++++ drivers/pci/pci-driver.c | 12 +- drivers/vfio/device_cdev.c | 25 +- drivers/vfio/group.c | 9 + drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/vfio_pci.c | 11 +- drivers/vfio/pci/vfio_pci_core.c | 23 +- drivers/vfio/pci/vfio_pci_liveupdate.c | 278 ++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 16 + drivers/vfio/vfio.h | 13 - drivers/vfio/vfio_main.c | 22 +- include/linux/kho/abi/pci.h | 53 ++++ include/linux/kho/abi/vfio_pci.h | 45 +++ include/linux/liveupdate.h | 3 + include/linux/pci.h | 38 +++ include/linux/vfio.h | 51 ++++ include/linux/vfio_pci_core.h | 7 + kernel/liveupdate/luo_flb.c | 4 + tools/testing/selftests/liveupdate/.gitignore | 1 + tools/testing/selftests/liveupdate/Makefile | 14 +- .../include/libliveupdate.h} | 11 +- .../selftests/liveupdate/lib/libliveupdate.mk | 20 ++ .../{luo_test_utils.c => lib/liveupdate.c} | 43 ++- .../selftests/liveupdate/luo_kexec_simple.c | 2 +- .../selftests/liveupdate/luo_multi_session.c | 2 +- tools/testing/selftests/vfio/Makefile | 23 +- .../vfio/lib/include/libvfio/iommu.h | 2 + .../lib/include/libvfio/vfio_pci_device.h | 8 + tools/testing/selftests/vfio/lib/iommu.c | 4 +- .../selftests/vfio/lib/vfio_pci_device.c | 60 +++- .../vfio/vfio_pci_liveupdate_kexec_test.c | 255 ++++++++++++++++ .../vfio/vfio_pci_liveupdate_uapi_test.c | 93 ++++++ 34 files changed, 1313 insertions(+), 86 deletions(-) create mode 100644 drivers/pci/liveupdate.c create mode 100644 drivers/vfio/pci/vfio_pci_liveupdate.c create mode 100644 include/linux/kho/abi/pci.h create mode 100644 include/linux/kho/abi/vfio_pci.h rename tools/testing/selftests/liveupdate/{luo_test_utils.h => lib/include/libliveupdate.h} (80%) create mode 100644 tools/testing/selftests/liveupdate/lib/libliveupdate.mk rename tools/testing/selftests/liveupdate/{luo_test_utils.c => lib/liveupdate.c} (89%) create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test.c
Prevent an incoming FLB from being retrieved after its finish() callback has been run. An incoming FLB should only retrieved once, and once its finish() callback has run the data has been freed and cannot be (and should not be) retrieved again.
Fixes: e8c57e582167 ("liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state") Signed-off-by: David Matlack dmatlack@google.com --- include/linux/liveupdate.h | 3 +++ kernel/liveupdate/luo_flb.c | 4 ++++ 2 files changed, 7 insertions(+)
diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h index ed81e7b31a9f..b913d63eab5f 100644 --- a/include/linux/liveupdate.h +++ b/include/linux/liveupdate.h @@ -165,12 +165,15 @@ struct liveupdate_flb_ops { * @obj: The live kernel object returned by .preserve() or .retrieve(). * @lock: A mutex that protects all fields within this structure, providing * the synchronization service for the FLB's ops. + * @finished: True once the FLB's finish() callback has run, to prevent an FLB + * from being retrieve()'d again after finish() (incoming only). */ struct luo_flb_private_state { long count; u64 data; void *obj; struct mutex lock; + bool finished; };
/* diff --git a/kernel/liveupdate/luo_flb.c b/kernel/liveupdate/luo_flb.c index e80ac5b575ec..072796fb75cb 100644 --- a/kernel/liveupdate/luo_flb.c +++ b/kernel/liveupdate/luo_flb.c @@ -155,6 +155,9 @@ static int luo_flb_retrieve_one(struct liveupdate_flb *flb)
guard(mutex)(&private->incoming.lock);
+ if (private->incoming.finished) + return -ENODATA; + if (private->incoming.obj) return 0;
@@ -213,6 +216,7 @@ static void luo_flb_file_finish_one(struct liveupdate_flb *flb)
private->incoming.data = 0; private->incoming.obj = NULL; + private->incoming.finished = true; } } }
Add an API to enable the PCI subsystem to track all devices that are preserved across a Live Update, including both incoming devices (passed from the previous kernel) and outgoing devices (passed to the next kernel).
Use PCI segment number and BDF to keep track of devices across Live Update. This means the kernel must keep both identifiers constant across a Live Update for any preserved device. VFs are not supported for now, since that requires preserving SR-IOV state on the device to ensure the same number of VFs appear after kexec and with the same BDFs.
Drivers that preserve devices across Live Update can now register their struct liveupdate_file_handler with the PCI subsystem so that the PCI subsystem can allocate and manage File-Lifecycle-Bound (FLB) global data to track the list of incoming and outgoing preserved devices.
pci_liveupdate_register_fh(driver_fh) pci_liveupdate_unregister_fh(driver_fh)
Drivers can notify the PCI subsystem whenever a device is preserved and unpreserved with the following APIs:
pci_liveupdate_outgoing_preserve(pci_dev) pci_liveupdate_outgoing_unpreserve(pci_dev)
After a Live Update, the PCI subsystem can fetch its FLB global data from the previous kernel from the Live Update Orchestrator (LUO) to determine which devices are preserved. This API is also made available for drivers to use to check if a device was preserved before userspace retrieves the file for it.
pci_liveupdate_incoming_is_preserved(pci_dev)
Once a driver has finished restoring an incoming preserved device, it can notify the PCI subsystem with the following call:
pci_liveupdate_incoming_finish(pci_dev)
This will be used in subsequent commits by the vfio-pci driver to preserve VFIO devices across Live Update.
Signed-off-by: David Matlack dmatlack@google.com --- drivers/pci/Makefile | 1 + drivers/pci/liveupdate.c | 248 ++++++++++++++++++++++++++++++++++++ include/linux/kho/abi/pci.h | 53 ++++++++ include/linux/pci.h | 38 ++++++ 4 files changed, 340 insertions(+) create mode 100644 drivers/pci/liveupdate.c create mode 100644 include/linux/kho/abi/pci.h
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index 67647f1880fb..0cb43e10e71d 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -16,6 +16,7 @@ obj-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_SYSFS) += pci-sysfs.o slot.o obj-$(CONFIG_ACPI) += pci-acpi.o obj-$(CONFIG_GENERIC_PCI_IOMAP) += iomap.o +obj-$(CONFIG_LIVEUPDATE) += liveupdate.o endif
obj-$(CONFIG_OF) += of.o diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c new file mode 100644 index 000000000000..f9bb97f3bada --- /dev/null +++ b/drivers/pci/liveupdate.c @@ -0,0 +1,248 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Copyright (c) 2025, Google LLC. + * David Matlack dmatlack@google.com + */ + +#include <linux/bsearch.h> +#include <linux/io.h> +#include <linux/kexec_handover.h> +#include <linux/kho/abi/pci.h> +#include <linux/liveupdate.h> +#include <linux/mutex.h> +#include <linux/mm.h> +#include <linux/pci.h> +#include <linux/sort.h> + +static DEFINE_MUTEX(pci_flb_outgoing_lock); +static DEFINE_MUTEX(pci_flb_incoming_lock); + +static int pci_flb_preserve(struct liveupdate_flb_op_args *args) +{ + struct pci_dev *dev = NULL; + struct folio *folio; + unsigned int order; + int nr_devices = 0; + int ret; + + /* + * Calculate the maximum number of devices based on what's present + * on the system currently (including VFs) to size the folio holding + * struct pci_ser. This is not perfect given devices could be + * hotplugged, but it's also unlikely that all devices in the system are + * going to be preserved anyway. + */ + for_each_pci_dev(dev) { + if (dev->is_virtfn) + continue; + + nr_devices += 1 + pci_sriov_get_totalvfs(dev); + } + + order = get_order(offsetof(struct pci_ser, devices[nr_devices + 1])); + + folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, order); + if (!folio) + return -ENOMEM; + + ret = kho_preserve_folio(folio); + if (ret) { + folio_put(folio); + return ret; + } + + args->obj = folio_address(folio); + args->data = virt_to_phys(args->obj); + + return 0; +} + +static void pci_flb_unpreserve(struct liveupdate_flb_op_args *args) +{ + struct pci_ser *ser = args->obj; + struct folio *folio = virt_to_folio(ser); + + WARN_ON_ONCE(ser->nr_devices); + kho_unpreserve_folio(folio); + folio_put(folio); +} + +static int pci_flb_retrieve(struct liveupdate_flb_op_args *args) +{ + struct folio *folio; + + folio = kho_restore_folio(args->data); + if (!folio) + panic("Unable to restore preserved FLB data from KHO (0x%llx)\n", args->data); + + args->obj = folio_address(folio); + return 0; +} + +static void pci_flb_finish(struct liveupdate_flb_op_args *args) +{ + struct pci_ser *ser = args->obj; + + /* + * Sanity check that all devices have been finished via + * pci_liveupdate_incoming_finish(). + */ + WARN_ON_ONCE(ser->nr_devices); + folio_put(virt_to_folio(ser)); +} + +static struct liveupdate_flb_ops pci_liveupdate_flb_ops = { + .preserve = pci_flb_preserve, + .unpreserve = pci_flb_unpreserve, + .retrieve = pci_flb_retrieve, + .finish = pci_flb_finish, + .owner = THIS_MODULE, +}; + +static struct liveupdate_flb pci_liveupdate_flb = { + .ops = &pci_liveupdate_flb_ops, + .compatible = PCI_LUO_FLB_COMPATIBLE, +}; + +#define INIT_PCI_DEV_SER(_dev) { \ + .domain = pci_domain_nr((_dev)->bus), \ + .bdf = pci_dev_id(_dev), \ +} + +static int pci_dev_ser_cmp(const void *__a, const void *__b) +{ + const struct pci_dev_ser *a = __a, *b = __b; + + return cmp_int(a->domain << 16 | a->bdf, b->domain << 16 | b->bdf); +} + +static struct pci_dev_ser *pci_ser_find(struct pci_ser *ser, struct pci_dev *dev) +{ + const struct pci_dev_ser key = INIT_PCI_DEV_SER(dev); + + return bsearch(&key, ser->devices, ser->nr_devices, + sizeof(key), pci_dev_ser_cmp); +} + +static int pci_ser_delete(struct pci_ser *ser, struct pci_dev *dev) +{ + struct pci_dev_ser *dev_ser; + int i; + + dev_ser = pci_ser_find(ser, dev); + if (!dev_ser) + return -ENOENT; + + for (i = dev_ser - ser->devices; i < ser->nr_devices - 1; i++) + ser->devices[i] = ser->devices[i + 1]; + + ser->nr_devices--; + return 0; +} + +static int max_nr_devices(struct pci_ser *ser) +{ + u64 size; + + size = folio_size(virt_to_folio(ser)); + size -= offsetof(struct pci_ser, devices); + + return size / sizeof(struct pci_dev_ser); +} + +int pci_liveupdate_outgoing_preserve(struct pci_dev *dev) +{ + struct pci_dev_ser new = INIT_PCI_DEV_SER(dev); + struct pci_ser *ser; + int i, ret; + + /* VFs are not supported yet due to BDF instability across kexec */ + if (dev->is_virtfn) + return -EINVAL; + + guard(mutex)(&pci_flb_outgoing_lock); + + ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser); + if (ret) + return ret; + + if (ser->nr_devices == max_nr_devices(ser)) + return -E2BIG; + + for (i = ser->nr_devices; i > 0; i--) { + struct pci_dev_ser *prev = &ser->devices[i - 1]; + int cmp = pci_dev_ser_cmp(&new, prev); + + /* This device is already preserved. */ + if (cmp == 0) + return 0; + + if (cmp > 0) + break; + + ser->devices[i] = *prev; + } + + ser->devices[i] = new; + ser->nr_devices++; + return 0; +} +EXPORT_SYMBOL_GPL(pci_liveupdate_outgoing_preserve); + +void pci_liveupdate_outgoing_unpreserve(struct pci_dev *dev) +{ + struct pci_ser *ser; + int ret; + + guard(mutex)(&pci_flb_outgoing_lock); + + ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser); + if (WARN_ON_ONCE(ret)) + return; + + WARN_ON_ONCE(pci_ser_delete(ser, dev)); +} +EXPORT_SYMBOL_GPL(pci_liveupdate_outgoing_unpreserve); + +bool pci_liveupdate_incoming_is_preserved(struct pci_dev *dev) +{ + struct pci_ser *ser; + int ret; + + guard(mutex)(&pci_flb_incoming_lock); + + ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)&ser); + if (ret) + return false; + + return pci_ser_find(ser, dev); +} +EXPORT_SYMBOL_GPL(pci_liveupdate_incoming_is_preserved); + +void pci_liveupdate_incoming_finish(struct pci_dev *dev) +{ + struct pci_ser *ser; + int ret; + + guard(mutex)(&pci_flb_incoming_lock); + + ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)&ser); + if (WARN_ON_ONCE(ret)) + return; + + WARN_ON_ONCE(pci_ser_delete(ser, dev)); +} +EXPORT_SYMBOL_GPL(pci_liveupdate_incoming_finish); + +int pci_liveupdate_register_fh(struct liveupdate_file_handler *fh) +{ + return liveupdate_register_flb(fh, &pci_liveupdate_flb); +} +EXPORT_SYMBOL_GPL(pci_liveupdate_register_fh); + +int pci_liveupdate_unregister_fh(struct liveupdate_file_handler *fh) +{ + return liveupdate_unregister_flb(fh, &pci_liveupdate_flb); +} +EXPORT_SYMBOL_GPL(pci_liveupdate_unregister_fh); diff --git a/include/linux/kho/abi/pci.h b/include/linux/kho/abi/pci.h new file mode 100644 index 000000000000..53744b6f191a --- /dev/null +++ b/include/linux/kho/abi/pci.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * Copyright (c) 2025, Google LLC. + * David Matlack dmatlack@google.com + */ + +#ifndef _LINUX_KHO_ABI_PCI_H +#define _LINUX_KHO_ABI_PCI_H + +#include <linux/compiler.h> +#include <linux/types.h> + +/** + * DOC: PCI File-Lifecycle Bound (FLB) Live Update ABI + * + * This header defines the ABI for preserving core PCI state across kexec using + * Live Update File-Lifecycle Bound (FLB) data. + * + * This interface is a contract. Any modification to any of the serialization + * structs defined here constitutes a breaking change. Such changes require + * incrementing the version number in the PCI_LUO_FLB_COMPATIBLE string. + */ + +#define PCI_LUO_FLB_COMPATIBLE "pci-v1" + +/** + * struct pci_dev_ser - Serialized state about a single PCI device. + * + * @domain: The device's PCI domain number (segment). + * @bdf: The device's PCI bus, device, and function number. + */ +struct pci_dev_ser { + u16 domain; + u16 bdf; +} __packed; + +/** + * struct pci_ser - PCI Subsystem Live Update State + * + * This struct tracks state about all devices that are being preserved across + * a Live Update for the next kernel. + * + * @nr_devices: The number of devices that were preserved. + * @devices: Flexible array of pci_dev_ser structs for each device. Guaranteed + * to be sorted ascending by domain and bdf. + */ +struct pci_ser { + u64 nr_devices; + struct pci_dev_ser devices[]; +} __packed; + +#endif /* _LINUX_KHO_ABI_PCI_H */ diff --git a/include/linux/pci.h b/include/linux/pci.h index d1fdf81fbe1e..6a3c2d7e5b82 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -40,6 +40,7 @@ #include <linux/resource_ext.h> #include <linux/msi_api.h> #include <uapi/linux/pci.h> +#include <linux/liveupdate.h>
#include <linux/pci_ids.h>
@@ -2795,4 +2796,41 @@ void pci_uevent_ers(struct pci_dev *pdev, enum pci_ers_result err_type); WARN_ONCE(condition, "%s %s: " fmt, \ dev_driver_string(&(pdev)->dev), pci_name(pdev), ##arg)
+#ifdef CONFIG_LIVEUPDATE +int pci_liveupdate_outgoing_preserve(struct pci_dev *dev); +void pci_liveupdate_outgoing_unpreserve(struct pci_dev *dev); +bool pci_liveupdate_incoming_is_preserved(struct pci_dev *dev); +void pci_liveupdate_incoming_finish(struct pci_dev *dev); +int pci_liveupdate_register_fh(struct liveupdate_file_handler *fh); +int pci_liveupdate_unregister_fh(struct liveupdate_file_handler *fh); +#else /* !CONFIG_LIVEUPDATE */ +static inline int pci_liveupdate_outgoing_preserve(struct pci_dev *dev) +{ + return -EOPNOTSUPP; +} + +static inline void pci_liveupdate_outgoing_unpreserve(struct pci_dev *dev) +{ +} + +static inline bool pci_liveupdate_incoming_is_preserved(struct pci_dev *dev) +{ + return false; +} + +static inline void pci_liveupdate_incoming_finish(struct pci_dev *dev) +{ +} + +static inline int pci_liveupdate_register_fh(struct liveupdate_file_handler *fh) +{ + return -EOPNOTSUPP; +} + +static inline int pci_liveupdate_unregister_fh(struct liveupdate_file_handler *fh) +{ + return -EOPNOTSUPP; +} +#endif /* !CONFIG_LIVEUPDATE */ + #endif /* LINUX_PCI_H */
From: Vipin Sharma vipinsh@google.com
Register a live update file handler for vfio-pci device files. Add stub implementations of all required callbacks so that registration does not fail (i.e. to avoid breaking git-bisect).
This file handler will be extended in subsequent commits to enable a device bound to vfio-pci to run without interruption while the host is going through a kexec Live Update.
Signed-off-by: Vipin Sharma vipinsh@google.com Co-Developed-by: David Matlack dmatlack@google.com Signed-off-by: David Matlack dmatlack@google.com --- MAINTAINERS | 1 + drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/vfio_pci.c | 9 +++- drivers/vfio/pci/vfio_pci_liveupdate.c | 69 ++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 14 ++++++ include/linux/kho/abi/vfio_pci.h | 28 +++++++++++ 6 files changed, 121 insertions(+), 1 deletion(-) create mode 100644 drivers/vfio/pci/vfio_pci_liveupdate.c create mode 100644 include/linux/kho/abi/vfio_pci.h
diff --git a/MAINTAINERS b/MAINTAINERS index 2722f98d0ed7..ff50977277c4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -26933,6 +26933,7 @@ F: Documentation/ABI/testing/debugfs-vfio F: Documentation/ABI/testing/sysfs-devices-vfio-dev F: Documentation/driver-api/vfio.rst F: drivers/vfio/ +F: include/linux/kho/abi/vfio_pci.h F: include/linux/vfio.h F: include/linux/vfio_pci_core.h F: include/uapi/linux/vfio.h diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index cf00c0a7e55c..929df22c079b 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -2,6 +2,7 @@
vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o +vfio-pci-core-$(CONFIG_LIVEUPDATE) += vfio_pci_liveupdate.o obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
vfio-pci-y := vfio_pci.o diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index ac10f14417f2..c2fe34a830d8 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -252,6 +252,10 @@ static int __init vfio_pci_init(void) int ret; bool is_disable_vga = true;
+ ret = vfio_pci_liveupdate_init(); + if (ret) + return ret; + #ifdef CONFIG_VFIO_PCI_VGA is_disable_vga = disable_vga; #endif @@ -260,8 +264,10 @@ static int __init vfio_pci_init(void)
/* Register and scan for devices */ ret = pci_register_driver(&vfio_pci_driver); - if (ret) + if (ret) { + vfio_pci_liveupdate_cleanup(); return ret; + }
vfio_pci_fill_ids();
@@ -275,6 +281,7 @@ module_init(vfio_pci_init); static void __exit vfio_pci_cleanup(void) { pci_unregister_driver(&vfio_pci_driver); + vfio_pci_liveupdate_cleanup(); } module_exit(vfio_pci_cleanup);
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c new file mode 100644 index 000000000000..b84e63c0357b --- /dev/null +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Copyright (c) 2025, Google LLC. + * Vipin Sharma vipinsh@google.com + * David Matlack dmatlack@google.com + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/kho/abi/vfio_pci.h> +#include <linux/liveupdate.h> +#include <linux/errno.h> + +#include "vfio_pci_priv.h" + +static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *handler, + struct file *file) +{ + return false; +} + +static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args) +{ + return -EOPNOTSUPP; +} + +static void vfio_pci_liveupdate_unpreserve(struct liveupdate_file_op_args *args) +{ +} + +static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args) +{ + return -EOPNOTSUPP; +} + +static void vfio_pci_liveupdate_finish(struct liveupdate_file_op_args *args) +{ +} + +static const struct liveupdate_file_ops vfio_pci_liveupdate_file_ops = { + .can_preserve = vfio_pci_liveupdate_can_preserve, + .preserve = vfio_pci_liveupdate_preserve, + .unpreserve = vfio_pci_liveupdate_unpreserve, + .retrieve = vfio_pci_liveupdate_retrieve, + .finish = vfio_pci_liveupdate_finish, + .owner = THIS_MODULE, +}; + +static struct liveupdate_file_handler vfio_pci_liveupdate_fh = { + .ops = &vfio_pci_liveupdate_file_ops, + .compatible = VFIO_PCI_LUO_FH_COMPATIBLE, +}; + +int __init vfio_pci_liveupdate_init(void) +{ + if (!liveupdate_enabled()) + return 0; + + return liveupdate_register_file_handler(&vfio_pci_liveupdate_fh); +} + +void vfio_pci_liveupdate_cleanup(void) +{ + if (!liveupdate_enabled()) + return; + + liveupdate_unregister_file_handler(&vfio_pci_liveupdate_fh); +} diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index a9972eacb293..b9f7c4e2b4df 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -107,4 +107,18 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; }
+#ifdef CONFIG_LIVEUPDATE +int __init vfio_pci_liveupdate_init(void); +void vfio_pci_liveupdate_cleanup(void); +#else +static inline int vfio_pci_liveupdate_init(void) +{ + return 0; +} + +static inline void vfio_pci_liveupdate_cleanup(void) +{ +} +#endif /* CONFIG_LIVEUPDATE */ + #endif diff --git a/include/linux/kho/abi/vfio_pci.h b/include/linux/kho/abi/vfio_pci.h new file mode 100644 index 000000000000..37a845eed972 --- /dev/null +++ b/include/linux/kho/abi/vfio_pci.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * Copyright (c) 2025, Google LLC. + * Vipin Sharma vipinsh@google.com + * David Matlack dmatlack@google.com + */ + +#ifndef _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H +#define _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H + +/** + * DOC: VFIO PCI Live Update ABI + * + * This header defines the ABI for preserving the state of a VFIO PCI device + * files across a kexec reboot using LUO. + * + * Device metadata is serialized into memory which is then handed to the next + * kernel via KHO. + * + * This interface is a contract. Any modification to any of the serialization + * structs defined here constitutes a breaking change. Such changes require + * incrementing the version number in the VFIO_PCI_LUO_FH_COMPATIBLE string. + */ + +#define VFIO_PCI_LUO_FH_COMPATIBLE "vfio-pci-v1" + +#endif /* _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H */
From: Vipin Sharma vipinsh@google.com
Implement the live update file handler callbacks to preserve a vfio-pci device across a Live Update. Subsequent commits will enable userspace to then retrieve this file after the Live Update.
Live Update support is scoped only to cdev files (i.e. not VFIO_GROUP_GET_DEVICE_FD files).
State about each device is serialized into a new ABI struct vfio_pci_core_device_ser. The contents of this struct are preserved across the Live Update to the next kernel using a combination of Kexec-Handover (KHO) to preserve the page(s) holding the struct and the Live Update Orchestrator (LUO) to preserve the physical address of the struct.
For now the only contents of struct vfio_pci_core_device_ser the device's PCI segment number and BDF, so that the device can be uniquely identified after the Live Update.
Require that userspace disables interrupts on the device prior to freeze() so that the device does not send any interrupts until new interrupt handlers have been set up by the next kernel.
Reset the device and restore its state in the freeze() callback. This ensures the device can be received by the next kernel in a consistent state. Eventually this will be dropped and the device can be preserved across in a running state, but that requires further work in VFIO and the core PCI layer.
Note that LUO holds a reference to this file when it is preserved. So VFIO is guaranteed that vfio_df_device_last_close() will not be called on this device no matter what userspace does.
Signed-off-by: Vipin Sharma vipinsh@google.com Co-Developed-by: David Matlack dmatlack@google.com Signed-off-by: David Matlack dmatlack@google.com --- drivers/vfio/pci/vfio_pci.c | 2 +- drivers/vfio/pci/vfio_pci_liveupdate.c | 100 ++++++++++++++++++++++++- drivers/vfio/pci/vfio_pci_priv.h | 2 + drivers/vfio/vfio.h | 13 ---- drivers/vfio/vfio_main.c | 9 --- include/linux/kho/abi/vfio_pci.h | 15 ++++ include/linux/vfio.h | 28 +++++++ 7 files changed, 144 insertions(+), 25 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index c2fe34a830d8..281c69c086d3 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -125,7 +125,7 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev) return 0; }
-static const struct vfio_device_ops vfio_pci_ops = { +const struct vfio_device_ops vfio_pci_ops = { .name = "vfio-pci", .init = vfio_pci_core_init_dev, .release = vfio_pci_core_release_dev, diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index b84e63c0357b..a0147dee8c0f 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -8,25 +8,120 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/kexec_handover.h> #include <linux/kho/abi/vfio_pci.h> #include <linux/liveupdate.h> #include <linux/errno.h> +#include <linux/vfio.h>
#include "vfio_pci_priv.h"
static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *handler, struct file *file) { - return false; + struct vfio_device_file *df = to_vfio_device_file(file); + + if (!df) + return false; + + /* Live Update support is limited to cdev files. */ + if (df->group) + return false; + + return df->device->ops == &vfio_pci_ops; }
static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args) { - return -EOPNOTSUPP; + struct vfio_device *device = vfio_device_from_file(args->file); + struct vfio_pci_core_device_ser *ser; + struct vfio_pci_core_device *vdev; + struct pci_dev *pdev; + struct folio *folio; + int err; + + vdev = container_of(device, struct vfio_pci_core_device, vdev); + pdev = vdev->pdev; + + if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV_KVM)) + return -EINVAL; + + if (vfio_pci_is_intel_display(pdev)) + return -EINVAL; + + folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, get_order(sizeof(*ser))); + if (!folio) + return -ENOMEM; + + ser = folio_address(folio); + + ser->bdf = pci_dev_id(pdev); + ser->domain = pci_domain_nr(pdev->bus); + + err = kho_preserve_folio(folio); + if (err) + goto error; + + args->serialized_data = virt_to_phys(ser); + return 0; + +error: + folio_put(folio); + return err; }
static void vfio_pci_liveupdate_unpreserve(struct liveupdate_file_op_args *args) { + struct vfio_pci_core_device_ser *ser = phys_to_virt(args->serialized_data); + struct folio *folio = virt_to_folio(ser); + + kho_unpreserve_folio(folio); + folio_put(folio); +} + +static int vfio_pci_liveupdate_freeze(struct liveupdate_file_op_args *args) +{ + struct vfio_device *device = vfio_device_from_file(args->file); + struct vfio_pci_core_device *vdev; + struct pci_dev *pdev; + int ret; + + vdev = container_of(device, struct vfio_pci_core_device, vdev); + pdev = vdev->pdev; + + guard(mutex)(&device->dev_set->lock); + + /* + * Userspace must disable interrupts on the device prior to freeze so + * that the device does not send any interrupts until new interrupt + * handlers have been established by the next kernel. + */ + if (vdev->irq_type != VFIO_PCI_NUM_IRQS) { + pci_err(pdev, "Freeze failed! Interrupts are still enabled.\n"); + return -EINVAL; + } + + pci_dev_lock(pdev); + + ret = pci_load_and_free_saved_state(pdev, &vdev->pci_saved_state); + if (ret) + goto out; + + /* + * Reset the device and restore it back to its original state before + * handing it to the next kernel. + * + * Eventually both of these should be dropped and the device should be + * kept running with its current state across the Live Update. + */ + if (vdev->reset_works) + ret = __pci_reset_function_locked(pdev); + + pci_restore_state(pdev); + +out: + pci_dev_unlock(pdev); + return ret; }
static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args) @@ -42,6 +137,7 @@ static const struct liveupdate_file_ops vfio_pci_liveupdate_file_ops = { .can_preserve = vfio_pci_liveupdate_can_preserve, .preserve = vfio_pci_liveupdate_preserve, .unpreserve = vfio_pci_liveupdate_unpreserve, + .freeze = vfio_pci_liveupdate_freeze, .retrieve = vfio_pci_liveupdate_retrieve, .finish = vfio_pci_liveupdate_finish, .owner = THIS_MODULE, diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index b9f7c4e2b4df..7f189e5e6c0a 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -11,6 +11,8 @@ /* Cap maximum number of ioeventfds per device (arbitrary) */ #define VFIO_PCI_IOEVENTFD_MAX 1000
+extern const struct vfio_device_ops vfio_pci_ops; + struct vfio_pci_ioeventfd { struct list_head next; struct vfio_pci_core_device *vdev; diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 50128da18bca..6b89edbbf174 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -16,17 +16,6 @@ struct iommufd_ctx; struct iommu_group; struct vfio_container;
-struct vfio_device_file { - struct vfio_device *device; - struct vfio_group *group; - - u8 access_granted; - u32 devid; /* only valid when iommufd is valid */ - spinlock_t kvm_ref_lock; /* protect kvm field */ - struct kvm *kvm; - struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */ -}; - void vfio_device_put_registration(struct vfio_device *device); bool vfio_device_try_get_registration(struct vfio_device *device); int vfio_df_open(struct vfio_device_file *df); @@ -34,8 +23,6 @@ void vfio_df_close(struct vfio_device_file *df); struct vfio_device_file * vfio_allocate_device_file(struct vfio_device *device);
-extern const struct file_operations vfio_device_fops; - #ifdef CONFIG_VFIO_NOIOMMU extern bool vfio_noiommu __read_mostly; #else diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 38c8e9350a60..9182dc46d73f 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1386,15 +1386,6 @@ const struct file_operations vfio_device_fops = { #endif };
-static struct vfio_device *vfio_device_from_file(struct file *file) -{ - struct vfio_device_file *df = file->private_data; - - if (file->f_op != &vfio_device_fops) - return NULL; - return df->device; -} - /** * vfio_file_is_valid - True if the file is valid vfio file * @file: VFIO group file or VFIO device file diff --git a/include/linux/kho/abi/vfio_pci.h b/include/linux/kho/abi/vfio_pci.h index 37a845eed972..9bf58a2f3820 100644 --- a/include/linux/kho/abi/vfio_pci.h +++ b/include/linux/kho/abi/vfio_pci.h @@ -9,6 +9,9 @@ #ifndef _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H #define _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H
+#include <linux/compiler.h> +#include <linux/types.h> + /** * DOC: VFIO PCI Live Update ABI * @@ -25,4 +28,16 @@
#define VFIO_PCI_LUO_FH_COMPATIBLE "vfio-pci-v1"
+/** + * struct vfio_pci_core_device_ser - Serialized state of a single VFIO PCI + * device. + * + * @bdf: The device's PCI bus, device, and function number. + * @domain: The device's PCI domain number (segment). + */ +struct vfio_pci_core_device_ser { + u16 bdf; + u16 domain; +} __packed; + #endif /* _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index eb563f538dee..f09da3bdf786 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -80,6 +80,34 @@ struct vfio_device { #endif };
+struct vfio_device_file { + struct vfio_device *device; + struct vfio_group *group; + + u8 access_granted; + u32 devid; /* only valid when iommufd is valid */ + spinlock_t kvm_ref_lock; /* protect kvm field */ + struct kvm *kvm; + struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */ +}; + +extern const struct file_operations vfio_device_fops; + +static inline struct vfio_device_file *to_vfio_device_file(struct file *file) +{ + if (file->f_op != &vfio_device_fops) + return NULL; + + return file->private_data; +} + +static inline struct vfio_device *vfio_device_from_file(struct file *file) +{ + struct vfio_device_file *df = to_vfio_device_file(file); + + return df ? df->device : NULL; +} + /** * struct vfio_device_ops - VFIO bus driver device callbacks *
Enforce that files for incoming (preserved by previous kernel) VFIO devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD rather than by opening the corresponding VFIO character device or via VFIO_GROUP_GET_DEVICE_FD.
Both of these methods would result in VFIO initializing the device without access to the preserved state of the device passed by the previous kernel.
Signed-off-by: David Matlack dmatlack@google.com --- drivers/vfio/device_cdev.c | 4 ++++ drivers/vfio/group.c | 9 +++++++++ include/linux/vfio.h | 11 +++++++++++ 3 files changed, 24 insertions(+)
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index 0a6e972f322b..f1e54dcc04e7 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -57,6 +57,10 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep) struct vfio_device *device = container_of(inode->i_cdev, struct vfio_device, cdev);
+ /* Device file must be retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD */ + if (vfio_liveupdate_incoming_is_preserved(device)) + return -EBUSY; + return __vfio_device_fops_cdev_open(device, filep); }
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index c376a6279de0..93dbbf37ec4c 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -313,6 +313,15 @@ static int vfio_group_ioctl_get_device_fd(struct vfio_group *group, if (IS_ERR(device)) return PTR_ERR(device);
+ /* + * This device was preserved across a Live Update. Accessing it via + * VFIO_GROUP_GET_DEVICE_FD is not allowed. + */ + if (vfio_liveupdate_incoming_is_preserved(device)) { + ret = -EBUSY; + goto err_put_device; + } + fdno = get_unused_fd_flags(O_CLOEXEC); if (fdno < 0) { ret = fdno; diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 4e400a7219ea..040c2b2ee074 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -16,6 +16,7 @@ #include <linux/cdev.h> #include <uapi/linux/vfio.h> #include <linux/iova_bitmap.h> +#include <linux/pci.h>
struct kvm; struct iommufd_ctx; @@ -425,4 +426,14 @@ static inline int __vfio_device_fops_cdev_open(struct vfio_device *device,
struct vfio_device *vfio_find_device(const void *data, device_match_t match);
+static inline bool vfio_liveupdate_incoming_is_preserved(struct vfio_device *device) +{ + struct device *d = device->dev; + + if (dev_is_pci(d)) + return pci_liveupdate_incoming_is_preserved(to_pci_dev(d)); + + return false; +} + #endif /* VFIO_H */
From: Vipin Sharma vipinsh@google.com
Enable userspace to retrieve preserved VFIO device files from VFIO after a Live Update by implementing the retrieve() and finish() file handler callbacks.
Use an anonymous inode when creating the file, since the retrieved device file is not opened through any particular cdev inode, and the cdev inode does not matter in practice.
For now the retrieved file is functionally equivalent a opening the corresponding VFIO cdev file. Subsequent commits will leverage the preserved state associated with the retrieved file to preserve bits of the device across Live Update.
Signed-off-by: Vipin Sharma vipinsh@google.com Co-Developed-by: David Matlack dmatlack@google.com Signed-off-by: David Matlack dmatlack@google.com --- drivers/vfio/device_cdev.c | 21 +++++--- drivers/vfio/pci/vfio_pci_liveupdate.c | 75 +++++++++++++++++++++++++- drivers/vfio/vfio_main.c | 13 +++++ include/linux/vfio.h | 12 +++++ 4 files changed, 113 insertions(+), 8 deletions(-)
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index 480cac3a0c27..0a6e972f322b 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -16,14 +16,8 @@ void vfio_init_device_cdev(struct vfio_device *device) device->cdev.owner = THIS_MODULE; }
-/* - * device access via the fd opened by this function is blocked until - * .open_device() is called successfully during BIND_IOMMUFD. - */ -int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep) +int __vfio_device_fops_cdev_open(struct vfio_device *device, struct file *filep) { - struct vfio_device *device = container_of(inode->i_cdev, - struct vfio_device, cdev); struct vfio_device_file *df; int ret;
@@ -52,6 +46,19 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep) vfio_device_put_registration(device); return ret; } +EXPORT_SYMBOL_GPL(__vfio_device_fops_cdev_open); + +/* + * device access via the fd opened by this function is blocked until + * .open_device() is called successfully during BIND_IOMMUFD. + */ +int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep) +{ + struct vfio_device *device = container_of(inode->i_cdev, + struct vfio_device, cdev); + + return __vfio_device_fops_cdev_open(device, filep); +}
static void vfio_df_get_kvm_safe(struct vfio_device_file *df) { diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index a0147dee8c0f..b7451007fca4 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -8,6 +8,8 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/anon_inodes.h> +#include <linux/file.h> #include <linux/kexec_handover.h> #include <linux/kho/abi/vfio_pci.h> #include <linux/liveupdate.h> @@ -124,13 +126,83 @@ static int vfio_pci_liveupdate_freeze(struct liveupdate_file_op_args *args) return ret; }
+static int match_device(struct device *dev, const void *arg) +{ + struct vfio_device *device = container_of(dev, struct vfio_device, device); + const struct vfio_pci_core_device_ser *ser = arg; + struct vfio_pci_core_device *vdev; + struct pci_dev *pdev; + + vdev = container_of(device, struct vfio_pci_core_device, vdev); + pdev = vdev->pdev; + + return ser->bdf == pci_dev_id(pdev) && ser->domain == pci_domain_nr(pdev->bus); +} + static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args) { - return -EOPNOTSUPP; + struct vfio_pci_core_device_ser *ser; + struct vfio_device *device; + struct folio *folio; + struct file *file; + int ret; + + folio = kho_restore_folio(args->serialized_data); + if (!folio) + return -ENOENT; + + ser = folio_address(folio); + + device = vfio_find_device(ser, match_device); + if (!device) + return -ENODEV; + + /* + * During a Live Update userspace retrieves preserved VFIO cdev files by + * issuing an ioctl on /dev/liveupdate rather than by opening VFIO + * character devices. + * + * To handle that scenario, this routine simulates opening the VFIO + * character device for userspace with an anonymous inode. The returned + * file has the same properties as a cdev file (e.g. operations are + * blocked until BIND_IOMMUFD is called), aside from the inode + * association. + */ + file = anon_inode_getfile_fmode("[vfio-device-liveupdate]", + &vfio_device_fops, NULL, + O_RDWR, FMODE_PREAD | FMODE_PWRITE); + + if (IS_ERR(file)) { + ret = PTR_ERR(file); + goto out; + } + + ret = __vfio_device_fops_cdev_open(device, file); + if (ret) { + fput(file); + goto out; + } + + args->file = file; + +out: + /* Drop the reference from vfio_find_device() */ + put_device(&device->device); + + return ret; +} + +static bool vfio_pci_liveupdate_can_finish(struct liveupdate_file_op_args *args) +{ + return args->retrieved; }
static void vfio_pci_liveupdate_finish(struct liveupdate_file_op_args *args) { + struct folio *folio; + + folio = virt_to_folio(phys_to_virt(args->serialized_data)); + folio_put(folio); }
static const struct liveupdate_file_ops vfio_pci_liveupdate_file_ops = { @@ -139,6 +211,7 @@ static const struct liveupdate_file_ops vfio_pci_liveupdate_file_ops = { .unpreserve = vfio_pci_liveupdate_unpreserve, .freeze = vfio_pci_liveupdate_freeze, .retrieve = vfio_pci_liveupdate_retrieve, + .can_finish = vfio_pci_liveupdate_can_finish, .finish = vfio_pci_liveupdate_finish, .owner = THIS_MODULE, }; diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 9182dc46d73f..c5b5eb509474 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -13,6 +13,7 @@ #include <linux/cdev.h> #include <linux/compat.h> #include <linux/device.h> +#include <linux/device/class.h> #include <linux/fs.h> #include <linux/idr.h> #include <linux/iommu.h> @@ -1706,6 +1707,18 @@ int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data, } EXPORT_SYMBOL(vfio_dma_rw);
+struct vfio_device *vfio_find_device(const void *data, device_match_t match) +{ + struct device *device; + + device = class_find_device(vfio.device_class, NULL, data, match); + if (!device) + return NULL; + + return container_of(device, struct vfio_device, device); +} +EXPORT_SYMBOL_GPL(vfio_find_device); + /* * Module/class support */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index f09da3bdf786..4e400a7219ea 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -413,4 +413,16 @@ int vfio_virqfd_enable(void *opaque, int (*handler)(void *, void *), void vfio_virqfd_disable(struct virqfd **pvirqfd); void vfio_virqfd_flush_thread(struct virqfd **pvirqfd);
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) +int __vfio_device_fops_cdev_open(struct vfio_device *device, struct file *filep); +#else +static inline int __vfio_device_fops_cdev_open(struct vfio_device *device, + struct file *filep) +{ + return -EOPNOTSUPP; +} +#endif /* IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) */ + +struct vfio_device *vfio_find_device(const void *data, device_match_t match); + #endif /* VFIO_H */
Require driver_override to be set to bind an incoming Live Update preserved device to a driver. Auto-probing could lead to the device being bound to a different driver than what was bound to the device prior to Live Update.
Delegate binding preserved devices to the right driver to userspace by requiring driver_override to be set on the device.
This restriction is relaxed once a driver calls pci_liveupdate_incoming_finish().
Signed-off-by: David Matlack dmatlack@google.com --- drivers/pci/pci-driver.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 302d61783f6c..294ce92331a8 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -420,18 +420,26 @@ static int __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev) }
#ifdef CONFIG_PCI_IOV -static inline bool pci_device_can_probe(struct pci_dev *pdev) +static inline bool pci_iov_device_can_probe(struct pci_dev *pdev) { return (!pdev->is_virtfn || pdev->physfn->sriov->drivers_autoprobe || pdev->driver_override); } #else -static inline bool pci_device_can_probe(struct pci_dev *pdev) +static inline bool pci_iov_device_can_probe(struct pci_dev *pdev) { return true; } #endif
+static inline bool pci_device_can_probe(struct pci_dev *pdev) +{ + if (pci_liveupdate_incoming_is_preserved(pdev)) + return pdev->driver_override; + + return pci_iov_device_can_probe(pdev); +} + static int pci_device_probe(struct device *dev) { int error;
Stash a pointer to a device's Live Updated state in struct vfio_pci_core_device. This will enable subsequent commits to use the preserved state when enabling the device.
To enable VFIO to safely access this pointer during device enablement, require that the device is fully enabled before returning true from can_finish().
Signed-off-by: David Matlack dmatlack@google.com --- drivers/vfio/pci/vfio_pci_core.c | 1 + drivers/vfio/pci/vfio_pci_liveupdate.c | 20 +++++++++++++++++++- include/linux/vfio_pci_core.h | 6 ++++++ 3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 7dcf5439dedc..b09fe0993e04 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -536,6 +536,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev)) vdev->has_vga = true;
+ vdev->liveupdate_state = NULL;
return 0;
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index 7669c65bde17..0fb29bd3ae3b 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -145,6 +145,7 @@ static int match_device(struct device *dev, const void *arg) static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args) { struct vfio_pci_core_device_ser *ser; + struct vfio_pci_core_device *vdev; struct vfio_device *device; struct folio *folio; struct file *file; @@ -186,6 +187,9 @@ static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args) goto out; }
+ vdev = container_of(device, struct vfio_pci_core_device, vdev); + vdev->liveupdate_state = ser; + args->file = file;
out: @@ -197,7 +201,21 @@ static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
static bool vfio_pci_liveupdate_can_finish(struct liveupdate_file_op_args *args) { - return args->retrieved; + struct vfio_pci_core_device *vdev; + struct vfio_device *device; + + if (!args->retrieved) + return false; + + device = vfio_device_from_file(args->file); + vdev = container_of(device, struct vfio_pci_core_device, vdev); + + /* + * Ensure VFIO is done using vdev->liveupdate_state, which means its + * safe for vfio_pci_liveupdate_finish() to free it. + */ + guard(mutex)(&device->dev_set->lock); + return !vdev->liveupdate_state; }
static void vfio_pci_liveupdate_finish(struct liveupdate_file_op_args *args) diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index f541044e42a2..56ff6452562d 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -94,6 +94,12 @@ struct vfio_pci_core_device { struct vfio_pci_core_device *sriov_pf_core_dev; struct notifier_block nb; struct rw_semaphore memory_lock; + + /* + * State passed by the previous kernel during a Live Update. Only + * safe to access when first opening the device. + */ + struct vfio_pci_core_device_ser *liveupdate_state; };
/* Will be exported for vfio pci drivers usage */
Notify the PCI subsystem about devices vfio-pci is preserving across Live Update by registering the vfio-pci liveupdate file handler with the PCI subsystem's FLB handler.
Notably this will ensure that devices preserved through vfio-pci do not go through auto-probing in the next kernel (i.e. preventing them from being bound to a different driver).
This also enables VFIO to detect that a device was preserved before userspace first retrieves the file from it, which will be used in subsequent commits.
Signed-off-by: David Matlack dmatlack@google.com --- drivers/vfio/pci/vfio_pci_liveupdate.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index b7451007fca4..7669c65bde17 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -64,6 +64,7 @@ static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args) if (err) goto error;
+ pci_liveupdate_outgoing_preserve(pdev); args->serialized_data = virt_to_phys(ser); return 0;
@@ -75,8 +76,10 @@ static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args) static void vfio_pci_liveupdate_unpreserve(struct liveupdate_file_op_args *args) { struct vfio_pci_core_device_ser *ser = phys_to_virt(args->serialized_data); + struct vfio_device *device = vfio_device_from_file(args->file); struct folio *folio = virt_to_folio(ser);
+ pci_liveupdate_outgoing_unpreserve(to_pci_dev(device->dev)); kho_unpreserve_folio(folio); folio_put(folio); } @@ -199,8 +202,11 @@ static bool vfio_pci_liveupdate_can_finish(struct liveupdate_file_op_args *args)
static void vfio_pci_liveupdate_finish(struct liveupdate_file_op_args *args) { + struct vfio_device *device = vfio_device_from_file(args->file); struct folio *folio;
+ pci_liveupdate_incoming_finish(to_pci_dev(device->dev)); + folio = virt_to_folio(phys_to_virt(args->serialized_data)); folio_put(folio); } @@ -223,10 +229,24 @@ static struct liveupdate_file_handler vfio_pci_liveupdate_fh = {
int __init vfio_pci_liveupdate_init(void) { + int ret; + if (!liveupdate_enabled()) return 0;
- return liveupdate_register_file_handler(&vfio_pci_liveupdate_fh); + ret = liveupdate_register_file_handler(&vfio_pci_liveupdate_fh); + if (ret) + return ret; + + ret = pci_liveupdate_register_fh(&vfio_pci_liveupdate_fh); + if (ret) + goto error; + + return 0; + +error: + liveupdate_unregister_file_handler(&vfio_pci_liveupdate_fh); + return ret; }
void vfio_pci_liveupdate_cleanup(void) @@ -234,5 +254,6 @@ void vfio_pci_liveupdate_cleanup(void) if (!liveupdate_enabled()) return;
+ WARN_ON_ONCE(pci_liveupdate_unregister_fh(&vfio_pci_liveupdate_fh)); liveupdate_unregister_file_handler(&vfio_pci_liveupdate_fh); }
From: Vipin Sharma vipinsh@google.com
Use the given VFIO cdev FD to initialize vfio_pci_device in VFIO selftests. Add the assertion to make sure that passed cdev FD is not used with legacy VFIO APIs. If VFIO cdev FD is provided then do not open the device instead use the FD for any interaction with the device.
This API will allow to write selftests where VFIO device FD is preserved using liveupdate and retrieved later using liveupdate ioctl after kexec.
Signed-off-by: Vipin Sharma vipinsh@google.com Co-Developed-by: David Matlack dmatlack@google.com Signed-off-by: David Matlack dmatlack@google.com --- .../lib/include/libvfio/vfio_pci_device.h | 3 ++ .../selftests/vfio/lib/vfio_pci_device.c | 33 ++++++++++++++----- 2 files changed, 27 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h index 2858885a89bb..896dfde88118 100644 --- a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h +++ b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h @@ -38,6 +38,9 @@ struct vfio_pci_device { #define dev_info(_dev, _fmt, ...) printf("%s: " _fmt, (_dev)->bdf, ##__VA_ARGS__) #define dev_err(_dev, _fmt, ...) fprintf(stderr, "%s: " _fmt, (_dev)->bdf, ##__VA_ARGS__)
+struct vfio_pci_device *__vfio_pci_device_init(const char *bdf, + struct iommu *iommu, + int device_fd); struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iommu); void vfio_pci_device_cleanup(struct vfio_pci_device *device);
diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index 468ee1c61b0c..e9423dc3864a 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -319,19 +319,27 @@ static void vfio_device_attach_iommufd_pt(int device_fd, u32 pt_id) ioctl_assert(device_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &args); }
-static void vfio_pci_iommufd_setup(struct vfio_pci_device *device, const char *bdf) +static void vfio_pci_iommufd_setup(struct vfio_pci_device *device, + const char *bdf, int device_fd) { - const char *cdev_path = vfio_pci_get_cdev_path(bdf); + const char *cdev_path;
- device->fd = open(cdev_path, O_RDWR); - VFIO_ASSERT_GE(device->fd, 0); - free((void *)cdev_path); + if (device_fd >= 0) { + device->fd = device_fd; + } else { + cdev_path = vfio_pci_get_cdev_path(bdf); + device->fd = open(cdev_path, O_RDWR); + VFIO_ASSERT_GE(device->fd, 0); + free((void *)cdev_path); + }
vfio_device_bind_iommufd(device->fd, device->iommu->iommufd); vfio_device_attach_iommufd_pt(device->fd, device->iommu->ioas_id); }
-struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iommu) +struct vfio_pci_device *__vfio_pci_device_init(const char *bdf, + struct iommu *iommu, + int device_fd) { struct vfio_pci_device *device;
@@ -341,10 +349,12 @@ struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iomm device->bdf = bdf; device->iommu = iommu;
- if (device->iommu->mode->container_path) + if (device->iommu->mode->container_path) { + VFIO_ASSERT_EQ(device_fd, -1); vfio_pci_container_setup(device, bdf); - else - vfio_pci_iommufd_setup(device, bdf); + } else { + vfio_pci_iommufd_setup(device, bdf, device_fd); + }
vfio_pci_device_setup(device); vfio_pci_driver_probe(device); @@ -352,6 +362,11 @@ struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iomm return device; }
+struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iommu) +{ + return __vfio_pci_device_init(bdf, iommu, /*device_fd=*/-1); +} + void vfio_pci_device_cleanup(struct vfio_pci_device *device) { int i;
Expose the list of iommu_modes to enable tests that want to iterate through all possible iommu modes.
Signed-off-by: David Matlack dmatlack@google.com --- tools/testing/selftests/vfio/lib/include/libvfio/iommu.h | 2 ++ tools/testing/selftests/vfio/lib/iommu.c | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/iommu.h b/tools/testing/selftests/vfio/lib/include/libvfio/iommu.h index 5c9b9dc6d993..a03ff2281f11 100644 --- a/tools/testing/selftests/vfio/lib/include/libvfio/iommu.h +++ b/tools/testing/selftests/vfio/lib/include/libvfio/iommu.h @@ -15,6 +15,8 @@ struct iommu_mode { unsigned long iommu_type; };
+extern const struct iommu_mode iommu_modes[]; +extern const int nr_iommu_modes; extern const char *default_iommu_mode;
struct dma_region { diff --git a/tools/testing/selftests/vfio/lib/iommu.c b/tools/testing/selftests/vfio/lib/iommu.c index 8079d43523f3..bf54f95ec833 100644 --- a/tools/testing/selftests/vfio/lib/iommu.c +++ b/tools/testing/selftests/vfio/lib/iommu.c @@ -24,7 +24,7 @@ const char *default_iommu_mode = "iommufd";
/* Reminder: Keep in sync with FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(). */ -static const struct iommu_mode iommu_modes[] = { +const struct iommu_mode iommu_modes[] = { { .name = "vfio_type1_iommu", .container_path = "/dev/vfio/vfio", @@ -50,6 +50,8 @@ static const struct iommu_mode iommu_modes[] = { }, };
+const int nr_iommu_modes = ARRAY_SIZE(iommu_modes); + static const struct iommu_mode *lookup_iommu_mode(const char *iommu_mode) { int i;
From: Vipin Sharma vipinsh@google.com
Do not reset the device when a Live Update preserved vfio-pci device is retrieved and first enabled. vfio_pci_liveupdate_freeze() guarantees the device is reset prior to Live Update, so there's no reason to reset it again after Live Update.
Since VFIO normally uses the initial reset to detect if the device supports function resets, pass that from the previous kernel via struct vfio_pci_core_dev_ser.
Signed-off-by: Vipin Sharma vipinsh@google.com Signed-off-by: David Matlack dmatlack@google.com --- drivers/vfio/pci/vfio_pci_core.c | 22 +++++++++++++++++----- drivers/vfio/pci/vfio_pci_liveupdate.c | 1 + include/linux/kho/abi/vfio_pci.h | 2 ++ include/linux/vfio_pci_core.h | 1 + 4 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index b09fe0993e04..c3b30f08a788 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -482,12 +482,24 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) if (ret) goto out_power;
- /* If reset fails because of the device lock, fail this path entirely */ - ret = pci_try_reset_function(pdev); - if (ret == -EAGAIN) - goto out_disable_device; + if (vdev->liveupdate_state) { + /* + * This device was handed off by vfio-pci from a previous kernel + * via Live Update, so it does not need to be reset. + */ + vdev->reset_works = vdev->liveupdate_state->reset_works; + } else { + /* + * If reset fails because of the device lock, fail this path + * entirely. + */ + ret = pci_try_reset_function(pdev); + if (ret == -EAGAIN) + goto out_disable_device; + + vdev->reset_works = !ret; + }
- vdev->reset_works = !ret; pci_save_state(pdev); vdev->pci_saved_state = pci_store_saved_state(pdev); if (!vdev->pci_saved_state) diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index 0fb29bd3ae3b..bcaf9de8a823 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -59,6 +59,7 @@ static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args)
ser->bdf = pci_dev_id(pdev); ser->domain = pci_domain_nr(pdev->bus); + ser->reset_works = vdev->reset_works;
err = kho_preserve_folio(folio); if (err) diff --git a/include/linux/kho/abi/vfio_pci.h b/include/linux/kho/abi/vfio_pci.h index 9bf58a2f3820..6c3d3c6dfc09 100644 --- a/include/linux/kho/abi/vfio_pci.h +++ b/include/linux/kho/abi/vfio_pci.h @@ -34,10 +34,12 @@ * * @bdf: The device's PCI bus, device, and function number. * @domain: The device's PCI domain number (segment). + * @reset_works: Non-zero if the device supports function resets. */ struct vfio_pci_core_device_ser { u16 bdf; u16 domain; + u8 reset_works; } __packed;
#endif /* _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H */ diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 56ff6452562d..3421721a1615 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -15,6 +15,7 @@ #include <linux/types.h> #include <linux/uuid.h> #include <linux/notifier.h> +#include <linux/kho/abi/vfio_pci.h>
#ifndef VFIO_PCI_CORE_H #define VFIO_PCI_CORE_H
From: Vipin Sharma vipinsh@google.com
Import and build liveupdate selftest library in VFIO selftests.
It allows to use liveupdate ioctls in VFIO selftests
Signed-off-by: Vipin Sharma vipinsh@google.com Signed-off-by: David Matlack dmatlack@google.com --- tools/testing/selftests/vfio/Makefile | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile index 8bb0b1e2d3a3..8e6e2cc2d8fd 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -11,6 +11,7 @@ TEST_PROGS_EXTENDED := scripts/run.sh TEST_PROGS_EXTENDED := scripts/setup.sh include ../lib.mk include lib/libvfio.mk +include ../liveupdate/lib/libliveupdate.mk
CFLAGS += -I$(top_srcdir)/tools/include CFLAGS += -MD @@ -18,11 +19,15 @@ CFLAGS += $(EXTRA_CFLAGS)
LDFLAGS += -pthread
-$(TEST_GEN_PROGS): %: %.o $(LIBVFIO_O) - $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $< $(LIBVFIO_O) $(LDLIBS) -o $@ +LIBS_O := $(LIBVFIO_O) +LIBS_O += $(LIBLIVEUPDATE_O) + +$(TEST_GEN_PROGS): %: %.o $(LIBS_O) + $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $< $(LIBS_O) $(LDLIBS) -o $@
TEST_GEN_PROGS_O = $(patsubst %, %.o, $(TEST_GEN_PROGS)) -TEST_DEP_FILES = $(patsubst %.o, %.d, $(TEST_GEN_PROGS_O) $(LIBVFIO_O)) +TEST_DEP_FILES := $(patsubst %.o, %.d, $(TEST_GEN_PROGS_O)) +TEST_DEP_FILES += $(patsubst %.o, %.d, $(LIBS_O)) -include $(TEST_DEP_FILES)
EXTRA_CLEAN += $(TEST_GEN_PROGS_O) $(TEST_DEP_FILES)
From: Vipin Sharma vipinsh@google.com
Move luo_test_utils.[ch] into a lib/ directory and pull the rules to build them out into a separate make script. This will enable these utilities to be also built by and used within other selftests (such as VFIO) in subsequent commits.
No functional change intended.
Signed-off-by: Vipin Sharma vipinsh@google.com Co-Developed-by: David Matlack dmatlack@google.com Signed-off-by: David Matlack dmatlack@google.com --- tools/testing/selftests/liveupdate/.gitignore | 1 + tools/testing/selftests/liveupdate/Makefile | 14 ++++--------- .../include/libliveupdate.h} | 8 ++++---- .../selftests/liveupdate/lib/libliveupdate.mk | 20 +++++++++++++++++++ .../{luo_test_utils.c => lib/liveupdate.c} | 2 +- .../selftests/liveupdate/luo_kexec_simple.c | 2 +- .../selftests/liveupdate/luo_multi_session.c | 2 +- 7 files changed, 32 insertions(+), 17 deletions(-) rename tools/testing/selftests/liveupdate/{luo_test_utils.h => lib/include/libliveupdate.h} (87%) create mode 100644 tools/testing/selftests/liveupdate/lib/libliveupdate.mk rename tools/testing/selftests/liveupdate/{luo_test_utils.c => lib/liveupdate.c} (99%)
diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore index 661827083ab6..18a0c7036cf3 100644 --- a/tools/testing/selftests/liveupdate/.gitignore +++ b/tools/testing/selftests/liveupdate/.gitignore @@ -3,6 +3,7 @@ !/**/ !*.c !*.h +!*.mk !*.sh !.gitignore !config diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile index 080754787ede..a060cc21f27f 100644 --- a/tools/testing/selftests/liveupdate/Makefile +++ b/tools/testing/selftests/liveupdate/Makefile @@ -1,7 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only
-LIB_C += luo_test_utils.c - TEST_GEN_PROGS += liveupdate
TEST_GEN_PROGS_EXTENDED += luo_kexec_simple @@ -10,25 +8,21 @@ TEST_GEN_PROGS_EXTENDED += luo_multi_session TEST_FILES += do_kexec.sh
include ../lib.mk +include lib/libliveupdate.mk
CFLAGS += $(KHDR_INCLUDES) CFLAGS += -Wall -O2 -Wno-unused-function CFLAGS += -MD
-LIB_O := $(patsubst %.c, $(OUTPUT)/%.o, $(LIB_C)) TEST_O := $(patsubst %, %.o, $(TEST_GEN_PROGS)) TEST_O += $(patsubst %, %.o, $(TEST_GEN_PROGS_EXTENDED))
-TEST_DEP_FILES := $(patsubst %.o, %.d, $(LIB_O)) +TEST_DEP_FILES := $(patsubst %.o, %.d, $(LIBLIVEUPDATE_O)) TEST_DEP_FILES += $(patsubst %.o, %.d, $(TEST_O)) -include $(TEST_DEP_FILES)
-$(LIB_O): $(OUTPUT)/%.o: %.c - $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@ - -$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): $(OUTPUT)/%: %.o $(LIB_O) - $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $< $(LIB_O) $(LDLIBS) -o $@ +$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): $(OUTPUT)/%: %.o $(LIBLIVEUPDATE_O) + $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $< $(LIBLIVEUPDATE_O) $(LDLIBS) -o $@
-EXTRA_CLEAN += $(LIB_O) EXTRA_CLEAN += $(TEST_O) EXTRA_CLEAN += $(TEST_DEP_FILES) diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.h b/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h similarity index 87% rename from tools/testing/selftests/liveupdate/luo_test_utils.h rename to tools/testing/selftests/liveupdate/lib/include/libliveupdate.h index 90099bf49577..4390a2737930 100644 --- a/tools/testing/selftests/liveupdate/luo_test_utils.h +++ b/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h @@ -7,13 +7,13 @@ * Utility functions for LUO kselftests. */
-#ifndef LUO_TEST_UTILS_H -#define LUO_TEST_UTILS_H +#ifndef SELFTESTS_LIVEUPDATE_LIB_LIVEUPDATE_H +#define SELFTESTS_LIVEUPDATE_LIB_LIVEUPDATE_H
#include <errno.h> #include <string.h> #include <linux/liveupdate.h> -#include "../kselftest.h" +#include "../../../kselftest.h"
#define LUO_DEVICE "/dev/liveupdate"
@@ -41,4 +41,4 @@ typedef void (*luo_test_stage2_fn)(int luo_fd, int state_session_fd); int luo_test(int argc, char *argv[], const char *state_session_name, luo_test_stage1_fn stage1, luo_test_stage2_fn stage2);
-#endif /* LUO_TEST_UTILS_H */ +#endif /* SELFTESTS_LIVEUPDATE_LIB_LIVEUPDATE_H */ diff --git a/tools/testing/selftests/liveupdate/lib/libliveupdate.mk b/tools/testing/selftests/liveupdate/lib/libliveupdate.mk new file mode 100644 index 000000000000..fffd95b085b6 --- /dev/null +++ b/tools/testing/selftests/liveupdate/lib/libliveupdate.mk @@ -0,0 +1,20 @@ +include $(top_srcdir)/scripts/subarch.include +ARCH ?= $(SUBARCH) + +LIBLIVEUPDATE_SRCDIR := $(selfdir)/liveupdate/lib + +LIBLIVEUPDATE_C := liveupdate.c + +LIBLIVEUPDATE_OUTPUT := $(OUTPUT)/libliveupdate + +LIBLIVEUPDATE_O := $(patsubst %.c, $(LIBLIVEUPDATE_OUTPUT)/%.o, $(LIBLIVEUPDATE_C)) + +LIBLIVEUPDATE_O_DIRS := $(shell dirname $(LIBLIVEUPDATE_O) | uniq) +$(shell mkdir -p $(LIBLIVEUPDATE_O_DIRS)) + +CFLAGS += -I$(LIBLIVEUPDATE_SRCDIR)/include + +$(LIBLIVEUPDATE_O): $(LIBLIVEUPDATE_OUTPUT)/%.o : $(LIBLIVEUPDATE_SRCDIR)/%.c + $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@ + +EXTRA_CLEAN += $(LIBLIVEUPDATE_OUTPUT) diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.c b/tools/testing/selftests/liveupdate/lib/liveupdate.c similarity index 99% rename from tools/testing/selftests/liveupdate/luo_test_utils.c rename to tools/testing/selftests/liveupdate/lib/liveupdate.c index 3c8721c505df..60121873f685 100644 --- a/tools/testing/selftests/liveupdate/luo_test_utils.c +++ b/tools/testing/selftests/liveupdate/lib/liveupdate.c @@ -21,7 +21,7 @@ #include <errno.h> #include <stdarg.h>
-#include "luo_test_utils.h" +#include <libliveupdate.h>
int luo_open_device(void) { diff --git a/tools/testing/selftests/liveupdate/luo_kexec_simple.c b/tools/testing/selftests/liveupdate/luo_kexec_simple.c index d7ac1f3dc4cb..786ac93b9ae3 100644 --- a/tools/testing/selftests/liveupdate/luo_kexec_simple.c +++ b/tools/testing/selftests/liveupdate/luo_kexec_simple.c @@ -8,7 +8,7 @@ * across a single kexec reboot. */
-#include "luo_test_utils.h" +#include <libliveupdate.h>
#define TEST_SESSION_NAME "test-session" #define TEST_MEMFD_TOKEN 0x1A diff --git a/tools/testing/selftests/liveupdate/luo_multi_session.c b/tools/testing/selftests/liveupdate/luo_multi_session.c index 0ee2d795beef..aac24a5f5ce3 100644 --- a/tools/testing/selftests/liveupdate/luo_multi_session.c +++ b/tools/testing/selftests/liveupdate/luo_multi_session.c @@ -9,7 +9,7 @@ * files. */
-#include "luo_test_utils.h" +#include <libliveupdate.h>
#define SESSION_EMPTY_1 "multi-test-empty-1" #define SESSION_EMPTY_2 "multi-test-empty-2"
Add Makefile support for TEST_GEN_PROGS_EXTENDED targets. These tests are not run by default.
TEST_GEN_PROGS_EXTENDED will be used for Live Update selftests in subsequent commits. These selftests must be run manually because they require the user/runner to perform additional actions, such as kexec, during the test.
Signed-off-by: David Matlack dmatlack@google.com --- tools/testing/selftests/vfio/Makefile | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile index 8e6e2cc2d8fd..424649049d94 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -22,12 +22,15 @@ LDFLAGS += -pthread LIBS_O := $(LIBVFIO_O) LIBS_O += $(LIBLIVEUPDATE_O)
-$(TEST_GEN_PROGS): %: %.o $(LIBS_O) +$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): %: %.o $(LIBS_O) $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $< $(LIBS_O) $(LDLIBS) -o $@
-TEST_GEN_PROGS_O = $(patsubst %, %.o, $(TEST_GEN_PROGS)) -TEST_DEP_FILES := $(patsubst %.o, %.d, $(TEST_GEN_PROGS_O)) +TESTS_O := $(patsubst %, %.o, $(TEST_GEN_PROGS)) +TESTS_O += $(patsubst %, %.o, $(TEST_GEN_PROGS_EXTENDED)) + +TEST_DEP_FILES := $(patsubst %.o, %.d, $(TESTS_O)) TEST_DEP_FILES += $(patsubst %.o, %.d, $(LIBS_O)) -include $(TEST_DEP_FILES)
-EXTRA_CLEAN += $(TEST_GEN_PROGS_O) $(TEST_DEP_FILES) +EXTRA_CLEAN += $(TESTS_O) +EXTRA_CLEAN += $(TEST_DEP_FILES)
From: Vipin Sharma vipinsh@google.com
Add helper functions to preserve and retrieve file descriptors from an LUO session. These will be used be used in subsequent commits to preserve FDs other than memfd.
No functional change intended.
Signed-off-by: Vipin Sharma vipinsh@google.com Co-Developed-by: David Matlack dmatlack@google.com Signed-off-by: David Matlack dmatlack@google.com --- .../liveupdate/lib/include/libliveupdate.h | 3 ++ .../selftests/liveupdate/lib/liveupdate.c | 41 +++++++++++++++---- 2 files changed, 35 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h b/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h index 4390a2737930..4c93d043d2b3 100644 --- a/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h +++ b/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h @@ -26,6 +26,9 @@ int luo_create_session(int luo_fd, const char *name); int luo_retrieve_session(int luo_fd, const char *name); int luo_session_finish(int session_fd);
+int luo_session_preserve_fd(int session_fd, int fd, int token); +int luo_session_retrieve_fd(int session_fd, int token); + int create_and_preserve_memfd(int session_fd, int token, const char *data); int restore_and_verify_memfd(int session_fd, int token, const char *expected_data);
diff --git a/tools/testing/selftests/liveupdate/lib/liveupdate.c b/tools/testing/selftests/liveupdate/lib/liveupdate.c index 60121873f685..9bf4f16ca0a4 100644 --- a/tools/testing/selftests/liveupdate/lib/liveupdate.c +++ b/tools/testing/selftests/liveupdate/lib/liveupdate.c @@ -54,9 +54,35 @@ int luo_retrieve_session(int luo_fd, const char *name) return arg.fd; }
+int luo_session_preserve_fd(int session_fd, int fd, int token) +{ + struct liveupdate_session_preserve_fd arg = { + .size = sizeof(arg), + .fd = fd, + .token = token, + }; + + if (ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg)) + return -errno; + + return 0; +} + +int luo_session_retrieve_fd(int session_fd, int token) +{ + struct liveupdate_session_retrieve_fd arg = { + .size = sizeof(arg), + .token = token, + }; + + if (ioctl(session_fd, LIVEUPDATE_SESSION_RETRIEVE_FD, &arg)) + return -errno; + + return arg.fd; +} + int create_and_preserve_memfd(int session_fd, int token, const char *data) { - struct liveupdate_session_preserve_fd arg = { .size = sizeof(arg) }; long page_size = sysconf(_SC_PAGE_SIZE); void *map = MAP_FAILED; int mfd = -1, ret = -1; @@ -75,9 +101,8 @@ int create_and_preserve_memfd(int session_fd, int token, const char *data) snprintf(map, page_size, "%s", data); munmap(map, page_size);
- arg.fd = mfd; - arg.token = token; - if (ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg) < 0) + ret = luo_session_preserve_fd(session_fd, mfd, token); + if (ret) goto out;
ret = 0; @@ -92,15 +117,13 @@ int create_and_preserve_memfd(int session_fd, int token, const char *data) int restore_and_verify_memfd(int session_fd, int token, const char *expected_data) { - struct liveupdate_session_retrieve_fd arg = { .size = sizeof(arg) }; long page_size = sysconf(_SC_PAGE_SIZE); void *map = MAP_FAILED; int mfd = -1, ret = -1;
- arg.token = token; - if (ioctl(session_fd, LIVEUPDATE_SESSION_RETRIEVE_FD, &arg) < 0) - return -errno; - mfd = arg.fd; + mfd = luo_session_retrieve_fd(session_fd, token); + if (mfd < 0) + return mfd;
map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, mfd, 0); if (map == MAP_FAILED)
From: Vipin Sharma vipinsh@google.com
Add a selftest to exercise preserving a vfio-pci device across a Live Update. For now the test is extremely simple and just verifies that the device file can be preserved and retrieved. In the future this test will be extended to verify more parts about device preservation as they are implemented.
This test is added to TEST_GEN_PROGS_EXTENDED since it must be run manually along with a kexec.
To run this test manually:
$ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0 $ tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test --stage 1 0000:00:04.0
$ kexec ... # NOTE: Exact method will be distro-dependent
$ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0 $ tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test --stage 2 0000:00:04.0
The second call to setup.sh is necessary because preserved devices are not bound to a driver after Live Update. Such devices must be manually bound by userspace after Live Update via driver_override.
This test is considered passing if all commands exit with 0.
Signed-off-by: Vipin Sharma vipinsh@google.com Co-Developed-by: David Matlack dmatlack@google.com Signed-off-by: David Matlack dmatlack@google.com --- tools/testing/selftests/vfio/Makefile | 4 + .../vfio/vfio_pci_liveupdate_kexec_test.c | 82 +++++++++++++++++++ 2 files changed, 86 insertions(+) create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c
diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile index 9b2a4f10c558..177aa5cd4376 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -6,6 +6,10 @@ TEST_GEN_PROGS += vfio_pci_device_init_perf_test TEST_GEN_PROGS += vfio_pci_driver_test TEST_GEN_PROGS += vfio_pci_liveupdate_uapi_test
+# This test must be run manually since it requires the user/automation to +# perform a kexec during the test. +TEST_GEN_PROGS_EXTENDED += vfio_pci_liveupdate_kexec_test + TEST_PROGS_EXTENDED := scripts/cleanup.sh TEST_PROGS_EXTENDED := scripts/lib.sh TEST_PROGS_EXTENDED := scripts/run.sh diff --git a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c new file mode 100644 index 000000000000..95644c3bd2d3 --- /dev/null +++ b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <libliveupdate.h> +#include <libvfio.h> + +static const char *device_bdf; + +#define STATE_SESSION "567e1b8d-daf1-4a4a-ac13-35d6d275f646" +#define DEVICE_SESSION "751ad95f-1621-4d68-8b5a-f9c2e460425f" + +enum { + STATE_TOKEN, + DEVICE_TOKEN, +}; + +static void before_kexec(int luo_fd) +{ + struct vfio_pci_device *device; + struct iommu *iommu; + int session_fd; + int ret; + + iommu = iommu_init("iommufd"); + device = vfio_pci_device_init(device_bdf, iommu); + + create_state_file(luo_fd, STATE_SESSION, STATE_TOKEN, /*next_stage=*/2); + + session_fd = luo_create_session(luo_fd, DEVICE_SESSION); + VFIO_ASSERT_GE(session_fd, 0); + + printf("Preserving device in session\n"); + ret = luo_session_preserve_fd(session_fd, device->fd, DEVICE_TOKEN); + VFIO_ASSERT_EQ(ret, 0); + + close(luo_fd); + daemonize_and_wait(); +} + +static void after_kexec(int luo_fd, int state_session_fd) +{ + struct vfio_pci_device *device; + struct iommu *iommu; + int session_fd; + int device_fd; + int stage; + + restore_and_read_stage(state_session_fd, STATE_TOKEN, &stage); + VFIO_ASSERT_EQ(stage, 2); + + session_fd = luo_retrieve_session(luo_fd, DEVICE_SESSION); + VFIO_ASSERT_GE(session_fd, 0); + + printf("Finishing the session before retrieving the device (should fail)\n"); + VFIO_ASSERT_NE(luo_session_finish(session_fd), 0); + + printf("Retrieving the device FD from LUO\n"); + device_fd = luo_session_retrieve_fd(session_fd, DEVICE_TOKEN); + VFIO_ASSERT_GE(device_fd, 0); + + printf("Binding the device to an iommufd and setting it up\n"); + iommu = iommu_init("iommufd"); + + /* + * This will invoke various ioctls on device_fd such as + * VFIO_DEVICE_GET_INFO. So this is a decent sanity test + * that LUO actually handed us back a valid VFIO device + * file and not something else. + */ + device = __vfio_pci_device_init(device_bdf, iommu, device_fd); + + printf("Finishing the session\n"); + VFIO_ASSERT_EQ(luo_session_finish(session_fd), 0); + + vfio_pci_device_cleanup(device); + iommu_cleanup(iommu); +} + +int main(int argc, char *argv[]) +{ + device_bdf = vfio_selftests_get_bdf(&argc, argv); + return luo_test(argc, argv, STATE_SESSION, before_kexec, after_kexec); +}
Add a selftest to exercise preserving a various VFIO files through /dev/liveupdate. Ensure that VFIO cdev device files can be preserved and everything else (group-based device files, group files, and container files) all fail.
Signed-off-by: David Matlack dmatlack@google.com --- tools/testing/selftests/vfio/Makefile | 1 + .../vfio/vfio_pci_liveupdate_uapi_test.c | 93 +++++++++++++++++++ 2 files changed, 94 insertions(+) create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test.c
diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile index 424649049d94..9b2a4f10c558 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -4,6 +4,7 @@ TEST_GEN_PROGS += vfio_iommufd_setup_test TEST_GEN_PROGS += vfio_pci_device_test TEST_GEN_PROGS += vfio_pci_device_init_perf_test TEST_GEN_PROGS += vfio_pci_driver_test +TEST_GEN_PROGS += vfio_pci_liveupdate_uapi_test
TEST_PROGS_EXTENDED := scripts/cleanup.sh TEST_PROGS_EXTENDED := scripts/lib.sh diff --git a/tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test.c b/tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test.c new file mode 100644 index 000000000000..3b4276b2532c --- /dev/null +++ b/tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test.c @@ -0,0 +1,93 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <libliveupdate.h> +#include <libvfio.h> +#include <kselftest_harness.h> + +static const char *device_bdf; + +FIXTURE(vfio_pci_liveupdate_uapi_test) { + int luo_fd; + int session_fd; + struct iommu *iommu; + struct vfio_pci_device *device; +}; + +FIXTURE_VARIANT(vfio_pci_liveupdate_uapi_test) { + const char *iommu_mode; +}; + +#define FIXTURE_VARIANT_ADD_IOMMU_MODE(_iommu_mode) \ +FIXTURE_VARIANT_ADD(vfio_pci_liveupdate_uapi_test, _iommu_mode) { \ + .iommu_mode = #_iommu_mode, \ +} + +FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(); +#undef FIXTURE_VARIANT_ADD_IOMMU_MODE + +FIXTURE_SETUP(vfio_pci_liveupdate_uapi_test) +{ + self->luo_fd = luo_open_device(); + ASSERT_GE(self->luo_fd, 0); + + self->session_fd = luo_create_session(self->luo_fd, "session"); + ASSERT_GE(self->session_fd, 0); + + self->iommu = iommu_init(variant->iommu_mode); + self->device = vfio_pci_device_init(device_bdf, self->iommu); +} + +FIXTURE_TEARDOWN(vfio_pci_liveupdate_uapi_test) +{ + vfio_pci_device_cleanup(self->device); + iommu_cleanup(self->iommu); + close(self->session_fd); + close(self->luo_fd); +} + +TEST_F(vfio_pci_liveupdate_uapi_test, preserve_device) +{ + int ret; + + ret = luo_session_preserve_fd(self->session_fd, self->device->fd, 0); + + /* Preservation should only be supported for VFIO cdev files. */ + ASSERT_EQ(ret, self->iommu->iommufd ? 0 : -ENOENT); +} + +TEST_F(vfio_pci_liveupdate_uapi_test, preserve_group_fails) +{ + int ret; + + if (self->iommu->iommufd) + return; + + ret = luo_session_preserve_fd(self->session_fd, self->device->group_fd, 0); + ASSERT_EQ(ret, -ENOENT); +} + +TEST_F(vfio_pci_liveupdate_uapi_test, preserve_container_fails) +{ + int ret; + + if (self->iommu->iommufd) + return; + + ret = luo_session_preserve_fd(self->session_fd, self->iommu->container_fd, 0); + ASSERT_EQ(ret, -ENOENT); +} + +int main(int argc, char *argv[]) +{ + int fd; + + fd = luo_open_device(); + if (fd < 0) { + printf("open(%s) failed: %s, skipping\n", LUO_DEVICE, strerror(errno)); + return KSFT_SKIP; + } + close(fd); + + device_bdf = vfio_selftests_get_bdf(&argc, argv); + return test_harness_run(argc, argv); +}
Expose a few low-level helper routings for setting up vfio_pci_device structs. These routines will be used in a subsequent commit to assert that VFIO_GROUP_GET_DEVICE_FD fails under certain conditions.
Signed-off-by: David Matlack dmatlack@google.com --- .../lib/include/libvfio/vfio_pci_device.h | 5 +++ .../selftests/vfio/lib/vfio_pci_device.c | 33 +++++++++++++------ 2 files changed, 28 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h index 896dfde88118..2389c7698335 100644 --- a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h +++ b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h @@ -125,4 +125,9 @@ static inline bool vfio_pci_device_match(struct vfio_pci_device *device,
const char *vfio_pci_get_cdev_path(const char *bdf);
+/* Low-level routines for setting up a struct vfio_pci_device */ +struct vfio_pci_device *vfio_pci_device_alloc(const char *bdf, struct iommu *iommu); +void vfio_pci_group_setup(struct vfio_pci_device *device); +void vfio_pci_iommu_setup(struct vfio_pci_device *device); + #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DEVICE_H */ diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index e9423dc3864a..c1a3886dee30 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -199,7 +199,7 @@ static unsigned int vfio_pci_get_group_from_dev(const char *bdf) return group; }
-static void vfio_pci_group_setup(struct vfio_pci_device *device, const char *bdf) +void vfio_pci_group_setup(struct vfio_pci_device *device) { struct vfio_group_status group_status = { .argsz = sizeof(group_status), @@ -207,7 +207,7 @@ static void vfio_pci_group_setup(struct vfio_pci_device *device, const char *bdf char group_path[32]; int group;
- group = vfio_pci_get_group_from_dev(bdf); + group = vfio_pci_get_group_from_dev(device->bdf); snprintf(group_path, sizeof(group_path), "/dev/vfio/%d", group);
device->group_fd = open(group_path, O_RDWR); @@ -219,14 +219,12 @@ static void vfio_pci_group_setup(struct vfio_pci_device *device, const char *bdf ioctl_assert(device->group_fd, VFIO_GROUP_SET_CONTAINER, &device->iommu->container_fd); }
-static void vfio_pci_container_setup(struct vfio_pci_device *device, const char *bdf) +void vfio_pci_iommu_setup(struct vfio_pci_device *device) { struct iommu *iommu = device->iommu; unsigned long iommu_type = iommu->mode->iommu_type; int ret;
- vfio_pci_group_setup(device, bdf); - ret = ioctl(iommu->container_fd, VFIO_CHECK_EXTENSION, iommu_type); VFIO_ASSERT_GT(ret, 0, "VFIO IOMMU type %lu not supported\n", iommu_type);
@@ -236,8 +234,14 @@ static void vfio_pci_container_setup(struct vfio_pci_device *device, const char * because the IOMMU type is already set. */ (void)ioctl(iommu->container_fd, VFIO_SET_IOMMU, (void *)iommu_type); +}
- device->fd = ioctl(device->group_fd, VFIO_GROUP_GET_DEVICE_FD, bdf); +static void vfio_pci_container_setup(struct vfio_pci_device *device) +{ + vfio_pci_group_setup(device); + vfio_pci_iommu_setup(device); + + device->fd = ioctl(device->group_fd, VFIO_GROUP_GET_DEVICE_FD, device->bdf); VFIO_ASSERT_GE(device->fd, 0); }
@@ -337,9 +341,7 @@ static void vfio_pci_iommufd_setup(struct vfio_pci_device *device, vfio_device_attach_iommufd_pt(device->fd, device->iommu->ioas_id); }
-struct vfio_pci_device *__vfio_pci_device_init(const char *bdf, - struct iommu *iommu, - int device_fd) +struct vfio_pci_device *vfio_pci_device_alloc(const char *bdf, struct iommu *iommu) { struct vfio_pci_device *device;
@@ -349,9 +351,20 @@ struct vfio_pci_device *__vfio_pci_device_init(const char *bdf, device->bdf = bdf; device->iommu = iommu;
+ return device; +} + +struct vfio_pci_device *__vfio_pci_device_init(const char *bdf, + struct iommu *iommu, + int device_fd) +{ + struct vfio_pci_device *device; + + device = vfio_pci_device_alloc(bdf, iommu); + if (device->iommu->mode->container_path) { VFIO_ASSERT_EQ(device_fd, -1); - vfio_pci_container_setup(device, bdf); + vfio_pci_container_setup(device); } else { vfio_pci_iommufd_setup(device, bdf, device_fd); }
Verify that opening a VFIO device through its cdev file and via VFIO_GROUP_GET_DEVICE_FD both fail with -EBUSY if the device was preserved across a Live Update. When a device file is preserve across a Live Update, the file must be retrieved from /dev/liveupdate, not from VFIO directly.
Signed-off-by: David Matlack dmatlack@google.com --- .../vfio/vfio_pci_liveupdate_kexec_test.c | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+)
diff --git a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c index 95644c3bd2d3..925c5fc30d56 100644 --- a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c +++ b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c @@ -36,6 +36,42 @@ static void before_kexec(int luo_fd) daemonize_and_wait(); }
+static void check_open_vfio_device_fails(void) +{ + const char *cdev_path = vfio_pci_get_cdev_path(device_bdf); + struct vfio_pci_device *device; + struct iommu *iommu; + int ret, i; + + printf("Checking open(%s) fails\n", cdev_path); + ret = open(cdev_path, O_RDWR); + VFIO_ASSERT_EQ(ret, -1); + VFIO_ASSERT_EQ(errno, EBUSY); + free((void *)cdev_path); + + for (i = 0; i < nr_iommu_modes; i++) { + if (!iommu_modes[i].container_path) + continue; + + iommu = iommu_init(iommu_modes[i].name); + + device = vfio_pci_device_alloc(device_bdf, iommu); + vfio_pci_group_setup(device); + vfio_pci_iommu_setup(device); + + printf("Checking ioctl(group_fd, VFIO_GROUP_GET_DEVICE_FD, "%s") fails (%s)\n", + device_bdf, iommu_modes[i].name); + + ret = ioctl(device->group_fd, VFIO_GROUP_GET_DEVICE_FD, device->bdf); + VFIO_ASSERT_EQ(ret, -1); + VFIO_ASSERT_EQ(errno, EBUSY); + + close(device->group_fd); + free(device); + iommu_cleanup(iommu); + } +} + static void after_kexec(int luo_fd, int state_session_fd) { struct vfio_pci_device *device; @@ -44,6 +80,8 @@ static void after_kexec(int luo_fd, int state_session_fd) int device_fd; int stage;
+ check_open_vfio_device_fails(); + restore_and_read_stage(state_session_fd, STATE_TOKEN, &stage); VFIO_ASSERT_EQ(stage, 2);
Add a long-running DMA memcpy operation to vfio_pci_liveupdate_kexec_test so that the device attempts to perform DMAs continuously during the Live Update.
At this point iommufd preservation is not supported and bus mastering is not kept enabled on the device during across the kexec, so most of these DMAs will be dropped. However this test ensures that the current device preservation support does not lead to system instability or crashes if the device is active. And once iommufd and bus mastering are preserved, this test can be relaxed to check that the DMA operations completed successfully.
Signed-off-by: David Matlack dmatlack@google.com --- .../vfio/vfio_pci_liveupdate_kexec_test.c | 135 ++++++++++++++++++ 1 file changed, 135 insertions(+)
diff --git a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c index 925c5fc30d56..3af50110a65d 100644 --- a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c +++ b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c @@ -1,8 +1,16 @@ // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/sizes.h> +#include <sys/mman.h> + #include <libliveupdate.h> #include <libvfio.h>
+#define MEMCPY_SIZE SZ_1G +#define DRIVER_SIZE SZ_1M +#define MEMFD_SIZE (MEMCPY_SIZE + DRIVER_SIZE) + +static struct dma_region memcpy_region; static const char *device_bdf;
#define STATE_SESSION "567e1b8d-daf1-4a4a-ac13-35d6d275f646" @@ -11,8 +19,100 @@ static const char *device_bdf; enum { STATE_TOKEN, DEVICE_TOKEN, + MEMFD_TOKEN, };
+static void dma_memcpy_one(struct vfio_pci_device *device) +{ + void *src = memcpy_region.vaddr, *dst; + u64 size; + + size = min_t(u64, memcpy_region.size / 2, device->driver.max_memcpy_size); + dst = src + size; + + memset(src, 1, size); + memset(dst, 0, size); + + printf("Kicking off 1 DMA memcpy operations of size 0x%lx...\n", size); + vfio_pci_driver_memcpy(device, + to_iova(device, src), + to_iova(device, dst), + size); + + VFIO_ASSERT_EQ(memcmp(src, dst, size), 0); +} + +static void dma_memcpy_start(struct vfio_pci_device *device) +{ + void *src = memcpy_region.vaddr, *dst; + u64 count, size; + + size = min_t(u64, memcpy_region.size / 2, device->driver.max_memcpy_size); + dst = src + size; + + /* + * Rough Math: If we assume the device will perform memcpy at a rate of + * 30GB/s then 7200GB of transfers will run for about 4 minutes. + */ + count = (u64)7200 * SZ_1G / size; + count = min_t(u64, count, device->driver.max_memcpy_count); + + memset(src, 1, size / 2); + memset(dst, 0, size / 2); + + printf("Kicking off %lu DMA memcpy operations of size 0x%lx...\n", count, size); + vfio_pci_driver_memcpy_start(device, + to_iova(device, src), + to_iova(device, dst), + size, count); +} + +static void dma_memfd_map(struct vfio_pci_device *device, int fd) +{ + void *vaddr; + + vaddr = mmap(NULL, MEMFD_SIZE, PROT_WRITE, MAP_SHARED, fd, 0); + VFIO_ASSERT_NE(vaddr, MAP_FAILED); + + memcpy_region.iova = SZ_4G; + memcpy_region.size = MEMCPY_SIZE; + memcpy_region.vaddr = vaddr; + iommu_map(device->iommu, &memcpy_region); + + device->driver.region.iova = memcpy_region.iova + memcpy_region.size; + device->driver.region.size = DRIVER_SIZE; + device->driver.region.vaddr = vaddr + memcpy_region.size; + iommu_map(device->iommu, &device->driver.region); +} + +static void dma_memfd_setup(struct vfio_pci_device *device, int session_fd) +{ + int fd, ret; + + fd = memfd_create("dma-buffer", 0); + VFIO_ASSERT_GE(fd, 0); + + ret = fallocate(fd, 0, 0, MEMFD_SIZE); + VFIO_ASSERT_EQ(ret, 0); + + printf("Preserving memfd of size 0x%x in session\n", MEMFD_SIZE); + ret = luo_session_preserve_fd(session_fd, fd, MEMFD_TOKEN); + VFIO_ASSERT_EQ(ret, 0); + + dma_memfd_map(device, fd); +} + +static void dma_memfd_restore(struct vfio_pci_device *device, int session_fd) +{ + int fd; + + printf("Retrieving memfd from LUO\n"); + fd = luo_session_retrieve_fd(session_fd, MEMFD_TOKEN); + VFIO_ASSERT_GE(fd, 0); + + dma_memfd_map(device, fd); +} + static void before_kexec(int luo_fd) { struct vfio_pci_device *device; @@ -32,6 +132,27 @@ static void before_kexec(int luo_fd) ret = luo_session_preserve_fd(session_fd, device->fd, DEVICE_TOKEN); VFIO_ASSERT_EQ(ret, 0);
+ dma_memfd_setup(device, session_fd); + + /* + * If the device has a selftests driver, kick off a long-running DMA + * operation to exercise the device trying to DMA during a Live Update. + * Since iommufd preservation is not supported yet, these DMAs should be + * dropped. So this is just looking to verify that the system does not + * fall over and crash as a result of a busy device being preserved. + */ + if (device->driver.ops) { + vfio_pci_driver_init(device); + dma_memcpy_start(device); + + /* + * Disable interrupts on the device or freeze() will fail. + * Unfortunately there isn't a way to easily have a test for + * that here since the check happens during shutdown. + */ + vfio_pci_msix_disable(device); + } + close(luo_fd); daemonize_and_wait(); } @@ -106,9 +227,23 @@ static void after_kexec(int luo_fd, int state_session_fd) */ device = __vfio_pci_device_init(device_bdf, iommu, device_fd);
+ dma_memfd_restore(device, session_fd); + printf("Finishing the session\n"); VFIO_ASSERT_EQ(luo_session_finish(session_fd), 0);
+ /* + * Once iommufd preservation is supported and the device is kept fully + * running across the Live Update, this should wait for the long- + * running DMA memcpy operation kicked off in before_kexec() to + * complete. But for now we expect the device to be reset so just + * trigger a single memcpy to make sure it's still functional. + */ + if (device->driver.ops) { + vfio_pci_driver_init(device); + dma_memcpy_one(device); + } + vfio_pci_device_cleanup(device); iommu_cleanup(iommu); }
linux-kselftest-mirror@lists.linaro.org