Hello,
This series adds the live update support in the VFIO PCI subsystem on top of Live Update Orchestrator (LUO) [1].
This series can also be found on GitHub:
https://github.com/shvipin/linux vfio/liveupdate/rfc-v1
Goal of live update in VFIO subsystem is to preserve VFIO PCI devices while the host kernel is going through a live update. A preserved device means it can continue to work, perform DMA, not get reset while host under live update gets rebooted via kexec.
This series registers VFIO with LUO, implements LUO callbacks, skip DMA clear, skip device reset, preserves and restores a device virtual config during live update. I have added a selftest towards the end of this series, vfio_pci_liveupdate_test, which sets certain properties of a VFIO PCI device, performs a live update, and then validates those properties are still same on the device.
Overall flow for a VFIO device going through a live update will be something like:
1. Userspace passes a VFIO cdev FD along with a token to LUO for preservation. 2. LUO passes FD to VFIO subsystem to verify if FD can be preserved. If yes, it increases the refcount on the FD. 3. Eventually, userspace tells LUO to prepare for live update which results in LUO calling prepare() callback to each of its register filesystem handler with the passed FD it should be preparing. 4. VFIO subsystem saves certain properties which will be either lost or hard to recover from the device. 5. VFIO saves the needed data to KHO and provide LUO with the physical address of the data preserved by KHO. 6. Userspace sends FREEZE event to freeze the system. LUO forwards this to each of its registered subsystem. 7. VFIO disables interrupts configured on the device during freeze call. 8. Userspace performs kexec. 9. During kexec reboot, generally, all PCI devices gets their Bus Master Enable bit disabled. In live update case, preserved VFIO devices are skipped. 9. During boot, usual device enumeration happens and LUO also intializes itself. 10. Userspace uses the same token value (step 1), and ask LUO to return VFIO FD corresponding to token. 11. LUO ask VFIO to return VFIO cdev FD corresponding to the token. It gives it the physical address which VFIO returned it in step 5. 12. VFIO restore the KHO data and read the BDF value it saved. It iterates through all of the VFIO device it has in its VFIO cdev class and finds the BDF device. 13. VFIO creates an anonymous inode and file corresponding to the VFIO PCI device and returns it to LUO and LUO returns it to userspace. 14. Now FD returned to userspace works exactly same as if userspace has opened a VFIO device from /dev/vfio/device/* location. 15. It makes usual bind iommufd and attach page table calls. 16. During bind, when VFIO device is internally opened for the first time: - VFIO skips Bus Master Disable - VFIO skips device reset. - VFIO instead of initializing vconfig from the scratch uses the vconfig stored in KHO, and same for few other fields.
This is what current series is implementing and validating through selftest.
There are other things are which not implemented yet and some are also dependent on other subsystems. For example:
1. Once a device has been prepared, VFIO should not allow any changes to its state from userspace for example, changing PCI config values, resetting the device, etc. 2. Device IOVA is not preserved in this series. This work is done separately in IOMMMUFD live update preservation [2] 3. During PCI device enumeration, PCI subsystem writes to PCI config space, attach device to its original driver if present. This work is being done in PCI preservation [3]. 4. Enabling PCI device done in VFIO subsystem should be handled in PCI subsystem. Current, this patch series hasn't changed the behavior. 5. If live update gets canceled, interrupts which are disabled in freeze need to be reconfigured again. 6. In finish, if a device is not restored, how to know if KHO folio has been restored or not. 6. VFIO cdev is restored in anonymous file system. This should instead be done on devetmpfs
For reviewers, following are the grouping of patches in this series:
Patches 1-4 ----------- Feel free to ignore if you are only interested in VFIO.
These are only for live update selftests. I had to make some changes on top LUO v4 series, to create a library out of them which can be used in other selftests (vfio), and fix some build issues.
Patches 5-9 ----------- Adds basic live update support in VFIO.
Registers to LUO, saves the device BDF in KHO during prepare, and returns VFIO cdev FD during restore.
It doesn't save or skip anything else.
Patches 10-17 ------------- Adds support for skipping certain opertions and preserving certain data needed to restore a device.
Patches 18-21 ------------- - Integrate VFIO selftest with live update selftest library. - Adds a basic vfio_pci_liveupdate_test test which validates that Bus Master Enable bit is preserved, and virtual config is restored properly.
Testing -------
I have done testing on QEMU with a test pci device and also on a bare metal with Intel DSA device. Make sure IDXD driver is not built in your kernel if testing with Intel DSA device. Basically, whichever device you use, it should not get auto-bind to any other driver.
Important config options which should be enabled to test this series:
- CONFIG_KEXEC_FILE - CONFIG_LIVEUPDATE - CONFIG_KEXEC_HANDOVER
Besides this usual VFIO, VFIO_PCI, IOMMU and other dependencies are enabled.
To build the test provide KHDR_INCLUDES to your make command if your headers are out-of-tree.
KHDR_INCLUDES="-isystem ../../../../build/usr/include" make
vfio_pci_liveupdate_test needs to be executed manually. This test needs to be executed two times; one before the live update and second after.
./run.sh -d 0000:00:04.0 vfio_pci_liveupdate_test
Next Steps ----------
1. Looking forward to feedback on this series. - What other things we should save? - Which things should not be saved? - Any locks or incorrect locking done in the series. - Any optimizations. 2. Integration with IOMMUFD and PCI series for complete workflow where a device continues a DMA while undergoing through live update.
I will be going on a paternity leave soon, so, my responses gonna be intermittent. David Matlack (dmatlack@google.com) has graciously offered to work on this series and continue upstream engagement on this feature until I am back. Thank you, David!
[1] LUO-v4: https://lore.kernel.org/linux-mm/20250929010321.3462457-1-pasha.tatashin@sol... [2] IOMMUFD: https://lore.kernel.org/linux-iommu/20250928190624.3735830-1-skhawaja@google... [3] PCI: https://lore.kernel.org/linux-pci/20250916-luo-pci-v2-0-c494053c3c08@kernel....
Vipin Sharma (21): selftests/liveupdate: Build tests from the selftests/liveupdate directory selftests/liveupdate: Create library of core live update ioctls selftests/liveupdate: Move do_kexec.sh script to liveupdate/lib selftests/liveupdate: Move LUO ioctls calls to liveupdate library vfio/pci: Register VFIO live update file handler to Live Update Orchestrator vfio/pci: Accept live update preservation request for VFIO cdev vfio/pci: Store VFIO PCI device preservation data in KHO for live update vfio/pci: Retrieve preserved VFIO device for Live Update Orechestrator vfio/pci: Add Live Update finish callback implementation PCI: Add option to skip Bus Master Enable reset during kexec vfio/pci: Skip clearing bus master on live update device during kexec vfio/pci: Skip clearing bus master on live update restored device vfio/pci: Preserve VFIO PCI config space through live update vfio/pci: Skip device reset on live update restored device. PCI: Make PCI saved state and capability structs public vfio/pci: Save and restore the PCI state of the VFIO device vfio/pci: Disable interrupts before going live update kexec vfio: selftests: Build liveupdate library in VFIO selftests vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD vfio: selftests: Add VFIO live update test vfio: selftests: Validate vconfig preservation of VFIO PCI device during live update
drivers/pci/pci-driver.c | 6 +- drivers/pci/pci.c | 5 - drivers/pci/pci.h | 7 - drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/vfio_pci_config.c | 17 + drivers/vfio/pci/vfio_pci_core.c | 31 +- drivers/vfio/pci/vfio_pci_liveupdate.c | 461 ++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 17 + drivers/vfio/vfio_main.c | 20 +- include/linux/pci.h | 15 + include/linux/vfio.h | 8 + include/linux/vfio_pci_core.h | 1 + tools/testing/selftests/liveupdate/.gitignore | 7 +- tools/testing/selftests/liveupdate/Makefile | 31 +- .../liveupdate/{ => lib}/do_kexec.sh | 0 .../liveupdate/lib/include/liveupdate_util.h | 27 + .../selftests/liveupdate/lib/libliveupdate.mk | 18 + .../liveupdate/lib/liveupdate_util.c | 106 ++++ .../selftests/liveupdate/luo_multi_file.c | 2 - .../selftests/liveupdate/luo_multi_kexec.c | 2 - .../selftests/liveupdate/luo_multi_session.c | 2 - .../selftests/liveupdate/luo_test_utils.c | 73 +-- .../selftests/liveupdate/luo_test_utils.h | 10 +- .../selftests/liveupdate/luo_unreclaimed.c | 1 - tools/testing/selftests/vfio/Makefile | 15 +- .../selftests/vfio/lib/include/vfio_util.h | 1 + .../selftests/vfio/lib/vfio_pci_device.c | 33 +- .../selftests/vfio/vfio_pci_liveupdate_test.c | 116 +++++ 28 files changed, 900 insertions(+), 133 deletions(-) create mode 100644 drivers/vfio/pci/vfio_pci_liveupdate.c rename tools/testing/selftests/liveupdate/{ => lib}/do_kexec.sh (100%) create mode 100644 tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h create mode 100644 tools/testing/selftests/liveupdate/lib/libliveupdate.mk create mode 100644 tools/testing/selftests/liveupdate/lib/liveupdate_util.c create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_test.c
base-commit: e48be01cadc981362646dc3a87d57316421590a5
Build selftests from liveupdate directory
Signed-off-by: Vipin Sharma vipinsh@google.com --- tools/testing/selftests/liveupdate/.gitignore | 7 ++++-- tools/testing/selftests/liveupdate/Makefile | 25 ++++++++++--------- 2 files changed, 18 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore index de7ca45d3892..da3a50a32aeb 100644 --- a/tools/testing/selftests/liveupdate/.gitignore +++ b/tools/testing/selftests/liveupdate/.gitignore @@ -1,2 +1,5 @@ -/liveupdate -/luo_multi_kexec +liveupdate +luo_multi_kexec +luo_multi_file +luo_multi_session +luo_unreclaimed diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile index 25a6dec790bb..fbcacbd1b798 100644 --- a/tools/testing/selftests/liveupdate/Makefile +++ b/tools/testing/selftests/liveupdate/Makefile @@ -1,10 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only
-KHDR_INCLUDES ?= -I../../../usr/include -CFLAGS += -Wall -O2 -Wno-unused-function -CFLAGS += $(KHDR_INCLUDES) -LDFLAGS += -static - # --- Test Configuration (Edit this section when adding new tests) --- LUO_SHARED_SRCS := luo_test_utils.c LUO_SHARED_HDRS += luo_test_utils.h @@ -25,6 +20,12 @@ TEST_GEN_PROGS := $(LUO_MAIN_TESTS)
liveupdate_SOURCES := liveupdate.c $(LUO_SHARED_SRCS)
+include ../lib.mk + +CFLAGS += -Wall -O2 -Wno-unused-function +CFLAGS += $(KHDR_INCLUDES) +LDFLAGS += -static + $(OUTPUT)/liveupdate: $(liveupdate_SOURCES) $(LUO_SHARED_HDRS) $(call msg,LINK,,$@) $(Q)$(LINK.c) $^ $(LDLIBS) -o $@ @@ -33,16 +34,16 @@ $(OUTPUT)/liveupdate: $(liveupdate_SOURCES) $(LUO_SHARED_HDRS) $(foreach test,$(LUO_MANUAL_TESTS), \ $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
+define BUILD_RULE_TEMPLATE +$(OUTPUT)/$(1): $($(1)_SOURCES) $(LUO_SHARED_HDRS) + $(call msg,LINK,,$$@) + $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ + $(Q)chmod +x $$@ +endef # This loop automatically generates an explicit build rule for each manual test. # It includes dependencies on the shared headers and makes the output # executable. # Note the use of '$$' to escape automatic variables for the 'eval' command. $(foreach test,$(LUO_MANUAL_TESTS), \ - $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \ - $(call msg,LINK,,$$@) ; \ - $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \ - $(Q)chmod +x $$@ \ - ) \ + $(eval $(call BUILD_RULE_TEMPLATE,$(test))) \ ) - -include ../lib.mk
Create liveupdate_util.mk library of core live update APIs which can be shared outside of liveupdate selftests, for example, VFIO selftests.
Shared library avoids the need for VFIO to define its own APIs to interact with liveupdate ioctls.
No functional changes intended, in this patch only few functions are moved to library without changing the code.
Signed-off-by: Vipin Sharma vipinsh@google.com --- tools/testing/selftests/liveupdate/Makefile | 6 +- .../liveupdate/lib/include/liveupdate_util.h | 23 +++++++ .../selftests/liveupdate/lib/libliveupdate.mk | 17 +++++ .../liveupdate/lib/liveupdate_util.c | 68 +++++++++++++++++++ .../selftests/liveupdate/luo_test_utils.c | 55 +-------------- .../selftests/liveupdate/luo_test_utils.h | 10 +-- 6 files changed, 114 insertions(+), 65 deletions(-) create mode 100644 tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h create mode 100644 tools/testing/selftests/liveupdate/lib/libliveupdate.mk create mode 100644 tools/testing/selftests/liveupdate/lib/liveupdate_util.c
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile index fbcacbd1b798..79d1c525f03c 100644 --- a/tools/testing/selftests/liveupdate/Makefile +++ b/tools/testing/selftests/liveupdate/Makefile @@ -26,7 +26,9 @@ CFLAGS += -Wall -O2 -Wno-unused-function CFLAGS += $(KHDR_INCLUDES) LDFLAGS += -static
-$(OUTPUT)/liveupdate: $(liveupdate_SOURCES) $(LUO_SHARED_HDRS) +include lib/libliveupdate.mk + +$(OUTPUT)/liveupdate: $(liveupdate_SOURCES) $(LUO_SHARED_HDRS) $(LIBLIVEUPDATE_O) $(call msg,LINK,,$@) $(Q)$(LINK.c) $^ $(LDLIBS) -o $@
@@ -35,7 +37,7 @@ $(foreach test,$(LUO_MANUAL_TESTS), \ $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
define BUILD_RULE_TEMPLATE -$(OUTPUT)/$(1): $($(1)_SOURCES) $(LUO_SHARED_HDRS) +$(OUTPUT)/$(1): $($(1)_SOURCES) $(LUO_SHARED_HDRS) $(LIBLIVEUPDATE_O) $(call msg,LINK,,$$@) $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ $(Q)chmod +x $$@ diff --git a/tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h b/tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h new file mode 100644 index 000000000000..f938ce60edb7 --- /dev/null +++ b/tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * Copyright (c) 2025, Google LLC. + * Pasha Tatashin pasha.tatashin@soleen.com + */ + +#ifndef SELFTESTS_LIVEUPDATE_LIB_LIVEUPDATE_UTIL_H +#define SELFTESTS_LIVEUPDATE_LIB_LIVEUPDATE_UTIL_H + +#include <linux/liveupdate.h> + +#define LUO_DEVICE "/dev/liveupdate" + +int luo_open_device(void); +int luo_create_session(int luo_fd, const char *name); +int luo_retrieve_session(int luo_fd, const char *name); + +int luo_set_session_event(int session_fd, enum liveupdate_event event); +int luo_set_global_event(int luo_fd, enum liveupdate_event event); +int luo_get_global_state(int luo_fd, enum liveupdate_state *state); + +#endif /* SELFTESTS_LIVEUPDATE_LIB_LIVEUPDATE_UTIL_H */ diff --git a/tools/testing/selftests/liveupdate/lib/libliveupdate.mk b/tools/testing/selftests/liveupdate/lib/libliveupdate.mk new file mode 100644 index 000000000000..b3fc2580a7cf --- /dev/null +++ b/tools/testing/selftests/liveupdate/lib/libliveupdate.mk @@ -0,0 +1,17 @@ +LIBLIVEUPDATE_SRCDIR := $(selfdir)/liveupdate/lib + +LIBLIVEUPDATE_C := liveupdate_util.c + +LIBLIVEUPDATE_OUTPUT := $(OUTPUT)/libliveupdate + +LIBLIVEUPDATE_O := $(patsubst %.c, $(LIBLIVEUPDATE_OUTPUT)/%.o, $(LIBLIVEUPDATE_C)) + +LIBLIVEUPDATE_O_DIRS := $(shell dirname $(LIBLIVEUPDATE_O) | uniq) +$(shell mkdir -p $(LIBLIVEUPDATE_O_DIRS)) + +CFLAGS += -I$(LIBLIVEUPDATE_SRCDIR)/include + +$(LIBLIVEUPDATE_O): $(LIBLIVEUPDATE_OUTPUT)/%.o : $(LIBLIVEUPDATE_SRCDIR)/%.c + $(CC) $(CFLAGS) $(CPPFLAGS) -c $< -o $@ + +EXTRA_CLEAN += $(LIBLIVEUPDATE_OUTPUT) \ No newline at end of file diff --git a/tools/testing/selftests/liveupdate/lib/liveupdate_util.c b/tools/testing/selftests/liveupdate/lib/liveupdate_util.c new file mode 100644 index 000000000000..1e6fd9dd8fb9 --- /dev/null +++ b/tools/testing/selftests/liveupdate/lib/liveupdate_util.c @@ -0,0 +1,68 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Copyright (c) 2025, Google LLC. + * Pasha Tatashin pasha.tatashin@soleen.com + */ + +#define _GNU_SOURCE + +#include <liveupdate_util.h> +#include <linux/liveupdate.h> +#include <errno.h> +#include <stdio.h> +#include <fcntl.h> +#include <sys/ioctl.h> + +int luo_open_device(void) +{ + return open(LUO_DEVICE, O_RDWR); +} + +int luo_create_session(int luo_fd, const char *name) +{ + struct liveupdate_ioctl_create_session arg = { .size = sizeof(arg) }; + + snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s", + LIVEUPDATE_SESSION_NAME_LENGTH - 1, name); + if (ioctl(luo_fd, LIVEUPDATE_IOCTL_CREATE_SESSION, &arg) < 0) + return -errno; + return arg.fd; +} + +int luo_retrieve_session(int luo_fd, const char *name) +{ + struct liveupdate_ioctl_retrieve_session arg = { .size = sizeof(arg) }; + + snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s", + LIVEUPDATE_SESSION_NAME_LENGTH - 1, name); + if (ioctl(luo_fd, LIVEUPDATE_IOCTL_RETRIEVE_SESSION, &arg) < 0) + return -errno; + return arg.fd; +} + +int luo_set_session_event(int session_fd, enum liveupdate_event event) +{ + struct liveupdate_session_set_event arg = { .size = sizeof(arg) }; + + arg.event = event; + return ioctl(session_fd, LIVEUPDATE_SESSION_SET_EVENT, &arg); +} + +int luo_set_global_event(int luo_fd, enum liveupdate_event event) +{ + struct liveupdate_ioctl_set_event arg = { .size = sizeof(arg) }; + + arg.event = event; + return ioctl(luo_fd, LIVEUPDATE_IOCTL_SET_EVENT, &arg); +} + +int luo_get_global_state(int luo_fd, enum liveupdate_state *state) +{ + struct liveupdate_ioctl_get_state arg = { .size = sizeof(arg) }; + + if (ioctl(luo_fd, LIVEUPDATE_IOCTL_GET_STATE, &arg) < 0) + return -errno; + *state = arg.state; + return 0; +} diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.c b/tools/testing/selftests/liveupdate/luo_test_utils.c index c0840e6e66fd..0f5bc7260ccc 100644 --- a/tools/testing/selftests/liveupdate/luo_test_utils.c +++ b/tools/testing/selftests/liveupdate/luo_test_utils.c @@ -17,39 +17,12 @@ #include <sys/mman.h> #include <errno.h> #include <stdarg.h> - +#include <liveupdate_util.h> #include "luo_test_utils.h" #include "../kselftest.h"
/* The fail_exit function is now a macro in the header. */
-int luo_open_device(void) -{ - return open(LUO_DEVICE, O_RDWR); -} - -int luo_create_session(int luo_fd, const char *name) -{ - struct liveupdate_ioctl_create_session arg = { .size = sizeof(arg) }; - - snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s", - LIVEUPDATE_SESSION_NAME_LENGTH - 1, name); - if (ioctl(luo_fd, LIVEUPDATE_IOCTL_CREATE_SESSION, &arg) < 0) - return -errno; - return arg.fd; -} - -int luo_retrieve_session(int luo_fd, const char *name) -{ - struct liveupdate_ioctl_retrieve_session arg = { .size = sizeof(arg) }; - - snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s", - LIVEUPDATE_SESSION_NAME_LENGTH - 1, name); - if (ioctl(luo_fd, LIVEUPDATE_IOCTL_RETRIEVE_SESSION, &arg) < 0) - return -errno; - return arg.fd; -} - int create_and_preserve_memfd(int session_fd, int token, const char *data) { struct liveupdate_session_preserve_fd arg = { .size = sizeof(arg) }; @@ -119,32 +92,6 @@ int restore_and_verify_memfd(int session_fd, int token, return ret; }
-int luo_set_session_event(int session_fd, enum liveupdate_event event) -{ - struct liveupdate_session_set_event arg = { .size = sizeof(arg) }; - - arg.event = event; - return ioctl(session_fd, LIVEUPDATE_SESSION_SET_EVENT, &arg); -} - -int luo_set_global_event(int luo_fd, enum liveupdate_event event) -{ - struct liveupdate_ioctl_set_event arg = { .size = sizeof(arg) }; - - arg.event = event; - return ioctl(luo_fd, LIVEUPDATE_IOCTL_SET_EVENT, &arg); -} - -int luo_get_global_state(int luo_fd, enum liveupdate_state *state) -{ - struct liveupdate_ioctl_get_state arg = { .size = sizeof(arg) }; - - if (ioctl(luo_fd, LIVEUPDATE_IOCTL_GET_STATE, &arg) < 0) - return -errno; - *state = arg.state; - return 0; -} - void create_state_file(int luo_fd, int next_stage) { char buf[32]; diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.h b/tools/testing/selftests/liveupdate/luo_test_utils.h index e30cfcb0a596..4d371b528a01 100644 --- a/tools/testing/selftests/liveupdate/luo_test_utils.h +++ b/tools/testing/selftests/liveupdate/luo_test_utils.h @@ -11,9 +11,9 @@ #include <errno.h> #include <string.h> #include <linux/liveupdate.h> +#include <liveupdate_util.h> #include "../kselftest.h"
-#define LUO_DEVICE "/dev/liveupdate" #define STATE_SESSION_NAME "state_session" #define STATE_MEMFD_TOKEN 999
@@ -30,19 +30,11 @@ struct session_info { ksft_exit_fail_msg("[%s] " fmt " (errno: %s)\n", \ __func__, ##__VA_ARGS__, strerror(errno))
-int luo_open_device(void); - -int luo_create_session(int luo_fd, const char *name); -int luo_retrieve_session(int luo_fd, const char *name);
int create_and_preserve_memfd(int session_fd, int token, const char *data); int restore_and_verify_memfd(int session_fd, int token, const char *expected_data); int verify_session_and_get_fd(int luo_fd, struct session_info *s);
-int luo_set_session_event(int session_fd, enum liveupdate_event event); -int luo_set_global_event(int luo_fd, enum liveupdate_event event); -int luo_get_global_state(int luo_fd, enum liveupdate_state *state); - void create_state_file(int luo_fd, int next_stage); int restore_and_read_state(int luo_fd, int *stage); void update_state_file(int session_fd, int next_stage);
Move do_kexec.sh to lib directory in the liveupdate selftest directory. Add code in libliveupdate.mk to copy the script to generated libliveupdate directory during the build.
Script allows liveupdate library users to initiate kexec for liveupdate test flows.
Signed-off-by: Vipin Sharma vipinsh@google.com --- tools/testing/selftests/liveupdate/Makefile | 2 -- .../selftests/liveupdate/{ => lib}/do_kexec.sh | 0 .../liveupdate/lib/include/liveupdate_util.h | 2 ++ .../testing/selftests/liveupdate/lib/libliveupdate.mk | 1 + .../selftests/liveupdate/lib/liveupdate_util.c | 11 +++++++++++ tools/testing/selftests/liveupdate/luo_multi_file.c | 2 -- tools/testing/selftests/liveupdate/luo_multi_kexec.c | 2 -- .../testing/selftests/liveupdate/luo_multi_session.c | 2 -- tools/testing/selftests/liveupdate/luo_unreclaimed.c | 1 - 9 files changed, 14 insertions(+), 9 deletions(-) rename tools/testing/selftests/liveupdate/{ => lib}/do_kexec.sh (100%)
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile index 79d1c525f03c..f203fd681afe 100644 --- a/tools/testing/selftests/liveupdate/Makefile +++ b/tools/testing/selftests/liveupdate/Makefile @@ -9,8 +9,6 @@ LUO_MANUAL_TESTS += luo_multi_kexec LUO_MANUAL_TESTS += luo_multi_session LUO_MANUAL_TESTS += luo_unreclaimed
-TEST_FILES += do_kexec.sh - LUO_MAIN_TESTS += liveupdate
# --- Automatic Rule Generation (Do not edit below) --- diff --git a/tools/testing/selftests/liveupdate/do_kexec.sh b/tools/testing/selftests/liveupdate/lib/do_kexec.sh similarity index 100% rename from tools/testing/selftests/liveupdate/do_kexec.sh rename to tools/testing/selftests/liveupdate/lib/do_kexec.sh diff --git a/tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h b/tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h index f938ce60edb7..6ee9e124a1a4 100644 --- a/tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h +++ b/tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h @@ -11,10 +11,12 @@ #include <linux/liveupdate.h>
#define LUO_DEVICE "/dev/liveupdate" +#define KEXEC_SCRIPT "libliveupdate/do_kexec.sh"
int luo_open_device(void); int luo_create_session(int luo_fd, const char *name); int luo_retrieve_session(int luo_fd, const char *name); +int luo_session_preserve_fd(int session_fd, int fd, int token);
int luo_set_session_event(int session_fd, enum liveupdate_event event); int luo_set_global_event(int luo_fd, enum liveupdate_event event); diff --git a/tools/testing/selftests/liveupdate/lib/libliveupdate.mk b/tools/testing/selftests/liveupdate/lib/libliveupdate.mk index b3fc2580a7cf..ddb9b1a4363b 100644 --- a/tools/testing/selftests/liveupdate/lib/libliveupdate.mk +++ b/tools/testing/selftests/liveupdate/lib/libliveupdate.mk @@ -8,6 +8,7 @@ LIBLIVEUPDATE_O := $(patsubst %.c, $(LIBLIVEUPDATE_OUTPUT)/%.o, $(LIBLIVEUPDATE_
LIBLIVEUPDATE_O_DIRS := $(shell dirname $(LIBLIVEUPDATE_O) | uniq) $(shell mkdir -p $(LIBLIVEUPDATE_O_DIRS)) +$(shell cp -n $(LIBLIVEUPDATE_SRCDIR)/do_kexec.sh $(LIBLIVEUPDATE_OUTPUT))
CFLAGS += -I$(LIBLIVEUPDATE_SRCDIR)/include
diff --git a/tools/testing/selftests/liveupdate/lib/liveupdate_util.c b/tools/testing/selftests/liveupdate/lib/liveupdate_util.c index 1e6fd9dd8fb9..26fd6a7763a2 100644 --- a/tools/testing/selftests/liveupdate/lib/liveupdate_util.c +++ b/tools/testing/selftests/liveupdate/lib/liveupdate_util.c @@ -30,6 +30,17 @@ int luo_create_session(int luo_fd, const char *name) return arg.fd; }
+int luo_session_preserve_fd(int session_fd, int fd, int token) +{ + struct liveupdate_session_preserve_fd arg = { + .size = sizeof(arg), + .fd = fd, + .token = token + }; + + return ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg) < 0; +} + int luo_retrieve_session(int luo_fd, const char *name) { struct liveupdate_ioctl_retrieve_session arg = { .size = sizeof(arg) }; diff --git a/tools/testing/selftests/liveupdate/luo_multi_file.c b/tools/testing/selftests/liveupdate/luo_multi_file.c index ae38fe8aba4c..1a4f95046c75 100644 --- a/tools/testing/selftests/liveupdate/luo_multi_file.c +++ b/tools/testing/selftests/liveupdate/luo_multi_file.c @@ -7,8 +7,6 @@
#include "luo_test_utils.h"
-#define KEXEC_SCRIPT "./do_kexec.sh" - #define SESSION_NAME "multi_file_session" #define TOKEN_A 101 #define TOKEN_B 102 diff --git a/tools/testing/selftests/liveupdate/luo_multi_kexec.c b/tools/testing/selftests/liveupdate/luo_multi_kexec.c index 1f350990ee67..5cfecbc6d269 100644 --- a/tools/testing/selftests/liveupdate/luo_multi_kexec.c +++ b/tools/testing/selftests/liveupdate/luo_multi_kexec.c @@ -7,8 +7,6 @@
#include "luo_test_utils.h"
-#define KEXEC_SCRIPT "./do_kexec.sh" - #define NUM_SESSIONS 3
/* Helper to set up one session and all its files */ diff --git a/tools/testing/selftests/liveupdate/luo_multi_session.c b/tools/testing/selftests/liveupdate/luo_multi_session.c index 9ea96d7b997f..389d4b559cb3 100644 --- a/tools/testing/selftests/liveupdate/luo_multi_session.c +++ b/tools/testing/selftests/liveupdate/luo_multi_session.c @@ -8,8 +8,6 @@ #include "luo_test_utils.h" #include "../kselftest.h"
-#define KEXEC_SCRIPT "./do_kexec.sh" - #define NUM_SESSIONS 5 #define FILES_PER_SESSION 5
diff --git a/tools/testing/selftests/liveupdate/luo_unreclaimed.c b/tools/testing/selftests/liveupdate/luo_unreclaimed.c index c3921b21b97b..b31bb354bfc3 100644 --- a/tools/testing/selftests/liveupdate/luo_unreclaimed.c +++ b/tools/testing/selftests/liveupdate/luo_unreclaimed.c @@ -8,7 +8,6 @@ #include "luo_test_utils.h" #include "../kselftest.h"
-#define KEXEC_SCRIPT "./do_kexec.sh"
#define SESSION_NAME "unreclaimed_session" #define TOKEN_A 100
Move liveupdate ioctls call to liveupdate library.
This allows single place for luo ioctl interactions and provide other selftests to access them.
Signed-off-by: Vipin Sharma vipinsh@google.com --- .../liveupdate/lib/include/liveupdate_util.h | 2 ++ .../liveupdate/lib/liveupdate_util.c | 29 ++++++++++++++++++- .../selftests/liveupdate/luo_test_utils.c | 18 ++++-------- 3 files changed, 35 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h b/tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h index 6ee9e124a1a4..a5cb034f7692 100644 --- a/tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h +++ b/tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h @@ -17,6 +17,8 @@ int luo_open_device(void); int luo_create_session(int luo_fd, const char *name); int luo_retrieve_session(int luo_fd, const char *name); int luo_session_preserve_fd(int session_fd, int fd, int token); +int luo_session_unpreserve_fd(int session_fd, int token); +int luo_session_restore_fd(int session_fd, int token);
int luo_set_session_event(int session_fd, enum liveupdate_event event); int luo_set_global_event(int luo_fd, enum liveupdate_event event); diff --git a/tools/testing/selftests/liveupdate/lib/liveupdate_util.c b/tools/testing/selftests/liveupdate/lib/liveupdate_util.c index 26fd6a7763a2..96c6c1b65043 100644 --- a/tools/testing/selftests/liveupdate/lib/liveupdate_util.c +++ b/tools/testing/selftests/liveupdate/lib/liveupdate_util.c @@ -38,7 +38,34 @@ int luo_session_preserve_fd(int session_fd, int fd, int token) .token = token };
- return ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg) < 0; + if (ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg) < 0) + return -errno; + return 0; +} + +int luo_session_unpreserve_fd(int session_fd, int token) +{ + struct liveupdate_session_unpreserve_fd arg = { + .size = sizeof(arg), + .token = token + }; + + if (ioctl(session_fd, LIVEUPDATE_SESSION_UNPRESERVE_FD, &arg) < 0) + return -errno; + return 0; +} + +int luo_session_restore_fd(int session_fd, int token) +{ + struct liveupdate_session_restore_fd arg = { + .size = sizeof(arg), + .token = token + }; + + if (ioctl(session_fd, LIVEUPDATE_SESSION_RESTORE_FD, &arg) < 0) + return -errno; + return arg.fd; + }
int luo_retrieve_session(int luo_fd, const char *name) diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.c b/tools/testing/selftests/liveupdate/luo_test_utils.c index 0f5bc7260ccc..b1f7b5c79c07 100644 --- a/tools/testing/selftests/liveupdate/luo_test_utils.c +++ b/tools/testing/selftests/liveupdate/luo_test_utils.c @@ -12,7 +12,6 @@ #include <string.h> #include <fcntl.h> #include <unistd.h> -#include <sys/ioctl.h> #include <sys/syscall.h> #include <sys/mman.h> #include <errno.h> @@ -25,7 +24,6 @@
int create_and_preserve_memfd(int session_fd, int token, const char *data) { - struct liveupdate_session_preserve_fd arg = { .size = sizeof(arg) }; long page_size = sysconf(_SC_PAGE_SIZE); void *map = MAP_FAILED; int mfd = -1, ret = -1; @@ -44,9 +42,7 @@ int create_and_preserve_memfd(int session_fd, int token, const char *data) snprintf(map, page_size, "%s", data); munmap(map, page_size);
- arg.fd = mfd; - arg.token = token; - if (ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg) < 0) + if (luo_session_preserve_fd(session_fd, mfd, token)) goto out;
ret = 0; /* Success */ @@ -61,15 +57,13 @@ int create_and_preserve_memfd(int session_fd, int token, const char *data) int restore_and_verify_memfd(int session_fd, int token, const char *expected_data) { - struct liveupdate_session_restore_fd arg = { .size = sizeof(arg) }; long page_size = sysconf(_SC_PAGE_SIZE); void *map = MAP_FAILED; int mfd = -1, ret = -1;
- arg.token = token; - if (ioctl(session_fd, LIVEUPDATE_SESSION_RESTORE_FD, &arg) < 0) - return -errno; - mfd = arg.fd; + mfd = luo_session_restore_fd(session_fd, token); + if (mfd < 0) + return mfd;
map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, mfd, 0); if (map == MAP_FAILED) @@ -134,10 +128,8 @@ int restore_and_read_state(int luo_fd, int *stage) void update_state_file(int session_fd, int next_stage) { char buf[32]; - struct liveupdate_session_unpreserve_fd arg = { .size = sizeof(arg) };
- arg.token = STATE_MEMFD_TOKEN; - if (ioctl(session_fd, LIVEUPDATE_SESSION_UNPRESERVE_FD, &arg) < 0) + if (luo_session_unpreserve_fd(session_fd, STATE_MEMFD_TOKEN)) fail_exit("unpreserve failed");
snprintf(buf, sizeof(buf), "%d", next_stage);
Register VFIO live update file handler to Live Update Orchestrator. Provide stub implementation of the handler callbacks.
Adding live update support in VFIO will enable a VFIO PCI device to work uninterrupted while the host kernel is being updated through a kexec reboot.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/vfio_pci_core.c | 1 + drivers/vfio/pci/vfio_pci_liveupdate.c | 44 ++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 6 ++++ 4 files changed, 52 insertions(+) create mode 100644 drivers/vfio/pci/vfio_pci_liveupdate.c
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index cf00c0a7e55c..929df22c079b 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -2,6 +2,7 @@
vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o +vfio-pci-core-$(CONFIG_LIVEUPDATE) += vfio_pci_liveupdate.o obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
vfio-pci-y := vfio_pci.o diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 7dcf5439dedc..0894673a9262 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -2568,6 +2568,7 @@ static void vfio_pci_core_cleanup(void) static int __init vfio_pci_core_init(void) { /* Allocate shared config space permission data used by all devices */ + vfio_pci_liveupdate_init(); return vfio_pci_init_perm_bits(); }
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c new file mode 100644 index 000000000000..088f7698a72c --- /dev/null +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Liveupdate support for VFIO devices. + * + * Copyright (c) 2025, Google LLC. + * Vipin Sharma vipinsh@google.com + */ + +#include <linux/liveupdate.h> +#include <linux/errno.h> + +#include "vfio_pci_priv.h" + +static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_handler *handler, + u64 data, struct file **file) +{ + return -EOPNOTSUPP; +} + +static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *handler, + struct file *file) +{ + return -EOPNOTSUPP; +} + +static const struct liveupdate_file_ops vfio_pci_luo_fops = { + .retrieve = vfio_pci_liveupdate_retrieve, + .can_preserve = vfio_pci_liveupdate_can_preserve, + .owner = THIS_MODULE, +}; + +static struct liveupdate_file_handler vfio_pci_luo_handler = { + .ops = &vfio_pci_luo_fops, + .compatible = "vfio-v1", +}; + +void __init vfio_pci_liveupdate_init(void) +{ + int err = liveupdate_register_file_handler(&vfio_pci_luo_handler); + + if (err) + pr_err("VFIO PCI liveupdate file handler register failed, error %d.\n", err); +} diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index a9972eacb293..7779fd744ff5 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -107,4 +107,10 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; }
+#ifdef CONFIG_LIVEUPDATE +void vfio_pci_liveupdate_init(void); +#else +static inline void vfio_pci_liveupdate_init(void) { } +#endif /* CONFIG_LIVEUPDATE */ + #endif
Return true in can_preserve() callback of live update file handler, if VFIO can preserve the passed VFIO cdev file. Return -EOPNOTSUPP from prepare() callback for now to fail any attempt to preserve VFIO cdev in live update.
The VFIO cdev opened check ensures that the file is actually used for VFIO cdev and not for VFIO device FD which can be obtained from the VFIO group.
Returning true from can_preserve() tells Live Update Orchestrator that VFIO can try to preserve the given file during live update. Actual preservation logic will be added in future patches, therefore, for now, prepare call will fail.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/vfio/pci/vfio_pci_liveupdate.c | 16 +++++++++++++++- drivers/vfio/vfio_main.c | 3 ++- include/linux/vfio.h | 2 ++ 3 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index 088f7698a72c..2ce2c11cb51c 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -8,10 +8,17 @@ */
#include <linux/liveupdate.h> +#include <linux/vfio.h> #include <linux/errno.h>
#include "vfio_pci_priv.h"
+static int vfio_pci_liveupdate_prepare(struct liveupdate_file_handler *handler, + struct file *file, u64 *data) +{ + return -EOPNOTSUPP; +} + static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_handler *handler, u64 data, struct file **file) { @@ -21,10 +28,17 @@ static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_handler *handler, static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *handler, struct file *file) { - return -EOPNOTSUPP; + struct vfio_device *device = vfio_device_from_file(file); + + if (!device) + return false; + + guard(mutex)(&device->dev_set->lock); + return vfio_device_cdev_opened(device); }
static const struct liveupdate_file_ops vfio_pci_luo_fops = { + .prepare = vfio_pci_liveupdate_prepare, .retrieve = vfio_pci_liveupdate_retrieve, .can_preserve = vfio_pci_liveupdate_can_preserve, .owner = THIS_MODULE, diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 38c8e9350a60..4cb47c1564f4 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1386,7 +1386,7 @@ const struct file_operations vfio_device_fops = { #endif };
-static struct vfio_device *vfio_device_from_file(struct file *file) +struct vfio_device *vfio_device_from_file(struct file *file) { struct vfio_device_file *df = file->private_data;
@@ -1394,6 +1394,7 @@ static struct vfio_device *vfio_device_from_file(struct file *file) return NULL; return df->device; } +EXPORT_SYMBOL_GPL(vfio_device_from_file);
/** * vfio_file_is_valid - True if the file is valid vfio file diff --git a/include/linux/vfio.h b/include/linux/vfio.h index eb563f538dee..2443d24aa237 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -385,4 +385,6 @@ int vfio_virqfd_enable(void *opaque, int (*handler)(void *, void *), void vfio_virqfd_disable(struct virqfd **pvirqfd); void vfio_virqfd_flush_thread(struct virqfd **pvirqfd);
+struct vfio_device *vfio_device_from_file(struct file *file); + #endif /* VFIO_H */
Create a struct to serialize VFIO PCI data and preserve it using KHO. Provide physical address of the folio to Live Update Orchestrator (LUO) in prepare() callback so that LUO can give it back after kexec. Unpreserve and free the folio in cancel() callback.
Store PCI BDF value in the serialized data. BDF value is unique for each device on a host and remains same unless hardware or firmware is changed.
Preserving BDF value allows VFIO to find the PCI device which LUO wants to restore in retrieve() callback after kexec. In future patches, more meaningful data will be serialized to actually preserve working of the device.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/vfio/pci/vfio_pci_liveupdate.c | 54 +++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index 2ce2c11cb51c..3eb4895ce475 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -10,13 +10,64 @@ #include <linux/liveupdate.h> #include <linux/vfio.h> #include <linux/errno.h> +#include <linux/kexec_handover.h>
#include "vfio_pci_priv.h"
+struct vfio_pci_core_device_ser { + u16 bdf; +} __packed; + +static int vfio_pci_lu_serialize(struct vfio_pci_core_device *vdev, + struct vfio_pci_core_device_ser *ser) +{ + ser->bdf = pci_dev_id(vdev->pdev); + return 0; +} + static int vfio_pci_liveupdate_prepare(struct liveupdate_file_handler *handler, struct file *file, u64 *data) { - return -EOPNOTSUPP; + struct vfio_pci_core_device_ser *ser; + struct vfio_pci_core_device *vdev; + struct vfio_device *device; + struct folio *folio; + int err; + + device = vfio_device_from_file(file); + vdev = container_of(device, struct vfio_pci_core_device, vdev); + + folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, get_order(sizeof(*ser))); + if (!folio) + return -ENOMEM; + + ser = folio_address(folio); + + err = vfio_pci_lu_serialize(vdev, ser); + if (err) + goto err_free_folio; + + err = kho_preserve_folio(folio); + if (err) + goto err_free_folio; + + *data = virt_to_phys(ser); + + return 0; + +err_free_folio: + folio_put(folio); + return err; +} + +static void vfio_pci_liveupdate_cancel(struct liveupdate_file_handler *handler, + struct file *file, u64 data) +{ + struct vfio_pci_core_device_ser *ser = phys_to_virt(data); + struct folio *folio = virt_to_folio(ser); + + WARN_ON_ONCE(kho_unpreserve_folio(folio)); + folio_put(folio); }
static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_handler *handler, @@ -39,6 +90,7 @@ static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *han
static const struct liveupdate_file_ops vfio_pci_luo_fops = { .prepare = vfio_pci_liveupdate_prepare, + .cancel = vfio_pci_liveupdate_cancel, .retrieve = vfio_pci_liveupdate_retrieve, .can_preserve = vfio_pci_liveupdate_can_preserve, .owner = THIS_MODULE,
Retrieve VFIO device in the retrieve() callback of the LUO file handler. Deserialize the KHO data and search in the VFIO cdev class for device matching the BDF. Export needed functions from core VFIO module to others.
Create anonymous inode and file struct for the device. This is similar to how VFIO group returns VFIO device FD. This is different than VFIO cdev where cdev device is connected to inode and file on devtempfs.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/vfio/pci/vfio_pci_liveupdate.c | 67 +++++++++++++++++++++++++- drivers/vfio/vfio_main.c | 17 +++++++ include/linux/vfio.h | 6 +++ 3 files changed, 89 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index 3eb4895ce475..cb3ff097afbf 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -10,7 +10,9 @@ #include <linux/liveupdate.h> #include <linux/vfio.h> #include <linux/errno.h> +#include <linux/anon_inodes.h> #include <linux/kexec_handover.h> +#include <linux/file.h>
#include "vfio_pci_priv.h"
@@ -70,10 +72,73 @@ static void vfio_pci_liveupdate_cancel(struct liveupdate_file_handler *handler, folio_put(folio); }
+static int match_bdf(struct device *device, const void *bdf) +{ + struct vfio_device *core_vdev = + container_of(device, struct vfio_device, device); + struct vfio_pci_core_device *vdev = + container_of(core_vdev, struct vfio_pci_core_device, vdev); + + return *(u16 *)bdf == pci_dev_id(vdev->pdev); +} + static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_handler *handler, u64 data, struct file **file) { - return -EOPNOTSUPP; + struct vfio_pci_core_device_ser *ser; + struct vfio_device_file *df; + struct vfio_device *device; + struct folio *folio; + struct file *filep; + int err; + + folio = kho_restore_folio(data); + if (!folio) + return -ENOENT; + + ser = folio_address(folio); + device = vfio_find_device_in_cdev_class(&ser->bdf, match_bdf); + if (!device) + return -ENODEV; + + df = vfio_allocate_device_file(device); + if (IS_ERR(df)) { + err = PTR_ERR(df); + goto err_vfio_device_file; + } + + filep = anon_inode_getfile_fmode("[vfio-cdev]", &vfio_device_fops, df, + O_RDWR, FMODE_PREAD | FMODE_PWRITE); + if (IS_ERR(filep)) { + err = PTR_ERR(filep); + goto err_anon_inode; + } + + /* Paired with the put in vfio_device_fops_release() */ + if (!vfio_device_try_get_registration(device)) { + err = -ENODEV; + goto err_get_registration; + } + + put_device(&device->device); + + /* + * Use the pseudo fs inode on the device to link all mmaps + * to the same address space, allowing us to unmap all vmas + * associated to this device using unmap_mapping_range(). + */ + filep->f_mapping = device->inode->i_mapping; + *file = filep; + + return 0; + +err_get_registration: + fput(filep); +err_anon_inode: + kfree(df); +err_vfio_device_file: + put_device(&device->device); + return err; }
static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *handler, diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 4cb47c1564f4..90ecb3544f79 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -13,6 +13,7 @@ #include <linux/cdev.h> #include <linux/compat.h> #include <linux/device.h> +#include <linux/device/class.h> #include <linux/fs.h> #include <linux/idr.h> #include <linux/iommu.h> @@ -177,6 +178,7 @@ bool vfio_device_try_get_registration(struct vfio_device *device) { return refcount_inc_not_zero(&device->refcount); } +EXPORT_SYMBOL_GPL(vfio_device_try_get_registration);
/* * VFIO driver API @@ -502,6 +504,7 @@ vfio_allocate_device_file(struct vfio_device *device)
return df; } +EXPORT_SYMBOL_GPL(vfio_allocate_device_file);
static int vfio_df_device_first_open(struct vfio_device_file *df) { @@ -1385,6 +1388,7 @@ const struct file_operations vfio_device_fops = { .show_fdinfo = vfio_device_show_fdinfo, #endif }; +EXPORT_SYMBOL_GPL(vfio_device_fops);
struct vfio_device *vfio_device_from_file(struct file *file) { @@ -1716,6 +1720,19 @@ int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data, } EXPORT_SYMBOL(vfio_dma_rw);
+struct vfio_device *vfio_find_device_in_cdev_class(const void *data, + device_match_t match) +{ + struct device *device = class_find_device(vfio.device_class, NULL, data, + match); + + if (!device) + return NULL; + + return container_of(device, struct vfio_device, device); +} +EXPORT_SYMBOL_GPL(vfio_find_device_in_cdev_class); + /* * Module/class support */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 2443d24aa237..f98802facb24 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -386,5 +386,11 @@ void vfio_virqfd_disable(struct virqfd **pvirqfd); void vfio_virqfd_flush_thread(struct virqfd **pvirqfd);
struct vfio_device *vfio_device_from_file(struct file *file); +struct vfio_device *vfio_find_device_in_cdev_class(const void *data, + device_match_t match); +bool vfio_device_try_get_registration(struct vfio_device *device); +struct vfio_device_file *vfio_allocate_device_file(struct vfio_device *device); + +extern const struct file_operations vfio_device_fops;
#endif /* VFIO_H */
Add finish() callback implentation in LUO file handler to free restored folio. Reset the VFIO device if it is not reclaimed by userspace.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/vfio/pci/vfio_pci_liveupdate.c | 33 ++++++++++++++++++++++++++ 1 file changed, 33 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index cb3ff097afbf..8e0ee01127b3 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -82,6 +82,38 @@ static int match_bdf(struct device *device, const void *bdf) return *(u16 *)bdf == pci_dev_id(vdev->pdev); }
+static void vfio_pci_liveupdate_finish(struct liveupdate_file_handler *handler, + struct file *file, u64 data, bool reclaimed) +{ + struct vfio_pci_core_device_ser *ser; + struct vfio_pci_core_device *vdev; + struct vfio_device *device; + struct folio *folio; + + if (reclaimed) { + folio = virt_to_folio(phys_to_virt(data)); + goto out_folio_put; + } else { + folio = kho_restore_folio(data); + } + + if (!folio) + return; + + ser = folio_address(folio); + + device = vfio_find_device_in_cdev_class(&ser->bdf, match_bdf); + if (!device) + goto out_folio_put; + + vdev = container_of(device, struct vfio_pci_core_device, vdev); + pci_try_reset_function(vdev->pdev); + put_device(&device->device); + +out_folio_put: + folio_put(folio); +} + static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_handler *handler, u64 data, struct file **file) { @@ -156,6 +188,7 @@ static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *han static const struct liveupdate_file_ops vfio_pci_luo_fops = { .prepare = vfio_pci_liveupdate_prepare, .cancel = vfio_pci_liveupdate_cancel, + .finish = vfio_pci_liveupdate_finish, .retrieve = vfio_pci_liveupdate_retrieve, .can_preserve = vfio_pci_liveupdate_can_preserve, .owner = THIS_MODULE,
Add bit field 'skip_kexec_clear_master' to struct pci_dev{}. Skip clearing Bus Master Enable bit on PCI device during kexec reboot.
Devices preserved using live update might be performing a DMA transaction during kexec. Skipping clearing this bit allows a device to continue DMA while live update is in progress.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/pci/pci-driver.c | 6 ++++-- include/linux/pci.h | 2 ++ 2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 302d61783f6c..6aab358dc27a 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -513,11 +513,13 @@ static void pci_device_shutdown(struct device *dev) /* * If this is a kexec reboot, turn off Bus Master bit on the * device to tell it to not continue to do DMA. Don't touch - * devices in D3cold or unknown states. + * devices in D3cold or unknown states. Don't clear the bit + * if device has explicitly asked to skip it. * If it is not a kexec reboot, firmware will hit the PCI * devices with big hammer and stop their DMA any way. */ - if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) + if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot) && + !pci_dev->skip_kexec_clear_master) pci_clear_master(pci_dev); }
diff --git a/include/linux/pci.h b/include/linux/pci.h index d1fdf81fbe1e..8ce2d4528193 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -400,6 +400,8 @@ struct pci_dev { decoding during BAR sizing */ unsigned int wakeup_prepared:1; unsigned int skip_bus_pm:1; /* Internal: Skip bus-level PM */ + unsigned int skip_kexec_clear_master:1; /* Don't clear the Bus Master + Enable bit on kexec reboot */ unsigned int ignore_hotplug:1; /* Ignore hotplug events */ unsigned int hotplug_user_indicators:1; /* SlotCtl indicators controlled exclusively by
Set skip_kexec_clear_master on live update prepare() so that the device participating in live update can continue to perform DMA during kexec phase.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/vfio/pci/vfio_pci_liveupdate.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index 8e0ee01127b3..789b52665e35 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -54,6 +54,7 @@ static int vfio_pci_liveupdate_prepare(struct liveupdate_file_handler *handler, goto err_free_folio;
*data = virt_to_phys(ser); + vdev->pdev->skip_kexec_clear_master = true;
return 0;
@@ -67,7 +68,12 @@ static void vfio_pci_liveupdate_cancel(struct liveupdate_file_handler *handler, { struct vfio_pci_core_device_ser *ser = phys_to_virt(data); struct folio *folio = virt_to_folio(ser); + struct vfio_pci_core_device *vdev; + struct vfio_device *device;
+ device = vfio_device_from_file(file); + vdev = container_of(device, struct vfio_pci_core_device, vdev); + vdev->pdev->skip_kexec_clear_master = false; WARN_ON_ONCE(kho_unpreserve_folio(folio)); folio_put(folio); }
On Fri, Oct 17, 2025 at 05:07:03PM -0700, Vipin Sharma wrote:
Set skip_kexec_clear_master on live update prepare() so that the device participating in live update can continue to perform DMA during kexec phase.
Instead of introducing the skip_kexec_clear_master flag, could you introduce a function to check whether a device participates in live update and call that in pci_device_shutdown()?
I think that would be cleaner. Otherwise someone reading the code has to chase down the meaning of skip_kexec_clear_master, i.e. search for places where the bit is set.
When the device is unbound from vfio-pci, don't you have to clear the skip_kexec_clear_master flag? I'm not seeing this in your patches but maybe I'm missing something. That problem would solve itself if you follow the suggestion above.
Thanks,
Lukas
On 2025-10-18 09:09:06, Lukas Wunner wrote:
On Fri, Oct 17, 2025 at 05:07:03PM -0700, Vipin Sharma wrote:
Set skip_kexec_clear_master on live update prepare() so that the device participating in live update can continue to perform DMA during kexec phase.
Instead of introducing the skip_kexec_clear_master flag, could you introduce a function to check whether a device participates in live update and call that in pci_device_shutdown()?
I think that would be cleaner. Otherwise someone reading the code has to chase down the meaning of skip_kexec_clear_master, i.e. search for places where the bit is set.
That is one way to do it. In our internal implementation we have an API which checks for the device participation in the live update, similar to what you have suggested.
The PCI series posted by Chris [1] is providing a different way to know the live update particpation of device. There pci_dev has a new struct which contains particpation information.
In this VFIO series, my intention is to make minimal changes to PCI or any other subsystem. I opted for a simple variable to check what device should do during kexec reboot.
My hunch is that we will end up needing some state information in the struct pci_dev{} which denotes device participation and whatever that ends up being, we can use that here.
[1] https://lore.kernel.org/linux-pci/20250916-luo-pci-v2-0-c494053c3c08@kernel....
When the device is unbound from vfio-pci, don't you have to clear the skip_kexec_clear_master flag? I'm not seeing this in your patches but maybe I'm missing something. That problem would solve itself if you follow the suggestion above.
VFIO subsystem blocks removal from vfio-pci if there is still a reference to device (references are increased/decreased when device is opened/closed, check vfio_unregister_group_dev()). LUO also do fget on the VFIO FD which means we will not get closed callback on the VFIO FD until that reference is dropped besides the opened file in userspace.
So, prior to kexec, luo will drop reference only if live update cancel happens and that is the time we are resetting this flag in this patch series.
Store the restored serialized data in struct vfio_pci_core_device{}. Skip clearing the bus master bit on the restored VFIO devices when opened for the first time after live update reboot.
In the live update finish, clean up the pointer to the restored KHO data. Warn if the device open count is 0, which indicates that userspace might not have opened and restored the device.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/vfio/pci/vfio_pci_core.c | 8 ++++++-- drivers/vfio/pci/vfio_pci_liveupdate.c | 19 ++++++++++++++----- include/linux/vfio_pci_core.h | 1 + 3 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 0894673a9262..29236b015242 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -475,8 +475,12 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) return ret; }
- /* Don't allow our initial saved state to include busmaster */ - pci_clear_master(pdev); + /* + * Don't allow our initial saved state to include busmaster. However, if + * device is participating in liveupdate then don't change this bit. + */ + if (!vdev->liveupdate_restore) + pci_clear_master(pdev);
ret = pci_enable_device(pdev); if (ret) diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index 789b52665e35..6cc94d9a0386 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -96,12 +96,10 @@ static void vfio_pci_liveupdate_finish(struct liveupdate_file_handler *handler, struct vfio_device *device; struct folio *folio;
- if (reclaimed) { + if (reclaimed) folio = virt_to_folio(phys_to_virt(data)); - goto out_folio_put; - } else { + else folio = kho_restore_folio(data); - }
if (!folio) return; @@ -113,7 +111,14 @@ static void vfio_pci_liveupdate_finish(struct liveupdate_file_handler *handler, goto out_folio_put;
vdev = container_of(device, struct vfio_pci_core_device, vdev); - pci_try_reset_function(vdev->pdev); + if (reclaimed) { + guard(mutex)(&device->dev_set->lock); + if (!vfio_device_cdev_opened(device)) + pci_err(vdev->pdev, "Open count is 0, userspace might not have restored the device.\n"); + vdev->liveupdate_restore = NULL; + } else { + pci_try_reset_function(vdev->pdev); + } put_device(&device->device);
out_folio_put: @@ -124,6 +129,7 @@ static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_handler *handler, u64 data, struct file **file) { struct vfio_pci_core_device_ser *ser; + struct vfio_pci_core_device *vdev; struct vfio_device_file *df; struct vfio_device *device; struct folio *folio; @@ -167,6 +173,9 @@ static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_handler *handler, */ filep->f_mapping = device->inode->i_mapping; *file = filep; + vdev = container_of(device, struct vfio_pci_core_device, vdev); + guard(mutex)(&device->dev_set->lock); + vdev->liveupdate_restore = ser;
return 0;
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index f541044e42a2..8c3fe2db7eb3 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -94,6 +94,7 @@ struct vfio_pci_core_device { struct vfio_pci_core_device *sriov_pf_core_dev; struct notifier_block nb; struct rw_semaphore memory_lock; + void *liveupdate_restore; };
/* Will be exported for vfio pci drivers usage */
Save and restore vconfig, pci_config_map, and rbar members of the struct vfio_pci_core_device{} during live update. Use the max size of PCI config space i.e. 4096 bytes for storing vconfig and pci_config_map irrespective of the exact size. Store the current config size which is present in the struct pci_dev{} also, to know how much actual data is present in the vconfig and the pci_config_map.
vconfig represents virtual PCI config used by VFIO to virtualize certain bits of the config space in the PCI device. This should be preserved as those virtualized bits cannot be retrieved from reading hardware.
pci_config_map is used to identify starting point of a capability. This is not strictly needed to be preserved and can be recreated after kexec but saving it in kHO reduces the code change. Currently, pci_config_map is populated in the same code where vconfig gets initialized. If pci_config_map is not saved then a separate flow need to be added for just populating pci_config_map.
rbar is used to restore BARs after a reset. This value needs to be preserved as reset will lose this information.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/vfio/pci/vfio_pci_config.c | 17 ++++++++++++ drivers/vfio/pci/vfio_pci_liveupdate.c | 38 ++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 5 ++++ 3 files changed, 60 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 8f02f236b5b4..36a71fc3d526 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -1756,6 +1756,23 @@ int vfio_config_init(struct vfio_pci_core_device *vdev) vdev->pci_config_map = map; vdev->vconfig = vconfig;
+ if (vdev->liveupdate_restore) { + ret = vfio_pci_liveupdate_restore_config(vdev); + if (ret) + goto out; + /* + * Liveupdate might have started after userspace writes to BARs + * but before VFIO sanitizes them which happens when BARs are + * read next time. + * + * Assume BARs are dirty so that VFIO will sanitize them + * unconditionally next time and avoid giving userspace wrong + * value. + */ + vdev->bardirty = true; + return 0; + } + memset(map, PCI_CAP_ID_BASIC, PCI_STD_HEADER_SIZEOF); memset(map + PCI_STD_HEADER_SIZEOF, PCI_CAP_ID_INVALID, pdev->cfg_size - PCI_STD_HEADER_SIZEOF); diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index 6cc94d9a0386..824dba2750fe 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -18,12 +18,43 @@
struct vfio_pci_core_device_ser { u16 bdf; + u32 cfg_size; + u8 pci_config_map[PCI_CFG_SPACE_EXP_SIZE]; + u8 vconfig[PCI_CFG_SPACE_EXP_SIZE]; + u32 rbar[7]; } __packed;
+static int vfio_pci_liveupdate_deserialize_config(struct vfio_pci_core_device *vdev, + struct vfio_pci_core_device_ser *ser) +{ + struct pci_dev *pdev = vdev->pdev; + + if (WARN_ON_ONCE(pdev->cfg_size != ser->cfg_size)) { + dev_err(&pdev->dev, "Config size in serialized (%d) not matching the one pci_dev (%d)", + ser->cfg_size, pdev->cfg_size); + return -EINVAL; + } + + memcpy(vdev->pci_config_map, ser->pci_config_map, ser->cfg_size); + memcpy(vdev->vconfig, ser->vconfig, ser->cfg_size); + memcpy(vdev->rbar, ser->rbar, sizeof(vdev->rbar)); + return 0; +} + +static void vfio_pci_liveupdate_serialize_config(struct vfio_pci_core_device *vdev, + struct vfio_pci_core_device_ser *ser) +{ + ser->cfg_size = vdev->pdev->cfg_size; + memcpy(ser->pci_config_map, vdev->pci_config_map, ser->cfg_size); + memcpy(ser->vconfig, vdev->vconfig, ser->cfg_size); + memcpy(ser->rbar, vdev->rbar, sizeof(vdev->rbar)); +} + static int vfio_pci_lu_serialize(struct vfio_pci_core_device *vdev, struct vfio_pci_core_device_ser *ser) { ser->bdf = pci_dev_id(vdev->pdev); + vfio_pci_liveupdate_serialize_config(vdev, ser); return 0; }
@@ -221,3 +252,10 @@ void __init vfio_pci_liveupdate_init(void) if (err) pr_err("VFIO PCI liveupdate file handler register failed, error %d.\n", err); } + +int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_core_device_ser *ser = vdev->liveupdate_restore; + + return vfio_pci_liveupdate_deserialize_config(vdev, ser); +} diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index 7779fd744ff5..0d5aca6c2471 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -109,8 +109,13 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
#ifdef CONFIG_LIVEUPDATE void vfio_pci_liveupdate_init(void); +int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev); #else static inline void vfio_pci_liveupdate_init(void) { } +int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev) +{ + return -EINVAL; +} #endif /* CONFIG_LIVEUPDATE */
#endif
On 2025-10-17 17:07:05, Vipin Sharma wrote:
--- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -109,8 +109,13 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) #ifdef CONFIG_LIVEUPDATE void vfio_pci_liveupdate_init(void); +int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev); #else static inline void vfio_pci_liveupdate_init(void) { } +int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev)
This should be static inline
+{
- return -EINVAL;
+} #endif /* CONFIG_LIVEUPDATE */
#endif
2.51.0.858.gf9c4a03a3a-goog
Do not reset the device when a live update preserved VFIO PCI device is opened for the first time after kexec.
Save 'reset_works' to the device serialized state. If not saved then this value can only be restored by performing an actual reset, which is not desired during live update. If a device can be reset before live update then most likely it can be reset after live update unless some reset methods have been removed. In that case when actual reset is tried it will return an error.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/vfio/pci/vfio_pci_core.c | 15 ++++++++++----- drivers/vfio/pci/vfio_pci_liveupdate.c | 9 +++++++++ drivers/vfio/pci/vfio_pci_priv.h | 2 ++ 3 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 29236b015242..186a669b68a4 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -486,12 +486,17 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) if (ret) goto out_power;
- /* If reset fails because of the device lock, fail this path entirely */ - ret = pci_try_reset_function(pdev); - if (ret == -EAGAIN) - goto out_disable_device; + if (vdev->liveupdate_restore) { + vfio_pci_liveupdate_restore_device(vdev); + } else { + /* If reset fails because of the device lock, fail this path entirely */ + ret = pci_try_reset_function(pdev); + if (ret == -EAGAIN) + goto out_disable_device; + + vdev->reset_works = !ret; + }
- vdev->reset_works = !ret; pci_save_state(pdev); vdev->pci_saved_state = pci_store_saved_state(pdev); if (!vdev->pci_saved_state) diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index 824dba2750fe..82ff9f178fdc 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -22,6 +22,7 @@ struct vfio_pci_core_device_ser { u8 pci_config_map[PCI_CFG_SPACE_EXP_SIZE]; u8 vconfig[PCI_CFG_SPACE_EXP_SIZE]; u32 rbar[7]; + u8 reset_works; } __packed;
static int vfio_pci_liveupdate_deserialize_config(struct vfio_pci_core_device *vdev, @@ -55,6 +56,7 @@ static int vfio_pci_lu_serialize(struct vfio_pci_core_device *vdev, { ser->bdf = pci_dev_id(vdev->pdev); vfio_pci_liveupdate_serialize_config(vdev, ser); + ser->reset_works = vdev->reset_works; return 0; }
@@ -259,3 +261,10 @@ int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev)
return vfio_pci_liveupdate_deserialize_config(vdev, ser); } + +void vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_core_device_ser *ser = vdev->liveupdate_restore; + + vdev->reset_works = ser->reset_works; +} diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index 0d5aca6c2471..ee1c7c229020 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -110,12 +110,14 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) #ifdef CONFIG_LIVEUPDATE void vfio_pci_liveupdate_init(void); int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev); +void vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev); #else static inline void vfio_pci_liveupdate_init(void) { } int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev) { return -EINVAL; } +void vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev) { } #endif /* CONFIG_LIVEUPDATE */
#endif
Move struct pci_saved_state{} and struct pci_cap_saved_data{} to linux/pci.h so that they are available to code outside of the PCI core.
These structs will be used in subsequent commits to serialize and deserialize PCI state across Live Update.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/pci/pci.c | 5 ----- drivers/pci/pci.h | 7 ------- include/linux/pci.h | 13 +++++++++++++ 3 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index b14dd064006c..b68bf3e820ce 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1884,11 +1884,6 @@ void pci_restore_state(struct pci_dev *dev) } EXPORT_SYMBOL(pci_restore_state);
-struct pci_saved_state { - u32 config_space[16]; - struct pci_cap_saved_data cap[]; -}; - /** * pci_store_saved_state - Allocate and return an opaque struct containing * the device saved state. diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 09476a467cc0..973fcdf7898d 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -197,13 +197,6 @@ int pci_bridge_secondary_bus_reset(struct pci_dev *dev); int pci_bus_error_reset(struct pci_dev *dev); int __pci_reset_bus(struct pci_bus *bus);
-struct pci_cap_saved_data { - u16 cap_nr; - bool cap_extended; - unsigned int size; - u32 data[]; -}; - struct pci_cap_saved_state { struct hlist_node next; struct pci_cap_saved_data cap; diff --git a/include/linux/pci.h b/include/linux/pci.h index 8ce2d4528193..70c9b12c8c02 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1448,6 +1448,19 @@ void pci_disable_rom(struct pci_dev *pdev); void __iomem __must_check *pci_map_rom(struct pci_dev *pdev, size_t *size); void pci_unmap_rom(struct pci_dev *pdev, void __iomem *rom);
+ +struct pci_cap_saved_data { + u16 cap_nr; + bool cap_extended; + unsigned int size; + u32 data[]; +}; + +struct pci_saved_state { + u32 config_space[16]; + struct pci_cap_saved_data cap[]; +}; + /* Power management related routines */ int pci_save_state(struct pci_dev *dev); void pci_restore_state(struct pci_dev *dev);
On Fri, Oct 17, 2025 at 05:07:07PM -0700, Vipin Sharma wrote:
Move struct pci_saved_state{} and struct pci_cap_saved_data{} to linux/pci.h so that they are available to code outside of the PCI core.
These structs will be used in subsequent commits to serialize and deserialize PCI state across Live Update.
That's not sufficient as a justification to make these public in my view.
There are already pci_store_saved_state() and pci_load_saved_state() helpers to serialize PCI state. Why do you need anything more? (Honest question.)
Thanks,
Lukas
On 2025-10-18 09:17:33, Lukas Wunner wrote:
On Fri, Oct 17, 2025 at 05:07:07PM -0700, Vipin Sharma wrote:
Move struct pci_saved_state{} and struct pci_cap_saved_data{} to linux/pci.h so that they are available to code outside of the PCI core.
These structs will be used in subsequent commits to serialize and deserialize PCI state across Live Update.
That's not sufficient as a justification to make these public in my view.
There are already pci_store_saved_state() and pci_load_saved_state() helpers to serialize PCI state. Why do you need anything more? (Honest question.)
In LUO ecosystem, currently, we do not have a solid solution to do proper serialization/deserialization of structs along with versioning between different kernel versions. This work is still being discussed.
Here, I created separate structs (exactly same as the original one) to have little bit control on what gets saved in serialized state and correctly gets deserialized after kexec.
For example, if I am using existing structs and not creating my own structs then I cannot just do a blind memcpy() between whole of the PCI state prior to kexec to PCI state after the kexec. In the new kernel layout might have changed like addition or removal of a field.
Having __packed in my version of struct, I can build validation like hardcoded offset of members. I can add version number (not added in this series) for checking compatbility in the struct for serialization and deserialization. Overall, it is providing some freedom to how to pass data to next kernel without changing or modifying the PCI state structs.
On Sat, Oct 18, 2025 at 03:36:20PM -0700, Vipin Sharma wrote:
Having __packed in my version of struct, I can build validation like hardcoded offset of members. I can add version number (not added in this series) for checking compatbility in the struct for serialization and deserialization. Overall, it is providing some freedom to how to pass data to next kernel without changing or modifying the PCI state structs.
I keep saying this, and this series really strongly shows why, we need to have a dedicated header directroy for LUO "ABI" structs. Putting this random struct in some random header and then declaring it is part of the luo ABI is really bad.
All the information in the abi headers needs to have detailed comments explaining what it is and so on so people can evaluate if it is suitable or not.
But, it is also not clear why pci serialization structs should leak out of the PCI layer.
The design of luo was to allow each layer to contribute its own tags/etc to the serialization so there is no reason to have vfio piggback on pci structs or something.
Jason
On Sat, Oct 18, 2025 at 03:36:20PM -0700, Vipin Sharma wrote:
On 2025-10-18 09:17:33, Lukas Wunner wrote:
On Fri, Oct 17, 2025 at 05:07:07PM -0700, Vipin Sharma wrote:
Move struct pci_saved_state{} and struct pci_cap_saved_data{} to linux/pci.h so that they are available to code outside of the PCI core.
These structs will be used in subsequent commits to serialize and deserialize PCI state across Live Update.
That's not sufficient as a justification to make these public in my view.
There are already pci_store_saved_state() and pci_load_saved_state() helpers to serialize PCI state. Why do you need anything more? (Honest question.)
In LUO ecosystem, currently, we do not have a solid solution to do proper serialization/deserialization of structs along with versioning between different kernel versions. This work is still being discussed.
Here, I created separate structs (exactly same as the original one) to have little bit control on what gets saved in serialized state and correctly gets deserialized after kexec.
For example, if I am using existing structs and not creating my own structs then I cannot just do a blind memcpy() between whole of the PCI state prior to kexec to PCI state after the kexec. In the new kernel layout might have changed like addition or removal of a field.
The last time we changed those structs was in 2013 by fd0f7f73ca96. So changes are extremely rare.
What could change in theory is the layout of the individual capabilities (the data[] in struct pci_cap_saved_data). E.g. maybe we decide that we need to save an additional register. But that's also rare. Normally we add all the mutable registers when a new capability is supported and have no need to amend that afterwards.
So I think you're preparing for an eventuality that's very unlikely to happen. Question is whether that justifies the additional complexity and duplication. (Probably not.)
Note that struct pci_cap_saved_state was made private in 2021 by f0ab00174eb7. We try to prevent other subsystems or drivers fiddling with structures internal to the PCI core. For LUO to find acceptance, it needs to respect subsystems' desire to keep private what's private and it needs to be as non-intrusive as possible. If necessary, helpers needed by LUO (e.g. to determine the size of saved PCI state) should probably live in the PCI core and be #ifdef'ed to LUO being enabled.
Thanks,
Lukas
Save and restore the PCI state of the VFIO device which in the normal flow is recorded by VFIO when the device FD is opened for the first time and then reapplied to PCI device when the last opened device FD is closed.
Introduce "_ser" version of the struct pci_saved_state{} and struct pci_cap_saved_data{} to serialized saved PCI state for liveupdate. Store PCI state in VFIO in a separate folio as the size is indeterministic at build time to reserve space in struct vfio_pci_core_device_ser{}.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/vfio/pci/vfio_pci_core.c | 9 +- drivers/vfio/pci/vfio_pci_liveupdate.c | 176 ++++++++++++++++++++++++- drivers/vfio/pci/vfio_pci_priv.h | 8 +- 3 files changed, 187 insertions(+), 6 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 186a669b68a4..44ea3ac8da16 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -487,7 +487,9 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) goto out_power;
if (vdev->liveupdate_restore) { - vfio_pci_liveupdate_restore_device(vdev); + ret = vfio_pci_liveupdate_restore_device(vdev); + if (ret) + goto out_disable_device; } else { /* If reset fails because of the device lock, fail this path entirely */ ret = pci_try_reset_function(pdev); @@ -495,10 +497,11 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) goto out_disable_device;
vdev->reset_works = !ret; + + pci_save_state(pdev); + vdev->pci_saved_state = pci_store_saved_state(pdev); }
- pci_save_state(pdev); - vdev->pci_saved_state = pci_store_saved_state(pdev); if (!vdev->pci_saved_state) pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__);
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index 82ff9f178fdc..caef023d007a 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -13,9 +13,22 @@ #include <linux/anon_inodes.h> #include <linux/kexec_handover.h> #include <linux/file.h> +#include <linux/pci.h>
#include "vfio_pci_priv.h"
+struct pci_cap_saved_data_ser { + u16 cap_nr; + bool cap_extended; + unsigned int size; + u32 data[]; +} __packed; + +struct pci_saved_state_ser { + u32 config_space[16]; + struct pci_cap_saved_data_ser cap[]; +} __packed; + struct vfio_pci_core_device_ser { u16 bdf; u32 cfg_size; @@ -23,6 +36,7 @@ struct vfio_pci_core_device_ser { u8 vconfig[PCI_CFG_SPACE_EXP_SIZE]; u32 rbar[7]; u8 reset_works; + u64 pci_saved_state_phys; } __packed;
static int vfio_pci_liveupdate_deserialize_config(struct vfio_pci_core_device *vdev, @@ -51,12 +65,150 @@ static void vfio_pci_liveupdate_serialize_config(struct vfio_pci_core_device *vd memcpy(ser->rbar, vdev->rbar, sizeof(vdev->rbar)); }
+static size_t pci_saved_state_size(struct pci_saved_state *state) +{ + struct pci_cap_saved_data *cap; + size_t size; + + /* One empty cap to denote end. */ + size = sizeof(struct pci_saved_state) + sizeof(struct pci_cap_saved_data); + + cap = state->cap; + while (cap->size) { + size_t len = sizeof(struct pci_cap_saved_data) + cap->size; + + size += len; + cap = (struct pci_cap_saved_data *)((u8 *)cap + len); + } + + return size; +} + +static size_t pci_saved_state_size_from_ser(struct pci_saved_state_ser *state) +{ + struct pci_cap_saved_data_ser *cap; + size_t size; + + /* One empty cap to denote end. */ + size = sizeof(struct pci_saved_state) + sizeof(struct pci_cap_saved_data); + + cap = state->cap; + while (cap->size) { + size_t len = sizeof(struct pci_cap_saved_data) + cap->size; + + size += len; + cap = (struct pci_cap_saved_data_ser *)((u8 *)cap + len); + } + + return size; +} + +static void serialize_pci_cap_saved_data(struct pci_saved_state *state, + struct pci_saved_state_ser *state_ser) +{ + struct pci_cap_saved_data_ser *cap_ser = state_ser->cap; + struct pci_cap_saved_data *cap = state->cap; + + while (cap->size) { + cap_ser->cap_nr = cap->cap_nr; + cap_ser->cap_extended = cap->cap_extended; + cap_ser->size = cap->size; + memcpy(cap_ser->data, cap->data, cap_ser->size); + + cap = (void *)cap + sizeof(*cap) + cap->size; + cap_ser = (void *)cap_ser + sizeof(*cap_ser) + cap_ser->size; + } +} + +static void deserialize_pci_cap_saved_data(struct pci_saved_state *state, + struct pci_saved_state_ser *state_ser) +{ + struct pci_cap_saved_data_ser *cap_ser = state_ser->cap; + struct pci_cap_saved_data *cap = state->cap; + + while (cap_ser->size) { + cap->cap_nr = cap_ser->cap_nr; + cap->cap_extended = cap_ser->cap_extended; + cap->size = cap_ser->size; + memcpy(cap->data, cap_ser->data, cap_ser->size); + + cap = (void *)cap + sizeof(*cap) + cap->size; + cap_ser = (void *)cap_ser + sizeof(*cap_ser) + cap_ser->size; + } +} + +static int serialize_pci_saved_state(struct vfio_pci_core_device *vdev, + struct vfio_pci_core_device_ser *ser) +{ + struct pci_saved_state *state = vdev->pci_saved_state; + struct pci_saved_state_ser *state_ser; + struct folio *folio; + size_t size; + int ret; + + if (!state) + return 0; + + size = pci_saved_state_size(state); + + folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, get_order(size)); + if (!folio) + return -ENOMEM; + + state_ser = folio_address(folio); + + memcpy(state_ser->config_space, state->config_space, + sizeof(state_ser->config_space)); + + serialize_pci_cap_saved_data(state, state_ser); + + ret = kho_preserve_folio(folio); + if (ret) { + folio_put(folio); + return ret; + } + + ser->pci_saved_state_phys = virt_to_phys(state_ser); + + return 0; +} + +static int deserialize_pci_saved_state(struct vfio_pci_core_device *vdev, + struct vfio_pci_core_device_ser *ser) +{ + struct pci_saved_state_ser *state_ser; + struct pci_saved_state *state; + size_t size; + + if (!ser->pci_saved_state_phys) + return 0; + + state_ser = phys_to_virt(ser->pci_saved_state_phys); + size = pci_saved_state_size_from_ser(state_ser); + state = kzalloc(size, GFP_KERNEL); + if (!state) + return -ENOMEM; + + memcpy(state->config_space, state_ser->config_space, + sizeof(state_ser->config_space)); + + deserialize_pci_cap_saved_data(state, state_ser); + vdev->pci_saved_state = state; + return 0; +} + static int vfio_pci_lu_serialize(struct vfio_pci_core_device *vdev, struct vfio_pci_core_device_ser *ser) { + int err; + ser->bdf = pci_dev_id(vdev->pdev); vfio_pci_liveupdate_serialize_config(vdev, ser); ser->reset_works = vdev->reset_works; + err = serialize_pci_saved_state(vdev, ser); + if (err) + return err; + return 0; }
@@ -101,12 +253,18 @@ static void vfio_pci_liveupdate_cancel(struct liveupdate_file_handler *handler, { struct vfio_pci_core_device_ser *ser = phys_to_virt(data); struct folio *folio = virt_to_folio(ser); + struct folio *pci_saved_state_folio; struct vfio_pci_core_device *vdev; struct vfio_device *device;
device = vfio_device_from_file(file); vdev = container_of(device, struct vfio_pci_core_device, vdev); vdev->pdev->skip_kexec_clear_master = false; + if (ser->pci_saved_state_phys) { + pci_saved_state_folio = virt_to_folio(phys_to_virt(ser->pci_saved_state_phys)); + WARN_ON_ONCE(kho_unpreserve_folio(pci_saved_state_folio)); + folio_put(pci_saved_state_folio); + } WARN_ON_ONCE(kho_unpreserve_folio(folio)); folio_put(folio); } @@ -139,6 +297,9 @@ static void vfio_pci_liveupdate_finish(struct liveupdate_file_handler *handler,
ser = folio_address(folio);
+ if (!reclaimed && ser->pci_saved_state_phys) + kho_restore_folio(ser->pci_saved_state_phys); + device = vfio_find_device_in_cdev_class(&ser->bdf, match_bdf); if (!device) goto out_folio_put; @@ -155,6 +316,8 @@ static void vfio_pci_liveupdate_finish(struct liveupdate_file_handler *handler, put_device(&device->device);
out_folio_put: + if (ser->pci_saved_state_phys) + folio_put(virt_to_folio(phys_to_virt(ser->pci_saved_state_phys))); folio_put(folio); }
@@ -174,6 +337,11 @@ static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_handler *handler, return -ENOENT;
ser = folio_address(folio); + if (ser->pci_saved_state_phys) { + if (!kho_restore_folio(ser->pci_saved_state_phys)) + return -ENOENT; + } + device = vfio_find_device_in_cdev_class(&ser->bdf, match_bdf); if (!device) return -ENODEV; @@ -262,9 +430,15 @@ int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev) return vfio_pci_liveupdate_deserialize_config(vdev, ser); }
-void vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev) +int vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev) { struct vfio_pci_core_device_ser *ser = vdev->liveupdate_restore; + int err; + + err = deserialize_pci_saved_state(vdev, ser); + if (err) + return err;
vdev->reset_works = ser->reset_works; + return 0; } diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index ee1c7c229020..9d692e4d0cf7 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -110,14 +110,18 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) #ifdef CONFIG_LIVEUPDATE void vfio_pci_liveupdate_init(void); int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev); -void vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev); +int vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev); #else static inline void vfio_pci_liveupdate_init(void) { } int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev) { return -EINVAL; } -void vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev) { } +int vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev) +{ + return -EOPNOTSUPP; +} + #endif /* CONFIG_LIVEUPDATE */
#endif
On Fri, Oct 17, 2025 at 05:07:08PM -0700, Vipin Sharma wrote:
Save and restore the PCI state of the VFIO device which in the normal flow is recorded by VFIO when the device FD is opened for the first time and then reapplied to PCI device when the last opened device FD is closed.
Introduce "_ser" version of the struct pci_saved_state{} and struct pci_cap_saved_data{} to serialized saved PCI state for liveupdate. Store PCI state in VFIO in a separate folio as the size is indeterministic at build time to reserve space in struct vfio_pci_core_device_ser{}.
Unfortunately this commit message is of the type "summarize the code changes without explaining the reason for these changes".
Comparing the pci_saved_state_ser and pci_cap_saved_data_ser structs which you're introducing here with the existing pci_saved_state and pci_cap_saved_data structs, the only difference seems to be that you're adding __packed to your new structs. Is that all? Is that the only reason why these structs need to be duplicated? Maybe it would make more sense to add __packed to the existing structs, though the gain seems minimal.
Thanks,
Lukas
On 2025-10-18 09:25:30, Lukas Wunner wrote:
On Fri, Oct 17, 2025 at 05:07:08PM -0700, Vipin Sharma wrote:
Save and restore the PCI state of the VFIO device which in the normal flow is recorded by VFIO when the device FD is opened for the first time and then reapplied to PCI device when the last opened device FD is closed.
Introduce "_ser" version of the struct pci_saved_state{} and struct pci_cap_saved_data{} to serialized saved PCI state for liveupdate. Store PCI state in VFIO in a separate folio as the size is indeterministic at build time to reserve space in struct vfio_pci_core_device_ser{}.
Unfortunately this commit message is of the type "summarize the code changes without explaining the reason for these changes".
Comparing the pci_saved_state_ser and pci_cap_saved_data_ser structs which you're introducing here with the existing pci_saved_state and pci_cap_saved_data structs, the only difference seems to be that you're adding __packed to your new structs. Is that all? Is that the only reason why these structs need to be duplicated? Maybe it would make more sense to add __packed to the existing structs, though the gain seems minimal.
It allows (in future) to build more validation and compatibility between layout changes of struct across kernel version. We can add more fields in the *_ser version which can act as metadata to support in deserialization.
I do agree in the current form (with the assumption of no layout changes) we can get away with using the existing structs. I also think this should be taken care by PCI series instead of VFIO series.
Lets see what others also think, I am open to not adding these *_ser structs if we should wait for a proper support for struct serialization and work under assumption that these won't change.
On 2025-10-17 17:07:08, Vipin Sharma wrote:
--- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -110,14 +110,18 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) #ifdef CONFIG_LIVEUPDATE void vfio_pci_liveupdate_init(void); int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev); -void vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev); +int vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev); #else static inline void vfio_pci_liveupdate_init(void) { } int vfio_pci_liveupdate_restore_config(struct vfio_pci_core_device *vdev) { return -EINVAL; } -void vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev) { } +int vfio_pci_liveupdate_restore_device(struct vfio_pci_core_device *vdev) +{
- return -EOPNOTSUPP;
+}
This should also be static inline.
#endif /* CONFIG_LIVEUPDATE */
#endif
2.51.0.858.gf9c4a03a3a-goog
Disable VFIO interrupts configured on device during live update freeze callback. As there is no way for those interrupts to be handled during kexec, better stop the interrupts and let userspace reconfigure them after kexec.
Signed-off-by: Vipin Sharma vipinsh@google.com --- drivers/vfio/pci/vfio_pci_liveupdate.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c index caef023d007a..5d786ace6bde 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -248,6 +248,22 @@ static int vfio_pci_liveupdate_prepare(struct liveupdate_file_handler *handler, return err; }
+static int vfio_pci_liveupdate_freeze(struct liveupdate_file_handler *handler, + struct file *file, u64 *data) +{ + struct vfio_pci_core_device *vdev; + struct vfio_device *device; + + device = vfio_device_from_file(file); + vdev = container_of(device, struct vfio_pci_core_device, vdev); + + guard(mutex)(&vdev->igate); + if (vdev->irq_type == VFIO_PCI_NUM_IRQS) + return 0; + return vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER, + vdev->irq_type, 0, 0, NULL); +} + static void vfio_pci_liveupdate_cancel(struct liveupdate_file_handler *handler, struct file *file, u64 data) { @@ -403,6 +419,7 @@ static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *han
static const struct liveupdate_file_ops vfio_pci_luo_fops = { .prepare = vfio_pci_liveupdate_prepare, + .freeze = vfio_pci_liveupdate_freeze, .cancel = vfio_pci_liveupdate_cancel, .finish = vfio_pci_liveupdate_finish, .retrieve = vfio_pci_liveupdate_retrieve,
Import and build liveupdate selftest library in VFIO selftests.
It allows to use liveupdate ioctls in VFIO selftests
Signed-off-by: Vipin Sharma vipinsh@google.com --- tools/testing/selftests/vfio/Makefile | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile index 324ba0175a33..c7f271884cb4 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -6,16 +6,24 @@ TEST_GEN_PROGS += vfio_pci_driver_test TEST_PROGS_EXTENDED := run.sh include ../lib.mk include lib/libvfio.mk +include ../liveupdate/lib/libliveupdate.mk
CFLAGS += -I$(top_srcdir)/tools/include CFLAGS += -MD CFLAGS += $(EXTRA_CFLAGS)
-$(TEST_GEN_PROGS): %: %.o $(LIBVFIO_O) - $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $< $(LIBVFIO_O) $(LDLIBS) -o $@ +LIBS_O := $(LIBVFIO_O) +LIBS_O += $(LIBLIVEUPDATE_O) + +TEST_GEN_ALL_PROGS := $(TEST_GEN_PROGS) +TEST_GEN_ALL_PROGS += $(TEST_GEN_PROGS_EXTENDED) + +$(TEST_GEN_ALL_PROGS): %: %.o $(LIBS_O) + $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $< $(LIBS_O) $(LDLIBS) -o $@
TEST_GEN_PROGS_O = $(patsubst %, %.o, $(TEST_GEN_PROGS)) -TEST_DEP_FILES = $(patsubst %.o, %.d, $(TEST_GEN_PROGS_O) $(LIBVFIO_O)) +TEST_GEN_PROGS_O += $(patsubst %, %.o, $(TEST_GEN_PROGS_EXTENDED)) +TEST_DEP_FILES = $(patsubst %.o, %.d, $(TEST_GEN_PROGS_O) $(LIBS_O)) -include $(TEST_DEP_FILES)
EXTRA_CLEAN += $(TEST_GEN_PROGS_O) $(TEST_DEP_FILES)
Use the given VFIO cdev FD to initialize vfio_pci_device in VFIO selftests. Add the assertion to make sure that passed cdev FD is not used with legacy VFIO APIs. If VFIO cdev FD is provided then do not open the device instead use the FD for any interaction with the device.
This API will allow to write selftests where VFIO device FD is preserved using liveupdate and retrieved later using liveupdate ioctl after kexec.
Signed-off-by: Vipin Sharma vipinsh@google.com --- .../selftests/vfio/lib/include/vfio_util.h | 1 + .../selftests/vfio/lib/vfio_pci_device.c | 33 +++++++++++++++---- 2 files changed, 28 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index ed31606e01b7..8ec60a62a0d1 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -203,6 +203,7 @@ const char *vfio_pci_get_cdev_path(const char *bdf); extern const char *default_iommu_mode;
struct vfio_pci_device *vfio_pci_device_init(const char *bdf, const char *iommu_mode); +struct vfio_pci_device *vfio_pci_device_init_fd(int vfio_cdev_fd); void vfio_pci_device_cleanup(struct vfio_pci_device *device); void vfio_pci_device_reset(struct vfio_pci_device *device);
diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index 0921b2451ba5..cab9c74d2de8 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -486,13 +486,18 @@ static void vfio_device_attach_iommufd_pt(int device_fd, u32 pt_id) ioctl_assert(device_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &args); }
-static void vfio_pci_iommufd_setup(struct vfio_pci_device *device, const char *bdf) +static void vfio_pci_iommufd_setup(struct vfio_pci_device *device, + const char *bdf, int vfio_cdev_fd) { - const char *cdev_path = vfio_pci_get_cdev_path(bdf);
- device->fd = open(cdev_path, O_RDWR); + if (vfio_cdev_fd > 0) { + device->fd = vfio_cdev_fd; + } else { + const char *cdev_path = vfio_pci_get_cdev_path(bdf); + device->fd = open(cdev_path, O_RDWR); + free((void *)cdev_path); + } VFIO_ASSERT_GE(device->fd, 0); - free((void *)cdev_path);
/* * Require device->iommufd to be >0 so that a simple non-0 check can be @@ -507,7 +512,9 @@ static void vfio_pci_iommufd_setup(struct vfio_pci_device *device, const char *b vfio_device_attach_iommufd_pt(device->fd, device->ioas_id); }
-struct vfio_pci_device *vfio_pci_device_init(const char *bdf, const char *iommu_mode) +struct vfio_pci_device *__vfio_pci_device_init(const char *bdf, + const char *iommu_mode, + int vfio_cdev_fd) { struct vfio_pci_device *device;
@@ -518,10 +525,13 @@ struct vfio_pci_device *vfio_pci_device_init(const char *bdf, const char *iommu_
device->iommu_mode = lookup_iommu_mode(iommu_mode);
+ VFIO_ASSERT_FALSE(device->iommu_mode->container_path != NULL && vfio_cdev_fd > 0, + "Provide either container path or VFIO cdev FD, not both.\n"); + if (device->iommu_mode->container_path) vfio_pci_container_setup(device, bdf); else - vfio_pci_iommufd_setup(device, bdf); + vfio_pci_iommufd_setup(device, bdf, vfio_cdev_fd);
vfio_pci_device_setup(device); vfio_pci_driver_probe(device); @@ -529,6 +539,17 @@ struct vfio_pci_device *vfio_pci_device_init(const char *bdf, const char *iommu_ return device; }
+struct vfio_pci_device *vfio_pci_device_init(const char *bdf, + const char *iommu_mode) +{ + return __vfio_pci_device_init(bdf, iommu_mode, -1); +} + +struct vfio_pci_device *vfio_pci_device_init_fd(int vfio_cdev_fd) +{ + return __vfio_pci_device_init(NULL, "iommufd", vfio_cdev_fd); +} + void vfio_pci_device_cleanup(struct vfio_pci_device *device) { int i;
Write a test to exercise VFIO live update support on the passed device BDF. Provide different behavior of the test based on host live update state (NORMAL or UPDATED).
When test is executed in NORMAL state, initialize a VFIO PCI device and enable its Bus Master Enable bit by writing to PCI command register. Create a live update session, and pass the VFIO device FD to it for preservation. Preserve the session and then send the global live update prepare event. If everything is fine up to this point, then reboot the kernel using kexec.
When test is executed in UPDATED state, retrieve the session from Live Update Orchestrator, restore the VFIO FD from the session. Use the restored FD to initialize vfio_pci_device in selftest. Move the host to NORMAL state and verify if the Bus Master Enable bit is still enabled on the VFIO device.
Test will not be auto run, therefore, only build this test and let the user run the test manually with the command:
./run.sh -d 0000:6a:01.0 ./vfio_pci_liveupdate_test
Signed-off-by: Vipin Sharma vipinsh@google.com --- tools/testing/selftests/vfio/Makefile | 1 + .../selftests/vfio/vfio_pci_liveupdate_test.c | 106 ++++++++++++++++++ 2 files changed, 107 insertions(+) create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_test.c
diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile index c7f271884cb4..949b7fcc091e 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -3,6 +3,7 @@ TEST_GEN_PROGS += vfio_dma_mapping_test TEST_GEN_PROGS += vfio_iommufd_setup_test TEST_GEN_PROGS += vfio_pci_device_test TEST_GEN_PROGS += vfio_pci_driver_test +TEST_GEN_PROGS_EXTENDED += vfio_pci_liveupdate_test TEST_PROGS_EXTENDED := run.sh include ../lib.mk include lib/libvfio.mk diff --git a/tools/testing/selftests/vfio/vfio_pci_liveupdate_test.c b/tools/testing/selftests/vfio/vfio_pci_liveupdate_test.c new file mode 100644 index 000000000000..9fd0061348e0 --- /dev/null +++ b/tools/testing/selftests/vfio/vfio_pci_liveupdate_test.c @@ -0,0 +1,106 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Copyright (c) 2025, Google LLC. + * Vipin Sharma vipinsh@google.com + */ + +#include <linux/liveupdate.h> +#include <liveupdate_util.h> +#include <vfio_util.h> +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <sys/ioctl.h> + +#define SESSION_NAME "multi_file_session" +#define TOKEN 1234 + +static void run_pre_kexec(int luo_fd, const char *bdf) +{ + struct vfio_pci_device *device; + int session_fd; + u16 command; + + device = vfio_pci_device_init(bdf, "iommufd"); + + command = vfio_pci_config_readw(device, PCI_COMMAND); + VFIO_ASSERT_FALSE(command & PCI_COMMAND_MASTER); + + vfio_pci_config_writew(device, PCI_COMMAND, + command | PCI_COMMAND_MASTER); + + session_fd = luo_create_session(luo_fd, SESSION_NAME); + VFIO_ASSERT_GE(session_fd, 0, "Failed to create session %s", + SESSION_NAME); + VFIO_ASSERT_EQ(luo_session_preserve_fd(session_fd, device->fd, TOKEN), + 0, "Failed to preserve VFIO device"); + VFIO_ASSERT_EQ(luo_set_global_event(luo_fd, LIVEUPDATE_PREPARE), 0, + "Failed to set global PREPARE event"); + + VFIO_ASSERT_EQ(system(KEXEC_SCRIPT), 0, "kexec script failed"); + + sleep(10); /* Should not be reached */ + vfio_pci_device_cleanup(device); + exit(EXIT_FAILURE); +} + +static void run_post_kexec(int luo_fd, const char *bdf) +{ + int session_fd; + int vfio_fd; + struct vfio_pci_device *device; + u16 command; + + + session_fd = luo_retrieve_session(luo_fd, SESSION_NAME); + VFIO_ASSERT_GE(session_fd, 0, "Failed to retrieve session %s", + SESSION_NAME); + + vfio_fd = luo_session_restore_fd(session_fd, TOKEN); + if (vfio_fd < 0) { + printf("Failed to restore VFIO device, error %d", vfio_fd); + exit(1); + } + + device = vfio_pci_device_init_fd(vfio_fd); + + if (luo_set_global_event(luo_fd, LIVEUPDATE_FINISH) < 0) { + printf("Failed to set global FINISH event"); + exit(1); + } + + close(session_fd); + + command = vfio_pci_config_readw(device, PCI_COMMAND); + VFIO_ASSERT_TRUE(command & PCI_COMMAND_MASTER); + vfio_pci_device_cleanup(device); +} + +int main(int argc, char *argv[]) +{ + enum liveupdate_state state; + const char *device_bdf; + int luo_fd; + + device_bdf = vfio_selftests_get_bdf(&argc, argv); + + luo_fd = luo_open_device(); + VFIO_ASSERT_GE(luo_fd, 0, "Failed to open %s", LUO_DEVICE); + VFIO_ASSERT_EQ(luo_get_global_state(luo_fd, &state), 0, "Failed to get LUO state."); + + switch (state) { + case LIVEUPDATE_STATE_NORMAL: + printf("Running pre-kexec actions.\n"); + run_pre_kexec(luo_fd, device_bdf); + break; + case LIVEUPDATE_STATE_UPDATED: + printf("Running post-kexec actions.\n"); + run_post_kexec(luo_fd, device_bdf); + break; + default: + printf("Test started in an unexpected state: %d", state); + } + + close(luo_fd); +}
Test preservation of a VFIO PCI device virtual config (vconfig in struct vfio_pci_core_device{}) during the live update. Write some random data to PCI_INTERRUPT_LINE register which is virtualized by VFIO and verify that the same data is read after kexec.
Certain bits in the config space are virtualized by VFIO, so write to them don't go to the device PCI config instead they are stored in memory. After live update, vconfig should have the value same as prior to kexec, which means vconfig should be saved in KHO and later retrieved to restore the device.
Signed-off-by: Vipin Sharma vipinsh@google.com --- .../testing/selftests/vfio/vfio_pci_liveupdate_test.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/vfio/vfio_pci_liveupdate_test.c b/tools/testing/selftests/vfio/vfio_pci_liveupdate_test.c index 9fd0061348e0..2d80fdcb1ef7 100644 --- a/tools/testing/selftests/vfio/vfio_pci_liveupdate_test.c +++ b/tools/testing/selftests/vfio/vfio_pci_liveupdate_test.c @@ -15,12 +15,14 @@
#define SESSION_NAME "multi_file_session" #define TOKEN 1234 +#define RANDOM_DATA 0x12
static void run_pre_kexec(int luo_fd, const char *bdf) { struct vfio_pci_device *device; int session_fd; u16 command; + u8 data;
device = vfio_pci_device_init(bdf, "iommufd");
@@ -30,6 +32,10 @@ static void run_pre_kexec(int luo_fd, const char *bdf) vfio_pci_config_writew(device, PCI_COMMAND, command | PCI_COMMAND_MASTER);
+ vfio_pci_config_writeb(device, PCI_INTERRUPT_LINE, RANDOM_DATA); + data = vfio_pci_config_readb(device, PCI_INTERRUPT_LINE); + VFIO_ASSERT_EQ(data, RANDOM_DATA); + session_fd = luo_create_session(luo_fd, SESSION_NAME); VFIO_ASSERT_GE(session_fd, 0, "Failed to create session %s", SESSION_NAME); @@ -51,6 +57,7 @@ static void run_post_kexec(int luo_fd, const char *bdf) int vfio_fd; struct vfio_pci_device *device; u16 command; + u8 data;
session_fd = luo_retrieve_session(luo_fd, SESSION_NAME); @@ -74,6 +81,9 @@ static void run_post_kexec(int luo_fd, const char *bdf)
command = vfio_pci_config_readw(device, PCI_COMMAND); VFIO_ASSERT_TRUE(command & PCI_COMMAND_MASTER); + + data = vfio_pci_config_readb(device, PCI_INTERRUPT_LINE); + VFIO_ASSERT_EQ(data, RANDOM_DATA); vfio_pci_device_cleanup(device); }
On Fri, Oct 17, 2025 at 05:06:52PM -0700, Vipin Sharma wrote:
- Integration with IOMMUFD and PCI series for complete workflow where a device continues a DMA while undergoing through live update.
It is a bit confusing, this series has PCI components so how does it relate the PCI series? Is this self contained for at least limited PCI topologies?
Jason
On 2025-10-18 14:21:30, Jason Gunthorpe wrote:
On Fri, Oct 17, 2025 at 05:06:52PM -0700, Vipin Sharma wrote:
- Integration with IOMMUFD and PCI series for complete workflow where a device continues a DMA while undergoing through live update.
It is a bit confusing, this series has PCI components so how does it relate the PCI series? Is this self contained for at least limited PCI topologies?
This series has very minimal PCI support. For example, it is skipping DMA disable on the VFIO PCI device during kexec reboot and saving initial PCI state during first open (bind) of the device.
We do need proper PCI support, few examples:
- Not disabling DMA bit on bridges upstream of the leaf VFIO PCI device node. - Not writing to PCI config during device enumeration. - Not autobinding devices to their default driver. My testing works on devices which don't have driver bulit in the kernel so there is no probing by other drivers. - PCI enable and disable calls support.
These things I think should be solved in PCI series.
On Sat, Oct 18, 2025 at 03:53:09PM -0700, Vipin Sharma wrote:
On 2025-10-18 14:21:30, Jason Gunthorpe wrote:
On Fri, Oct 17, 2025 at 05:06:52PM -0700, Vipin Sharma wrote:
- Integration with IOMMUFD and PCI series for complete workflow where a device continues a DMA while undergoing through live update.
It is a bit confusing, this series has PCI components so how does it relate the PCI series? Is this self contained for at least limited PCI topologies?
This series has very minimal PCI support. For example, it is skipping DMA disable on the VFIO PCI device during kexec reboot and saving initial PCI state during first open (bind) of the device.
We do need proper PCI support, few examples:
- Not disabling DMA bit on bridges upstream of the leaf VFIO PCI device node.
So limited to topology without bridges
- Not writing to PCI config during device enumeration.
I think this should be included here
- Not autobinding devices to their default driver. My testing works on devices which don't have driver bulit in the kernel so there is no probing by other drivers.
Good enough for now, easy to not build in such drivers.
- PCI enable and disable calls support.
?? Shouldn't vfio restore skip calling pci enable? Seems like there should be some solution here.
Jason
linux-kselftest-mirror@lists.linaro.org