From: Pranjal Shrivastava praan@google.com Sent: Wednesday, June 17, 2026 2:52 AM
On Tue, Jun 16, 2026 at 09:48:14AM +0000, Tian, Kevin wrote:
From: Pranjal Shrivastava praan@google.com Sent: Saturday, June 13, 2026 3:39 AM
On Wed, Jun 10, 2026 at 04:43:20PM +0100, Matt Evans wrote:
@@ -1264,7 +1265,7 @@ static int vfio_pci_ioctl_reset(struct
vfio_pci_core_device *vdev,
if (!vdev->reset_works) return -EINVAL;
- vfio_pci_zap_and_down_write_memory_lock(vdev);
down_write(&vdev->memory_lock);
/*
- This function can be invoked while the power state is non-D0. If
@@ -1277,10 +1278,11 @@ static int vfio_pci_ioctl_reset(struct
vfio_pci_core_device *vdev,
*/vfio_pci_set_power_state(vdev, PCI_D0);
- vfio_pci_dma_buf_move(vdev, true);
- vfio_pci_zap_revoke_bars(vdev);
I'm wondering if this change in behavior is correct? BEFORE this patch the sequence was:
- zap vma mappings
- Enter D0
After this patch the sequence becomes
- Take the lock
- Enter D0
- zap vma mappings
My worry is if user-space accesses a BAR *during* the transition to D0, it could crash since the mappings still exist during the transition?
not 'crash' as you also noted later with all Fs on read and dropped writes.
Ack, "crash" is definitely a strong word, I just meant that the user-space program isn't expecting to see all Fs today. Since today any access during reset is faulted, however with this all apps may have to lookout for all Fs during a read. Could this change cause existing apps to crash?
I expect there will be certain handshake between the resetting process and any subordinary processes using the exported dmabuf. The device state right after a resetting is not functional. Presumably the resetting process (as the userspace driver of the entire device) needs to re-initialize the device into a state allowing dmabuf to work correctly again. This window is much larger than above, within which I'm not sure what'd be reasonable expectations from those apps.
The old code is immune to it because it removed user-mappings first.
Following the discussion from v1 regarding the ordering of vfio_pci_dma_buf_move() and the D0 transition.. while it makes sense to perform the DMABUF revocation/move after the hardware is in D0.. I'm
not
too confident about moving zap after D0 :/
probably add a comment to remind that ordering requirement for dma
+1. That'd be helpful.
I mean, sure, the user would just see all Fs on a read and writes will be dropped silently until we are in D0.. but the behaviour before this change was that the user access will fault and hang on the memory_lock instead which ensures that the user observes a consistent dev state..
I see this more consistent from another angle.
Old code only removes/blocks cpu access but not for device. DMAs are allowed to this device while it's transitioning between D0/D3.
New code at least make this part consistent - both cpu/p2p are allowed in the transition window.
Ideally a sane userspace shouldn't rely on the content read back when it has initiated a reset in parallel. So this behavior change sounds ok?
I agree on the CPU / P2P consistency part. However, my concern is for a shared reset scenario where a reset triggered by one process (I guess it was vfio_assign_device_set?) can affect multiple devices in a dev_set that are owned by different, unrelated processes.
In the old code, these peer processes are protected because their BAR mappings are zapped immediately. Their MMIO threads simply stall in a page fault until the reset is complete.
I agree for a single-reset scenario, sane user-space should never access regions during a self-triggered reset.
Am I missing something?
Given the resetting impact is intrusive, IMHO handshake/coordination is also required between processes operating on devices in a same dev_set otherwise peer processes will break quickly even with the protection in the old code.
btw I don't remember all the detail but holds an impression there are restrictions on the caller owning all devices in a dev_set or they all belong to the same iommufd context...