From: Pranjal Shrivastava praan@google.com Sent: Saturday, June 13, 2026 3:39 AM
On Wed, Jun 10, 2026 at 04:43:20PM +0100, Matt Evans wrote:
@@ -1264,7 +1265,7 @@ static int vfio_pci_ioctl_reset(struct
vfio_pci_core_device *vdev,
if (!vdev->reset_works) return -EINVAL;
- vfio_pci_zap_and_down_write_memory_lock(vdev);
down_write(&vdev->memory_lock);
/*
- This function can be invoked while the power state is non-D0. If
@@ -1277,10 +1278,11 @@ static int vfio_pci_ioctl_reset(struct
vfio_pci_core_device *vdev,
*/vfio_pci_set_power_state(vdev, PCI_D0);
- vfio_pci_dma_buf_move(vdev, true);
- vfio_pci_zap_revoke_bars(vdev);
I'm wondering if this change in behavior is correct? BEFORE this patch the sequence was:
- zap vma mappings
- Enter D0
After this patch the sequence becomes
- Take the lock
- Enter D0
- zap vma mappings
My worry is if user-space accesses a BAR *during* the transition to D0, it could crash since the mappings still exist during the transition?
not 'crash' as you also noted later with all Fs on read and dropped writes.
The old code is immune to it because it removed user-mappings first.
Following the discussion from v1 regarding the ordering of vfio_pci_dma_buf_move() and the D0 transition.. while it makes sense to perform the DMABUF revocation/move after the hardware is in D0.. I'm not too confident about moving zap after D0 :/
probably add a comment to remind that ordering requirement for dma
I mean, sure, the user would just see all Fs on a read and writes will be dropped silently until we are in D0.. but the behaviour before this change was that the user access will fault and hang on the memory_lock instead which ensures that the user observes a consistent dev state..
I see this more consistent from another angle.
Old code only removes/blocks cpu access but not for device. DMAs are allowed to this device while it's transitioning between D0/D3.
New code at least make this part consistent - both cpu/p2p are allowed in the transition window.
Ideally a sane userspace shouldn't rely on the content read back when it has initiated a reset in parallel. So this behavior change sounds ok?