Changelog: v5: * Documented the DMA-BUF expectations around DMA unmap. * Added wait support in VFIO for DMA unmap. * Reordered patches. * Improved commit messages to document even more. v4: https://lore.kernel.org/all/20260121-dmabuf-revoke-v4-0-d311cbc8633d@nvidia.... * Changed DMA_RESV_USAGE_KERNEL to DMA_RESV_USAGE_BOOKKEEP. * Made .invalidate_mapping() truly optional. * Added patch which renames dma_buf_move_notify() to be dma_buf_invalidate_mappings(). * Restored dma_buf_attachment_is_dynamic() function. v3: https://lore.kernel.org/all/20260120-dmabuf-revoke-v3-0-b7e0b07b8214@nvidia.... * Used Jason's wordings for commits and cover letter. * Removed IOMMUFD patch. * Renamed dma_buf_attachment_is_revoke() to be dma_buf_attach_revocable(). * Added patch to remove CONFIG_DMABUF_MOVE_NOTIFY. * Added Reviewed-by tags. * Called to dma_resv_wait_timeout() after dma_buf_move_notify() in VFIO. * Added dma_buf_attach_revocable() check to VFIO DMABUF attach function. * Slightly changed commit messages. v2: https://patch.msgid.link/20260118-dmabuf-revoke-v2-0-a03bb27c0875@nvidia.com * Changed series to document the revoke semantics instead of implementing it. v1: https://patch.msgid.link/20260111-dmabuf-revoke-v1-0-fb4bcc8c259b@nvidia.com
------------------------------------------------------------------------- This series is based on latest VFIO fix, which will be sent to Linus very soon.
https://lore.kernel.org/all/20260121-vfio-add-pin-v1-1-4e04916b17f1@nvidia.c...
Thanks ------------------------------------------------------------------------- This series documents a dma-buf “revoke” mechanism: to allow a dma-buf exporter to explicitly invalidate (“kill”) a shared buffer after it has been distributed to importers, so that further CPU and device access is prevented and importers reliably observe failure.
The change in this series is to properly document and use existing core “revoked” state on the dma-buf object and a corresponding exporter-triggered revoke operation.
dma-buf has quietly allowed calling move_notify on pinned dma-bufs, even though legacy importers using dma_buf_attach() would simply ignore these calls.
The intention was that move_notify() would tell the importer to expedite it's unmapping process and once the importer is fully finished with DMA it would unmap the dma-buf which finally signals that the importer is no longer ever going to touch the memory again. Importers that touch past their unmap() call can trigger IOMMU errors, AER and beyond, however read-and-discard access between move_notify() and unmap is allowed.
Thus, we can define the exporter's revoke sequence for pinned dma-buf as:
dma_resv_lock(dmabuf->resv, NULL); // Prevent new mappings from being established priv->revoked = true;
// Tell all importers to eventually unmap dma_buf_invalidate_mappings(dmabuf);
// Wait for any inprogress fences on the old mapping dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); dma_resv_unlock(dmabuf->resv, NULL);
// Wait for all importers to complete unmap wait_for_completion(&priv->unmapp_comp);
However, dma-buf also supports importers that don't do anything on move_notify(), and will not unmap the buffer in bounded time.
Since such importers would cause the above sequence to hang, a new mechanism is needed to detect incompatible importers.
Introduce dma_buf_attach_revocable() which if true indicates the above sequence is safe to use and will complete in kernel-only bounded time for this attachment.
Unfortunately dma_buf_attach_revocable() is going to fail for the popular RDMA pinned importer, which means we cannot introduce it to existing places using pinned move_notify() without potentially breaking existing userspace flows.
Existing exporters that only trigger this flow for RAS errors should not call dma_buf_attach_revocable() and will suffer an unbounded block on the final completion, hoping that the userspace will notice the RAS and clean things up. Without revoke support on the RDMA pinned importers it doesn't seem like any other non-breaking option is currently possible.
For new exporters, like VFIO and RDMA, that have userspace triggered revoke events, the unbouned sleep would not be acceptable. They can call dma_buf_attach_revocable() and will not work with the RDMA pinned importer from day 0, preventing regressions.
In the process add documentation explaining the above details.
Thanks
Signed-off-by: Leon Romanovsky leonro@nvidia.com --- Leon Romanovsky (8): dma-buf: Rename .move_notify() callback to a clearer identifier dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings() dma-buf: Always build with DMABUF_MOVE_NOTIFY vfio: Wait for dma-buf invalidation to complete dma-buf: Make .invalidate_mapping() truly optional dma-buf: Add dma_buf_attach_revocable() vfio: Permit VFIO to work with pinned importers iommufd: Add dma_buf_pin()
drivers/dma-buf/Kconfig | 12 ---- drivers/dma-buf/dma-buf.c | 69 +++++++++++++++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdkfd/Kconfig | 2 +- drivers/gpu/drm/virtio/virtgpu_prime.c | 2 +- drivers/gpu/drm/xe/tests/xe_dma_buf.c | 7 +-- drivers/gpu/drm/xe/xe_bo.c | 2 +- drivers/gpu/drm/xe/xe_dma_buf.c | 14 ++--- drivers/infiniband/core/umem_dmabuf.c | 13 ----- drivers/infiniband/hw/mlx5/mr.c | 2 +- drivers/iommu/iommufd/pages.c | 11 +++- drivers/iommu/iommufd/selftest.c | 2 +- drivers/vfio/pci/vfio_pci_dmabuf.c | 90 +++++++++++++++++++++++------ include/linux/dma-buf.h | 17 +++--- 15 files changed, 164 insertions(+), 95 deletions(-) --- base-commit: 61ceaf236115f20f4fdd7cf60f883ada1063349a change-id: 20251221-dmabuf-revoke-b90ef16e4236
Best regards, -- Leon Romanovsky leonro@nvidia.com
From: Leon Romanovsky leonro@nvidia.com
Rename the .move_notify() callback to .invalidate_mappings() to make its purpose explicit and highlight that it is responsible for invalidating existing mappings.
Suggested-by: Christian König christian.koenig@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/dma-buf/dma-buf.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 ++-- drivers/gpu/drm/virtio/virtgpu_prime.c | 2 +- drivers/gpu/drm/xe/tests/xe_dma_buf.c | 6 +++--- drivers/gpu/drm/xe/xe_dma_buf.c | 2 +- drivers/infiniband/core/umem_dmabuf.c | 4 ++-- drivers/infiniband/hw/mlx5/mr.c | 2 +- drivers/iommu/iommufd/pages.c | 2 +- include/linux/dma-buf.h | 6 +++--- 9 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index edaa9e4ee4ae..59cc647bf40e 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -948,7 +948,7 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, if (WARN_ON(!dmabuf || !dev)) return ERR_PTR(-EINVAL);
- if (WARN_ON(importer_ops && !importer_ops->move_notify)) + if (WARN_ON(importer_ops && !importer_ops->invalidate_mappings)) return ERR_PTR(-EINVAL);
attach = kzalloc(sizeof(*attach), GFP_KERNEL); @@ -1055,7 +1055,7 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_pin, "DMA_BUF"); * * This unpins a buffer pinned by dma_buf_pin() and allows the exporter to move * any mapping of @attach again and inform the importer through - * &dma_buf_attach_ops.move_notify. + * &dma_buf_attach_ops.invalidate_mappings. */ void dma_buf_unpin(struct dma_buf_attachment *attach) { @@ -1262,7 +1262,7 @@ void dma_buf_move_notify(struct dma_buf *dmabuf)
list_for_each_entry(attach, &dmabuf->attachments, node) if (attach->importer_ops) - attach->importer_ops->move_notify(attach); + attach->importer_ops->invalidate_mappings(attach); } EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, "DMA_BUF");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index c1461317eb29..cd4944ceb047 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -438,7 +438,7 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf) }
/** - * amdgpu_dma_buf_move_notify - &attach.move_notify implementation + * amdgpu_dma_buf_move_notify - &attach.invalidate_mappings implementation * * @attach: the DMA-buf attachment * @@ -509,7 +509,7 @@ amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach)
static const struct dma_buf_attach_ops amdgpu_dma_buf_attach_ops = { .allow_peer2peer = true, - .move_notify = amdgpu_dma_buf_move_notify + .invalidate_mappings = amdgpu_dma_buf_move_notify };
/** diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c b/drivers/gpu/drm/virtio/virtgpu_prime.c index ce49282198cb..19c78dd2ca77 100644 --- a/drivers/gpu/drm/virtio/virtgpu_prime.c +++ b/drivers/gpu/drm/virtio/virtgpu_prime.c @@ -288,7 +288,7 @@ static void virtgpu_dma_buf_move_notify(struct dma_buf_attachment *attach)
static const struct dma_buf_attach_ops virtgpu_dma_buf_attach_ops = { .allow_peer2peer = true, - .move_notify = virtgpu_dma_buf_move_notify + .invalidate_mappings = virtgpu_dma_buf_move_notify };
struct drm_gem_object *virtgpu_gem_prime_import(struct drm_device *dev, diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c index 5df98de5ba3c..1f2cca5c2f81 100644 --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c @@ -23,7 +23,7 @@ static bool p2p_enabled(struct dma_buf_test_params *params) static bool is_dynamic(struct dma_buf_test_params *params) { return IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY) && params->attach_ops && - params->attach_ops->move_notify; + params->attach_ops->invalidate_mappings; }
static void check_residency(struct kunit *test, struct xe_bo *exported, @@ -60,7 +60,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
/* * Evict exporter. Evicting the exported bo will - * evict also the imported bo through the move_notify() functionality if + * evict also the imported bo through the invalidate_mappings() functionality if * importer is on a different device. If they're on the same device, * the exporter and the importer should be the same bo. */ @@ -198,7 +198,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
static const struct dma_buf_attach_ops nop2p_attach_ops = { .allow_peer2peer = false, - .move_notify = xe_dma_buf_move_notify + .invalidate_mappings = xe_dma_buf_move_notify };
/* diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c index 7c74a31d4486..1b9cd043e517 100644 --- a/drivers/gpu/drm/xe/xe_dma_buf.c +++ b/drivers/gpu/drm/xe/xe_dma_buf.c @@ -287,7 +287,7 @@ static void xe_dma_buf_move_notify(struct dma_buf_attachment *attach)
static const struct dma_buf_attach_ops xe_dma_buf_attach_ops = { .allow_peer2peer = true, - .move_notify = xe_dma_buf_move_notify + .invalidate_mappings = xe_dma_buf_move_notify };
#if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST) diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c index 0ec2e4120cc9..d77a739cfe7a 100644 --- a/drivers/infiniband/core/umem_dmabuf.c +++ b/drivers/infiniband/core/umem_dmabuf.c @@ -129,7 +129,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device, if (check_add_overflow(offset, (unsigned long)size, &end)) return ret;
- if (unlikely(!ops || !ops->move_notify)) + if (unlikely(!ops || !ops->invalidate_mappings)) return ret;
dmabuf = dma_buf_get(fd); @@ -195,7 +195,7 @@ ib_umem_dmabuf_unsupported_move_notify(struct dma_buf_attachment *attach)
static struct dma_buf_attach_ops ib_umem_dmabuf_attach_pinned_ops = { .allow_peer2peer = true, - .move_notify = ib_umem_dmabuf_unsupported_move_notify, + .invalidate_mappings = ib_umem_dmabuf_unsupported_move_notify, };
struct ib_umem_dmabuf * diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 325fa04cbe8a..97099d3b1688 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1620,7 +1620,7 @@ static void mlx5_ib_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = { .allow_peer2peer = 1, - .move_notify = mlx5_ib_dmabuf_invalidate_cb, + .invalidate_mappings = mlx5_ib_dmabuf_invalidate_cb, };
static struct ib_mr * diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c index dbe51ecb9a20..76f900fa1687 100644 --- a/drivers/iommu/iommufd/pages.c +++ b/drivers/iommu/iommufd/pages.c @@ -1451,7 +1451,7 @@ static void iopt_revoke_notify(struct dma_buf_attachment *attach)
static struct dma_buf_attach_ops iopt_dmabuf_attach_revoke_ops = { .allow_peer2peer = true, - .move_notify = iopt_revoke_notify, + .invalidate_mappings = iopt_revoke_notify, };
/* diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 0bc492090237..1b397635c793 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -407,7 +407,7 @@ struct dma_buf { * through the device. * * - Dynamic importers should set fences for any access that they can't - * disable immediately from their &dma_buf_attach_ops.move_notify + * disable immediately from their &dma_buf_attach_ops.invalidate_mappings * callback. * * IMPORTANT: @@ -458,7 +458,7 @@ struct dma_buf_attach_ops { bool allow_peer2peer;
/** - * @move_notify: [optional] notification that the DMA-buf is moving + * @invalidate_mappings: [optional] notification that the DMA-buf is moving * * If this callback is provided the framework can avoid pinning the * backing store while mappings exists. @@ -475,7 +475,7 @@ struct dma_buf_attach_ops { * New mappings can be created after this callback returns, and will * point to the new location of the DMA-buf. */ - void (*move_notify)(struct dma_buf_attachment *attach); + void (*invalidate_mappings)(struct dma_buf_attachment *attach); };
/**
From: Leon Romanovsky leonro@nvidia.com
Along with renaming the .move_notify() callback, rename the corresponding dma-buf core function. This makes the expected behavior clear to exporters calling this function.
Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/dma-buf/dma-buf.c | 8 ++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/xe/xe_bo.c | 2 +- drivers/iommu/iommufd/selftest.c | 2 +- drivers/vfio/pci/vfio_pci_dmabuf.c | 4 ++-- include/linux/dma-buf.h | 2 +- 6 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 59cc647bf40e..e12db540c413 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -912,7 +912,7 @@ dma_buf_pin_on_map(struct dma_buf_attachment *attach) * 3. Exporters must hold the dma-buf reservation lock when calling these * functions: * - * - dma_buf_move_notify() + * - dma_buf_invalidate_mappings() */
/** @@ -1247,14 +1247,14 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF");
/** - * dma_buf_move_notify - notify attachments that DMA-buf is moving + * dma_buf_invalidate_mappings - notify attachments that DMA-buf is moving * * @dmabuf: [in] buffer which is moving * * Informs all attachments that they need to destroy and recreate all their * mappings. */ -void dma_buf_move_notify(struct dma_buf *dmabuf) +void dma_buf_invalidate_mappings(struct dma_buf *dmabuf) { struct dma_buf_attachment *attach;
@@ -1264,7 +1264,7 @@ void dma_buf_move_notify(struct dma_buf *dmabuf) if (attach->importer_ops) attach->importer_ops->invalidate_mappings(attach); } -EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, "DMA_BUF"); +EXPORT_SYMBOL_NS_GPL(dma_buf_invalidate_mappings, "DMA_BUF");
/** * DOC: cpu access diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index e08f58de4b17..f73dc99d1887 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1270,7 +1270,7 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
if (abo->tbo.base.dma_buf && !drm_gem_is_imported(&abo->tbo.base) && old_mem && old_mem->mem_type != TTM_PL_SYSTEM) - dma_buf_move_notify(abo->tbo.base.dma_buf); + dma_buf_invalidate_mappings(abo->tbo.base.dma_buf);
/* move_notify is called before move happens */ trace_amdgpu_bo_move(abo, new_mem ? new_mem->mem_type : -1, diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index bf4ee976b680..7d02cd9a8501 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -819,7 +819,7 @@ static int xe_bo_move_notify(struct xe_bo *bo,
/* Don't call move_notify() for imported dma-bufs. */ if (ttm_bo->base.dma_buf && !ttm_bo->base.import_attach) - dma_buf_move_notify(ttm_bo->base.dma_buf); + dma_buf_invalidate_mappings(ttm_bo->base.dma_buf);
/* * TTM has already nuked the mmap for us (see ttm_bo_unmap_virtual), diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 550ff36dec3a..f60cbd5328cc 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -2081,7 +2081,7 @@ static int iommufd_test_dmabuf_revoke(struct iommufd_ucmd *ucmd, int fd, priv = dmabuf->priv; dma_resv_lock(dmabuf->resv, NULL); priv->revoked = revoked; - dma_buf_move_notify(dmabuf); + dma_buf_invalidate_mappings(dmabuf); dma_resv_unlock(dmabuf->resv);
err_put: diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c index 4be4a85005cb..d8ceafabef48 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -332,7 +332,7 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) if (priv->revoked != revoked) { dma_resv_lock(priv->dmabuf->resv, NULL); priv->revoked = revoked; - dma_buf_move_notify(priv->dmabuf); + dma_buf_invalidate_mappings(priv->dmabuf); dma_resv_unlock(priv->dmabuf->resv); } fput(priv->dmabuf->file); @@ -353,7 +353,7 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) list_del_init(&priv->dmabufs_elm); priv->vdev = NULL; priv->revoked = true; - dma_buf_move_notify(priv->dmabuf); + dma_buf_invalidate_mappings(priv->dmabuf); dma_resv_unlock(priv->dmabuf->resv); vfio_device_put_registration(&vdev->vdev); fput(priv->dmabuf->file); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 1b397635c793..d5c3ce2b3aa4 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -600,7 +600,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *, enum dma_data_direction); void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *, enum dma_data_direction); -void dma_buf_move_notify(struct dma_buf *dma_buf); +void dma_buf_invalidate_mappings(struct dma_buf *dma_buf); int dma_buf_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction dir); int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
From: Leon Romanovsky leonro@nvidia.com
DMABUF_MOVE_NOTIFY was introduced in 2018 and has been marked as experimental and disabled by default ever since. Six years later, all new importers implement this callback.
It is therefore reasonable to drop CONFIG_DMABUF_MOVE_NOTIFY and always build DMABUF with support for it enabled.
Suggested-by: Christian König christian.koenig@amd.com Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/dma-buf/Kconfig | 12 ------------ drivers/dma-buf/dma-buf.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 10 +++------- drivers/gpu/drm/amd/amdkfd/Kconfig | 2 +- drivers/gpu/drm/xe/tests/xe_dma_buf.c | 3 +-- drivers/gpu/drm/xe/xe_dma_buf.c | 12 ++++-------- 6 files changed, 10 insertions(+), 32 deletions(-)
diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig index b46eb8a552d7..84d5e9b24e20 100644 --- a/drivers/dma-buf/Kconfig +++ b/drivers/dma-buf/Kconfig @@ -40,18 +40,6 @@ config UDMABUF A driver to let userspace turn memfd regions into dma-bufs. Qemu can use this to create host dmabufs for guest framebuffers.
-config DMABUF_MOVE_NOTIFY - bool "Move notify between drivers (EXPERIMENTAL)" - default n - depends on DMA_SHARED_BUFFER - help - Don't pin buffers if the dynamic DMA-buf interface is available on - both the exporter as well as the importer. This fixes a security - problem where userspace is able to pin unrestricted amounts of memory - through DMA-buf. - This is marked experimental because we don't yet have a consistent - execution context and memory management between drivers. - config DMABUF_DEBUG bool "DMA-BUF debug checks" depends on DMA_SHARED_BUFFER diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index e12db540c413..cd68c1c0bfd7 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -847,8 +847,7 @@ static bool dma_buf_pin_on_map(struct dma_buf_attachment *attach) { return attach->dmabuf->ops->pin && - (!dma_buf_attachment_is_dynamic(attach) || - !IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)); + !dma_buf_attachment_is_dynamic(attach); }
/** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index cd4944ceb047..b7f85b8653cf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -133,13 +133,9 @@ static int amdgpu_dma_buf_pin(struct dma_buf_attachment *attach) * notifiers are disabled, only allow pinning in VRAM when move * notiers are enabled. */ - if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) { - domains &= ~AMDGPU_GEM_DOMAIN_VRAM; - } else { - list_for_each_entry(attach, &dmabuf->attachments, node) - if (!attach->peer2peer) - domains &= ~AMDGPU_GEM_DOMAIN_VRAM; - } + list_for_each_entry(attach, &dmabuf->attachments, node) + if (!attach->peer2peer) + domains &= ~AMDGPU_GEM_DOMAIN_VRAM;
if (domains & AMDGPU_GEM_DOMAIN_VRAM) bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig b/drivers/gpu/drm/amd/amdkfd/Kconfig index 16e12c9913f9..a5d7467c2f34 100644 --- a/drivers/gpu/drm/amd/amdkfd/Kconfig +++ b/drivers/gpu/drm/amd/amdkfd/Kconfig @@ -27,7 +27,7 @@ config HSA_AMD_SVM
config HSA_AMD_P2P bool "HSA kernel driver support for peer-to-peer for AMD GPU devices" - depends on HSA_AMD && PCI_P2PDMA && DMABUF_MOVE_NOTIFY + depends on HSA_AMD && PCI_P2PDMA help Enable peer-to-peer (P2P) communication between AMD GPUs over the PCIe bus. This can improve performance of multi-GPU compute diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c index 1f2cca5c2f81..c107687ef3c0 100644 --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c @@ -22,8 +22,7 @@ static bool p2p_enabled(struct dma_buf_test_params *params)
static bool is_dynamic(struct dma_buf_test_params *params) { - return IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY) && params->attach_ops && - params->attach_ops->invalidate_mappings; + return params->attach_ops && params->attach_ops->invalidate_mappings; }
static void check_residency(struct kunit *test, struct xe_bo *exported, diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c index 1b9cd043e517..ea370cd373e9 100644 --- a/drivers/gpu/drm/xe/xe_dma_buf.c +++ b/drivers/gpu/drm/xe/xe_dma_buf.c @@ -56,14 +56,10 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach) bool allow_vram = true; int ret;
- if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) { - allow_vram = false; - } else { - list_for_each_entry(attach, &dmabuf->attachments, node) { - if (!attach->peer2peer) { - allow_vram = false; - break; - } + list_for_each_entry(attach, &dmabuf->attachments, node) { + if (!attach->peer2peer) { + allow_vram = false; + break; } }
From: Leon Romanovsky leonro@nvidia.com
dma-buf invalidation is handled asynchronously by the hardware, so VFIO must wait until all affected objects have been fully invalidated.
In addition, the dma-buf exporter is expecting that all importers unmap any buffers they previously mapped.
Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions") Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/vfio/pci/vfio_pci_dmabuf.c | 71 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 68 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c index d8ceafabef48..485515629fe4 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -17,6 +17,8 @@ struct vfio_pci_dma_buf { struct dma_buf_phys_vec *phys_vec; struct p2pdma_provider *provider; u32 nr_ranges; + struct kref kref; + struct completion comp; u8 revoked : 1; };
@@ -44,27 +46,46 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, return 0; }
+static void vfio_pci_dma_buf_done(struct kref *kref) +{ + struct vfio_pci_dma_buf *priv = + container_of(kref, struct vfio_pci_dma_buf, kref); + + complete(&priv->comp); +} + static struct sg_table * vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment, enum dma_data_direction dir) { struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv; + struct sg_table *ret;
dma_resv_assert_held(priv->dmabuf->resv);
if (priv->revoked) return ERR_PTR(-ENODEV);
- return dma_buf_phys_vec_to_sgt(attachment, priv->provider, - priv->phys_vec, priv->nr_ranges, - priv->size, dir); + ret = dma_buf_phys_vec_to_sgt(attachment, priv->provider, + priv->phys_vec, priv->nr_ranges, + priv->size, dir); + if (IS_ERR(ret)) + return ret; + + kref_get(&priv->kref); + return ret; }
static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment, struct sg_table *sgt, enum dma_data_direction dir) { + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv; + + dma_resv_assert_held(priv->dmabuf->resv); + dma_buf_free_sgt(attachment, sgt, dir); + kref_put(&priv->kref, vfio_pci_dma_buf_done); }
static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) @@ -287,6 +308,9 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, goto err_dev_put; }
+ kref_init(&priv->kref); + init_completion(&priv->comp); + /* dma_buf_put() now frees priv */ INIT_LIST_HEAD(&priv->dmabufs_elm); down_write(&vdev->memory_lock); @@ -326,6 +350,8 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) lockdep_assert_held_write(&vdev->memory_lock);
list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + unsigned long wait; + if (!get_file_active(&priv->dmabuf->file)) continue;
@@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) dma_resv_lock(priv->dmabuf->resv, NULL); priv->revoked = revoked; dma_buf_invalidate_mappings(priv->dmabuf); + dma_resv_wait_timeout(priv->dmabuf->resv, + DMA_RESV_USAGE_BOOKKEEP, false, + MAX_SCHEDULE_TIMEOUT); dma_resv_unlock(priv->dmabuf->resv); + if (revoked) { + kref_put(&priv->kref, vfio_pci_dma_buf_done); + /* Let's wait till all DMA unmap are completed. */ + wait = wait_for_completion_timeout( + &priv->comp, secs_to_jiffies(1)); + /* + * If you see this WARN_ON, it means that + * importer didn't call unmap in response to + * dma_buf_invalidate_mappings() which is not + * allowed. + */ + WARN(!wait, + "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()"); + } else { + /* + * Kref is initialize again, because when revoke + * was performed the reference counter was decreased + * to zero to trigger completion. + */ + kref_init(&priv->kref); + /* + * There is no need to wait as no mapping was + * performed when the previous status was + * priv->revoked == true. + */ + reinit_completion(&priv->comp); + } } fput(priv->dmabuf->file); } @@ -346,6 +402,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
down_write(&vdev->memory_lock); list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + unsigned long wait; + if (!get_file_active(&priv->dmabuf->file)) continue;
@@ -354,7 +412,14 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) priv->vdev = NULL; priv->revoked = true; dma_buf_invalidate_mappings(priv->dmabuf); + dma_resv_wait_timeout(priv->dmabuf->resv, + DMA_RESV_USAGE_BOOKKEEP, false, + MAX_SCHEDULE_TIMEOUT); dma_resv_unlock(priv->dmabuf->resv); + kref_put(&priv->kref, vfio_pci_dma_buf_done); + wait = wait_for_completion_timeout(&priv->comp, + secs_to_jiffies(1)); + WARN_ON(!wait); vfio_device_put_registration(&vdev->vdev); fput(priv->dmabuf->file); }
From: Leon Romanovsky leonro@nvidia.com
The .invalidate_mapping() callback is documented as optional, yet it effectively became mandatory whenever importer_ops were provided. This led to cases where RDMA non-ODP code had to supply an empty stub.
Relax the checks in the dma-buf core so the callback can be omitted, allowing RDMA code to drop the unnecessary function.
Removing the stub allows the next patch to tell that RDMA does not support .invalidate_mapping() by checking for a NULL op.
Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/dma-buf/dma-buf.c | 6 ++---- drivers/infiniband/core/umem_dmabuf.c | 13 ------------- 2 files changed, 2 insertions(+), 17 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index cd68c1c0bfd7..1629312d364a 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -947,9 +947,6 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, if (WARN_ON(!dmabuf || !dev)) return ERR_PTR(-EINVAL);
- if (WARN_ON(importer_ops && !importer_ops->invalidate_mappings)) - return ERR_PTR(-EINVAL); - attach = kzalloc(sizeof(*attach), GFP_KERNEL); if (!attach) return ERR_PTR(-ENOMEM); @@ -1260,7 +1257,8 @@ void dma_buf_invalidate_mappings(struct dma_buf *dmabuf) dma_resv_assert_held(dmabuf->resv);
list_for_each_entry(attach, &dmabuf->attachments, node) - if (attach->importer_ops) + if (attach->importer_ops && + attach->importer_ops->invalidate_mappings) attach->importer_ops->invalidate_mappings(attach); } EXPORT_SYMBOL_NS_GPL(dma_buf_invalidate_mappings, "DMA_BUF"); diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c index d77a739cfe7a..256e34c15e6b 100644 --- a/drivers/infiniband/core/umem_dmabuf.c +++ b/drivers/infiniband/core/umem_dmabuf.c @@ -129,9 +129,6 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device, if (check_add_overflow(offset, (unsigned long)size, &end)) return ret;
- if (unlikely(!ops || !ops->invalidate_mappings)) - return ret; - dmabuf = dma_buf_get(fd); if (IS_ERR(dmabuf)) return ERR_CAST(dmabuf); @@ -184,18 +181,8 @@ struct ib_umem_dmabuf *ib_umem_dmabuf_get(struct ib_device *device, } EXPORT_SYMBOL(ib_umem_dmabuf_get);
-static void -ib_umem_dmabuf_unsupported_move_notify(struct dma_buf_attachment *attach) -{ - struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv; - - ibdev_warn_ratelimited(umem_dmabuf->umem.ibdev, - "Invalidate callback should not be called when memory is pinned\n"); -} - static struct dma_buf_attach_ops ib_umem_dmabuf_attach_pinned_ops = { .allow_peer2peer = true, - .invalidate_mappings = ib_umem_dmabuf_unsupported_move_notify, };
struct ib_umem_dmabuf *
From: Leon Romanovsky leonro@nvidia.com
Some exporters need a flow to synchronously revoke access to the DMA-buf by importers. Once revoke is completed the importer is not permitted to touch the memory otherwise they may get IOMMU faults, AERs, or worse.
DMA-buf today defines a revoke flow, for both pinned and dynamic importers, which is broadly:
dma_resv_lock(dmabuf->resv, NULL); // Prevent new mappings from being established priv->revoked = true;
// Tell all importers to eventually unmap dma_buf_invalidate_mappings(dmabuf);
// Wait for any inprogress fences on the old mapping dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); dma_resv_unlock(dmabuf->resv, NULL);
// Wait for all importers to complete unmap wait_for_completion(&priv->unmapped_comp);
This works well, and an importer that continues to access the DMA-buf after unmapping it is very buggy.
However, the final wait for unmap is effectively unbounded. Several importers do not support invalidate_mappings() at all and won't unmap until userspace triggers it.
This unbounded wait is not suitable for exporters like VFIO and RDMA tha need to issue revoke as part of their normal operations.
Add dma_buf_attach_revocable() to allow exporters to determine the difference between importers that can complete the above in bounded time, and those that can't. It can be called inside the exporter's attach op to reject incompatible importers.
Document these details about how dma_buf_invalidate_mappings() works and what the required sequence is to achieve a full revocation.
Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/dma-buf/dma-buf.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++- include/linux/dma-buf.h | 9 +++------ 2 files changed, 50 insertions(+), 7 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 1629312d364a..f0e05227bda8 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -1242,13 +1242,59 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, } EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF");
+/** + * dma_buf_attach_revocable - check if a DMA-buf importer implements + * revoke semantics. + * @attach: the DMA-buf attachment to check + * + * Returns true if the DMA-buf importer can support the revoke sequence + * explained in dma_buf_invalidate_mappings() within bounded time. Meaning the + * importer implements invalidate_mappings() and ensures that unmap is called as + * a result. + */ +bool dma_buf_attach_revocable(struct dma_buf_attachment *attach) +{ + return attach->importer_ops && + attach->importer_ops->invalidate_mappings; +} +EXPORT_SYMBOL_NS_GPL(dma_buf_attach_revocable, "DMA_BUF"); + /** * dma_buf_invalidate_mappings - notify attachments that DMA-buf is moving * * @dmabuf: [in] buffer which is moving * * Informs all attachments that they need to destroy and recreate all their - * mappings. + * mappings. If the attachment is dynamic then the dynamic importer is expected + * to invalidate any caches it has of the mapping result and perform a new + * mapping request before allowing HW to do any further DMA. + * + * If the attachment is pinned then this informs the pinned importer that the + * underlying mapping is no longer available. Pinned importers may take this is + * as a permanent revocation and never establish new mappings so exporters + * should not trigger it lightly. + * + * Upon return importers may continue to access the DMA-buf memory. The caller + * must do two additional waits to ensure that the memory is no longer being + * accessed: + * 1) Until dma_resv_wait_timeout() retires fences the importer is allowed to + * fully access the memory. + * 2) Until the importer calls unmap it is allowed to speculatively + * read-and-discard the memory. It must not write to the memory. + * + * A caller wishing to use dma_buf_invalidate_mappings() to fully stop access to + * the DMA-buf must wait for both. Dynamic callers can often use just the first. + * + * All importers providing a invalidate_mappings() op must ensure that unmap is + * called within bounded time after the op. + * + * Pinned importers that do not support a invalidate_mappings() op will + * eventually perform unmap when they are done with the buffer, which may be an + * ubounded time from calling this function. dma_buf_attach_revocable() can be + * used to prevent such importers from attaching. + * + * Importers are free to request a new mapping in parallel as this function + * returns. */ void dma_buf_invalidate_mappings(struct dma_buf *dmabuf) { diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index d5c3ce2b3aa4..84a7ec8f5359 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -468,12 +468,8 @@ struct dma_buf_attach_ops { * called with this lock held as well. This makes sure that no mapping * is created concurrently with an ongoing move operation. * - * Mappings stay valid and are not directly affected by this callback. - * But the DMA-buf can now be in a different physical location, so all - * mappings should be destroyed and re-created as soon as possible. - * - * New mappings can be created after this callback returns, and will - * point to the new location of the DMA-buf. + * See the kdoc for dma_buf_invalidate_mappings() for details on the + * required behavior. */ void (*invalidate_mappings)(struct dma_buf_attachment *attach); }; @@ -601,6 +597,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *, void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *, enum dma_data_direction); void dma_buf_invalidate_mappings(struct dma_buf *dma_buf); +bool dma_buf_attach_revocable(struct dma_buf_attachment *attach); int dma_buf_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction dir); int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
From: Leon Romanovsky leonro@nvidia.com
Till now VFIO has rejected pinned importers, largely to avoid being used with the RDMA pinned importer that cannot handle a move_notify() to revoke access.
Using dma_buf_attach_revocable() it can tell the difference between pinned importers that support the flow described in dma_buf_invalidate_mappings() and those that don't.
Thus permit compatible pinned importers.
This is one of two items IOMMUFD requires to remove its private interface to VFIO's dma-buf.
Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/vfio/pci/vfio_pci_dmabuf.c | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c index 485515629fe4..3c8dc56e2238 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -22,16 +22,6 @@ struct vfio_pci_dma_buf { u8 revoked : 1; };
-static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment) -{ - return -EOPNOTSUPP; -} - -static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment) -{ - /* Do nothing */ -} - static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, struct dma_buf_attachment *attachment) { @@ -43,6 +33,9 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, if (priv->revoked) return -ENODEV;
+ if (!dma_buf_attach_revocable(attachment)) + return -EOPNOTSUPP; + return 0; }
@@ -107,8 +100,6 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) }
static const struct dma_buf_ops vfio_pci_dmabuf_ops = { - .pin = vfio_pci_dma_buf_pin, - .unpin = vfio_pci_dma_buf_unpin, .attach = vfio_pci_dma_buf_attach, .map_dma_buf = vfio_pci_dma_buf_map, .unmap_dma_buf = vfio_pci_dma_buf_unmap,
From: Leon Romanovsky leonro@nvidia.com
IOMMUFD relies on a private protocol with VFIO, and this always operated in pinned mode.
Now that VFIO can support pinned importers update IOMMUFD to invoke the normal dma-buf flow to request pin.
This isn't enough to allow IOMMUFD to work with other exporters, it still needs a way to get the physical address list which is another series.
IOMMUFD supports the defined revoke semantics. It immediately stops and fences access to the memory inside it's invalidate_mappings() callback, and it currently doesn't use scatterlists so doesn't call map/unmap at all.
It is expected that a future revision can synchronously call unmap from the move_notify callback as well.
Acked-by: Christian König christian.koenig@amd.com Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/iommu/iommufd/pages.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c index 76f900fa1687..a5eb2bc4ef48 100644 --- a/drivers/iommu/iommufd/pages.c +++ b/drivers/iommu/iommufd/pages.c @@ -1501,16 +1501,22 @@ static int iopt_map_dmabuf(struct iommufd_ctx *ictx, struct iopt_pages *pages, mutex_unlock(&pages->mutex); }
- rc = sym_vfio_pci_dma_buf_iommufd_map(attach, &pages->dmabuf.phys); + rc = dma_buf_pin(attach); if (rc) goto err_detach;
+ rc = sym_vfio_pci_dma_buf_iommufd_map(attach, &pages->dmabuf.phys); + if (rc) + goto err_unpin; + dma_resv_unlock(dmabuf->resv);
/* On success iopt_release_pages() will detach and put the dmabuf. */ pages->dmabuf.attach = attach; return 0;
+err_unpin: + dma_buf_unpin(attach); err_detach: dma_resv_unlock(dmabuf->resv); dma_buf_detach(dmabuf, attach); @@ -1656,6 +1662,7 @@ void iopt_release_pages(struct kref *kref) if (iopt_is_dmabuf(pages) && pages->dmabuf.attach) { struct dma_buf *dmabuf = pages->dmabuf.attach->dmabuf;
+ dma_buf_unpin(pages->dmabuf.attach); dma_buf_detach(dmabuf, pages->dmabuf.attach); dma_buf_put(dmabuf); WARN_ON(!list_empty(&pages->dmabuf.tracker));
linaro-mm-sig@lists.linaro.org