This series implements a dma-buf “revoke” mechanism: to allow a dma-buf exporter to explicitly invalidate (“kill”) a shared buffer after it has been distributed to importers, so that further CPU and device access is prevented and importers reliably observe failure.
Today, dma-buf effectively provides “if you have the fd, you can keep using the memory indefinitely.” That assumption breaks down when an exporter must reclaim, reset, evict, or otherwise retire backing memory after it has been shared. Concrete cases include GPU reset and recovery where old allocations become unsafe to access, memory eviction/overcommit where backing storage must be withdrawn, and security or isolation situations where continued access must be prevented. While drivers can sometimes approximate this with exporter-specific fencing and policy, there is no core dma-buf state transition that communicates “this buffer is no longer valid; fail access” across all access paths.
The change in this series is to introduce a core “revoked” state on the dma-buf object and a corresponding exporter-triggered revoke operation. Once a dma-buf is revoked, new access paths are blocked so that attempts to DMA-map, vmap, or mmap the buffer fail in a consistent way.
In addition, the series aims to invalidate existing access as much as the kernel allows: device mappings are torn down where possible so devices and IOMMUs cannot continue DMA.
The semantics are intentionally simple: revoke is a one-way, permanent transition for the lifetime of that dma-buf instance.
From a compatibility perspective, users that never invoke revoke are unaffected, and exporters that adopt it gain a core-supported enforcement mechanism rather than relying on ad hoc driver behavior. The intent is to keep the interface minimal and avoid imposing policy; the series provides the mechanism to terminate access, with policy remaining in the exporter and higher-level components.
BTW, see this megathread [1] for additional context. Ironically, it was posted exactly one year ago.
[1] https://lore.kernel.org/all/20250107142719.179636-2-yilun.xu@linux.intel.com...
Thanks
Cc: linux-rdma@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: kvm@vger.kernel.org Cc: iommu@lists.linux.dev To: Jason Gunthorpe jgg@ziepe.ca To: Leon Romanovsky leon@kernel.org To: Sumit Semwal sumit.semwal@linaro.org To: Christian König christian.koenig@amd.com To: Alex Williamson alex@shazbot.org To: Kevin Tian kevin.tian@intel.com To: Joerg Roedel joro@8bytes.org To: Will Deacon will@kernel.org To: Robin Murphy robin.murphy@arm.com
Signed-off-by: Leon Romanovsky leonro@nvidia.com --- Leon Romanovsky (4): dma-buf: Introduce revoke semantics vfio: Use dma-buf revoke semantics iommufd: Require DMABUF revoke semantics iommufd/selftest: Reuse dma-buf revoke semantics
drivers/dma-buf/dma-buf.c | 36 ++++++++++++++++++++++++++++++++---- drivers/iommu/iommufd/pages.c | 2 +- drivers/iommu/iommufd/selftest.c | 12 ++++-------- drivers/vfio/pci/vfio_pci_dmabuf.c | 27 ++++++--------------------- include/linux/dma-buf.h | 31 +++++++++++++++++++++++++++++++ 5 files changed, 74 insertions(+), 34 deletions(-) --- base-commit: 9ace4753a5202b02191d54e9fdf7f9e3d02b85eb change-id: 20251221-dmabuf-revoke-b90ef16e4236
Best regards, -- Leon Romanovsky leonro@nvidia.com
From: Leon Romanovsky leonro@nvidia.com
Add a dma-buf revoke mechanism that allows an exporter to explicitly invalidate ("kill") a shared buffer after it has been handed out to importers. Once revoked, all further CPU and device access is blocked, and importers consistently observe failure.
This requires both importers and exporters to honor the revoke contract. For importers, this means no page faults are delivered after the buffer is invalidated. For exporters, the dma-buf core prevents attaching new importers and remapping existing ones once revocation has occurred.
The proposed mechanism allows binding importers that do not require revoke support, and they shall continue using the existing .move_notify() API. However, importers that cannot handle page faults to remap buffers will fail to bind to exporters that do not support revoke.
Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/dma-buf/dma-buf.c | 36 ++++++++++++++++++++++++++++++++---- include/linux/dma-buf.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 63 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index edaa9e4ee4ae..4d31fba792ee 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -697,6 +697,9 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) if (WARN_ON(!exp_info->ops->pin != !exp_info->ops->unpin)) return ERR_PTR(-EINVAL);
+ if (WARN_ON(exp_info->revoke_semantics && exp_info->ops->pin)) + return ERR_PTR(-EINVAL); + if (!try_module_get(exp_info->owner)) return ERR_PTR(-ENOENT);
@@ -727,6 +730,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) dmabuf->cb_in.poll = dmabuf->cb_out.poll = &dmabuf->poll; dmabuf->cb_in.active = dmabuf->cb_out.active = 0; INIT_LIST_HEAD(&dmabuf->attachments); + dmabuf->revoke_semantics = exp_info->revoke_semantics;
if (!resv) { dmabuf->resv = (struct dma_resv *)&dmabuf[1]; @@ -948,8 +952,21 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, if (WARN_ON(!dmabuf || !dev)) return ERR_PTR(-EINVAL);
- if (WARN_ON(importer_ops && !importer_ops->move_notify)) - return ERR_PTR(-EINVAL); + if (dmabuf->invalidate) + return ERR_PTR(-ENODEV); + + if (importer_ops) { + if (WARN_ON(!importer_ops->move_notify && + !importer_ops->revoke_notify)) + return ERR_PTR(-EINVAL); + + if (WARN_ON(importer_ops->move_notify && + importer_ops->revoke_notify)) + return ERR_PTR(-EINVAL); + + if (!dmabuf->revoke_semantics && importer_ops->revoke_notify) + return ERR_PTR(-EINVAL); + }
attach = kzalloc(sizeof(*attach), GFP_KERNEL); if (!attach) @@ -1102,6 +1119,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, if (WARN_ON(!attach || !attach->dmabuf)) return ERR_PTR(-EINVAL);
+ if (attach->dmabuf->invalidate) + return ERR_PTR(-ENODEV); + dma_resv_assert_held(attach->dmabuf->resv);
if (dma_buf_pin_on_map(attach)) { @@ -1261,8 +1281,16 @@ void dma_buf_move_notify(struct dma_buf *dmabuf) dma_resv_assert_held(dmabuf->resv);
list_for_each_entry(attach, &dmabuf->attachments, node) - if (attach->importer_ops) - attach->importer_ops->move_notify(attach); + if (attach->importer_ops) { + if (attach->importer_ops->move_notify) + attach->importer_ops->move_notify(attach); + + if (attach->importer_ops->revoke_notify) + attach->importer_ops->revoke_notify(attach); + } + + if (dmabuf->revoke_semantics) + dmabuf->invalidate = true; } EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, "DMA_BUF");
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 0bc492090237..e198ee490151 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -23,6 +23,7 @@ #include <linux/dma-fence.h> #include <linux/wait.h> #include <linux/pci-p2pdma.h> +#include <linux/dma-resv.h>
struct device; struct dma_buf; @@ -441,6 +442,15 @@ struct dma_buf { struct dma_buf *dmabuf; } *sysfs_entry; #endif + /** + * @revoke_semantics: + * + * This exporter implements revoke semantics. + */ + bool revoke_semantics; + + /** @invalidate: this buffer was revoked and invalidated */ + bool invalidate; };
/** @@ -476,6 +486,18 @@ struct dma_buf_attach_ops { * point to the new location of the DMA-buf. */ void (*move_notify)(struct dma_buf_attachment *attach); + + /** + * @revoke_notify: [optional] notification that the DMA-buf is revoking + * + * If this callback is provided the importer will invildate the mappings. + * + * This callback is called with the lock of the reservation object + * associated with the dma_buf held. + * + * New mappings shouldn't be created after this callback returns. + */ + void (*revoke_notify)(struct dma_buf_attachment *attach); };
/** @@ -516,6 +538,7 @@ struct dma_buf_attachment { * @size: Size of the buffer - invariant over the lifetime of the buffer * @flags: mode flags for the file * @resv: reservation-object, NULL to allocate default one + * @revoke_semantics: support revoke semantics * @priv: Attach private data of allocator to this buffer * * This structure holds the information required to export the buffer. Used @@ -528,6 +551,7 @@ struct dma_buf_export_info { size_t size; int flags; struct dma_resv *resv; + bool revoke_semantics; void *priv; };
@@ -620,4 +644,11 @@ int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map); void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map); struct dma_buf *dma_buf_iter_begin(void); struct dma_buf *dma_buf_iter_next(struct dma_buf *dmbuf); + +static inline void dma_buf_mark_valid(struct dma_buf *dma_buf) +{ + dma_resv_assert_held(dma_buf->resv); + + dma_buf->invalidate = false; +} #endif /* __DMA_BUF_H__ */
From: Leon Romanovsky leonro@nvidia.com
Remove open-code variant of revoked semantics and reuse existing dma_buf_move_notify() and newly introduced dma_buf_mark_valid() primitives.
Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/vfio/pci/vfio_pci_dmabuf.c | 27 ++++++--------------------- 1 file changed, 6 insertions(+), 21 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c index d4d0f7d08c53..d953bd4cd118 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -17,20 +17,14 @@ struct vfio_pci_dma_buf { struct dma_buf_phys_vec *phys_vec; struct p2pdma_provider *provider; u32 nr_ranges; - u8 revoked : 1; };
static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, struct dma_buf_attachment *attachment) { - struct vfio_pci_dma_buf *priv = dmabuf->priv; - if (!attachment->peer2peer) return -EOPNOTSUPP;
- if (priv->revoked) - return -ENODEV; - return 0; }
@@ -42,9 +36,6 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
dma_resv_assert_held(priv->dmabuf->resv);
- if (priv->revoked) - return ERR_PTR(-ENODEV); - return dma_buf_phys_vec_to_sgt(attachment, priv->provider, priv->phys_vec, priv->nr_ranges, priv->size, dir); @@ -90,8 +81,6 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops = { * * If this function succeeds the following are true: * - There is one physical range and it is pointing to MMIO - * - When move_notify is called it means revoke, not move, vfio_dma_buf_map - * will fail if it is currently revoked */ int vfio_pci_dma_buf_iommufd_map(struct dma_buf_attachment *attachment, struct dma_buf_phys_vec *phys) @@ -104,9 +93,6 @@ int vfio_pci_dma_buf_iommufd_map(struct dma_buf_attachment *attachment, return -EOPNOTSUPP;
priv = attachment->dmabuf->priv; - if (priv->revoked) - return -ENODEV; - /* More than one range to iommufd will require proper DMABUF support */ if (priv->nr_ranges != 1) return -EOPNOTSUPP; @@ -268,6 +254,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, exp_info.size = priv->size; exp_info.flags = get_dma_buf.open_flags; exp_info.priv = priv; + exp_info.revoke_semantics = true;
priv->dmabuf = dma_buf_export(&exp_info); if (IS_ERR(priv->dmabuf)) { @@ -279,7 +266,6 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, INIT_LIST_HEAD(&priv->dmabufs_elm); down_write(&vdev->memory_lock); dma_resv_lock(priv->dmabuf->resv, NULL); - priv->revoked = !__vfio_pci_memory_enabled(vdev); list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); dma_resv_unlock(priv->dmabuf->resv); up_write(&vdev->memory_lock); @@ -317,12 +303,12 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) if (!get_file_active(&priv->dmabuf->file)) continue;
- if (priv->revoked != revoked) { - dma_resv_lock(priv->dmabuf->resv, NULL); - priv->revoked = revoked; + dma_resv_lock(priv->dmabuf->resv, NULL); + if (revoked) dma_buf_move_notify(priv->dmabuf); - dma_resv_unlock(priv->dmabuf->resv); - } + else + dma_buf_mark_valid(priv->dmabuf); + dma_resv_unlock(priv->dmabuf->resv); fput(priv->dmabuf->file); } } @@ -340,7 +326,6 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) dma_resv_lock(priv->dmabuf->resv, NULL); list_del_init(&priv->dmabufs_elm); priv->vdev = NULL; - priv->revoked = true; dma_buf_move_notify(priv->dmabuf); dma_resv_unlock(priv->dmabuf->resv); vfio_device_put_registration(&vdev->vdev);
From: Leon Romanovsky leonro@nvidia.com
IOMMUFD does not support page fault handling, and after a call to .move_notify() all mappings become invalid. Ensure that the IOMMUFD DMABUF importer is bound to a revoke‑aware DMABUF exporter (for example, VFIO).
Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/iommu/iommufd/pages.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c index dbe51ecb9a20..a233def71be0 100644 --- a/drivers/iommu/iommufd/pages.c +++ b/drivers/iommu/iommufd/pages.c @@ -1451,7 +1451,7 @@ static void iopt_revoke_notify(struct dma_buf_attachment *attach)
static struct dma_buf_attach_ops iopt_dmabuf_attach_revoke_ops = { .allow_peer2peer = true, - .move_notify = iopt_revoke_notify, + .revoke_notify = iopt_revoke_notify, };
/*
From: Leon Romanovsky leonro@nvidia.com
Test iommufd_test_dmabuf_revoke() with dma-buf revoke primitives.
Signed-off-by: Leon Romanovsky leonro@nvidia.com --- drivers/iommu/iommufd/selftest.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 550ff36dec3a..523dfac44ff8 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -1958,7 +1958,6 @@ void iommufd_selftest_destroy(struct iommufd_object *obj) struct iommufd_test_dma_buf { void *memory; size_t length; - bool revoked; };
static int iommufd_test_dma_buf_attach(struct dma_buf *dmabuf, @@ -2011,9 +2010,6 @@ int iommufd_test_dma_buf_iommufd_map(struct dma_buf_attachment *attachment, if (attachment->dmabuf->ops != &iommufd_test_dmabuf_ops) return -EOPNOTSUPP;
- if (priv->revoked) - return -ENODEV; - phys->paddr = virt_to_phys(priv->memory); phys->len = priv->length; return 0; @@ -2065,7 +2061,6 @@ static int iommufd_test_dmabuf_get(struct iommufd_ucmd *ucmd, static int iommufd_test_dmabuf_revoke(struct iommufd_ucmd *ucmd, int fd, bool revoked) { - struct iommufd_test_dma_buf *priv; struct dma_buf *dmabuf; int rc = 0;
@@ -2078,10 +2073,11 @@ static int iommufd_test_dmabuf_revoke(struct iommufd_ucmd *ucmd, int fd, goto err_put; }
- priv = dmabuf->priv; dma_resv_lock(dmabuf->resv, NULL); - priv->revoked = revoked; - dma_buf_move_notify(dmabuf); + if (revoked) + dma_buf_move_notify(dmabuf); + else + dma_buf_mark_valid(dmabuf); dma_resv_unlock(dmabuf->resv);
err_put:
On 1/11/26 11:37, Leon Romanovsky wrote:
This series implements a dma-buf “revoke” mechanism: to allow a dma-buf exporter to explicitly invalidate (“kill”) a shared buffer after it has been distributed to importers, so that further CPU and device access is prevented and importers reliably observe failure.
We already have that. This is what the move_notify is all about.
Today, dma-buf effectively provides “if you have the fd, you can keep using the memory indefinitely.” That assumption breaks down when an exporter must reclaim, reset, evict, or otherwise retire backing memory after it has been shared. Concrete cases include GPU reset and recovery where old allocations become unsafe to access, memory eviction/overcommit where backing storage must be withdrawn, and security or isolation situations where continued access must be prevented. While drivers can sometimes approximate this with exporter-specific fencing and policy, there is no core dma-buf state transition that communicates “this buffer is no longer valid; fail access” across all access paths.
It's not correct that there is no DMA-buf handling for this use case.
The change in this series is to introduce a core “revoked” state on the dma-buf object and a corresponding exporter-triggered revoke operation. Once a dma-buf is revoked, new access paths are blocked so that attempts to DMA-map, vmap, or mmap the buffer fail in a consistent way.
In addition, the series aims to invalidate existing access as much as the kernel allows: device mappings are torn down where possible so devices and IOMMUs cannot continue DMA.
The semantics are intentionally simple: revoke is a one-way, permanent transition for the lifetime of that dma-buf instance.
From a compatibility perspective, users that never invoke revoke are unaffected, and exporters that adopt it gain a core-supported enforcement mechanism rather than relying on ad hoc driver behavior. The intent is to keep the interface minimal and avoid imposing policy; the series provides the mechanism to terminate access, with policy remaining in the exporter and higher-level components.
As far as I can see that patch set is completely superfluous.
The move_notify mechanism has been implemented exactly to cover this use case and is in use for a couple of years now.
What exactly is missing?
Regards, Christian.
BTW, see this megathread [1] for additional context. Ironically, it was posted exactly one year ago.
[1] https://lore.kernel.org/all/20250107142719.179636-2-yilun.xu@linux.intel.com...
Thanks
Cc: linux-rdma@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: kvm@vger.kernel.org Cc: iommu@lists.linux.dev To: Jason Gunthorpe jgg@ziepe.ca To: Leon Romanovsky leon@kernel.org To: Sumit Semwal sumit.semwal@linaro.org To: Christian König christian.koenig@amd.com To: Alex Williamson alex@shazbot.org To: Kevin Tian kevin.tian@intel.com To: Joerg Roedel joro@8bytes.org To: Will Deacon will@kernel.org To: Robin Murphy robin.murphy@arm.com
Signed-off-by: Leon Romanovsky leonro@nvidia.com
Leon Romanovsky (4): dma-buf: Introduce revoke semantics vfio: Use dma-buf revoke semantics iommufd: Require DMABUF revoke semantics iommufd/selftest: Reuse dma-buf revoke semantics
drivers/dma-buf/dma-buf.c | 36 ++++++++++++++++++++++++++++++++---- drivers/iommu/iommufd/pages.c | 2 +- drivers/iommu/iommufd/selftest.c | 12 ++++-------- drivers/vfio/pci/vfio_pci_dmabuf.c | 27 ++++++--------------------- include/linux/dma-buf.h | 31 +++++++++++++++++++++++++++++++ 5 files changed, 74 insertions(+), 34 deletions(-)
base-commit: 9ace4753a5202b02191d54e9fdf7f9e3d02b85eb change-id: 20251221-dmabuf-revoke-b90ef16e4236
Best regards,
Leon Romanovsky leonro@nvidia.com
On Mon, Jan 12, 2026 at 11:04:38AM +0100, Christian König wrote:
On 1/11/26 11:37, Leon Romanovsky wrote:
This series implements a dma-buf “revoke” mechanism: to allow a dma-buf exporter to explicitly invalidate (“kill”) a shared buffer after it has been distributed to importers, so that further CPU and device access is prevented and importers reliably observe failure.
We already have that. This is what the move_notify is all about.
Today, dma-buf effectively provides “if you have the fd, you can keep using the memory indefinitely.” That assumption breaks down when an exporter must reclaim, reset, evict, or otherwise retire backing memory after it has been shared. Concrete cases include GPU reset and recovery where old allocations become unsafe to access, memory eviction/overcommit where backing storage must be withdrawn, and security or isolation situations where continued access must be prevented. While drivers can sometimes approximate this with exporter-specific fencing and policy, there is no core dma-buf state transition that communicates “this buffer is no longer valid; fail access” across all access paths.
It's not correct that there is no DMA-buf handling for this use case.
The change in this series is to introduce a core “revoked” state on the dma-buf object and a corresponding exporter-triggered revoke operation. Once a dma-buf is revoked, new access paths are blocked so that attempts to DMA-map, vmap, or mmap the buffer fail in a consistent way.
In addition, the series aims to invalidate existing access as much as the kernel allows: device mappings are torn down where possible so devices and IOMMUs cannot continue DMA.
The semantics are intentionally simple: revoke is a one-way, permanent transition for the lifetime of that dma-buf instance.
From a compatibility perspective, users that never invoke revoke are unaffected, and exporters that adopt it gain a core-supported enforcement mechanism rather than relying on ad hoc driver behavior. The intent is to keep the interface minimal and avoid imposing policy; the series provides the mechanism to terminate access, with policy remaining in the exporter and higher-level components.
As far as I can see that patch set is completely superfluous.
The move_notify mechanism has been implemented exactly to cover this use case and is in use for a couple of years now.
What exactly is missing?
From what I can tell, the missing piece is what happens after .move_notify() is called. According to the documentation, the exporter remains valid, and the importer is expected to recreate all mappings.
include/linux/dma-buf.h: 471 * Mappings stay valid and are not directly affected by this callback. 472 * But the DMA-buf can now be in a different physical location, so all 473 * mappings should be destroyed and re-created as soon as possible. 474 * 475 * New mappings can be created after this callback returns, and will 476 * point to the new location of the DMA-buf.
Call to dma_buf_move_notify() does not prevent new attachments to that exporter, while "revoke" does. In the current code, the importer is not aware that the exporter no longer exists and will continue calling dma_buf_map_attachment().
In summary, the current implementation allows a single .attach() check but permits multiple .map_dma_buf() calls. With "revoke", we gain the ability to block any subsequent .map_dma_buf() operations.
Main use case is VFIO as exporter and IOMMUFD as importer.
Thanks
Regards, Christian.
BTW, see this megathread [1] for additional context. Ironically, it was posted exactly one year ago.
[1] https://lore.kernel.org/all/20250107142719.179636-2-yilun.xu@linux.intel.com...
Thanks
Cc: linux-rdma@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: kvm@vger.kernel.org Cc: iommu@lists.linux.dev To: Jason Gunthorpe jgg@ziepe.ca To: Leon Romanovsky leon@kernel.org To: Sumit Semwal sumit.semwal@linaro.org To: Christian König christian.koenig@amd.com To: Alex Williamson alex@shazbot.org To: Kevin Tian kevin.tian@intel.com To: Joerg Roedel joro@8bytes.org To: Will Deacon will@kernel.org To: Robin Murphy robin.murphy@arm.com
Signed-off-by: Leon Romanovsky leonro@nvidia.com
Leon Romanovsky (4): dma-buf: Introduce revoke semantics vfio: Use dma-buf revoke semantics iommufd: Require DMABUF revoke semantics iommufd/selftest: Reuse dma-buf revoke semantics
drivers/dma-buf/dma-buf.c | 36 ++++++++++++++++++++++++++++++++---- drivers/iommu/iommufd/pages.c | 2 +- drivers/iommu/iommufd/selftest.c | 12 ++++-------- drivers/vfio/pci/vfio_pci_dmabuf.c | 27 ++++++--------------------- include/linux/dma-buf.h | 31 +++++++++++++++++++++++++++++++ 5 files changed, 74 insertions(+), 34 deletions(-)
base-commit: 9ace4753a5202b02191d54e9fdf7f9e3d02b85eb change-id: 20251221-dmabuf-revoke-b90ef16e4236
Best regards,
Leon Romanovsky leonro@nvidia.com
On 1/12/26 13:19, Leon Romanovsky wrote:
On Mon, Jan 12, 2026 at 11:04:38AM +0100, Christian König wrote:
On 1/11/26 11:37, Leon Romanovsky wrote:
This series implements a dma-buf “revoke” mechanism: to allow a dma-buf exporter to explicitly invalidate (“kill”) a shared buffer after it has been distributed to importers, so that further CPU and device access is prevented and importers reliably observe failure.
We already have that. This is what the move_notify is all about.
Today, dma-buf effectively provides “if you have the fd, you can keep using the memory indefinitely.” That assumption breaks down when an exporter must reclaim, reset, evict, or otherwise retire backing memory after it has been shared. Concrete cases include GPU reset and recovery where old allocations become unsafe to access, memory eviction/overcommit where backing storage must be withdrawn, and security or isolation situations where continued access must be prevented. While drivers can sometimes approximate this with exporter-specific fencing and policy, there is no core dma-buf state transition that communicates “this buffer is no longer valid; fail access” across all access paths.
It's not correct that there is no DMA-buf handling for this use case.
The change in this series is to introduce a core “revoked” state on the dma-buf object and a corresponding exporter-triggered revoke operation. Once a dma-buf is revoked, new access paths are blocked so that attempts to DMA-map, vmap, or mmap the buffer fail in a consistent way.
In addition, the series aims to invalidate existing access as much as the kernel allows: device mappings are torn down where possible so devices and IOMMUs cannot continue DMA.
The semantics are intentionally simple: revoke is a one-way, permanent transition for the lifetime of that dma-buf instance.
From a compatibility perspective, users that never invoke revoke are unaffected, and exporters that adopt it gain a core-supported enforcement mechanism rather than relying on ad hoc driver behavior. The intent is to keep the interface minimal and avoid imposing policy; the series provides the mechanism to terminate access, with policy remaining in the exporter and higher-level components.
As far as I can see that patch set is completely superfluous.
The move_notify mechanism has been implemented exactly to cover this use case and is in use for a couple of years now.
What exactly is missing?
From what I can tell, the missing piece is what happens after .move_notify() is called. According to the documentation, the exporter remains valid, and the importer is expected to recreate all mappings.
include/linux/dma-buf.h: 471 * Mappings stay valid and are not directly affected by this callback. 472 * But the DMA-buf can now be in a different physical location, so all 473 * mappings should be destroyed and re-created as soon as possible. 474 * 475 * New mappings can be created after this callback returns, and will 476 * point to the new location of the DMA-buf.
Call to dma_buf_move_notify() does not prevent new attachments to that exporter, while "revoke" does. In the current code, the importer is not aware that the exporter no longer exists and will continue calling dma_buf_map_attachment().
Yeah and that is perfectly intentional.
In summary, the current implementation allows a single .attach() check but permits multiple .map_dma_buf() calls. With "revoke", we gain the ability to block any subsequent .map_dma_buf() operations.
Clear NAK to that plan. This is not something DMA-buf should need to deal with and as far as I can see is incompatible with the UAPI.
If a DMA-buf can no longer be attached or mapped then the relevant callbacks just need to return an error code.
Existing mappings can be invalidated with the move_notify callback and that functionality should be sufficient to prevent importers from accessing the backing store.
Existing attachments should stay around until the importer drops their usage.
In other words the exporter can't force an importer to drop their attachments, that would be a violation of the UAPI.
Regards, Christian.
Main use case is VFIO as exporter and IOMMUFD as importer.
Thanks
Regards, Christian.
BTW, see this megathread [1] for additional context. Ironically, it was posted exactly one year ago.
[1] https://lore.kernel.org/all/20250107142719.179636-2-yilun.xu@linux.intel.com...
Thanks
Cc: linux-rdma@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: kvm@vger.kernel.org Cc: iommu@lists.linux.dev To: Jason Gunthorpe jgg@ziepe.ca To: Leon Romanovsky leon@kernel.org To: Sumit Semwal sumit.semwal@linaro.org To: Christian König christian.koenig@amd.com To: Alex Williamson alex@shazbot.org To: Kevin Tian kevin.tian@intel.com To: Joerg Roedel joro@8bytes.org To: Will Deacon will@kernel.org To: Robin Murphy robin.murphy@arm.com
Signed-off-by: Leon Romanovsky leonro@nvidia.com
Leon Romanovsky (4): dma-buf: Introduce revoke semantics vfio: Use dma-buf revoke semantics iommufd: Require DMABUF revoke semantics iommufd/selftest: Reuse dma-buf revoke semantics
drivers/dma-buf/dma-buf.c | 36 ++++++++++++++++++++++++++++++++---- drivers/iommu/iommufd/pages.c | 2 +- drivers/iommu/iommufd/selftest.c | 12 ++++-------- drivers/vfio/pci/vfio_pci_dmabuf.c | 27 ++++++--------------------- include/linux/dma-buf.h | 31 +++++++++++++++++++++++++++++++ 5 files changed, 74 insertions(+), 34 deletions(-)
base-commit: 9ace4753a5202b02191d54e9fdf7f9e3d02b85eb change-id: 20251221-dmabuf-revoke-b90ef16e4236
Best regards,
Leon Romanovsky leonro@nvidia.com
On Mon, Jan 12, 2026 at 01:57:25PM +0100, Christian König wrote:
Clear NAK to that plan. This is not something DMA-buf should need to deal with and as far as I can see is incompatible with the UAPI.
We had this discussion with Simona and you a while back and there was a pretty clear direction we needed to add a revoke to sit inbetween pin and move. I think Leon has no quite got the "dmabuf lingo" down right to explain this.
https://lore.kernel.org/dri-devel/Z4Z4NKqVG2Vbv98Q@phenom.ffwll.local/
Since you mention pin here, I think that's another aspect of the revocable vs dynamic question. Dynamic buffers are expected to sometimes just move around for no reason, and importers must be able to cope.
For recovable exporters/importers I'd expect that movement is not happening, meaning it's pinned until the single terminal revocation. And maybe I read the kvm stuff wrong, but it reads more like the latter to me when crawling through the pfn code.
The issue is that DMABUF only offers two attachment options today, pin and move. iommufd/kvm can implement pin, but not move because they don't support faulting.
vfio and others don't need move with faulting but they do need pin with a terminal, emergency, revocation.
The purpose of revoke is to add a new negotiated attachment mode between exporter and importer that behaves the same as pin up until the user does something catastrophic (like ubind a driver) then a revoke invalidation is used to clean everything up safely.
You are right that the existing move_notify already meets this semantic, and today VFIO exporter, RDMA ODP importer even implement this. Upon VFIO revoke move_notify() will invalidate and map() will fail. RDMA ODP then HW fails all faults.
The problem revoke is designed to solve is that many importers have hardware that can either be DMA'ing or failing. There is no fault mechanims that can be used to implement the full "move around for no reason" semantics that are implied by move_notify.
Thus they can't implement move_notify!
Revoke allows this less capable HW to still be usable with exporters, so long as exporters promise only to issue an invalidation for a "single terminal revocation". Which does nicely match the needs of exporters which are primarily pin based.
IOW this is an enhancement to pin modes to add a terminal error case invalidation to pinned attachments.
It is not intended to be UAPI changing, and Leon is not trying to say that importers have to drop their attachment. The attachment just becomes permanently non-present.
Jason
On Mon, Jan 12, 2026 at 10:14:40AM -0400, Jason Gunthorpe wrote:
On Mon, Jan 12, 2026 at 01:57:25PM +0100, Christian König wrote:
Clear NAK to that plan. This is not something DMA-buf should need to deal with and as far as I can see is incompatible with the UAPI.
We had this discussion with Simona and you a while back and there was a pretty clear direction we needed to add a revoke to sit inbetween pin and move. I think Leon has no quite got the "dmabuf lingo" down right to explain this.
https://lore.kernel.org/dri-devel/Z4Z4NKqVG2Vbv98Q@phenom.ffwll.local/
<...>
It is not intended to be UAPI changing, and Leon is not trying to say that importers have to drop their attachment. The attachment just becomes permanently non-present.
Leon also ensures that no UAPI semantic changes are introduced here; the existing interface is simply extended.
Thanks
Jason
On 1/12/26 15:14, Jason Gunthorpe wrote:
On Mon, Jan 12, 2026 at 01:57:25PM +0100, Christian König wrote:
Clear NAK to that plan. This is not something DMA-buf should need to deal with and as far as I can see is incompatible with the UAPI.
We had this discussion with Simona and you a while back and there was a pretty clear direction we needed to add a revoke to sit inbetween pin and move. I think Leon has no quite got the "dmabuf lingo" down right to explain this.
I was already wondering why this was clearly not what we have discussed before.
https://lore.kernel.org/dri-devel/Z4Z4NKqVG2Vbv98Q@phenom.ffwll.local/
Since you mention pin here, I think that's another aspect of the revocable vs dynamic question. Dynamic buffers are expected to sometimes just move around for no reason, and importers must be able to cope.
For recovable exporters/importers I'd expect that movement is not happening, meaning it's pinned until the single terminal revocation. And maybe I read the kvm stuff wrong, but it reads more like the latter to me when crawling through the pfn code.
The issue is that DMABUF only offers two attachment options today, pin and move. iommufd/kvm can implement pin, but not move because they don't support faulting.
vfio and others don't need move with faulting but they do need pin with a terminal, emergency, revocation.
Yeah, I know that this is confusing. But that use case is already supported and we just need to properly document things.
The move_notify callback can be called even after pin() in the case of PCIe hotplug for example.
We could potentially rename the callback to something like invalidate_mappings.
And yes, I know that we had a few issues with that because we didn't properly documented things...
The purpose of revoke is to add a new negotiated attachment mode between exporter and importer that behaves the same as pin up until the user does something catastrophic (like ubind a driver) then a revoke invalidation is used to clean everything up safely.
With or with pin() you need to guarantee to the importer that the DMA address you gave out stay valid until the importer had a chance to free up it's mappings.
It is intentionally done this way to properly support PCIe hot plug because even when a device is gone the address space can't be re-used until all importers stated that they stopped their DMA.
You are right that the existing move_notify already meets this semantic, and today VFIO exporter, RDMA ODP importer even implement this. Upon VFIO revoke move_notify() will invalidate and map() will fail. RDMA ODP then HW fails all faults.
The problem revoke is designed to solve is that many importers have hardware that can either be DMA'ing or failing. There is no fault mechanims that can be used to implement the full "move around for no reason" semantics that are implied by move_notify.
In this case just call dma_buf_pin(). We already support that approach for RDMA devices which can't do ODP.
Regards, Christian.
Thus they can't implement move_notify!
Revoke allows this less capable HW to still be usable with exporters, so long as exporters promise only to issue an invalidation for a "single terminal revocation". Which does nicely match the needs of exporters which are primarily pin based.
IOW this is an enhancement to pin modes to add a terminal error case invalidation to pinned attachments.
It is not intended to be UAPI changing, and Leon is not trying to say that importers have to drop their attachment. The attachment just becomes permanently non-present.
Jason
On Mon, Jan 12, 2026 at 03:56:32PM +0100, Christian König wrote:
The problem revoke is designed to solve is that many importers have hardware that can either be DMA'ing or failing. There is no fault mechanims that can be used to implement the full "move around for no reason" semantics that are implied by move_notify.
In this case just call dma_buf_pin(). We already support that approach for RDMA devices which can't do ODP.
That alone isn't good enough - the patch adding the non-ODP support also contained this:
static void ib_umem_dmabuf_unsupported_move_notify(struct dma_buf_attachment *attach) { struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
ibdev_warn_ratelimited(umem_dmabuf->umem.ibdev, "Invalidate callback should not be called when memory is pinned\n"); }
static struct dma_buf_attach_ops ib_umem_dmabuf_attach_pinned_ops = { .allow_peer2peer = true, .move_notify = ib_umem_dmabuf_unsupported_move_notify, };
So we can't just allow it to attach to exporters that are going to start calling move_notify while pinned.
Looking around I don't see anyone else doing something like this, and reading your remarks I think EFA guys got it wrong. So I'm wondering if this should not have been allowed. Unfortunately 5 years later I'm pretty sure it is being used in places where we don't have HW support to invalidate at all, and it is now well established uAPI that we can't just break.
Which is why we are coming to negotiation because at least the above isn't going to work if move_notify is called for revoke reasons, and we'd like to block attaching exporters that need revoke for the above.
So, would you be happier with this if we documented that move_notify can be called for pinned importers for revoke purposes and figure out something to mark the above as special so exporters can fail pin if they are going to call move_notify?
Then this series would transform into documentation, making VFIO accept pin and continue to call move_notify as it does right now, and some logic to reject the RDMA non-ODP importer.
Jason
On 1/12/26 16:35, Jason Gunthorpe wrote:
On Mon, Jan 12, 2026 at 03:56:32PM +0100, Christian König wrote:
The problem revoke is designed to solve is that many importers have hardware that can either be DMA'ing or failing. There is no fault mechanims that can be used to implement the full "move around for no reason" semantics that are implied by move_notify.
In this case just call dma_buf_pin(). We already support that approach for RDMA devices which can't do ODP.
That alone isn't good enough - the patch adding the non-ODP support also contained this:
static void ib_umem_dmabuf_unsupported_move_notify(struct dma_buf_attachment *attach) { struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
ibdev_warn_ratelimited(umem_dmabuf->umem.ibdev, "Invalidate callback should not be called when memory is pinned\n"); }
Yeah, I know. That's what I meant we have to better document this.
static struct dma_buf_attach_ops ib_umem_dmabuf_attach_pinned_ops = { .allow_peer2peer = true, .move_notify = ib_umem_dmabuf_unsupported_move_notify, };
So we can't just allow it to attach to exporters that are going to start calling move_notify while pinned.
The point is exporters are already doing this.
Looking around I don't see anyone else doing something like this, and reading your remarks I think EFA guys got it wrong. So I'm wondering if this should not have been allowed. Unfortunately 5 years later I'm pretty sure it is being used in places where we don't have HW support to invalidate at all, and it is now well established uAPI that we can't just break.
Which is why we are coming to negotiation because at least the above isn't going to work if move_notify is called for revoke reasons, and we'd like to block attaching exporters that need revoke for the above.
Ah, yes that makes sense. This is clearly a new requirement.
So basically for PCIe hotplug was a rare event were we said we have some problems with non-ODP but we can live with that, but for this use case here it's more like a perfectly normal condition that userspace can trigger.
So the exporter wants to reject importers which can't handle a mapping invalidation while the BO is pinned, correct?
So, would you be happier with this if we documented that move_notify can be called for pinned importers for revoke purposes and figure out something to mark the above as special so exporters can fail pin if they are going to call move_notify?
That would work for me. I mean it is already current practice, we just never fully documented it.
Then this series would transform into documentation, making VFIO accept pin and continue to call move_notify as it does right now, and some logic to reject the RDMA non-ODP importer.
I think we just need to expose this with flags or similar from the importer side. As far as I know RDMA without ODP is currently the only one really needing this (except for cross device scanout, but that is special anyway).
Christian.
Jason
On Mon, Jan 12, 2026 at 05:12:36PM +0100, Christian König wrote:
static struct dma_buf_attach_ops ib_umem_dmabuf_attach_pinned_ops = { .allow_peer2peer = true, .move_notify = ib_umem_dmabuf_unsupported_move_notify, };
So we can't just allow it to attach to exporters that are going to start calling move_notify while pinned.
The point is exporters are already doing this.
:( So obviously this doesn't work fully correctly..
Which is why we are coming to negotiation because at least the above isn't going to work if move_notify is called for revoke reasons, and we'd like to block attaching exporters that need revoke for the above.
Ah, yes that makes sense. This is clearly a new requirement.
So basically for PCIe hotplug was a rare event were we said we have some problems with non-ODP but we can live with that, but for this use case here it's more like a perfectly normal condition that userspace can trigger.
Yes that seems to be exactly the case. I didn't know about the PCI RAS case until now :(
So the exporter wants to reject importers which can't handle a mapping invalidation while the BO is pinned, correct?
Yes. I think at a minimum exporters where it is a normal use case should block it so unpriv user space cannot trigger incorrect behavior / ignored invalidation. ie VFIO will trigger this based on unpriv user system calls.
I supposed we have to retain the PCI RAS misbehavior for now at least. It would probably be uAPI regression to start blocking some of the existing ones.
It also seems we should invest in the RDMA side to minimize where this is used.
So, would you be happier with this if we documented that move_notify can be called for pinned importers for revoke purposes and figure out something to mark the above as special so exporters can fail pin if they are going to call move_notify?
That would work for me. I mean it is already current practice, we just never fully documented it.
OK
Then this series would transform into documentation, making VFIO accept pin and continue to call move_notify as it does right now, and some logic to reject the RDMA non-ODP importer.
I think we just need to expose this with flags or similar from the importer side. As far as I know RDMA without ODP is currently the only one really needing this (except for cross device scanout, but that is special anyway).
I did not see any other importers with an obvious broken move_notify, so I hope this is right. Even iommufd has a working move_notify (disruptive, but working).
How do you feel about an enum in the ops:
+enum dma_buf_move_notify_level { + /* + * The importer can pause HW access while move_notify is running + * and cleanly handle dynamic changes to the DMA mapping without + * any disruption. + */ + DMA_BUF_MOVE_NOTIFY_FAULTING = 0, + /* + * The importer can stop HW access and disruptively fail any + * of its DMA activity. move_notify should only be called if the + * exporter is experiencing an unusual error and can accept + * that the importer will be disrupted. + */ + DMA_BUF_MOVE_NOTIFY_REVOKING, + /* + * move_notify is not supported at all and must not be called. Do not + * introduce new drivers using this, it has significant draw backs + * around PCI error handling and other cases. It has the most limited + * set of compatible importers. + */ + DMA_BUF_MOVE_NOTIFY_UNSUPPORTED, +}; + /** * struct dma_buf_attach_ops - importer operations for an attachment * @@ -457,6 +480,8 @@ struct dma_buf_attach_ops { */ bool allow_peer2peer;
+ enum dma_buf_move_notify_level move_notify_level; + /** * @move_notify: [optional] notification that the DMA-buf is moving *
Jason
linaro-mm-sig@lists.linaro.org