Linaro-mm-sig

linaro-mm-sig@lists.linaro.org

16 participants
3181 discussions

Re: [PATCH v9 10/11] vfio/pci: Add dma-buf export support for MMIO regions

by Leon Romanovsky

On Thu, Nov 20, 2025 at 05:04:13PM -0700, Alex Williamson wrote: > On Thu, 20 Nov 2025 11:28:29 +0200 > Leon Romanovsky <leon(a)kernel.org> wrote: > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > > index 142b84b3f225..51a3bcc26f8b 100644 > > --- a/drivers/vfio/pci/vfio_pci_core.c > > +++ b/drivers/vfio/pci/vfio_pci_core.c > ... > > @@ -2487,8 +2500,11 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, > > > > err_undo: > > list_for_each_entry_from_reverse(vdev, &dev_set->device_list, > > - vdev.dev_set_list) > > + vdev.dev_set_list) { > > + if (__vfio_pci_memory_enabled(vdev)) > > + vfio_pci_dma_buf_move(vdev, false); > > up_write(&vdev->memory_lock); > > + } > > I ran into a bug here. In the hot reset path we can have dev_sets > where one or more devices are not opened by the user. The vconfig > buffer for the device is established on open. However: > > bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev) > { > struct pci_dev *pdev = vdev->pdev; > u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]); > ... > > Leads to a NULL pointer dereference. > > I think the most straightforward fix is simply to test the open_count > on the vfio_device, which is also protected by the dev_set->lock that > we already hold here: > > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -2501,7 +2501,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, > err_undo: > list_for_each_entry_from_reverse(vdev, &dev_set->device_list, > vdev.dev_set_list) { > - if (__vfio_pci_memory_enabled(vdev)) > + if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev)) > vfio_pci_dma_buf_move(vdev, false); > up_write(&vdev->memory_lock); > } > > Any other suggestions? This should be the only reset path with this > nuance of affecting non-opened devices. Thanks, It seems right to me. Thanks > > Alex

3 days, 7 hours

Re: [PATCH v2] dma-buf: system_heap: use larger contiguous mappings instead of per-page mmap

by Sumit Semwal

Hi Barry, On Fri, 21 Nov 2025 at 06:54, Barry Song <21cnbao(a)gmail.com> wrote: > > Hi Sumit, > > > > > Using the micro-benchmark below, we see that mmap becomes > > 3.5X faster: > > > Marcin pointed out to me off-tree that it is actually 35x faster, > not 3.5x faster. Sorry for my poor math. I assume you can fix this > when merging it? Sure, I corrected this, and is merged to drm-misc-next Thanks, Sumit. > > > > > W/ patch: > > > > ~ # ./a.out > > mmap 512MB took 200266.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 198151.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 197069.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 196781.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 198102.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 195552.000 us, verify OK > > > > W/o patch: > > > > ~ # ./a.out > > mmap 512MB took 6987470.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 6970739.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 6984383.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 6971311.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 6991680.000 us, verify OK > > > Thanks > Barry

3 days, 9 hours

Re: [PATCH v9 10/11] vfio/pci: Add dma-buf export support for MMIO regions

by Jason Gunthorpe

On Thu, Nov 20, 2025 at 05:04:13PM -0700, Alex Williamson wrote: > @@ -2501,7 +2501,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, > err_undo: > list_for_each_entry_from_reverse(vdev, &dev_set->device_list, > vdev.dev_set_list) { > - if (__vfio_pci_memory_enabled(vdev)) > + if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev)) > vfio_pci_dma_buf_move(vdev, false); > up_write(&vdev->memory_lock); > } > > Any other suggestions? This should be the only reset path with this > nuance of affecting non-opened devices. Thanks, Seems reasonable, but should it be in __vfio_pci_memory_enabled() just to be robust? Jason

3 days, 15 hours

Re: [PATCH] drm/xe: Fix memory leak when handling pagefault vma

by Thomas Hellström

On Thu, 2025-11-20 at 18:14 +0200, Mika Kuoppala wrote: > When the pagefault handling code was moved to a new file, an extra > drm_exec_init() was added to the VMA path. This call is unnecessary > because > xe_validation_ctx_init() already performs a drm_exec_init(), > resulting in a > memory leak reported by kmemleak. > > Remove the redundant drm_exec_init() from the VMA pagefault handling > code. > > Fixes: fb544b844508 ("drm/xe: Implement xe_pagefault_queue_work") > Cc: Matthew Brost <matthew.brost(a)intel.com> > Cc: Stuart Summers <stuart.summers(a)intel.com> > Cc: Lucas De Marchi <lucas.demarchi(a)intel.com> > Cc: "Thomas Hellström" <thomas.hellstrom(a)linux.intel.com> > Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com> > Cc: Sumit Semwal <sumit.semwal(a)linaro.org> > Cc: "Christian König" <christian.koenig(a)amd.com> > Cc: intel-xe(a)lists.freedesktop.org > Cc: linux-media(a)vger.kernel.org > Cc: dri-devel(a)lists.freedesktop.org > Cc: linaro-mm-sig(a)lists.linaro.org > Signed-off-by: Mika Kuoppala <mika.kuoppala(a)linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> > --- > drivers/gpu/drm/xe/xe_pagefault.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/gpu/drm/xe/xe_pagefault.c > b/drivers/gpu/drm/xe/xe_pagefault.c > index fe3e40145012..afb06598b6e1 100644 > --- a/drivers/gpu/drm/xe/xe_pagefault.c > +++ b/drivers/gpu/drm/xe/xe_pagefault.c > @@ -102,7 +102,6 @@ static int xe_pagefault_handle_vma(struct xe_gt > *gt, struct xe_vma *vma, > > /* Lock VM and BOs dma-resv */ > xe_validation_ctx_init(&ctx, &vm->xe->val, &exec, (struct > xe_val_flags) {}); > - drm_exec_init(&exec, 0, 0); > drm_exec_until_all_locked(&exec) { > err = xe_pagefault_begin(&exec, vma, tile->mem.vram, > needs_vram == 1);

3 days, 23 hours

Re: [PATCH 5/9] iommufd: Allow MMIO pages in a batch

by Jason Gunthorpe

On Thu, Nov 20, 2025 at 07:59:19AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg(a)nvidia.com> > > Sent: Saturday, November 8, 2025 12:50 AM > > > > +enum batch_kind { > > + BATCH_CPU_MEMORY = 0, > > + BATCH_MMIO, > > +}; > > with 'CPU_MEMORY' (instead of plain 'MEMORY') implies future > support of 'DEV_MEMORY'? Maybe, but I don't have an immediate thought on this. CXL "MMIO" that is cachable is a thing but we can also label it as CPU_MEMORY. We might have something for CC shared/protected memory down the road. Thanks, Jason

4 days

Reasonable maximum signaling timeout for dma_fences

by Christian König

Hi everybody, we have documented here https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#dma-fence-cr… that dma_fence objects must signal in a reasonable amount of time, but at the same time note that drivers might have a different idea of what reasonable means. Recently I realized that this is actually not a good idea. Background is that the wall clock timeout means that for example the OOM killer might actually wait for this timeout to be able to terminate a process and reclaim the memory used. And this is just an example of how general kernel features might depend on that. Some drivers and fence implementations used 10 seconds and that raised complains by end users. So at least amdgpu recently switched to 2 second which triggered an internal discussion about it. This patch set here now adds a define to the dma_fence header which gives 2 seconds as reasonable amount of time. SW-sync is modified to always taint the kernel (since it doesn't has a timeout), VGEM is switched over to the new define and the scheduler gets a warning and taints the kernel if a driver uses a timeout longer than that. I have not much intention of actually committing the patches (maybe except the SW-sync one), but question is if 2 seconds are reasonable? Regards, Christian.

4 days

Re: [PATCH net-next v6 0/6] Add AF_XDP zero copy support

by patchwork-bot+netdevbpf＠kernel.org

Hello: This series was applied to netdev/net-next.git (main) by Paolo Abeni <pabeni(a)redhat.com>: On Tue, 18 Nov 2025 19:25:36 +0530 you wrote: > This series adds AF_XDP zero coppy support to icssg driver. > > Tests were performed on AM64x-EVM with xdpsock application [1]. > > A clear improvement is seen Transmit (txonly) and receive (rxdrop) > for 64 byte packets. 1500 byte test seems to be limited by line > rate (1G link) so no improvement seen there in packet rate > > [...] Here is the summary with links: - [net-next,v6,1/6] net: ti: icssg-prueth: Add functions to create and destroy Rx/Tx queues https://git.kernel.org/netdev/net-next/c/41dde7f1d013 - [net-next,v6,2/6] net: ti: icssg-prueth: Add XSK pool helpers https://git.kernel.org/netdev/net-next/c/7dfd7597911f - [net-next,v6,3/6] net: ti: icssg-prueth: Add AF_XDP zero copy for TX https://git.kernel.org/netdev/net-next/c/8756ef2eb078 - [net-next,v6,4/6] net: ti: icssg-prueth: Make emac_run_xdp function independent of page https://git.kernel.org/netdev/net-next/c/121133163c9f - [net-next,v6,5/6] net: ti: icssg-prueth: Add AF_XDP zero copy for RX https://git.kernel.org/netdev/net-next/c/7a64bb388df3 - [net-next,v6,6/6] net: ti: icssg-prueth: Enable zero copy in XDP features https://git.kernel.org/netdev/net-next/c/c6a1ec1870e6 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html

4 days

Re: [PATCH 02/18] dma-buf: protected fence ops by RCU v3

by Christian König

On 11/18/25 17:03, Tvrtko Ursulin wrote: >>>> @@ -448,13 +465,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence) >>>> static inline bool >>>> dma_fence_is_signaled(struct dma_fence *fence) >>>> { >>>> + const struct dma_fence_ops *ops; >>>> + >>>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) >>>> return true; >>>> - if (fence->ops->signaled && fence->ops->signaled(fence)) { >>>> + rcu_read_lock(); >>>> + ops = rcu_dereference(fence->ops); >>>> + if (ops->signaled && ops->signaled(fence)) { >>>> + rcu_read_unlock(); >>> >>> With the unlocked version two threads could race and one could make the fence->lock go away just around here, before the dma_fence_signal below will take it. It seems it is only safe to rcu_read_unlock before signaling if using the embedded fence (later in the series). Can you think of a downside to holding the rcu read lock to after signaling? that would make it safe I think. >> >> Well it's good to talk about it but I think that it is not necessary to protect the lock in this particular case. >> >> See the RCU protection is only for the fence->ops pointer, but the lock can be taken way after the fence is already signaled. >> >> That's why I came up with the patch to move the lock into the fence in the first place. > > Right. And you think there is nothing to gain with the option of keeping the rcu_read_unlock() to after signalling? Ie. why not plug a potential race if we can for no negative effect. I thought quite a bit over that, but at least of hand I can't come up with a reason why we should do this. The signaling path doesn't need the RCU read side lock as far as I can see. Regards, Christian. > > Regards, > > Tvrtko

4 days, 1 hour

[RFC PATCH 0/2] locking/ww_mutex, dma-buf/dma-resv: Improve detection of unheld locks

by Thomas Hellström

WW mutexes and dma-resv objects, which embed them, typically have a number of locks belocking to the same lock class. However code using them typically want to verify the locking on object granularity, not lock-class granularity. This series add ww_mutex functions to facilitate that, (patch 1) and utilizes these functions in the dma-resv lock checks. Thomas Hellström (2): kernel/locking/ww_mutex: Add per-lock lock-check helpers dma-buf/dma-resv: Improve the dma-resv lockdep checks include/linux/dma-resv.h | 7 +++++-- include/linux/ww_mutex.h | 18 ++++++++++++++++++ kernel/locking/mutex.c | 10 ++++++++++ 3 files changed, 33 insertions(+), 2 deletions(-) -- 2.51.1

4 days, 2 hours

[PATCH v8 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf

by Leon Romanovsky

Changelog: v8: * Fixed spelling errors in p2pdma documentation file. * Added vdev->pci_ops check for NULL in vfio_pci_core_feature_dma_buf(). * Simplified the nvgrace_get_dmabuf_phys() function. * Added extra check in pcim_p2pdma_provider() to catch missing call to pcim_p2pdma_init(). v7: https://patch.msgid.link/20251106-dmabuf-vfio-v7-0-2503bf390699@nvidia.com * Dropped restore_revoke flag and added vfio_pci_dma_buf_move to reverse loop. * Fixed spelling errors in documentation patch. * Rebased on top of v6.18-rc3. * Added include to stddef.h to vfio.h, to keep uapi header file independent. v6: https://patch.msgid.link/20251102-dmabuf-vfio-v6-0-d773cff0db9f@nvidia.com * Fixed wrong error check from pcim_p2pdma_init(). * Documented pcim_p2pdma_provider() function. * Improved commit messages. * Added VFIO DMA-BUF selftest, not sent yet. * Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf. * Fixed error unwind when dma_buf_fd() fails. * Document latest changes to p2pmem. * Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type. * Moved DMA mapping logic to DMA-BUF. * Removed types patch to avoid dependencies between subsystems. * Moved vfio_pci_dma_buf_move() in err_undo block. * Added nvgrace patch. v5: https://lore.kernel.org/all/cover.1760368250.git.leon@kernel.org * Rebased on top of v6.18-rc1. * Added more validation logic to make sure that DMA-BUF length doesn't overflow in various scenarios. * Hide kernel config from the users. * Fixed type conversion issue. DMA ranges are exposed with u64 length, but DMA-BUF uses "unsigned int" as a length for SG entries. * Added check to prevent from VFIO drivers which reports BAR size different from PCI, do not use DMA-BUF functionality. v4: https://lore.kernel.org/all/cover.1759070796.git.leon@kernel.org * Split pcim_p2pdma_provider() to two functions, one that initializes array of providers and another to return right provider pointer. v3: https://lore.kernel.org/all/cover.1758804980.git.leon@kernel.org * Changed pcim_p2pdma_enable() to be pcim_p2pdma_provider(). * Cache provider in vfio_pci_dma_buf struct instead of BAR index. * Removed misleading comment from pcim_p2pdma_provider(). * Moved MMIO check to be in pcim_p2pdma_provider(). v2: https://lore.kernel.org/all/cover.1757589589.git.leon@kernel.org/ * Added extra patch which adds new CONFIG, so next patches can reuse * it. * Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state" into the other patch. * Fixed revoke calls to be aligned with true->false semantics. * Extended p2pdma_providers to be per-BAR and not global to whole * device. * Fixed possible race between dmabuf states and revoke. * Moved revoke to PCI BAR zap block. v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org * Changed commit messages. * Reused DMA_ATTR_MMIO attribute. * Returned support for multiple DMA ranges per-dMABUF. v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com --------------------------------------------------------------------------- Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API" https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series. --------------------------------------------------------------------------- This series extends the VFIO PCI subsystem to support exporting MMIO regions from PCI device BARs as dma-buf objects, enabling safe sharing of non-struct page memory with controlled lifetime management. This allows RDMA and other subsystems to import dma-buf FDs and build them into memory regions for PCI P2P operations. The series supports a use case for SPDK where a NVMe device will be owned by SPDK through VFIO but interacting with a RDMA device. The RDMA device may directly access the NVMe CMB or directly manipulate the NVMe device's doorbell using PCI P2P. However, as a general mechanism, it can support many other scenarios with VFIO. This dmabuf approach can be usable by iommufd as well for generic and safe P2P mappings. In addition to the SPDK use-case mentioned above, the capability added in this patch series can also be useful when a buffer (located in device memory such as VRAM) needs to be shared between any two dGPU devices or instances (assuming one of them is bound to VFIO PCI) as long as they are P2P DMA compatible. The implementation provides a revocable attachment mechanism using dma-buf move operations. MMIO regions are normally pinned as BARs don't change physical addresses, but access is revoked when the VFIO device is closed or a PCI reset is issued. This ensures kernel self-defense against potentially hostile userspace. The series includes significant refactoring of the PCI P2PDMA subsystem to separate core P2P functionality from memory allocation features, making it more modular and suitable for VFIO use cases that don't need struct page support. ----------------------------------------------------------------------- The series is based originally on https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c… but heavily rewritten to be based on DMA physical API. ----------------------------------------------------------------------- The WIP branch can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=… Thanks --- Jason Gunthorpe (2): PCI/P2PDMA: Document DMABUF model vfio/nvgrace: Support get_dmabuf_phys Leon Romanovsky (7): PCI/P2PDMA: Separate the mmap() support from the core logic PCI/P2PDMA: Simplify bus address mapping API PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function dma-buf: provide phys_vec to scatter-gather mapping routine vfio/pci: Enable peer-to-peer DMA transactions by default vfio/pci: Add dma-buf export support for MMIO regions Vivek Kasireddy (2): vfio: Export vfio device get and put registration helpers vfio/pci: Share the core device pointer while invoking feature functions Documentation/driver-api/pci/p2pdma.rst | 95 +++++++--- block/blk-mq-dma.c | 2 +- drivers/dma-buf/dma-buf.c | 235 ++++++++++++++++++++++++ drivers/iommu/dma-iommu.c | 4 +- drivers/pci/p2pdma.c | 186 ++++++++++++++----- drivers/vfio/pci/Kconfig | 3 + drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/nvgrace-gpu/main.c | 56 ++++++ drivers/vfio/pci/vfio_pci.c | 5 + drivers/vfio/pci/vfio_pci_config.c | 22 ++- drivers/vfio/pci/vfio_pci_core.c | 53 ++++-- drivers/vfio/pci/vfio_pci_dmabuf.c | 315 ++++++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 23 +++ drivers/vfio/vfio_main.c | 2 + include/linux/dma-buf.h | 18 ++ include/linux/pci-p2pdma.h | 120 +++++++----- include/linux/vfio.h | 2 + include/linux/vfio_pci_core.h | 42 +++++ include/uapi/linux/vfio.h | 28 +++ kernel/dma/direct.c | 4 +- mm/hmm.c | 2 +- 21 files changed, 1078 insertions(+), 140 deletions(-) --- base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa change-id: 20251016-dmabuf-vfio-6cef732adf5a Best regards, -- Leon Romanovsky <leonro(a)nvidia.com>

4 days, 2 hours

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig