Hi all,
This series is based on previous RFCs/discussions:
Tech topic: https://lore.kernel.org/linux-iommu/20250918214425.2677057-1-amastro@fb.com/
RFCv1: https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
RFCv2: https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/
The background/rationale is covered in more detail in the RFC cover
letters. The TL;DR is:
The goal is to enable userspace driver designs that use VFIO to export
DMABUFs representing subsets of PCI device BARs, and "vend" those
buffers from a primary process to other subordinate processes by fd.
These processes then mmap() the buffers and their access to the device
is isolated to the exported ranges. This is an improvement on sharing
the VFIO device fd to subordinate processes, which would allow
unfettered access .
This is achieved by enabling mmap() of vfio-pci DMABUFs. Second, a
new ioctl()-based revocation mechanism is added to allow the primary
process to forcibly revoke access to previously-shared BAR spans, even
if the subordinate processes haven't cleanly exited.
(The related topic of safe delegation of iommufd control to the
subordinate processes is not addressed here, and is follow-up work.)
As well as isolation and revocation, another advantage to accessing a
BAR through a VMA backed by a DMABUF is that it's straightforward to
create the buffer with access attributes, such as write-combining.
Notes on patches
================
Feedback from the RFCs requested that, instead of creating
DMABUF-specific vm_ops and .fault paths, to go the whole way and
migrate the existing VFIO PCI BAR mmap() to be backed by a DMABUF too,
resulting in a common vm_ops and fault handler for mmap()s of both the
VFIO device and explicitly-exported DMABUFs. This has been done for
vfio-pci, but not sub-drivers (nvgrace-gpu's special-case mappings are
unchanged).
vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put
A bug fix to a related are, whose context is a depdency for later
patches.
vfio/pci: Add a helper to look up PFNs for DMABUFs
vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
The first is for a DMABUF VMA fault handler to determine
arbitrary-sized PFNs from ranges in DMABUF. Secondly, refactor
DMABUF export for use by the existing export feature and a new
helper that creates a DMABUF corresponding to a VFIO BAR mmap()
request.
vfio/pci: Convert BAR mmap() to use a DMABUF
The vfio-pci core mmap() creates a DMABUF with the helper, and the
vm_ops fault handler uses the other helper to resolve the fault.
Because this depends on DMABUF structs/code, CONFIG_VFIO_PCI_CORE
needs to depend on CONFIG_DMA_SHARED_BUFFER. The
CONFIG_VFIO_PCI_DMABUF still conditionally enables the export
support code.
NOTE: The user mmap()s a device fd, but the resulting VMA's vm_file
becomes that of the DMABUF which takes ownership of the device and
puts it on release. This maintains the existing behaviour of a VMA
keeping the VFIO device open.
BAR zapping then happens via the existing vfio_pci_dma_buf_move()
path, which now needs to unmap PTEs in the DMABUF's address_space.
vfio/pci: Provide a user-facing name for BAR mappings
There was a request for decent debug naming in /proc/<pid>/maps
etc. comparable to the existing VFIO names: since the VMAs are
DMABUFs, they have a "dmabuf:" prefix and can't be 100% identical
to before. This is a user-visible change, but this patch at least
now gives us extra info on the BDF & BAR being mapped.
vfio/pci: Clean up BAR zap and revocation
In general (see NOTE!) the vfio_pci_zap_bars() is now obsolete,
since it unmaps PTEs in the VFIO device address_space which is now
unused. This consolidates all calls (e.g. around reset) with the
neighbouring vfio_pci_dma_buf_move()s into new functions, to
revoke-zap/unrevoke.
NOTE: the nvgrace-gpu driver continues to use its own private
vm_ops, fault handler, etc. for its special memregions, and these
DO still add PTEs to the VFIO device address_space. So, a
temporary flag, vdev->bar_needs_zap, maintains the old behaviour
for this use. At least this patch's consolidation makes it easy
to remove the remaining zap when this need goes away.
A FIXME is added: if nvgrace-gpu is converted to DMABUFs, remove
the flag and final zap.
vfio/pci: Support mmap() of a VFIO DMABUF
Adds mmap() for a DMABUF fd exported from vfio-pci.
It was a goal to keep the VFIO device fd lifetime behaviour
unchanged with respect to the DMABUFs. An application can close
all device fds, and this will revoke/clean up all DMABUFs; no
mappings or other access can be performed now. When enabling
mmap() of the DMABUFs, this means access through the VMA is also
revoked. This complicates the fault handler because whilst the
DMABUF exists, it has no guarantee that the corresponding VFIO
device is still alive. Adds synchronisation ensuring the vdev is
available before vdev->memory_lock is touched.
(I decided against the alternative of preventing cleanup by holding
the VFIO device open if any DMABUFs exist, because it's both a
change of behaviour and less clean overall.)
I've added a chonky comment in place, happy to clarify more if you
have ideas.
vfio/pci: Permanently revoke a DMABUF on request
By weight, this is mostly a rename of revoked to an enum, status.
There are now 3 states for a buffer, usable and revoked
temporary/permanent. A new VFIO device ioctl is added,
VFIO_DEVICE_PCI_DMABUF_REVOKE, which passes a DMABUF (exported from
that device) and permanently revokes it. Thus a userspace driver
can guarantee any downstream consumers of a shared fd are prevented
from accessing a BAR range, and that range can be reused.
The code doing revocation in vfio_pci_dma_buf_move() is moved,
unchanged, to a common function for use by _move() and the new
ioctl path.
Q: I can't think of a good reason to temporarily revoke/unrevoke
buffers from userspace, so didn't add a 'flags' field to the ioctl
struct. Easy to add if people think it's worthwhile for future
use.
vfio/pci: Add mmap() attributes to DMABUF feature
Reserves bits [31:28] in vfio_device_feature_dma_buf to allow a
(CPU) mapping attribute to be specified for an exported set of
ranges. The default is the current UC, and a new flag can specify
CPU access as WC.
Q: I've taken 4 bits; the intention is for this field to be a
scalar not a bitmap (i.e. mutually-exclusive access properties).
Perhaps 4 is a bit too many?
Testing
=======
(The [RFC ONLY] userspace test program, for QEMU edu-plus, has been
dropped, but can be found in the GitHub branch below.)
This code has been tested in mapping DMABUFs of single/multiple
ranges, aliasing mmap()s, aliasing ranges across DMABUFs, vm_pgoff >
0, revocation, shutdown/cleanup scenarios, and hugepage mappings seem
to work correctly. I've lightly tested WC mappings also (by observing
resulting PTEs as having the correct attributes...). No regressions
observed on the VFIO selftests, or on our internal vfio-pci
applications.
End
===
This is based on -next (next-20260414 but will merge earlier), as it
depends on Leon's series "vfio: Wait for dma-buf invalidation to
complete":
https://lore.kernel.org/linux-iommu/20260205-nocturnal-poetic-chamois-f566a…
These commits are on GitHub, along with "[RFC ONLY] selftests: vfio: Add
standalone vfio_dmabuf_mmap_test":
https://github.com/metamev/linux/compare/next-20260414...metamev:linux:dev/…
Thanks for reading,
Matt
================================================================================
Change log:
v1:
- Cleanup of the common DMABUF-aware VMA vm_ops fault handler and
export code.
- Fixed a lot of races, particularly faults racing with DMABUF
cleanup (if the VFIO device fds close, for example).
- Added nicer human-readable names for VFIO mmap() VMAs
RFCv2: Respin based on the feedback/suggestions:
https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/
- Transform the existing VFIO BAR mmap path to also use DMABUFs
behind the scenes, and then simply share that code for
explicitly-mapped DMABUFs. Jason wanted to go that direction to
enable iommufd VFIO type 1 emulation to pick up a DMABUF for an IO
mapping.
- Revoke buffers using a VFIO device fd ioctl
RFCv1:
https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
Matt Evans (9):
vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put
vfio/pci: Add a helper to look up PFNs for DMABUFs
vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
vfio/pci: Convert BAR mmap() to use a DMABUF
vfio/pci: Provide a user-facing name for BAR mappings
vfio/pci: Clean up BAR zap and revocation
vfio/pci: Support mmap() of a VFIO DMABUF
vfio/pci: Permanently revoke a DMABUF on request
vfio/pci: Add mmap() attributes to DMABUF feature
drivers/vfio/pci/Kconfig | 3 +-
drivers/vfio/pci/Makefile | 3 +-
drivers/vfio/pci/nvgrace-gpu/main.c | 5 +
drivers/vfio/pci/vfio_pci_config.c | 30 +-
drivers/vfio/pci/vfio_pci_core.c | 224 ++++++++++---
drivers/vfio/pci/vfio_pci_dmabuf.c | 500 +++++++++++++++++++++++-----
drivers/vfio/pci/vfio_pci_priv.h | 49 ++-
include/linux/vfio_pci_core.h | 1 +
include/uapi/linux/vfio.h | 42 ++-
9 files changed, 690 insertions(+), 167 deletions(-)
--
2.47.3
Genuine Passports, Driver's licenses And Other Documents Of All Countries For Sale. Buy High Quality Data-based Registered Machine Read-able Scan-able Driver's Licenses, ID's, Passports, And Citizenship Documents. Buy Real Passports, Driver's License, ID Cards, Visas, USA Green Card, Fake Money. We are the best producers of genuine Data-based Registered High Quality Real/Fake Passport, Driver's Licenses, ID Card's And other Citizenship Documents. Buy High Quality Data-based Registered Machine Read-able Scan-able Driver's Licenses, ID's, Passports, And Citizenship Documents. Buy Real Passports, Driver's License, ID Cards, Visas, USA Green Card, Fake Money.
Email : jameswalkes0987(a)gmail.com
What's app:+237 677480749
Tell:+237 677480749
We are the best producers of genuine high quality fake documents.
Buy Fake Passport British(UK) For Sale Diplomatic Canadian False ID Cards Online United States(US) Fake ID Card Sell Driver's License.
Buy genuine Green card Training certificates, GCSE, A-levels, High School Diploma Certificates ,GMAT, MCAT, and LSAT examination Certificates and credit cards, school diplomas, school degrees all in an entirely new name issued and registered in the government's data-base system.
Welcome To David Wilson and Associates Network. Get A second Chance In Life with Wesley Hover and Associates , protect your privacy, build new credit history, by-pass criminal background checks, take back your freedom. It's a cruise, you have come to the right place for all your travel needs and guest what !!!! you're going to finally make your dream a reality. Let us help you plan the ideal trip for you if you would like to find out more about the services we have available. We are an Association responsible for the production of real genuine passports, Real Genuine Data-Base Registered Fake Passports and other Citizenship documents. I can guarantee you a new Identity starting from a clean new genuine Birth Certificate, ID card, Drivers License, Passports, Social security card with SSN, credit files can you imagine ??
WE DO OFFER LEGITIMATE'S SERVICE: .GET REGISTERED BRITISH PASSPORT. .REGISTERED CANADIAN PASSPORT. .REGISTERED FRENCH PASSPORT. .REGISTERED AMERICAN PASSPORT. .REGISTERED USA PASSPORT. .REGISTERED PASSPORT FOR COUNTRIES IN THE EUROPEAN UNION.
We offer a service to help you through to meet your goals, we can help you with: Getting real government issued ID under another identity(NEW NAMES), A new social security number (verifiable with the SSA), Checking and saving accounts for your new ID. Credit cards Relocation Passports, Diplomatic passports, novelty passports. Production and obtaining new identification documents.
We also do work permit and bank statements and have connections to OFFER JOBS in country like Dubai, USA , CANADA , UK ,China,Peru,Brazil,South Africa,Denmark,Sweden,Norway,France etc..
Tourist and business visa services available to residents of all 50 states and all nationalities Worldwide
INTERESTED TO BUY REAL GENUINE QUALITY BANKNOTES of Euros, Dollars and Pounds ??? with security feature magnetic ink, water marks, the pen test, and the security strip that by-pass machines. Our hundreds carry "color-shifting ink," an advanced feature that gives the money an appearance of changing color when held at different angles including Intaglio.
Coaching services available....
Not even an expertise custom official or machine can ever dictate the document we offer as fake, since the document is no different from Real government issued!
Email : jameswalkes0987(a)gmail.com
What's app:+237 677480749
Tell:+237 677480749
E-Mail Your Questions and Comments.
We are looking forward to receiving your inquiries and early receipt of your first orders!
Hi Boris,
On Tue, May 05, 2026 at 05:20:48PM +0200, Boris Brezillon wrote:
> Hi Ketil,
>
> On Tue, 5 May 2026 16:05:07 +0200
> Ketil Johnsen <ketil.johnsen(a)arm.com> wrote:
>
> > From: John Stultz <jstultz(a)google.com>
> >
> > Add proper reference counting on the dma_heap structure. While
> > existing heaps are built-in, we may eventually have heaps loaded
> > from modules, and we'll need to be able to properly handle the
> > references to the heaps
>
> It's weird that this "heap as module" thing is mentioned here, but
> actual robustness to make this safe is not added in the commit or any
> of the following ones.
>
> >
> > Signed-off-by: John Stultz <jstultz(a)google.com>
> > Signed-off-by: T.J. Mercier <tjmercier(a)google.com>
> > Signed-off-by: Yong Wu <yong.wu(a)mediatek.com>
> > [Yong: Just add comment for "minor" and "refcount"]
> > Signed-off-by: Yunfei Dong <yunfei.dong(a)mediatek.com>
> > [Yunfei: Change reviewer's comments]
> > Signed-off-by: Florent Tomasin <florent.tomasin(a)arm.com>
> > [Florent: Rebase]
> > Signed-off-by: Ketil Johnsen <ketil.johnsen(a)arm.com>
> > [Ketil: Rebase]
> > ---
> > drivers/dma-buf/dma-heap.c | 29 +++++++++++++++++++++++++++++
> > include/linux/dma-heap.h | 2 ++
> > 2 files changed, 31 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index ac5f8685a6494..9fd365ddbd517 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -12,6 +12,7 @@
> > #include <linux/dma-heap.h>
> > #include <linux/err.h>
> > #include <linux/export.h>
> > +#include <linux/kref.h>
> > #include <linux/list.h>
> > #include <linux/nospec.h>
> > #include <linux/syscalls.h>
> > @@ -31,6 +32,7 @@
> > * @heap_devt: heap device node
> > * @list: list head connecting to list of heaps
> > * @heap_cdev: heap char device
> > + * @refcount: reference counter for this heap device
> > *
> > * Represents a heap of memory from which buffers can be made.
> > */
> > @@ -41,6 +43,7 @@ struct dma_heap {
> > dev_t heap_devt;
> > struct list_head list;
> > struct cdev heap_cdev;
> > + struct kref refcount;
> > };
> >
> > static LIST_HEAD(heap_list);
> > @@ -248,6 +251,7 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> > if (!heap)
> > return ERR_PTR(-ENOMEM);
> >
> > + kref_init(&heap->refcount);
> > heap->name = exp_info->name;
> > heap->ops = exp_info->ops;
> > heap->priv = exp_info->priv;
> > @@ -313,6 +317,31 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> > }
> > EXPORT_SYMBOL_NS_GPL(dma_heap_add, "DMA_BUF_HEAP");
> >
> > +static void dma_heap_release(struct kref *ref)
> > +{
> > + struct dma_heap *heap = container_of(ref, struct dma_heap, refcount);
> > + unsigned int minor = MINOR(heap->heap_devt);
> > +
> > + mutex_lock(&heap_list_lock);
> > + list_del(&heap->list);
> > + mutex_unlock(&heap_list_lock);
> > +
> > + device_destroy(dma_heap_class, heap->heap_devt);
> > + cdev_del(&heap->heap_cdev);
> > + xa_erase(&dma_heap_minors, minor);
> > +
> > + kfree(heap);
>
> That's actually problematic, because cdev_del() doesn't guarantee that
> all opened FDs have been closed [1], it just guarantees that no new ones
> can materialize. In order to make that safe, we'd need a
>
> 1. kref_get_unless_zero() in dma_heap_open(), with proper locking around
> the xa_load() to protect against the heap removal that's happening
> here
> 2. a dma_heap_put() in a new dma_heap_close() implementation
> 3. a guarantee that heap implementations won't go away until the last
> ref is dropped, which means ops and all the data needed for this heap
> to satisfy ioctl()s (and more generally every passed at
> dma_heap_add() time) have to stay valid until the last ref is
> dropped. Alternatively, we could restrict this only to in-flight
> ioctl()s, and have the ops replaced by some dummy ops using RCU or a
> rwlock. But I guess live dmabufs allocated on this heap have to
> retain the heap and its implementation anyway.
>
> For record, #3 is already not satisfied by the current tee_heap
> implementation (tee_dma_heap objects can vanish before the dma_heap
> object is gone). The other implementations seem to be fine because they
> are statically linked, and they either have exp_info.priv set to NULL,
> or something that's never released.
That statement won't hold for long, see:
https://lore.kernel.org/r/20260427-dma-buf-heaps-as-modules-v5-0-b6f5678fee…
However, all upstream heaps can be loaded as module, but not unloaded.
So once you get a reference to it, you can assume it will live forever.
That's why we didn't merge that patch before, even though it was discussed:
https://lore.kernel.org/all/CANDhNCqk9Uk4aXHhUsL4hR1GHNmWZnH3C9Np-A02wdi+J3…
Maxime
On Thu, Apr 30, 2026 at 9:15 PM Barry Song <baohua(a)kernel.org> wrote:
>
> On Wed, Apr 22, 2026 at 3:10 PM Christian König
> <christian.koenig(a)amd.com> wrote:
> >
> > On 4/7/26 13:29, Barry Song wrote:
> > > On Tue, Apr 7, 2026 at 3:58 PM Christian König <christian.koenig(a)amd.com> wrote:
> > >>
> > >> On 4/6/26 23:49, Barry Song (Xiaomi) wrote:
> > >>> From: Xueyuan Chen <Xueyuan.chen21(a)gmail.com>
> > >>>
> > >>> Replace the heavy for_each_sgtable_page() iterator in system_heap_do_vmap()
> > >>> with a more efficient nested loop approach.
> > >>>
> > >>> Instead of iterating page by page, we now iterate through the scatterlist
> > >>> entries via for_each_sgtable_sg(). Because pages within a single sg entry
> > >>> are physically contiguous, we can populate the page array with a in an
> > >>> inner loop using simple pointer math. This save a lot of time.
> > >>>
> > >>> The WARN_ON check is also pulled out of the loop to save branch
> > >>> instructions.
> > >>>
> > >>> Performance results mapping a 2GB buffer on Radxa O6:
> > >>> - Before: ~1440000 ns
> > >>> - After: ~232000 ns
> > >>> (~84% reduction in iteration time, or ~6.2x faster)
> > >>
> > >> Well real question is why do you care about the vmap performance?
> > >>
> > >> That should basically only be used for fbdev emulation (except for VMGFX) and we absolutely don't care about performance there.
> > >
> > > I agree that in mainline, dma_buf_vmap is not used very often.
> > > Here’s what I was able to find:
> > >
> > > 1 1638 drivers/dma-buf/dma-buf.c <<dma_buf_vmap_unlocked>>
> > > ret = dma_buf_vmap(dmabuf, map);
> > > 2 376 drivers/gpu/drm/drm_gem_shmem_helper.c
> > > <<drm_gem_shmem_vmap_locked>>
> > > ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> > > 3 85 drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > <<etnaviv_gem_prime_vmap_impl>>
> > > ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
> > > 4 433 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c <<map_external>>
> > > ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
> > > 5 88 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c <<vmw_gem_vmap>>
> > > ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> > >
> > > However, in the Android ecosystem, system_heap and similar heaps
> > > are widely used across camera, NPU, and media drivers. Many of these
> > > drivers are not in mainline but do use vmap() in real code paths.
> >
> > Well out of tree drivers are not a justification to make an upstream changes.
> >
> > Apart from a handful of workarounds which need to CPU access as fallback DMA-buf vmap is only used to provide fb dev emulation.
> >
> > The vmap interface has already given us quite a headache in the first place and there are a couple of unresolved problems regarding synchronization and coherency.
> >
> > When a driver would be pushed upstream which makes so frequent use of the dma_buf_vmap function that it matters for the performance I think there would be push back on that and the driver developer would require a very good explanation why that is necessary.
> >
> > So for now I have to reject that patch.
>
> Well, it doesn’t seem to increase complexity, and the code is quite easy
> to understand.
I agree with this. This change introduces basically no downsides for
upstream, even if it primarily benefits a rare use case. Since
dma_buf_vmap is exported for driver use, why not enhance the
performance for all callers?
-T.J.
> It would be great if the community could be more welcoming
> to developers who are just getting involved, rather than discouraging them.
>
> Apparently, no one can control whether the source code of those kernel
> modules will be upstreamed except the vendors themselves, but products
> can still benefit from the common kernel.
>
> Best Regards
> Barry
On Tue, May 5, 2026 at 6:00 AM Julian Orth <ju.orth(a)gmail.com> wrote:
>
> On Tue, May 5, 2026 at 2:41 PM Christian König <christian.koenig(a)amd.com> wrote:
> >
> > Hi Julian,
> >
> > On 5/5/26 14:25, Julian Orth wrote:
> > > In ab4c3dcf9a71582503b4fb25aeab884c696cab25 ("dma-buf: Remove DMA-BUF
> > > sysfs stats") the /sys/kernel/dmabuf/buffer directory was removed.
> > >
> > > I've been using this interface, specifically the exporter_name file,
> > > to detect dmabufs created via udmabuf. Such dmabufs show "udmabuf" in
> > > exporter_name. I've been doing this for two reasons: 1) to detect that
> > > mmap on such buffers will be fast and 2) to detect that GPU access to
> > > such buffers will be slow.
> >
> > Crap, I really hoped that Android was the only user of that sysfs interface since that approach turned out to be quite broken.
> >
> > It's number one rule on Linux that we don't break userspace. So I hope that you don't insist on bringing that interface back, but if you do I will just revert the removal until we found a better solution.
>
> Bringing it back shouldn't be necessary.
>
> >
> > > With the removal of that file, that detection mechanism no longer works.
> > >
> > > I'm not particularly fond of that mechanism but it was the only one
> > > providing that functionality that I could find at the time. If there
> > > is another one, ideally an ioctl on the dmabuf, please let me know.
> >
> > The virtual fdinfo file you can find under /proc/$pid/fdinfo/$fd also contains the exporter name for the DMA-buf.
> >
> > You can find the full documentation here: https://docs.kernel.org/filesystems/proc.html#dma-buffer-files
> >
> > Is that sufficient?
>
> I think that is sufficient. I probably didn't use fdinfo initially
> because 1) it's a lot more work to parse and 2) I wasn't sure if it
> was intended to be machine-readable or if there could sometimes be
> newlines in the values and such.
>
> >
> > Additional to that the debugfs for DMA-buf also contains that information and I'm open to the suggestion with the IOCTL.
>
> My application runs as a regular user so it cannot access /sys/kernel/debug.
>
> Having an IOCTL would be ideal if it is not too much work. I'll fall
> back to fdinfo for now.
>
> Thanks, Julian
Phew, I'm glad fdinfo suits your needs.
Adding an ioctl would introduce new UAPI so I think we'd want to avoid
that unless absolutely necessary.
Thanks,
T.J.
> >
> > Regards,
> > Christian.
> >
> > >
> > > Shipping an entire BPF compiler in my application, which the original
> > > patch suggests as the replacement, is not an option when the removed
> > > alternative was simply reading a file.
> > >
> > > Thanks, Julian
> >
Hi Julian,
On 5/5/26 14:25, Julian Orth wrote:
> In ab4c3dcf9a71582503b4fb25aeab884c696cab25 ("dma-buf: Remove DMA-BUF
> sysfs stats") the /sys/kernel/dmabuf/buffer directory was removed.
>
> I've been using this interface, specifically the exporter_name file,
> to detect dmabufs created via udmabuf. Such dmabufs show "udmabuf" in
> exporter_name. I've been doing this for two reasons: 1) to detect that
> mmap on such buffers will be fast and 2) to detect that GPU access to
> such buffers will be slow.
Crap, I really hoped that Android was the only user of that sysfs interface since that approach turned out to be quite broken.
It's number one rule on Linux that we don't break userspace. So I hope that you don't insist on bringing that interface back, but if you do I will just revert the removal until we found a better solution.
> With the removal of that file, that detection mechanism no longer works.
>
> I'm not particularly fond of that mechanism but it was the only one
> providing that functionality that I could find at the time. If there
> is another one, ideally an ioctl on the dmabuf, please let me know.
The virtual fdinfo file you can find under /proc/$pid/fdinfo/$fd also contains the exporter name for the DMA-buf.
You can find the full documentation here: https://docs.kernel.org/filesystems/proc.html#dma-buffer-files
Is that sufficient?
Additional to that the debugfs for DMA-buf also contains that information and I'm open to the suggestion with the IOCTL.
Regards,
Christian.
>
> Shipping an entire BPF compiler in my application, which the original
> patch suggests as the replacement, is not an option when the removed
> alternative was simply reading a file.
>
> Thanks, Julian
Removing the signal on any feature allows to simplfy the dma_fence_array
code a lot and saves us from the need to install a callback on all fences
at the same time.
This results is less memory and CPU overhead.
v2: fix potential double locking pointed out by Tvrtko
Signed-off-by: Christian König <christian.koenig(a)amd.com>
---
drivers/dma-buf/dma-fence-array.c | 134 +++++++++++++-----------------
drivers/gpu/drm/xe/xe_vm.c | 2 +-
include/linux/dma-fence-array.h | 22 ++---
3 files changed, 66 insertions(+), 92 deletions(-)
diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index 5e10e8df372f..8b94c6287482 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -42,97 +42,88 @@ static void dma_fence_array_clear_pending_error(struct dma_fence_array *array)
cmpxchg(&array->base.error, PENDING_ERROR, 0);
}
-static void irq_dma_fence_array_work(struct irq_work *wrk)
+static void dma_fence_array_cb_func(struct dma_fence *f,
+ struct dma_fence_cb *cb)
{
- struct dma_fence_array *array = container_of(wrk, typeof(*array), work);
-
- dma_fence_array_clear_pending_error(array);
+ struct dma_fence_array *array =
+ container_of(cb, struct dma_fence_array, callback);
- dma_fence_signal(&array->base);
- dma_fence_put(&array->base);
+ irq_work_queue(&array->work);
}
-static void dma_fence_array_cb_func(struct dma_fence *f,
- struct dma_fence_cb *cb)
+static bool dma_fence_array_try_add_cb(struct dma_fence_array *array)
{
- struct dma_fence_array_cb *array_cb =
- container_of(cb, struct dma_fence_array_cb, cb);
- struct dma_fence_array *array = array_cb->array;
+ while (array->num_pending) {
+ struct dma_fence *f = array->fences[array->num_pending - 1];
- dma_fence_array_set_pending_error(array, f->error);
+ if (!dma_fence_add_callback(f, &array->callback,
+ dma_fence_array_cb_func))
+ return true;
- if (atomic_dec_and_test(&array->num_pending))
- irq_work_queue(&array->work);
- else
+ dma_fence_array_set_pending_error(array, f->error);
+ --array->num_pending;
+ }
+ return false;
+}
+
+static void dma_fence_array_irq_work(struct irq_work *wrk)
+{
+ struct dma_fence_array *array = container_of(wrk, typeof(*array), work);
+
+ --array->num_pending;
+ if (!dma_fence_array_try_add_cb(array)) {
+ dma_fence_signal(&array->base);
dma_fence_put(&array->base);
+ }
}
static bool dma_fence_array_enable_signaling(struct dma_fence *fence)
{
struct dma_fence_array *array = to_dma_fence_array(fence);
- struct dma_fence_array_cb *cb = array->callbacks;
- unsigned i;
- for (i = 0; i < array->num_fences; ++i) {
- cb[i].array = array;
+ /*
+ * As we may report that the fence is signaled before all
+ * callbacks are complete, we need to take an additional
+ * reference count on the array so that we do not free it too
+ * early. The core fence handling will only hold the reference
+ * until we signal the array as complete (but that is now
+ * insufficient).
+ */
+ dma_fence_get(&array->base);
+ if (!dma_fence_array_try_add_cb(array)) {
/*
- * As we may report that the fence is signaled before all
- * callbacks are complete, we need to take an additional
- * reference count on the array so that we do not free it too
- * early. The core fence handling will only hold the reference
- * until we signal the array as complete (but that is now
- * insufficient).
+ * When all fences are already signaled we can drop the reference again
+ * and report to the caller that the array can be signaled as well.
*/
- dma_fence_get(&array->base);
- if (dma_fence_add_callback(array->fences[i], &cb[i].cb,
- dma_fence_array_cb_func)) {
- int error = array->fences[i]->error;
-
- dma_fence_array_set_pending_error(array, error);
- dma_fence_put(&array->base);
- if (atomic_dec_and_test(&array->num_pending)) {
- dma_fence_array_clear_pending_error(array);
- return false;
- }
- }
+ dma_fence_put(&array->base);
+ return false;
}
-
return true;
}
static bool dma_fence_array_signaled(struct dma_fence *fence)
{
struct dma_fence_array *array = to_dma_fence_array(fence);
- int num_pending;
+ int num_pending, error = 0;
unsigned int i;
/*
- * We need to read num_pending before checking the enable_signal bit
- * to avoid racing with the enable_signaling() implementation, which
- * might decrement the counter, and cause a partial check.
- * atomic_read_acquire() pairs with atomic_dec_and_test() in
- * dma_fence_array_enable_signaling()
- *
- * The !--num_pending check is here to account for the any_signaled case
- * if we race with enable_signaling(), that means the !num_pending check
- * in the is_signalling_enabled branch might be outdated (num_pending
- * might have been decremented), but that's fine. The user will get the
- * right value when testing again later.
+ * Reading num_pending without a memory barrier here is correct since
+ * that is only for optimization, it is perfectly acceptable to have a
+ * stale value for it. In all other cases num_pending is accessed by a
+ * single call chain.
*/
- num_pending = atomic_read_acquire(&array->num_pending);
- if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &array->base.flags)) {
- if (num_pending <= 0)
- goto signal;
- return false;
- }
+ num_pending = READ_ONCE(array->num_pending);
+ for (i = 0; i < num_pending; ++i) {
+ struct dma_fence *f = array->fences[i];
- for (i = 0; i < array->num_fences; ++i) {
- if (dma_fence_is_signaled(array->fences[i]) && !--num_pending)
- goto signal;
- }
- return false;
+ if (!dma_fence_is_signaled(f))
+ return false;
-signal:
+ if (!error)
+ error = f->error;
+ }
+ dma_fence_array_set_pending_error(array, error);
dma_fence_array_clear_pending_error(array);
return true;
}
@@ -171,15 +162,12 @@ EXPORT_SYMBOL(dma_fence_array_ops);
/**
* dma_fence_array_alloc - Allocate a custom fence array
- * @num_fences: [in] number of fences to add in the array
*
* Return dma fence array on success, NULL on failure
*/
-struct dma_fence_array *dma_fence_array_alloc(int num_fences)
+struct dma_fence_array *dma_fence_array_alloc(void)
{
- struct dma_fence_array *array;
-
- return kzalloc_flex(*array, callbacks, num_fences);
+ return kzalloc_obj(struct dma_fence_array);
}
EXPORT_SYMBOL(dma_fence_array_alloc);
@@ -203,10 +191,13 @@ void dma_fence_array_init(struct dma_fence_array *array,
WARN_ON(!num_fences || !fences);
array->num_fences = num_fences;
+ array->num_pending = num_fences;
+ array->fences = fences;
+ array->base.error = PENDING_ERROR;
dma_fence_init(&array->base, &dma_fence_array_ops, NULL, context,
seqno);
- init_irq_work(&array->work, irq_dma_fence_array_work);
+ init_irq_work(&array->work, dma_fence_array_irq_work);
/*
* dma_fence_array_enable_signaling() is invoked while holding
@@ -220,11 +211,6 @@ void dma_fence_array_init(struct dma_fence_array *array,
*/
lockdep_set_class(&array->base.inline_lock, &dma_fence_array_lock_key);
- atomic_set(&array->num_pending, num_fences);
- array->fences = fences;
-
- array->base.error = PENDING_ERROR;
-
/*
* dma_fence_array objects should never contain any other fence
* containers or otherwise we run into recursion and potential kernel
@@ -265,7 +251,7 @@ struct dma_fence_array *dma_fence_array_create(int num_fences,
{
struct dma_fence_array *array;
- array = dma_fence_array_alloc(num_fences);
+ array = dma_fence_array_alloc();
if (!array)
return NULL;
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 62a87a051be7..8f472911469d 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3370,7 +3370,7 @@ static struct dma_fence *ops_execute(struct xe_vm *vm,
goto err_trace;
}
- cf = dma_fence_array_alloc(n_fence);
+ cf = dma_fence_array_alloc();
if (!cf) {
fence = ERR_PTR(-ENOMEM);
goto err_out;
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index 1b1d87579c38..3ee55c0e2fa4 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -15,16 +15,6 @@
#include <linux/dma-fence.h>
#include <linux/irq_work.h>
-/**
- * struct dma_fence_array_cb - callback helper for fence array
- * @cb: fence callback structure for signaling
- * @array: reference to the parent fence array object
- */
-struct dma_fence_array_cb {
- struct dma_fence_cb cb;
- struct dma_fence_array *array;
-};
-
/**
* struct dma_fence_array - fence to represent an array of fences
* @base: fence base class
@@ -33,18 +23,17 @@ struct dma_fence_array_cb {
* @num_pending: fences in the array still pending
* @fences: array of the fences
* @work: internal irq_work function
- * @callbacks: array of callback helpers
+ * @callback: callback structure for signaling
*/
struct dma_fence_array {
struct dma_fence base;
- unsigned num_fences;
- atomic_t num_pending;
+ unsigned int num_fences;
+ unsigned int num_pending;
struct dma_fence **fences;
struct irq_work work;
-
- struct dma_fence_array_cb callbacks[] __counted_by(num_fences);
+ struct dma_fence_cb callback;
};
/**
@@ -78,11 +67,10 @@ to_dma_fence_array(struct dma_fence *fence)
for (index = 0, fence = dma_fence_array_first(head); fence; \
++(index), fence = dma_fence_array_next(head, index))
-struct dma_fence_array *dma_fence_array_alloc(int num_fences);
+struct dma_fence_array *dma_fence_array_alloc(void);
void dma_fence_array_init(struct dma_fence_array *array,
int num_fences, struct dma_fence **fences,
u64 context, unsigned seqno);
-
struct dma_fence_array *dma_fence_array_create(int num_fences,
struct dma_fence **fences,
u64 context, unsigned seqno);
--
2.43.0
Acting quickly can dramatically increase the chances of successful retrieval. If you’ve lost your crypto to a scam or accidental transfer, don’t wait another day hoping the situation will resolve itself. Take proactive steps and work with a team that has a proven history of delivering results.
Ghost Champion Recovery Experts is ready to help you reclaim what is rightfully yours. No matter how complex your case may seem, their specialists will assess your situation honestly and provide a clear roadmap toward recovery. You deserve peace of mind and the confidence that your financial future can be restored.
Contact Ghost Champion Recovery Experts today and take the first step toward recovering your lost cryptocurrency:
Telegram: https://t.me/WizardGhosthacker
Email:ghostchampionwizard@gmail.com
Website : https://stellamariaqueen03.wixsite.com/-ghost-champion-wiza
Don’t let scammers win. Join thousands of others who have trusted GHOST CHAMPION HACKER to retrieve their stolen assets and rebuild their financial security. Your recovery journey starts here.
Acting quickly can dramatically increase the chances of successful retrieval. If you’ve lost your crypto to a scam or accidental transfer, don’t wait another day hoping the situation will resolve itself. Take proactive steps and work with a team that has a proven history of delivering results.
Ghost Champion Recovery Experts is ready to help you reclaim what is rightfully yours. No matter how complex your case may seem, their specialists will assess your situation honestly and provide a clear roadmap toward recovery. You deserve peace of mind and the confidence that your financial future can be restored.
Contact Ghost Champion Recovery Experts today and take the first step toward recovering your lost cryptocurrency:
Telegram: https://t.me/WizardGhosthacker
Email:ghostchampionwizard@gmail.com
Website : https://stellamariaqueen03.wixsite.com/-ghost-champion-wiza
Don’t let scammers win. Join thousands of others who have trusted GHOST CHAMPION HACKER to retrieve their stolen assets and rebuild their financial security. Your recovery journey starts here.