Linaro-mm-sig September 2025

linaro-mm-sig@lists.linaro.org

22 participants
40 discussions

[PATCH v3 00/10] vfio/pci: Allow MMIO regions to be exported through dma-buf

by Leon Romanovsky

Changelog: v3: * Changed pcim_p2pdma_enable() to be pcim_p2pdma_provider(). * Cache provider in vfio_pci_dma_buf struct instead of BAR index. * Removed misleading comment from pcim_p2pdma_provider(). * Moved MMIO check to be in pcim_p2pdma_provider(). v2: https://lore.kernel.org/all/cover.1757589589.git.leon@kernel.org/ * Added extra patch which adds new CONFIG, so next patches can reuse it. * Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state" into the other patch. * Fixed revoke calls to be aligned with true->false semantics. * Extended p2pdma_providers to be per-BAR and not global to whole device. * Fixed possible race between dmabuf states and revoke. * Moved revoke to PCI BAR zap block. v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org * Changed commit messages. * Reused DMA_ATTR_MMIO attribute. * Returned support for multiple DMA ranges per-dMABUF. v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com --------------------------------------------------------------------------- Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API" https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series. --------------------------------------------------------------------------- This series extends the VFIO PCI subsystem to support exporting MMIO regions from PCI device BARs as dma-buf objects, enabling safe sharing of non-struct page memory with controlled lifetime management. This allows RDMA and other subsystems to import dma-buf FDs and build them into memory regions for PCI P2P operations. The series supports a use case for SPDK where a NVMe device will be owned by SPDK through VFIO but interacting with a RDMA device. The RDMA device may directly access the NVMe CMB or directly manipulate the NVMe device's doorbell using PCI P2P. However, as a general mechanism, it can support many other scenarios with VFIO. This dmabuf approach can be usable by iommufd as well for generic and safe P2P mappings. In addition to the SPDK use-case mentioned above, the capability added in this patch series can also be useful when a buffer (located in device memory such as VRAM) needs to be shared between any two dGPU devices or instances (assuming one of them is bound to VFIO PCI) as long as they are P2P DMA compatible. The implementation provides a revocable attachment mechanism using dma-buf move operations. MMIO regions are normally pinned as BARs don't change physical addresses, but access is revoked when the VFIO device is closed or a PCI reset is issued. This ensures kernel self-defense against potentially hostile userspace. The series includes significant refactoring of the PCI P2PDMA subsystem to separate core P2P functionality from memory allocation features, making it more modular and suitable for VFIO use cases that don't need struct page support. ----------------------------------------------------------------------- The series is based originally on https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c… but heavily rewritten to be based on DMA physical API. ----------------------------------------------------------------------- The WIP branch can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=… Thanks Leon Romanovsky (8): PCI/P2PDMA: Separate the mmap() support from the core logic PCI/P2PDMA: Simplify bus address mapping API PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation PCI/P2PDMA: Export pci_p2pdma_map_type() function types: move phys_vec definition to common header vfio/pci: Add dma-buf export config for MMIO regions vfio/pci: Enable peer-to-peer DMA transactions by default vfio/pci: Add dma-buf export support for MMIO regions Vivek Kasireddy (2): vfio: Export vfio device get and put registration helpers vfio/pci: Share the core device pointer while invoking feature functions block/blk-mq-dma.c | 7 +- drivers/iommu/dma-iommu.c | 4 +- drivers/pci/p2pdma.c | 176 +++++++++---- drivers/vfio/pci/Kconfig | 20 ++ drivers/vfio/pci/Makefile | 2 + drivers/vfio/pci/vfio_pci_config.c | 22 +- drivers/vfio/pci/vfio_pci_core.c | 58 +++-- drivers/vfio/pci/vfio_pci_dmabuf.c | 394 +++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 23 ++ drivers/vfio/vfio_main.c | 2 + include/linux/pci-p2pdma.h | 115 +++++---- include/linux/types.h | 5 + include/linux/vfio.h | 2 + include/linux/vfio_pci_core.h | 4 + include/uapi/linux/vfio.h | 25 ++ kernel/dma/direct.c | 4 +- mm/hmm.c | 2 +- 17 files changed, 741 insertions(+), 124 deletions(-) create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c -- 2.51.0

2 months

[PATCH v4] Documentation: dma-buf: heaps: Add naming guidelines

by Maxime Ripard

We've discussed a number of times of how some heap names are bad, but not really what makes a good heap name. Let's document what we expect the heap names to look like. Reviewed-by: Andrew Davis <afd(a)ti.com> Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com> Signed-off-by: Maxime Ripard <mripard(a)kernel.org> --- Changes in v4: - Dropped *all* the cacheable mentions - Link to v3: https://lore.kernel.org/r/20250717-dma-buf-heap-names-doc-v3-1-d2dbb4b95ef6… Changes in v3: - Grammar, spelling fixes - Remove the cacheable / uncacheable name suggestion - Link to v2: https://lore.kernel.org/r/20250616-dma-buf-heap-names-doc-v2-1-8ae43174cdbf… Changes in v2: - Added justifications for each requirement / suggestions - Added a mention and example of buffer attributes - Link to v1: https://lore.kernel.org/r/20250520-dma-buf-heap-names-doc-v1-1-ab31f74809ee… --- Documentation/userspace-api/dma-buf-heaps.rst | 35 +++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst index 535f49047ce6450796bf4380c989e109355efc05..1ced2720f929432661182f1a3a88aa1ff80bd6af 100644 --- a/Documentation/userspace-api/dma-buf-heaps.rst +++ b/Documentation/userspace-api/dma-buf-heaps.rst @@ -21,5 +21,40 @@ following heaps: usually created either through the kernel commandline through the `cma` parameter, a memory region Device-Tree node with the `linux,cma-default` property set, or through the `CMA_SIZE_MBYTES` or `CMA_SIZE_PERCENTAGE` Kconfig options. Depending on the platform, it might be called ``reserved``, ``linux,cma``, or ``default-pool``. + +Naming Convention +================= + +``dma-buf`` heaps name should meet a number of constraints: + +- The name must be stable, and must not change from one version to the other. + Userspace identifies heaps by their name, so if the names ever change, we + would be likely to introduce regressions. + +- The name must describe the memory region the heap will allocate from, and + must uniquely identify it in a given platform. Since userspace applications + use the heap name as the discriminant, it must be able to tell which heap it + wants to use reliably if there's multiple heaps. + +- The name must not mention implementation details, such as the allocator. The + heap driver will change over time, and implementation details when it was + introduced might not be relevant in the future. + +- The name should describe properties of the buffers that would be allocated. + Doing so will make heap identification easier for userspace. Such properties + are: + + - ``contiguous`` for physically contiguous buffers; + + - ``protected`` for encrypted buffers not accessible the OS; + +- The name may describe intended usage. Doing so will make heap identification + easier for userspace applications and users. + +For example, assuming a platform with a reserved memory region located +at the RAM address 0x42000000, intended to allocate video framebuffers, +physically contiguous, and backed by the CMA kernel allocator, good +names would be ``memory@42000000-contiguous`` or ``video@42000000``, but +``cma-video`` wouldn't. --- base-commit: 038d61fd642278bab63ee8ef722c50d10ab01e8f change-id: 20250520-dma-buf-heap-names-doc-31261aa0cfe6 Best regards, -- Maxime Ripard <mripard(a)kernel.org>

2 months

[PATCH] dma-buf: fix reference count leak in dma_buf_poll_add_cb()

by Dan Carpenter

Call dma_fence_put(fence) if dma_fence_add_callback() fails. Fixes: 6b51b02a3a0a ("dma-buf: fix and rework dma_buf_poll v7") Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org> --- From code review, not from testing. Please review carefully. drivers/dma-buf/dma-buf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 2bcf9ceca997..a14e1f50b090 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -301,9 +301,9 @@ static bool dma_buf_poll_add_cb(struct dma_resv *resv, bool write, fence) { dma_fence_get(fence); r = dma_fence_add_callback(fence, &dcb->cb, dma_buf_poll_cb); + dma_fence_put(fence); if (!r) return true; - dma_fence_put(fence); } return false; -- 2.51.0

2 months

Re: [PATCH] dma-buf/sw-sync: Hide the feature by default

by Christian König

On 22.09.25 15:24, Janusz Krzysztofik wrote: > When multiple fences of an sw_sync timeline are signaled via > sw_sync_ioctl_inc(), we now disable interrupts and keep them disabled > while signaling all requested fences of the timeline in a loop. Since > user space may set up an arbitrary long timeline of fences with > arbitrarily expensive callbacks added to each fence, we may end up running > with interrupts disabled for too long, longer than NMI watchdog limit. > That potentially risky scenario has been demonstrated on Intel DRM CI > trybot[1], on a low end machine fi-pnv-d510, with one of new IGT subtests > that tried to reimplement wait_* test cases of a dma_fence_chain selftest > in user space. > > [141.993704] [IGT] syncobj_timeline: starting subtest stress-enable-all-signal-all-forward > [164.964389] watchdog: CPU3: Watchdog detected hard LOCKUP on cpu 3 > [164.964407] Modules linked in: snd_hda_codec_alc662 snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore i915 prime_numbers ttm drm_buddy drm_display_helper cec rc_core i2c_algo_bit video wmi overlay at24 ppdev gpio_ich binfmt_misc nls_iso8859_1 coretemp i2c_i801 i2c_mux i2c_smbus r8169 lpc_ich realtek parport_pc parport nvme_fabrics dm_multipath fuse msr efi_pstore nfnetlink autofs4 > [164.964569] irq event stamp: 1002206 > [164.964575] hardirqs last enabled at (1002205): [<ffffffff82898ac7>] _raw_spin_unlock_irq+0x27/0x70 > [164.964599] hardirqs last disabled at (1002206): [<ffffffff8287d021>] sysvec_irq_work+0x11/0xc0 > [164.964616] softirqs last enabled at (1002138): [<ffffffff81341bc5>] fpu_clone+0xb5/0x270 > [164.964631] softirqs last disabled at (1002136): [<ffffffff81341b97>] fpu_clone+0x87/0x270 > [164.964650] CPU: 3 UID: 0 PID: 1515 Comm: syncobj_timelin Tainted: G U 6.17.0-rc6-Trybot_154715v1-gc1b827f32471+ #1 PREEMPT(voluntary) > [164.964662] Tainted: [U]=USER > [164.964665] Hardware name: /D510MO, BIOS MOPNV10J.86A.0311.2010.0802.2346 08/02/2010 > [164.964669] RIP: 0010:lock_release+0x13d/0x2a0 > [164.964680] Code: c2 01 48 8d 4d c8 44 89 f6 4c 89 ef e8 bc fc ff ff 0b 05 96 ca 42 06 0f 84 fc 00 00 00 b8 ff ff ff ff 65 0f c1 05 0b 71 a9 02 <83> f8 01 0f 85 2f 01 00 00 48 f7 45 c0 00 02 00 00 74 06 fb 0f 1f > [164.964686] RSP: 0018:ffffc90000170e70 EFLAGS: 00000057 > [164.964693] RAX: 0000000000000001 RBX: ffffffff83595520 RCX: 0000000000000000 > [164.964698] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [164.964701] RBP: ffffc90000170eb0 R08: 0000000000000000 R09: 0000000000000000 > [164.964706] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8226a948 > [164.964710] R13: ffff88802423b340 R14: 0000000000000001 R15: ffff88802423c238 > [164.964714] FS: 0000729f4d972940(0000) GS:ffff8880f8e77000(0000) knlGS:0000000000000000 > [164.964720] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [164.964725] CR2: 0000729f4d92e720 CR3: 000000003afe4000 CR4: 00000000000006f0 > [164.964729] Call Trace: > [164.964734] <IRQ> > [164.964750] dma_fence_chain_get_prev+0x13d/0x240 > [164.964769] dma_fence_chain_walk+0xbd/0x200 > [164.964784] dma_fence_chain_enable_signaling+0xb2/0x280 > [164.964803] dma_fence_chain_irq_work+0x1b/0x80 > [164.964816] irq_work_single+0x75/0xa0 > [164.964834] irq_work_run_list+0x33/0x60 > [164.964846] irq_work_run+0x18/0x40 > [164.964856] __sysvec_irq_work+0x35/0x170 > [164.964868] sysvec_irq_work+0x9b/0xc0 > [164.964879] </IRQ> > [164.964882] <TASK> > [164.964890] asm_sysvec_irq_work+0x1b/0x20 > [164.964900] RIP: 0010:_raw_spin_unlock_irq+0x2d/0x70 > [164.964907] Code: 00 00 55 48 89 e5 53 48 89 fb 48 83 c7 18 48 8b 75 08 e8 06 63 bf fe 48 89 df e8 be 98 bf fe e8 59 ee d3 fe fb 0f 1f 44 00 00 <65> ff 0d 5c 85 68 01 74 14 48 8b 5d f8 c9 31 c0 31 d2 31 c9 31 f6 > [164.964913] RSP: 0018:ffffc9000070fca0 EFLAGS: 00000246 > [164.964919] RAX: 0000000000000000 RBX: ffff88800c2d8b10 RCX: 0000000000000000 > [164.964923] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [164.964927] RBP: ffffc9000070fca8 R08: 0000000000000000 R09: 0000000000000000 > [164.964931] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88800c2d8ac0 > [164.964934] R13: ffffc9000070fcc8 R14: ffff88800c2d8ac0 R15: 00000000ffffffff > [164.964967] sync_timeline_signal+0x153/0x2c0 > [164.964989] sw_sync_ioctl+0x98/0x580 > [164.965017] __x64_sys_ioctl+0xa2/0x100 > [164.965034] x64_sys_call+0x1226/0x2680 > [164.965046] do_syscall_64+0x93/0x980 > [164.965057] ? do_syscall_64+0x1b7/0x980 > [164.965070] ? lock_release+0xce/0x2a0 > [164.965082] ? __might_fault+0x53/0xb0 > [164.965096] ? __might_fault+0x89/0xb0 > [164.965104] ? __might_fault+0x53/0xb0 > [164.965116] ? _copy_to_user+0x53/0x70 > [164.965131] ? __x64_sys_rt_sigprocmask+0x8f/0xe0 > [164.965152] ? do_syscall_64+0x1b7/0x980 > [164.965169] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [164.965176] RIP: 0033:0x729f4fb24ded > [164.965188] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 > [164.965193] RSP: 002b:00007ffdc36220e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [164.965200] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 0000729f4fb24ded > [164.965205] RDX: 00007ffdc3622174 RSI: 0000000040045701 RDI: 0000000000000007 > [164.965209] RBP: 00007ffdc3622130 R08: 0000000000000000 R09: 0000000000000000 > [164.965213] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdc3622174 > [164.965217] R13: 0000000040045701 R14: 0000000000000007 R15: 0000000000000003 > [164.965248] </TASK> > [166.952984] perf: interrupt took too long (11861 > 6217), lowering kernel.perf_event_max_sample_rate to 16000 > [166.953134] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 13036276804 wd_nsec: 13036274445 > > As explained by Christian Köenig[2], "The purpose of the sw-sync is to > test what happens if drivers exposing dma-fences doesn't behave well. So > being able to trigger the NMI watchdog for example is part of why that > functionality exists in the first place. ... You can actually use the > functionality to intentionally deadlock drivers and even the core memory > management." > > Let the feature show up only if EXPERT is selected. > > [1] https://patchwork.freedesktop.org/series/154715/ > [2] https://patchwork.freedesktop.org/patch/675579/#comment_1239269 > > Fixes: 35538d7822e86 ("dma-buf/sw_sync: de-stage SW_SYNC") > Cc: Christian König <christian.koenig(a)amd.com> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com> Good idea, we have previously discussed to taint the kernel if sw_sync is used but that is also a clearly step in the right direction. Reviewed-by: Christian König <christian.koenig(a)amd.com> Regards, Christian. > --- > drivers/dma-buf/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig > index b46eb8a552d7b..e726948b64f67 100644 > --- a/drivers/dma-buf/Kconfig > +++ b/drivers/dma-buf/Kconfig > @@ -18,7 +18,7 @@ config SYNC_FILE > Documentation/driver-api/sync_file.rst. > > config SW_SYNC > - bool "Sync File Validation Framework" > + bool "Sync File Validation Framework" if EXPERT > default n > depends on SYNC_FILE > depends on DEBUG_FS

2 months

Re: [PATCH] dma-buf/sw-sync: Fix interrupts disabled excessively long

by Christian König

On 19.09.25 12:44, Janusz Krzysztofik wrote: > When multiple fences of an sw_sync timeline are signaled via > sw_sync_ioctl_inc(), we now disable interrupts and keep them disabled > while signaling all requested fences of the timeline in a loop. Since > user space may set up an arbitrary long timeline of fences with > arbitrarily expensive callbacks added to each fence, we may end up running > with interrupts disabled for too long, longer than NMI watchdog limit. > That potentially risky scenario has been demonstrated on Intel DRM CI > trybot[1], on a low end machine fi-pnv-d510, with one of new IGT subtests > that try to reimplement wait_* test cases of a dma_fence_chain selftest in > user space. The purpose of the sw-sync is to test what happens if drivers exposing dma-fences doesn't behave well. So being able to trigger the NMI watchdog for example is part of why that functionality exists in the first place. See this WARNING in the Kconfig file as well: config SW_SYNC bool "Sync File Validation Framework" default n depends on SYNC_FILE depends on DEBUG_FS help A sync object driver that uses a 32bit counter to coordinate synchronization. Useful when there is no hardware primitive backing the synchronization. WARNING: improper use of this can result in deadlocking kernel drivers from userspace. Intended for test and debug only. You can actually use the functionality to intentionally deadlock drivers and even the core memory management. Regards, Christian. > > [141.993704] [IGT] syncobj_timeline: starting subtest stress-enable-all-signal-all-forward > [164.964389] watchdog: CPU3: Watchdog detected hard LOCKUP on cpu 3 > [164.964407] Modules linked in: snd_hda_codec_alc662 snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore i915 prime_numbers ttm drm_buddy drm_display_helper cec rc_core i2c_algo_bit video wmi overlay at24 ppdev gpio_ich binfmt_misc nls_iso8859_1 coretemp i2c_i801 i2c_mux i2c_smbus r8169 lpc_ich realtek parport_pc parport nvme_fabrics dm_multipath fuse msr efi_pstore nfnetlink autofs4 > [164.964569] irq event stamp: 1002206 > [164.964575] hardirqs last enabled at (1002205): [<ffffffff82898ac7>] _raw_spin_unlock_irq+0x27/0x70 > [164.964599] hardirqs last disabled at (1002206): [<ffffffff8287d021>] sysvec_irq_work+0x11/0xc0 > [164.964616] softirqs last enabled at (1002138): [<ffffffff81341bc5>] fpu_clone+0xb5/0x270 > [164.964631] softirqs last disabled at (1002136): [<ffffffff81341b97>] fpu_clone+0x87/0x270 > [164.964650] CPU: 3 UID: 0 PID: 1515 Comm: syncobj_timelin Tainted: G U 6.17.0-rc6-Trybot_154715v1-gc1b827f32471+ #1 PREEMPT(voluntary) > [164.964662] Tainted: [U]=USER > [164.964665] Hardware name: /D510MO, BIOS MOPNV10J.86A.0311.2010.0802.2346 08/02/2010 > [164.964669] RIP: 0010:lock_release+0x13d/0x2a0 > [164.964680] Code: c2 01 48 8d 4d c8 44 89 f6 4c 89 ef e8 bc fc ff ff 0b 05 96 ca 42 06 0f 84 fc 00 00 00 b8 ff ff ff ff 65 0f c1 05 0b 71 a9 02 <83> f8 01 0f 85 2f 01 00 00 48 f7 45 c0 00 02 00 00 74 06 fb 0f 1f > [164.964686] RSP: 0018:ffffc90000170e70 EFLAGS: 00000057 > [164.964693] RAX: 0000000000000001 RBX: ffffffff83595520 RCX: 0000000000000000 > [164.964698] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [164.964701] RBP: ffffc90000170eb0 R08: 0000000000000000 R09: 0000000000000000 > [164.964706] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8226a948 > [164.964710] R13: ffff88802423b340 R14: 0000000000000001 R15: ffff88802423c238 > [164.964714] FS: 0000729f4d972940(0000) GS:ffff8880f8e77000(0000) knlGS:0000000000000000 > [164.964720] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [164.964725] CR2: 0000729f4d92e720 CR3: 000000003afe4000 CR4: 00000000000006f0 > [164.964729] Call Trace: > [164.964734] <IRQ> > [164.964750] dma_fence_chain_get_prev+0x13d/0x240 > [164.964769] dma_fence_chain_walk+0xbd/0x200 > [164.964784] dma_fence_chain_enable_signaling+0xb2/0x280 > [164.964803] dma_fence_chain_irq_work+0x1b/0x80 > [164.964816] irq_work_single+0x75/0xa0 > [164.964834] irq_work_run_list+0x33/0x60 > [164.964846] irq_work_run+0x18/0x40 > [164.964856] __sysvec_irq_work+0x35/0x170 > [164.964868] sysvec_irq_work+0x9b/0xc0 > [164.964879] </IRQ> > [164.964882] <TASK> > [164.964890] asm_sysvec_irq_work+0x1b/0x20 > [164.964900] RIP: 0010:_raw_spin_unlock_irq+0x2d/0x70 > [164.964907] Code: 00 00 55 48 89 e5 53 48 89 fb 48 83 c7 18 48 8b 75 08 e8 06 63 bf fe 48 89 df e8 be 98 bf fe e8 59 ee d3 fe fb 0f 1f 44 00 00 <65> ff 0d 5c 85 68 01 74 14 48 8b 5d f8 c9 31 c0 31 d2 31 c9 31 f6 > [164.964913] RSP: 0018:ffffc9000070fca0 EFLAGS: 00000246 > [164.964919] RAX: 0000000000000000 RBX: ffff88800c2d8b10 RCX: 0000000000000000 > [164.964923] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [164.964927] RBP: ffffc9000070fca8 R08: 0000000000000000 R09: 0000000000000000 > [164.964931] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88800c2d8ac0 > [164.964934] R13: ffffc9000070fcc8 R14: ffff88800c2d8ac0 R15: 00000000ffffffff > [164.964967] sync_timeline_signal+0x153/0x2c0 > [164.964989] sw_sync_ioctl+0x98/0x580 > [164.965017] __x64_sys_ioctl+0xa2/0x100 > [164.965034] x64_sys_call+0x1226/0x2680 > [164.965046] do_syscall_64+0x93/0x980 > [164.965057] ? do_syscall_64+0x1b7/0x980 > [164.965070] ? lock_release+0xce/0x2a0 > [164.965082] ? __might_fault+0x53/0xb0 > [164.965096] ? __might_fault+0x89/0xb0 > [164.965104] ? __might_fault+0x53/0xb0 > [164.965116] ? _copy_to_user+0x53/0x70 > [164.965131] ? __x64_sys_rt_sigprocmask+0x8f/0xe0 > [164.965152] ? do_syscall_64+0x1b7/0x980 > [164.965169] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [164.965176] RIP: 0033:0x729f4fb24ded > [164.965188] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 > [164.965193] RSP: 002b:00007ffdc36220e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [164.965200] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 0000729f4fb24ded > [164.965205] RDX: 00007ffdc3622174 RSI: 0000000040045701 RDI: 0000000000000007 > [164.965209] RBP: 00007ffdc3622130 R08: 0000000000000000 R09: 0000000000000000 > [164.965213] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdc3622174 > [164.965217] R13: 0000000040045701 R14: 0000000000000007 R15: 0000000000000003 > [164.965248] </TASK> > [166.952984] perf: interrupt took too long (11861 > 6217), lowering kernel.perf_event_max_sample_rate to 16000 > [166.953134] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 13036276804 wd_nsec: 13036274445 > > Avoid potentially expensive signaling of each fence when removing it from > the timeline from inside the loop under protection of a common lock and > disabled interrupts, do that only after interrupts are re-enabled. Each > call to dma_fence_signal() will then disable and re-enable interrputs as > needed for processing of each signaled fence. > > [1] https://patchwork.freedesktop.org/series/154715/ > > Fixes: 0f0d8406fb9c3 ("android: convert sync to fence api, v6") > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com> > --- > drivers/dma-buf/sw_sync.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > index 3c20f1d31cf54..638c2f756299a 100644 > --- a/drivers/dma-buf/sw_sync.c > +++ b/drivers/dma-buf/sw_sync.c > @@ -224,13 +224,12 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) > > list_move_tail(&pt->link, &signalled); > rb_erase(&pt->node, &obj->pt_tree); > - > - dma_fence_signal_locked(&pt->base); > } > > spin_unlock_irq(&obj->lock); > > list_for_each_entry_safe(pt, next, &signalled, link) { > + dma_fence_signal(&pt->base); > list_del_init(&pt->link); > dma_fence_put(&pt->base); > }

2 months, 1 week

Re: [PATCH v1 1/2] drm/amdgpu: make non-NULL out fence mandatory

by Christian König

On 16.09.25 13:58, Pierre-Eric Pelloux-Prayer wrote: > > > Le 16/09/2025 à 12:52, Christian König a écrit : >> On 16.09.25 11:46, Pierre-Eric Pelloux-Prayer wrote: >>> >>> >>> Le 16/09/2025 à 11:25, Christian König a écrit : >>>> On 16.09.25 09:08, Pierre-Eric Pelloux-Prayer wrote: >>>>> amdgpu_ttm_copy_mem_to_mem has a single caller, make sure the out >>>>> fence is non-NULL to simplify the code. >>>>> Since none of the pointers should be NULL, we can enable >>>>> __attribute__((nonnull))__. >>>>> >>>>> While at it make the function static since it's only used from >>>>> amdgpuu_ttm.c. >>>>> >>>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 ++++++++--------- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 6 ------ >>>>> 2 files changed, 8 insertions(+), 15 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>> index 27ab4e754b2a..70b817b5578d 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>> @@ -284,12 +284,13 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, >>>>> * move and different for a BO to BO copy. >>>>> * >>>>> */ >>>>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, >>>>> - const struct amdgpu_copy_mem *src, >>>>> - const struct amdgpu_copy_mem *dst, >>>>> - uint64_t size, bool tmz, >>>>> - struct dma_resv *resv, >>>>> - struct dma_fence **f) >>>>> +__attribute__((nonnull)) >>>> >>>> That looks fishy. >>>> >>>>> +static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, >>>>> + const struct amdgpu_copy_mem *src, >>>>> + const struct amdgpu_copy_mem *dst, >>>>> + uint64_t size, bool tmz, >>>>> + struct dma_resv *resv, >>>>> + struct dma_fence **f) >>>> >>>> I'm not an expert for those, but looking at other examples that should be here and look something like: >>>> >>>> __attribute__((nonnull(7))) >>> >>> Both syntax are valid. The GCC docs says: >>> >>> If no arg-index is given to the nonnull attribute, all pointer arguments are marked as non-null >> >> Never seen that before. Is that gcc specifc or standardized? > > clang supports it: > > https://clang.llvm.org/docs/AttributeReference.html#id10 > > And both syntaxes are already used in the drm subtree by i915. Ok in that case Reviewed-by: Christian König <christian.koenig(a)amd.com>. Regards, Christian. > > Pierre-Eric > >> >>> >>> >>>> >>>> But I think for this case here it is also not a must have to have that. >>> >>> I can remove it if you prefer, but it doesn't hurt to have the compiler validate usage of the functions. >> >> Yeah it's clearly useful, but I'm worried that clang won't like it. >> >> Christian. >> >>> >>> Pierre-Eric >>> >>> >>>> >>>> Regards, >>>> Christian. >>>> >>>>> { >>>>> struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; >>>>> struct amdgpu_res_cursor src_mm, dst_mm; >>>>> @@ -363,9 +364,7 @@ int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, >>>>> } >>>>> error: >>>>> mutex_unlock(&adev->mman.gtt_window_lock); >>>>> - if (f) >>>>> - *f = dma_fence_get(fence); >>>>> - dma_fence_put(fence); >>>>> + *f = fence; >>>>> return r; >>>>> } >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h >>>>> index bb17987f0447..07ae2853c77c 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h >>>>> @@ -170,12 +170,6 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, >>>>> struct dma_resv *resv, >>>>> struct dma_fence **fence, bool direct_submit, >>>>> bool vm_needs_flush, uint32_t copy_flags); >>>>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, >>>>> - const struct amdgpu_copy_mem *src, >>>>> - const struct amdgpu_copy_mem *dst, >>>>> - uint64_t size, bool tmz, >>>>> - struct dma_resv *resv, >>>>> - struct dma_fence **f); >>>>> int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, >>>>> struct dma_resv *resv, >>>>> struct dma_fence **fence); >>

2 months, 1 week

Re: [PATCH v1 1/2] drm/amdgpu: make non-NULL out fence mandatory

by Christian König

On 16.09.25 11:46, Pierre-Eric Pelloux-Prayer wrote: > > > Le 16/09/2025 à 11:25, Christian König a écrit : >> On 16.09.25 09:08, Pierre-Eric Pelloux-Prayer wrote: >>> amdgpu_ttm_copy_mem_to_mem has a single caller, make sure the out >>> fence is non-NULL to simplify the code. >>> Since none of the pointers should be NULL, we can enable >>> __attribute__((nonnull))__. >>> >>> While at it make the function static since it's only used from >>> amdgpuu_ttm.c. >>> >>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 ++++++++--------- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 6 ------ >>> 2 files changed, 8 insertions(+), 15 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> index 27ab4e754b2a..70b817b5578d 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> @@ -284,12 +284,13 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, >>> * move and different for a BO to BO copy. >>> * >>> */ >>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, >>> - const struct amdgpu_copy_mem *src, >>> - const struct amdgpu_copy_mem *dst, >>> - uint64_t size, bool tmz, >>> - struct dma_resv *resv, >>> - struct dma_fence **f) >>> +__attribute__((nonnull)) >> >> That looks fishy. >> >>> +static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, >>> + const struct amdgpu_copy_mem *src, >>> + const struct amdgpu_copy_mem *dst, >>> + uint64_t size, bool tmz, >>> + struct dma_resv *resv, >>> + struct dma_fence **f) >> >> I'm not an expert for those, but looking at other examples that should be here and look something like: >> >> __attribute__((nonnull(7))) > > Both syntax are valid. The GCC docs says: > > If no arg-index is given to the nonnull attribute, all pointer arguments are marked as non-null Never seen that before. Is that gcc specifc or standardized? > > >> >> But I think for this case here it is also not a must have to have that. > > I can remove it if you prefer, but it doesn't hurt to have the compiler validate usage of the functions. Yeah it's clearly useful, but I'm worried that clang won't like it. Christian. > > Pierre-Eric > > >> >> Regards, >> Christian. >> >>> { >>> struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; >>> struct amdgpu_res_cursor src_mm, dst_mm; >>> @@ -363,9 +364,7 @@ int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, >>> } >>> error: >>> mutex_unlock(&adev->mman.gtt_window_lock); >>> - if (f) >>> - *f = dma_fence_get(fence); >>> - dma_fence_put(fence); >>> + *f = fence; >>> return r; >>> } >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h >>> index bb17987f0447..07ae2853c77c 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h >>> @@ -170,12 +170,6 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, >>> struct dma_resv *resv, >>> struct dma_fence **fence, bool direct_submit, >>> bool vm_needs_flush, uint32_t copy_flags); >>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, >>> - const struct amdgpu_copy_mem *src, >>> - const struct amdgpu_copy_mem *dst, >>> - uint64_t size, bool tmz, >>> - struct dma_resv *resv, >>> - struct dma_fence **f); >>> int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, >>> struct dma_resv *resv, >>> struct dma_fence **fence);

2 months, 1 week

Re: [PATCH v1 1/2] drm/amdgpu: make non-NULL out fence mandatory

by Christian König

On 16.09.25 09:08, Pierre-Eric Pelloux-Prayer wrote: > amdgpu_ttm_copy_mem_to_mem has a single caller, make sure the out > fence is non-NULL to simplify the code. > Since none of the pointers should be NULL, we can enable > __attribute__((nonnull))__. > > While at it make the function static since it's only used from > amdgpuu_ttm.c. > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 ++++++++--------- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 6 ------ > 2 files changed, 8 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 27ab4e754b2a..70b817b5578d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -284,12 +284,13 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, > * move and different for a BO to BO copy. > * > */ > -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > - const struct amdgpu_copy_mem *src, > - const struct amdgpu_copy_mem *dst, > - uint64_t size, bool tmz, > - struct dma_resv *resv, > - struct dma_fence **f) > +__attribute__((nonnull)) That looks fishy. > +static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > + const struct amdgpu_copy_mem *src, > + const struct amdgpu_copy_mem *dst, > + uint64_t size, bool tmz, > + struct dma_resv *resv, > + struct dma_fence **f) I'm not an expert for those, but looking at other examples that should be here and look something like: __attribute__((nonnull(7))) But I think for this case here it is also not a must have to have that. Regards, Christian. > { > struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; > struct amdgpu_res_cursor src_mm, dst_mm; > @@ -363,9 +364,7 @@ int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > } > error: > mutex_unlock(&adev->mman.gtt_window_lock); > - if (f) > - *f = dma_fence_get(fence); > - dma_fence_put(fence); > + *f = fence; > return r; > } > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > index bb17987f0447..07ae2853c77c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > @@ -170,12 +170,6 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > struct dma_resv *resv, > struct dma_fence **fence, bool direct_submit, > bool vm_needs_flush, uint32_t copy_flags); > -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > - const struct amdgpu_copy_mem *src, > - const struct amdgpu_copy_mem *dst, > - uint64_t size, bool tmz, > - struct dma_resv *resv, > - struct dma_fence **f); > int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > struct dma_resv *resv, > struct dma_fence **fence);

2 months, 1 week

Re: [PATCH v12 00/11] Trusted Execution Environment (TEE) driver for Qualcomm TEE (QTEE)

by Jens Wiklander

Hi, On Fri, Sep 12, 2025 at 6:07 AM Amirreza Zarrabi <amirreza.zarrabi(a)oss.qualcomm.com> wrote: > > This patch series introduces a Trusted Execution Environment (TEE) > driver for Qualcomm TEE (QTEE). QTEE enables Trusted Applications (TAs) > and services to run securely. It uses an object-based interface, where > each service is an object with sets of operations. Clients can invoke > these operations on objects, which can generate results, including other > objects. For example, an object can load a TA and return another object > that represents the loaded TA, allowing access to its services. > [snip] I'm OK with the TEE patches, Sumit and I have reviewed them. There were some minor conflicts with other patches I have in the pipe for this merge window, so this patchset is on top of what I have to avoid merge conflicts. However, the firmware patches are for code maintained by Björn. Björn, how would you like to do this? Can I take them via my tree, or what do you suggest? It's urgent to get this patchset into linux-next if it's to make it for the coming merge window. Ideally, I'd like to send my pull request to arm-soc during this week. Cheers, Jens > > --- > Amirreza Zarrabi (11): > firmware: qcom: tzmem: export shm_bridge create/delete > firmware: qcom: scm: add support for object invocation > tee: allow a driver to allocate a tee_device without a pool > tee: add close_context to TEE driver operation > tee: add TEE_IOCTL_PARAM_ATTR_TYPE_UBUF > tee: add TEE_IOCTL_PARAM_ATTR_TYPE_OBJREF > tee: increase TEE_MAX_ARG_SIZE to 4096 > tee: add Qualcomm TEE driver > tee: qcom: add primordial object > tee: qcom: enable TEE_IOC_SHM_ALLOC ioctl > Documentation: tee: Add Qualcomm TEE driver > > Documentation/tee/index.rst | 1 + > Documentation/tee/qtee.rst | 96 ++++ > MAINTAINERS | 7 + > drivers/firmware/qcom/qcom_scm.c | 119 ++++ > drivers/firmware/qcom/qcom_scm.h | 7 + > drivers/firmware/qcom/qcom_tzmem.c | 63 ++- > drivers/tee/Kconfig | 1 + > drivers/tee/Makefile | 1 + > drivers/tee/qcomtee/Kconfig | 12 + > drivers/tee/qcomtee/Makefile | 9 + > drivers/tee/qcomtee/async.c | 182 ++++++ > drivers/tee/qcomtee/call.c | 820 +++++++++++++++++++++++++++ > drivers/tee/qcomtee/core.c | 915 +++++++++++++++++++++++++++++++ > drivers/tee/qcomtee/mem_obj.c | 169 ++++++ > drivers/tee/qcomtee/primordial_obj.c | 113 ++++ > drivers/tee/qcomtee/qcomtee.h | 185 +++++++ > drivers/tee/qcomtee/qcomtee_msg.h | 304 ++++++++++ > drivers/tee/qcomtee/qcomtee_object.h | 316 +++++++++++ > drivers/tee/qcomtee/shm.c | 150 +++++ > drivers/tee/qcomtee/user_obj.c | 692 +++++++++++++++++++++++ > drivers/tee/tee_core.c | 127 ++++- > drivers/tee/tee_private.h | 6 - > include/linux/firmware/qcom/qcom_scm.h | 6 + > include/linux/firmware/qcom/qcom_tzmem.h | 15 + > include/linux/tee_core.h | 54 +- > include/linux/tee_drv.h | 12 + > include/uapi/linux/tee.h | 56 +- > 27 files changed, 4410 insertions(+), 28 deletions(-) > --- > base-commit: 8b8aefa5a5c7d4a65883e5653cf12f94c0b68dbf > change-id: 20241202-qcom-tee-using-tee-ss-without-mem-obj-362c66340527 > > Best regards, > -- > Amirreza Zarrabi <amirreza.zarrabi(a)oss.qualcomm.com> >

2 months, 1 week

[PATCH v4 0/3] Batch 2 of rust gem shmem work

by Lyude Paul

Now that we're getting close to reaching the finish line for upstreaming the rust gem shmem bindings, we've got another batch of patches that have been reviewed and can be safely pushed to drm-rust-next independently of the rest of the series. These patches of course apply against the drm-rust-next branch, and are part of the gem shmem series, the latest version of which can be found here: https://patchwork.freedesktop.org/series/146465/ Lyude Paul (3): drm/gem/shmem: Extract drm_gem_shmem_init() from drm_gem_shmem_create() drm/gem/shmem: Extract drm_gem_shmem_release() from drm_gem_shmem_free() rust: Add dma_buf stub bindings drivers/gpu/drm/drm_gem_shmem_helper.c | 98 ++++++++++++++++++-------- include/drm/drm_gem_shmem_helper.h | 2 + rust/kernel/dma_buf.rs | 40 +++++++++++ rust/kernel/lib.rs | 1 + 4 files changed, 111 insertions(+), 30 deletions(-) create mode 100644 rust/kernel/dma_buf.rs base-commit: cf4fd52e323604ccfa8390917593e1fb965653ee -- 2.51.0

2 months, 1 week

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig September 2025