On 4/14/26 12:55, popov.nkv(a)gmail.com wrote:
> From: Vladimir Popov <popov.nkv(a)gmail.com>
>
> If vmw_execbuf_fence_commands() call fails in
> vmw_kms_helper_validation_finish(), it sets *p_fence = NULL. If
> ctx->bo_list is not empty, the caller, vmw_kms_helper_validation_finish(),
> passes the fence through a chain of functions to dma_fence_is_array(),
> which causes a NULL pointer dereference in dma_fence_is_array():
>
> vmw_kms_helper_validation_finish() // pass NULL fence
> vmw_validation_done()
> vmw_validation_bo_fence()
> ttm_eu_fence_buffer_objects() // pass NULL fence
> dma_resv_add_fence()
> dma_fence_is_container()
> dma_fence_is_array() // NULL deref
Well good catch, but that is clearly not the right fix.
I'm not an expert for the vmwgfx code but in case of an error vmw_validation_revert() should be called an not vmw_kms_helper_validation_finish().
Regards,
Christian.
>
> Fix this by adding a NULL check in vmw_validation_bo_fence(): if the fence
> is NULL, fall back to ttm_eu_backoff_reservation()to safely release
> the buffer object reservations without attempting to add a NULL fence to
> dma_resv. This is safe because when fence is NULL, vmw_fallback_wait()
> has already been called inside vmw_execbuf_fence_commands() to synchronize
> the GPU.
>
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>
> Fixes: 038ecc503236 ("drm/vmwgfx: Add a validation module v2")
> Cc: stable(a)vger.kernel.org
> Signed-off-by: Vladimir Popov <popov.nkv(a)gmail.com>
> ---
> drivers/gpu/drm/vmwgfx/vmwgfx_validation.h | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> index 353d837907d8..fc04555ca505 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> @@ -127,16 +127,23 @@ vmw_validation_bo_reserve(struct vmw_validation_context *ctx,
> * vmw_validation_bo_fence - Unreserve and fence buffer objects registered
> * with a validation context
> * @ctx: The validation context
> + * @fence: Fence with which to fence all buffer objects taking part in the
> + * command submission.
> *
> * This function unreserves the buffer objects previously reserved using
> - * vmw_validation_bo_reserve, and fences them with a fence object.
> + * vmw_validation_bo_reserve, and fences them with a fence object if the
> + * given fence object is not NULL.
> */
> static inline void
> vmw_validation_bo_fence(struct vmw_validation_context *ctx,
> struct vmw_fence_obj *fence)
> {
> - ttm_eu_fence_buffer_objects(&ctx->ticket, &ctx->bo_list,
> - (void *) fence);
> + /* fence is able to be NULL if vmw_execbuf_fence_commands() fails */
> + if (fence)
> + ttm_eu_fence_buffer_objects(&ctx->ticket, &ctx->bo_list,
> + (void *)fence);
> + else
> + ttm_eu_backoff_reservation(&ctx->ticket, &ctx->bo_list);
> }
>
> /**
> --
> 2.43.0
>
Ghost mystery recovery Hacker in 2026, was Verified as The Best Cryptocurrency Recovery
ghost mystery recovery Hacker, established in 2026, is a trusted leader in cryptocurrency recovery. Known for its professionalism, ghost helps clients recover lost or stolen funds from scams, hacks, and unauthorized transactions. With advanced tools and a dedicated team, they offer tailored solutions for ghost recovery across blockchains. Customers value ghost for its transparency, fast response, and confidential service. Its success rate and commitment make it a top choice for crypto investors.
Email address: support@ ghostmysteryrecovery. c om
WhatsApp on (+44) 7480 061765
Website; ghostmysteryrecovery. c om
In the world of crypto, speed and security aren’t just conveniences—they’re essentials. That’s exactly what Flash USDT delivers. Designed for traders, businesses, and anyone who values instant and reliable transfers, Flash USDT is a powerful software tailored for peer-to-peer USDT (Tether) transactions that are fast, low-cost, and seamless.
Official Website: https://globalflashhubs.com/
Whether you're managing a P2P exchange or simply need a trusted tool for transferring digital assets, Flash BTC and USDT offer the performance and security you expect, integrated smoothly into your workflow.
What’s New with Flash Coin and Generator Software?
Enhanced support for trading P2P across BTC and USDT TRC20 networks, catering to betting, gaming, and forex platforms.
New installation guides for Windows and mobile devices, making setup easier than ever.
Expanded blockch@in compatibility, allowing you to flash coins directly to Bin@nce, Trust Wallet, and more with confidence.
Flash BTC and USDT TRC20 transfers last up to 90 days, ensuring flexibility
Why Choose Flash Coin?
It’s more than just software—it’s a commitment to trust and efficiency. With Flash Coin, you gain access to one of the most reliable services online for crypto transactions, designed to keep your exchanges swift and secure across multiple platforms.
Ready to experience the difference? Shop now and see why traders and businesses worldwide rely on Flash Coin for their digital asset needs.
Get Started Today
Learn how to install and operate Flash Generator software with easy-to-follow instructions for your device. Whether on desktop or mobile, integrating this tool into your daily routine has never been simpler.
Have questions or need support? Reach out directly through our official channels:
Official Website: https://globalflashhubs.com/
WhatsApp: https://wa.link/8q02qv
Telegram: https://t.me/billier5
Explore more about our products and how Flash USDT can transform your crypto experience:
By combining cross-chain tracing, rapid response, and data-driven investigation, Cipher Rescue Chain stands as the global benchmark for crypto recovery. Every traced transaction, every reconstructed path, and every recovered asset reinforces the same conclusion: with the right forensic expertise, recovery is not only possible—it is highly achievable.
On 4/9/26 14:38, Andi Shyti wrote:
> Hi Christian,
>
>> @@ -845,9 +845,8 @@ void dma_buf_put(struct dma_buf *dmabuf)
>> if (WARN_ON(!dmabuf || !dmabuf->file))
>> return;
>>
>> - fput(dmabuf->file);
>> -
>> DMA_BUF_TRACE(trace_dma_buf_put, dmabuf);
>> + fput(dmabuf->file);
>
> funny, I just found out I sent the exact same patch, just few
> minutes later :-) [*]
I liked your patch better since it has more accurate tags.
Just reviewed and pushed that one to drm-misc-fixes, should land upstream by the end of the week.
Regards,
Christian.
>
> Andi
>
>> }
>
> [*] https://lore.kernel.org/all/20260408123916.2604101-2-andi.shyti@kernel.org/
Since its introduction, DMA-buf has only supported using scatterlist for
the exporter and importer to exchange address information. This is not
sufficient for all use cases as dma_addr_t is a very specific and limited
type that should not be abused for things unrelated to the DMA API.
There are several motivations for addressing this now:
1) VFIO to IOMMUFD and KVM requires a physical address, not a dma_addr_t
scatterlist, it cannot be represented in the scatterlist structure
2) xe vGPU requires the host driver to accept a DMABUF from VFIO of its
own VF and convert it into an internal VRAM address on the PF
3) We are starting to look at replacement datastructures for
scatterlist
4) Ideas around UALink/etc are suggesting not using the DMA API
None of these can sanely be achieved using scatterlist.
Introduce a new mechanism called "mapping types" which allows DMA-buf to
work with more map/unmap options than scatterlist. Each mapping type
encompasses a full set of functions and data unique to itself. The core
code provides a match-making system to select the best type offered by the
exporter and importer to be the active mapping type for the attachment.
Everything related to scatterlist is moved into a DMA-buf SGT mapping
type, and into the "dma_buf_sgt_*" namespace for clarity. Existing
exporters are moved over to explicitly declare SGT mapping types and
importers are adjusted to use the dma_buf_sgt_* named importer helpers.
Mapping types are designed to be extendable, a driver can declare its own
mapping type for its internal private interconnect and use that without
having to adjust the core code.
The new attachment sequence starts with the importing driver declaring
what mapping types it can accept:
struct dma_buf_mapping_match imp_match[] = {
DMA_BUF_IMAPPING_MY_DRIVER(dev, ...),
DMA_BUF_IMAPPING_SGT(dev, false),
};
attach = dma_buf_mapping_attach(dmabuf, imp_match, ...)
Most drivers will do this via a dma_buf_sgt_*attach() helper.
The exporting driver can then declare what mapping types it can supply:
int exporter_match_mapping(struct dma_buf_match_args *args)
{
struct dma_buf_mapping_match exp_match[] = {
DMA_BUF_EMAPPING_MY_DRIVER(my_ops, dev, ...),
DMA_BUF_EMAPPING_SGT(sgt_ops, dev, false),
DMA_BUF_EMAPPING_PAL(PAL_ops),
};
return dma_buf_match_mapping(args, exp_match, ...);
}
Most drivers will do this via a helper:
static const struct dma_buf_ops ops = {
DMA_BUF_SIMPLE_SGT_EXP_MATCH(map_func, unmap_func)
};
During dma_buf_mapping_attach() the core code will select a mutual match
between the importer and exporter and record it as the active match in
the attach->map_type.
Each mapping type has its own types/function calls for
mapping/unmapping, and storage in the attach->map_type for its
information. As such each mapping type can offer function signatures
and data that exactly matches its needs.
This series goes through a sequence of:
1) Introduce the basic mapping type framework and the main components of
the SGT mapping type
2) Automatically make all existing exporters and importers use core
generated SGT mapping types so every attachment has a SGT mapping type
3) Convert all exporter drivers to natively create a SGT mapping type
4) Move all dma_buf_* functions and types that are related to SGT into
dma_buf_sgt_*
5) Remove all the now-unused items that have been moved into SGT specific
structures.
6) Demonstrate adding a new Physical Address List alongside SGT.
Due to the high number of files touched I would expect this to be broken
into phases, but this shows the entire picture.
This is on github: https://github.com/jgunthorpe/linux/commits/dmabuf_map_type
It is a followup to the discussion here:
https://lore.kernel.org/dri-devel/20251027044712.1676175-1-vivek.kasireddy@…
Jason Gunthorpe (26):
dma-buf: Introduce DMA-buf mapping types
dma-buf: Add the SGT DMA mapping type
dma-buf: Add dma_buf_mapping_attach()
dma-buf: Route SGT related actions through attach->map_type
dma-buf: Allow single exporter drivers to avoid the match_mapping
function
drm: Check the SGT ops for drm_gem_map_dma_buf()
dma-buf: Convert all the simple exporters to use SGT mapping type
drm/vmwgfx: Use match_mapping instead of dummy calls
accel/habanalabs: Use the SGT mapping type
drm/xe/dma-buf: Use the SGT mapping type
drm/amdgpu: Use the SGT mapping type
vfio/pci: Change the DMA-buf exporter to use mapping_type
dma-buf: Update dma_buf_phys_vec_to_sgt() to use the SGT mapping type
iio: buffer: convert to use the SGT mapping type
functionfs: convert to use the SGT mapping type
dma-buf: Remove unused SGT stuff from the common structures
treewide: Rename dma_buf_map_attachment(_unlocked) to dma_buf_sgt_
treewide: Rename dma_buf_unmap_attachment(_unlocked) to dma_buf_sgt_*
treewide: Rename dma_buf_attach() to dma_buf_sgt_attach()
treewide: Rename dma_buf_dynamic_attach() to
dma_buf_sgt_dynamic_attach()
dma-buf: Add the Physical Address List DMA mapping type
vfio/pci: Add physical address list support to DMABUF
iommufd: Use the PAL mapping type instead of a vfio function
iommufd: Support DMA-bufs with multiple physical ranges
iommufd/selftest: Check multi-phys DMA-buf scenarios
dma-buf: Add kunit tests for mapping type
Documentation/gpu/todo.rst | 2 +-
drivers/accel/amdxdna/amdxdna_gem.c | 14 +-
drivers/accel/amdxdna/amdxdna_ubuf.c | 10 +-
drivers/accel/habanalabs/common/memory.c | 54 ++-
drivers/accel/ivpu/ivpu_gem.c | 10 +-
drivers/accel/ivpu/ivpu_gem_userptr.c | 11 +-
drivers/accel/qaic/qaic_data.c | 8 +-
drivers/dma-buf/Makefile | 1 +
drivers/dma-buf/dma-buf-mapping.c | 186 ++++++++-
drivers/dma-buf/dma-buf.c | 180 ++++++---
drivers/dma-buf/heaps/cma_heap.c | 12 +-
drivers/dma-buf/heaps/system_heap.c | 13 +-
drivers/dma-buf/st-dma-mapping.c | 373 ++++++++++++++++++
drivers/dma-buf/udmabuf.c | 8 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 98 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +-
drivers/gpu/drm/armada/armada_gem.c | 33 +-
drivers/gpu/drm/drm_gem_shmem_helper.c | 2 +-
drivers/gpu/drm/drm_prime.c | 31 +-
drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 18 +-
drivers/gpu/drm/i915/gem/i915_gem_object.c | 2 +-
.../drm/i915/gem/selftests/i915_gem_dmabuf.c | 8 +-
.../gpu/drm/i915/gem/selftests/mock_dmabuf.c | 8 +-
drivers/gpu/drm/msm/msm_gem_prime.c | 7 +-
drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 11 +-
drivers/gpu/drm/tegra/gem.c | 33 +-
drivers/gpu/drm/virtio/virtgpu_prime.c | 23 +-
drivers/gpu/drm/vmwgfx/vmwgfx_prime.c | 32 +-
drivers/gpu/drm/xe/xe_bo.c | 18 +-
drivers/gpu/drm/xe/xe_dma_buf.c | 61 +--
drivers/iio/industrialio-buffer.c | 15 +-
drivers/infiniband/core/umem_dmabuf.c | 15 +-
drivers/iommu/iommufd/io_pagetable.h | 4 +-
drivers/iommu/iommufd/iommufd_private.h | 8 -
drivers/iommu/iommufd/iommufd_test.h | 7 +
drivers/iommu/iommufd/pages.c | 85 ++--
drivers/iommu/iommufd/selftest.c | 177 ++++++---
.../media/common/videobuf2/videobuf2-core.c | 2 +-
.../common/videobuf2/videobuf2-dma-contig.c | 26 +-
.../media/common/videobuf2/videobuf2-dma-sg.c | 21 +-
.../common/videobuf2/videobuf2-vmalloc.c | 13 +-
.../platform/nvidia/tegra-vde/dmabuf-cache.c | 9 +-
drivers/misc/fastrpc.c | 21 +-
drivers/tee/tee_heap.c | 13 +-
drivers/usb/gadget/function/f_fs.c | 11 +-
drivers/vfio/pci/vfio_pci_dmabuf.c | 79 ++--
drivers/xen/gntdev-dmabuf.c | 29 +-
include/linux/dma-buf-mapping.h | 297 ++++++++++++++
include/linux/dma-buf.h | 168 ++++----
io_uring/zcrx.c | 9 +-
net/core/devmem.c | 14 +-
samples/vfio-mdev/mbochs.c | 10 +-
sound/soc/fsl/fsl_asrc_m2m.c | 12 +-
tools/testing/selftests/iommu/iommufd.c | 43 ++
tools/testing/selftests/iommu/iommufd_utils.h | 17 +
55 files changed, 1764 insertions(+), 614 deletions(-)
create mode 100644 drivers/dma-buf/st-dma-mapping.c
base-commit: c63e5a50e1dd291cd95b04291b028fdcaba4c534
--
2.43.0
Hi Philipp,
On 4/13/26 03:47, Philipp Stanner wrote:
> On Sat, 2026-04-11 at 15:47 -0300, MaÃra Canal wrote:
>> The kerneldoc comment on dma_fence_init() and dma_fence_init64() describe
>> the legacy reason to pass an external lock as a need to prevent multiple
>> fences "from signaling out of order". However, this wording is a bit
>> misleading: a shared spinlock does not (and cannot) prevent the signaler
>> from signaling out of order. Signaling order is the driver's responsibility
>> regardless of whether the lock is shared or per-fence.
>>
>> What a shared lock actually provides is serialization of signaling and
>> observation across fences in a given context, so that observers never
>> see a later fence signaled while an earlier one is not.
>>
>> Reword both comments to describe this more accurately.
>>
>> Signed-off-by: MaÃra Canal <mcanal(a)igalia.com>
>> ---
>>
>> Hi,
>>
>> While reading the documentation, I found this particular paragraph quite
>> hard to understand. As I understand it, locks don't enforce order, only
>> serialization, but the paragraph seems to communicate the other way around.
>
> Yes, 100%.
> That's one of the reasons why Christian recently moved to using inline-
> locks.
>
>
>
>> Due to that, I had the impression that the current wording can be
>> misleading for driver developers.
>>
>> I'm proposing a new wording to better describe the use case of the
>> external lock based on my understanding, but it would be great to hear
>> the feedback and suggestions from more experienced developers who might
>> have more insight about these legacy use cases.
>>
>> Best regards,
>> - MaÃra
>>
>> Â drivers/dma-buf/dma-fence.c | 12 ++++++++----
>> Â 1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 1826ba73094c..bdc29d1c1b5c 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -1102,8 +1102,10 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>> Â * to check which fence is later by simply using dma_fence_later().
>> Â *
>> Â * It is strongly discouraged to provide an external lock because this couples
>> - * lock and fence life time. This is only allowed for legacy use cases when
>> - * multiple fences need to be prevented from signaling out of order.
>> + * lock and fence lifetime. This is only allowed for legacy use cases that need
>> + * a shared lock to serialize signaling and observation of fences within a
>> + * context, so that observers never see a later fence signaled while an earlier
>> + * one isn't.
>
> I would probably delete the explanation altogether and just state
> "allowed for legacy reasons". The legacy people know why it's allowed,
> and new users don't need to care. Same below of course.
Actually, I ended up in this part of the documentation as I'm thinking
about dropping the external lock in v3d driver. I was reading the docs
exactly to know if v3d is a legacy use case (spoiler: it isn't) and I
got confused by the current wording. So, I think there is value in
documenting the legacy use cases for the external lock.
Maybe, instead of "strongly discouraged", we could add a disclaimer
like: "New drivers MUST NOT use an external lock because...". What do
you think?
Best regards,
- MaÃra
>
>
> Thx
> P.
>
>> Â */
>> Â void
>> Â dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>> @@ -1129,8 +1131,10 @@ EXPORT_SYMBOL(dma_fence_init);
>> Â * to check which fence is later by simply using dma_fence_later().
>> Â *
>> Â * It is strongly discouraged to provide an external lock because this couples
>> - * lock and fence life time. This is only allowed for legacy use cases when
>> - * multiple fences need to be prevented from signaling out of order.
>> + * lock and fence lifetime. This is only allowed for legacy use cases that need
>> + * a shared lock to serialize signaling and observation of fences within a
>> + * context, so that observers never see a later fence signaled while an earlier
>> + * one isn't.
>> Â */
>> Â void
>> Â dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
>
On 4/13/26 12:05, Tvrtko Ursulin wrote:
> Trace_dma_fence_signaled, trace_dma_fence_wait_end and
> trace_dma_fence_destroy can all dereference a null fence->ops pointer
> after it has been reset on fence signalling.
>
> Lets use the safe string getters for most tracepoints to a void this.
>
> But for the signalling tracepoint, we move it to before the fence->ops
> is reset and special case its definition in order to avoid losing the
> driver and timeline information.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
> Cc: Christian König <christian.koenig(a)amd.com>
> Cc: Philipp Stanner <phasta(a)kernel.org>
> Cc: Boris Brezillon <boris.brezillon(a)collabora.com>
> Cc: linux-media(a)vger.kernel.org
> Cc: linaro-mm-sig(a)lists.linaro.org
> ---
> drivers/dma-buf/dma-fence.c | 3 ++-
> include/trace/events/dma_fence.h | 29 +++++++++++++++++++++++++++--
> 2 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index a2aa82f4eedd..b3bfa6943a8e 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> &fence->flags)))
> return;
>
> + trace_dma_fence_signaled(fence);
> +
> /*
> * When neither a release nor a wait operation is specified set the ops
> * pointer to NULL to allow the fence structure to become independent
> @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>
> fence->timestamp = timestamp;
> set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> - trace_dma_fence_signaled(fence);
>
> list_for_each_entry_safe(cur, tmp, &cb_list, node) {
> INIT_LIST_HEAD(&cur->node);
> diff --git a/include/trace/events/dma_fence.h b/include/trace/events/dma_fence.h
> index 3abba45c0601..220bf71446e8 100644
> --- a/include/trace/events/dma_fence.h
> +++ b/include/trace/events/dma_fence.h
> @@ -9,12 +9,37 @@
>
> struct dma_fence;
>
> +DECLARE_EVENT_CLASS(dma_fence,
> +
> + TP_PROTO(struct dma_fence *fence),
> +
> + TP_ARGS(fence),
> +
> + TP_STRUCT__entry(
> + __string(driver, dma_fence_driver_name(fence))
> + __string(timeline, dma_fence_timeline_name(fence))
That requires that we hold the RCU read side lock while doing the trace.
Not sure if that can be done inside the DECLARE_EVENT_CLASS() macro.
> + __field(unsigned int, context)
> + __field(unsigned int, seqno)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(driver);
> + __assign_str(timeline);
> + __entry->context = fence->context;
> + __entry->seqno = fence->seqno;
> + ),
> +
> + TP_printk("driver=%s timeline=%s context=%u seqno=%u",
> + __get_str(driver), __get_str(timeline), __entry->context,
> + __entry->seqno)
> +);
> +
> /*
> * Safe only for call sites which are guaranteed to not race with fence
> * signaling,holding the fence->lock and having checked for not signaled, or the
> * signaling path itself.
> */
> -DECLARE_EVENT_CLASS(dma_fence,
> +DECLARE_EVENT_CLASS(dma_fence_ops,
>
> TP_PROTO(struct dma_fence *fence),
>
> @@ -67,7 +92,7 @@ DEFINE_EVENT(dma_fence, dma_fence_enable_signal,
> TP_ARGS(fence)
> );
>
> -DEFINE_EVENT(dma_fence, dma_fence_signaled,
> +DEFINE_EVENT(dma_fence_ops, dma_fence_signaled,
The signal trace event is actually unproblematic. The question is more what to do with the release event.
Regards,
Christian.
>
> TP_PROTO(struct dma_fence *fence),
>
On 4/10/26 17:37, Tvrtko Ursulin wrote:
>
> On 10/04/2026 09:58, Christian König wrote:
>> On 4/9/26 15:58, Tvrtko Ursulin wrote:
>>>
>>> On 31/03/2026 08:49, Boris Brezillon wrote:
>>>> On Mon, 30 Mar 2026 14:36:23 +0100
>>>> Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> wrote:
>>>>
>>>>> Move the signalling tracepoint to before fence->ops are reset otherwise
>>>>> tracepoint will dereference a null pointer.
>>>>
>>>> I suspect other trace points are impacted too
>>>> (trace_dma_fence_destroy() is, at the very least).
>>>
>>> Indeed. I wonder why that did not trigger for me, while the one I fix here was an insta-crash...
>>
>> You need to actually enable the trace points and at least for the destroy one nobody is usually interested in that.
>
> Right, but I was pretty sure I was enabling perf record -e 'dma_fence:*' when I hit this. Anyway, it doesn't matter, I could be misremembering.
>
>>>
>>> To fix trace_dma_fence_destroy I think we need a new tracepoint definition ie. move it away from the existing event class - make it just log the context and seqno.
>>>
>>> Anyone has a better idea?
>>
>> The idea of tracing without accessing fence->ops sounds valid to me.
>>
>> Alternatively we could call dma_fence_timeline_name() and dma_fence_driver_name() from the tracepoint as well, but that means the tracepoints now require a RCU read side lock.
>
> We could possibly use the helpers. I am not sure if RCU annotation would have to be casted away to keep sparse happy, but more importantly, I think it would not be safe.
>
> Â thread AÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â thread B
>
>  dma_fence_signal_timestamp_locked       dma_fence_timeline_name
>    ..                       ops = rcu_dereference(fence->ops);
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â if (!dma_fence_test_signaled_flag(fence))
> Â Â Â test_and_set_bit
> Â Â Â ..
> Â Â Â RCU_INIT_POINTER(fence->ops, NULL);
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return (const char __rcu *)ops->get_driver_name(fence); // OOPS!
>
> Apologies for long line length, it did not fit otherwise.
>
> Looks like we missed this. Or it is me who is missing something?
See function dma_fence_driver_name() and dma_fence_timeline_name():
ops = rcu_dereference(fence->ops);
if (!dma_fence_test_signaled_flag(fence))
return (const char __rcu *)ops->get_driver_name(fence);
We first grab the ops pointer and then check if the fence is signaled or not. Since we first set the signaled flag and then NULL the ops pointer in the other thread we should be save here.
Could only be that test_bit() is not a memory barrier, but set_bit() is so that would be a bit surprising.
Alternatively I would be fine to switching testing ops for NULL instead of calling dma_fence_test_signaled_flag().
Regards,
Christian.
> Regards,
>
> Tvrtko
>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
>>>>> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
>>>>> Cc: Christian König <christian.koenig(a)amd.com>
>>>>> Cc: Philipp Stanner <phasta(a)kernel.org>
>>>>> Cc: Boris Brezillon <boris.brezillon(a)collabora.com>
>>>>> Cc: linux-media(a)vger.kernel.org
>>>>> Cc: linaro-mm-sig(a)lists.linaro.org
>>>>> ---
>>>>> Â Â drivers/dma-buf/dma-fence.c | 3 ++-
>>>>> Â Â 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>>>> index 1826ba73094c..1c1eaecaf1b0 100644
>>>>> --- a/drivers/dma-buf/dma-fence.c
>>>>> +++ b/drivers/dma-buf/dma-fence.c
>>>>> @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â &fence->flags)))
>>>>> Â Â Â Â Â Â Â Â Â Â return;
>>>>> Â Â +Â Â Â trace_dma_fence_signaled(fence);
>>>>> +
>>>>> Â Â Â Â Â Â /*
>>>>> Â Â Â Â Â Â Â * When neither a release nor a wait operation is specified set the ops
>>>>> Â Â Â Â Â Â Â * pointer to NULL to allow the fence structure to become independent
>>>>> @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>>> Â Â Â Â Â Â Â fence->timestamp = timestamp;
>>>>> Â Â Â Â Â Â set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
>>>>> -Â Â Â trace_dma_fence_signaled(fence);
>>>>> Â Â Â Â Â Â Â list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>>>>> Â Â Â Â Â Â Â Â Â Â INIT_LIST_HEAD(&cur->node);
>>>>
>>>
>>
>