+Cc Danilo
On Thu, 2026-06-18 at 15:03 +0100, André Draszik wrote:
Since commit 541c8f2468b9 ("dma-buf: detach fence ops on signal v3"), I'm seeing the BUG_ON() triggering in drm_crtc's fence_to_crtc() via drm_crtc_fence_get_driver_name() regularly:
Call trace: panic+0x58/0x5c die+0x160/0x178 bug_brk_handler+0x70/0xa4 call_el1_break_hook+0x3c/0x1a0 do_el1_brk64+0x24/0x74 el1_brk64+0x34/0x54 el1h_64_sync_handler+0x80/0xfc el1h_64_sync+0x84/0x88 drm_crtc_fence_get_driver_name+0x60/0x68 (P) sync_file_get_name+0x184/0x45c sync_file_ioctl+0x404/0xf70 __arm64_sys_ioctl+0x124/0x1dc
This looks to be caused by a code flow similar to the following:
+++ snip +++ thread A thread B
ioctl(SYNC_IOC_FILE_INFO) sync_file_ioctl() sync_file_get_name() dma_fence_signal_timestamp_locked() dma_fence_driver_name() ops = rcu_dereference(fence->ops) if (!dma_fence_test_signaled_flag()) ops->get_driver_name(fence) i.e. drm_crtc_fence_get_driver_name() test_and_set_bit(SIGNALED) RCU_INIT_POINTER(fence->ops, NULL) drm_crtc_fence_get_driver_name() BUG_ON(rcu_access_pointer(fence->ops) != &drm_crtc_fence_ops)
Now this looks like a very similar problem that I have recently been concerned with:
https://lore.kernel.org/dri-devel/20260612104251.2264707-2-phasta@kernel.org...
https://lore.kernel.org/dri-devel/fa0dc9757bf8343516c4b156a2b70ec91b64ef8f.c...
I continue to believe because of bugs like this and the ones I have quoted in the threads above the robustness of the kernel could be greatly improved if we could get dma_fence fully synchronized with its lock.
That said:
+++ snap +++
I see two ways to resolve this: a) simply drop the BUG_ON(). It can not work anymore since above commit, as it is racy now. b) pass the original 'ops' pointer obtained in dma_fence_driver_name() to all callees.
This patch implements option a), as because:
- I don't see much benefit in passing the extra pointer just for this
BUG_ON() to work.
- Requiring the dma_fence_ops in those callbacks is an implementation
detail of the drm_crtc driver, and therefore upper layers shouldn't have to care about that.
- The existence of the BUG_ON() doesn't appear to be consistent with
implementations of ::get_driver_name() or ::get_timeline_name() in the majority of other DRM drivers in the first place. Those that do have a similar BUG_ON() (i915, xe) probably also need an update similar to this patch here but I'm not in a position to test those.
Note that the adjacent drm_crtc_fence_get_timeline_name() has the same problem and is fixed by this patch as well.
Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3") Signed-off-by: André Draszik andre.draszik@linaro.org
drivers/gpu/drm/drm_crtc.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 63ead8ba6756..31c8636e7467 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -73,6 +73,9 @@ * &drm_mode_config_funcs.atomic_check. */ +#define fence_to_crtc(f) container_of((f)->extern_lock, \
struct drm_crtc, fence_lock)
I agree that macros should be avoided if possible.
/** * drm_crtc_from_index - find the registered CRTC at an index * @dev: DRM device @@ -154,14 +157,6 @@ static void drm_crtc_crc_fini(struct drm_crtc *crtc) #endif } -static const struct dma_fence_ops drm_crtc_fence_ops;
-static struct drm_crtc *fence_to_crtc(struct dma_fence *fence) -{
- BUG_ON(rcu_access_pointer(fence->ops) != &drm_crtc_fence_ops);
+1
BUG_ON is more or less deprecated and should not be used anymore. There needs to be bombastic justification for shooting down the entire kernel.
P.
- return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
-}
static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence) { struct drm_crtc *crtc = fence_to_crtc(fence);
base-commit: e2cae00c05d196491c318196792297f2dfbaa02c change-id: 20260618-linux-drm_crtc_fix2-23a7c354a412
Best regards,
linaro-mm-sig@lists.linaro.org