Linaro-mm-sig July 2022

linaro-mm-sig@lists.linaro.org

29 participants
30 discussions

[syzbot] inconsistent lock state in sync_info_debugfs_show

by syzbot

Hello, syzbot found the following issue on: HEAD commit: 1c52283265a4 Merge branch 'akpm' (patches from Andrew) git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1224663fb00000 kernel config: https://syzkaller.appspot.com/x/.config?x=75bc179af0ff0457 dashboard link: https://syzkaller.appspot.com/bug?extid=007bfe0f3330f6e1e7d1 compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+007bfe0f3330f6e1e7d1(a)syzkaller.appspotmail.com ================================ WARNING: inconsistent lock state 5.16.0-syzkaller #0 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. syz-executor.2/18360 [HC0[0]:SC0[0]:HE0:SE1] takes: ffffffff8c712cf8 (sync_timeline_list_lock){?...}-{2:2}, at: spin_lock_irq include/linux/spinlock.h:374 [inline] ffffffff8c712cf8 (sync_timeline_list_lock){?...}-{2:2}, at: sync_info_debugfs_show+0x2d/0x200 drivers/dma-buf/sync_debug.c:147 {IN-HARDIRQ-W} state was registered at: lock_acquire kernel/locking/lockdep.c:5639 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162 sync_timeline_debug_remove+0x25/0x190 drivers/dma-buf/sync_debug.c:31 sync_timeline_free drivers/dma-buf/sw_sync.c:104 [inline] kref_put include/linux/kref.h:65 [inline] sync_timeline_put drivers/dma-buf/sw_sync.c:116 [inline] timeline_fence_release+0x263/0x340 drivers/dma-buf/sw_sync.c:144 dma_fence_release+0x2ee/0x590 drivers/dma-buf/dma-fence.c:549 kref_put include/linux/kref.h:65 [inline] dma_fence_put include/linux/dma-fence.h:276 [inline] dma_fence_array_release+0x1e4/0x2b0 drivers/dma-buf/dma-fence-array.c:120 dma_fence_release+0x2ee/0x590 drivers/dma-buf/dma-fence.c:549 kref_put include/linux/kref.h:65 [inline] dma_fence_put include/linux/dma-fence.h:276 [inline] irq_dma_fence_array_work+0xa5/0xd0 drivers/dma-buf/dma-fence-array.c:52 irq_work_single+0x120/0x270 kernel/irq_work.c:211 irq_work_run_list+0x91/0xc0 kernel/irq_work.c:242 irq_work_run+0x54/0xd0 kernel/irq_work.c:251 __sysvec_irq_work+0x95/0x3d0 arch/x86/kernel/irq_work.c:22 sysvec_irq_work+0x8e/0xc0 arch/x86/kernel/irq_work.c:17 asm_sysvec_irq_work+0x12/0x20 arch/x86/include/asm/idtentry.h:664 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:160 [inline] _raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:202 spin_unlock_irq include/linux/spinlock.h:399 [inline] sw_sync_debugfs_release+0x160/0x240 drivers/dma-buf/sw_sync.c:321 __fput+0x286/0x9f0 fs/file_table.c:311 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xb29/0x2a30 kernel/exit.c:806 do_group_exit+0xd2/0x2f0 kernel/exit.c:935 get_signal+0x4b0/0x28c0 kernel/signal.c:2862 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 ret_from_fork+0x15/0x30 arch/x86/entry/entry_64.S:288 irq event stamp: 124 hardirqs last enabled at (123): [<ffffffff894fd980>] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline] hardirqs last enabled at (123): [<ffffffff894fd980>] _raw_spin_unlock_irqrestore+0x50/0x70 kernel/locking/spinlock.c:194 hardirqs last disabled at (124): [<ffffffff894fd6e1>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline] hardirqs last disabled at (124): [<ffffffff894fd6e1>] _raw_spin_lock_irq+0x41/0x50 kernel/locking/spinlock.c:170 softirqs last enabled at (116): [<ffffffff81465513>] invoke_softirq kernel/softirq.c:432 [inline] softirqs last enabled at (116): [<ffffffff81465513>] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637 softirqs last disabled at (97): [<ffffffff81465513>] invoke_softirq kernel/softirq.c:432 [inline] softirqs last disabled at (97): [<ffffffff81465513>] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sync_timeline_list_lock); <Interrupt> lock(sync_timeline_list_lock); *** DEADLOCK *** 3 locks held by syz-executor.2/18360: #0: ffff88801e30c0f0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:1034 #1: ffff88807a26dd58 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xdf/0x1280 fs/seq_file.c:182 #2: ffffffff8c712cf8 (sync_timeline_list_lock){?...}-{2:2}, at: spin_lock_irq include/linux/spinlock.h:374 [inline] #2: ffffffff8c712cf8 (sync_timeline_list_lock){?...}-{2:2}, at: sync_info_debugfs_show+0x2d/0x200 drivers/dma-buf/sync_debug.c:147 stack backtrace: CPU: 0 PID: 18360 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_usage_bug kernel/locking/lockdep.c:203 [inline] valid_state kernel/locking/lockdep.c:3945 [inline] mark_lock_irq kernel/locking/lockdep.c:4148 [inline] mark_lock.cold+0x61/0x8e kernel/locking/lockdep.c:4605 mark_held_locks+0x9f/0xe0 kernel/locking/lockdep.c:4206 __trace_hardirqs_on_caller kernel/locking/lockdep.c:4224 [inline] lockdep_hardirqs_on_prepare kernel/locking/lockdep.c:4292 [inline] lockdep_hardirqs_on_prepare+0x135/0x400 kernel/locking/lockdep.c:4244 trace_hardirqs_on+0x5b/0x1c0 kernel/trace/trace_preemptirq.c:49 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline] _raw_spin_unlock_irq+0x1f/0x40 kernel/locking/spinlock.c:202 spin_unlock_irq include/linux/spinlock.h:399 [inline] sync_print_obj drivers/dma-buf/sync_debug.c:118 [inline] sync_info_debugfs_show+0xeb/0x200 drivers/dma-buf/sync_debug.c:153 seq_read_iter+0x4f5/0x1280 fs/seq_file.c:230 seq_read+0x3e8/0x5c0 fs/seq_file.c:162 vfs_read+0x1b5/0x600 fs/read_write.c:479 ksys_read+0x12d/0x250 fs/read_write.c:619 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f781fb4b059 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f781e4c0168 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007f781fc5df60 RCX: 00007f781fb4b059 RDX: 0000000000000008 RSI: 0000000020000140 RDI: 0000000000000003 RBP: 00007f781fba508d R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fff0ff6575f R14: 00007f781e4c0300 R15: 0000000000022000 </TASK> --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller(a)googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

2 years, 7 months

[PATCH v2 0/5] Move all drivers to a common dma-buf locking convention

by Dmitry Osipenko

Hello, This series moves all drivers to a dynamic dma-buf locking specification. From now on all dma-buf importers are made responsible for holding dma-buf's reservation lock around all operations performed over dma-bufs in accordance to the locking specification. This allows us to utilize reservation lock more broadly around kernel without fearing of a potential deadlocks. This patchset passes all i915 selftests. It was also tested using VirtIO, Panfrost, Lima and Tegra drivers. I tested cases of display+GPU, display+V4L and GPU+V4L dma-buf sharing, which covers majority of kernel drivers since rest of the drivers share same or similar code paths. Changelog: v2: - Changed locking specification to avoid problems with a cross-driver ww locking, like was suggested by Christian König. Now the attach/detach callbacks are invoked without the held lock and exporter should take the lock. - Added "locking convention" documentation that explains which dma-buf functions and callbacks are locked/unlocked for importers and exporters, which was requested by Christian König. - Added ack from Tomasz Figa to the V4L patches that he gave to v1. Dmitry Osipenko (5): dma-buf: Add _unlocked postfix to function names drm/gem: Take reservation lock for vmap/vunmap operations dma-buf: Move all dma-bufs to dynamic locking specification media: videobuf2: Stop using internal dma-buf lock dma-buf: Remove internal lock Documentation/driver-api/dma-buf.rst | 6 + drivers/dma-buf/dma-buf.c | 253 +++++++++++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 +- drivers/gpu/drm/armada/armada_gem.c | 14 +- drivers/gpu/drm/drm_client.c | 4 +- drivers/gpu/drm/drm_gem.c | 24 ++ drivers/gpu/drm/drm_gem_cma_helper.c | 6 +- drivers/gpu/drm/drm_gem_framebuffer_helper.c | 6 +- drivers/gpu/drm/drm_gem_shmem_helper.c | 6 +- drivers/gpu/drm/drm_prime.c | 12 +- drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 6 +- drivers/gpu/drm/exynos/exynos_drm_gem.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 14 +- .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 20 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 8 +- drivers/gpu/drm/qxl/qxl_object.c | 17 +- drivers/gpu/drm/qxl/qxl_prime.c | 4 +- drivers/gpu/drm/tegra/gem.c | 27 +- drivers/infiniband/core/umem_dmabuf.c | 11 +- .../common/videobuf2/videobuf2-dma-contig.c | 26 +- .../media/common/videobuf2/videobuf2-dma-sg.c | 23 +- .../common/videobuf2/videobuf2-vmalloc.c | 17 +- .../platform/nvidia/tegra-vde/dmabuf-cache.c | 12 +- drivers/misc/fastrpc.c | 12 +- drivers/xen/gntdev-dmabuf.c | 14 +- include/drm/drm_gem.h | 3 + include/linux/dma-buf.h | 71 ++--- 28 files changed, 372 insertions(+), 254 deletions(-) -- 2.36.1

2 years, 11 months

[PATCH] dma-buf: revert "return only unsignaled fences in dma_fence_unwrap_for_each v3"

by Christian König

This reverts commit 8f61973718485f3e89bc4f408f929048b7b47c83. It turned out that this is not correct. Especially the sync_file info IOCTL needs to see even signaled fences to correctly report back their status to userspace. Instead add the filter in the merge function again where it makes sense. Signed-off-by: Christian König <christian.koenig(a)amd.com> --- drivers/dma-buf/dma-fence-unwrap.c | 3 ++- include/linux/dma-fence-unwrap.h | 6 +----- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c index 502a65ea6d44..7002bca792ff 100644 --- a/drivers/dma-buf/dma-fence-unwrap.c +++ b/drivers/dma-buf/dma-fence-unwrap.c @@ -72,7 +72,8 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences, count = 0; for (i = 0; i < num_fences; ++i) { dma_fence_unwrap_for_each(tmp, &iter[i], fences[i]) - ++count; + if (!dma_fence_is_signaled(tmp)) + ++count; } if (count == 0) diff --git a/include/linux/dma-fence-unwrap.h b/include/linux/dma-fence-unwrap.h index 390de1ee9d35..66b1e56fbb81 100644 --- a/include/linux/dma-fence-unwrap.h +++ b/include/linux/dma-fence-unwrap.h @@ -43,14 +43,10 @@ struct dma_fence *dma_fence_unwrap_next(struct dma_fence_unwrap *cursor); * Unwrap dma_fence_chain and dma_fence_array containers and deep dive into all * potential fences in them. If @head is just a normal fence only that one is * returned. - * - * Note that signalled fences are opportunistically filtered out, which - * means the iteration is potentially over no fence at all. */ #define dma_fence_unwrap_for_each(fence, cursor, head) \ for (fence = dma_fence_unwrap_first(head, cursor); fence; \ - fence = dma_fence_unwrap_next(cursor)) \ - if (!dma_fence_is_signaled(fence)) + fence = dma_fence_unwrap_next(cursor)) struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences, struct dma_fence **fences, -- 2.25.1

2 years, 11 months

[PATCH] dma-buf/dma_resv_usage: update explicit sync documentation

by Christian König

Make it clear that DMA_RESV_USAGE_BOOKMARK can be used for explicit synced user space submissions as well and document the rules around adding the same fence with different usages. Signed-off-by: Christian König <christian.koenig(a)amd.com> --- include/linux/dma-resv.h | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index c8ccbc94d5d2..264e27e56dff 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -62,6 +62,11 @@ struct dma_resv_list; * For example when asking for WRITE fences then the KERNEL fences are returned * as well. Similar when asked for READ fences then both WRITE and KERNEL * fences are returned as well. + * + * Already used fences can be promoted in the sense that a fence with + * DMA_RESV_USAGE_BOOKMARK could become DMA_RESV_USAGE_READ by adding it again + * with this usage. But fences can never be degraded in the sense that a fence + * with DMA_RESV_USAGE_WRITE could become DMA_RESV_USAGE_READ. */ enum dma_resv_usage { /** @@ -98,10 +103,15 @@ enum dma_resv_usage { * @DMA_RESV_USAGE_BOOKKEEP: No implicit sync. * * This should be used by submissions which don't want to participate in - * implicit synchronization. + * any implicit synchronization. + * + * The most common case are preemption fences, page table updates, TLB + * flushes as well as explicit synced user submissions. * - * The most common case are preemption fences as well as page table - * updates and their TLB flushes. + * Explicit synced user user submissions can be promoted to + * DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE as needed using + * dma_buf_import_sync_file() when implicit synchronization should + * become necessary after initial adding of the fence. */ DMA_RESV_USAGE_BOOKKEEP }; -- 2.25.1

2 years, 11 months

[PATCH] dma-buf/sync_file: use strscpy to replace strlcpy

by XueBing Chen

The strlcpy should not be used because it doesn't limit the source length. Preferred is strscpy. Signed-off-by: XueBing Chen <chenxuebing(a)jari.cn> --- drivers/dma-buf/sync_file.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c index 3ebec19a8e02..af57799c86ce 100644 --- a/drivers/dma-buf/sync_file.c +++ b/drivers/dma-buf/sync_file.c @@ -132,7 +132,7 @@ EXPORT_SYMBOL(sync_file_get_fence); char *sync_file_get_name(struct sync_file *sync_file, char *buf, int len) { if (sync_file->user_name[0]) { - strlcpy(buf, sync_file->user_name, len); + strscpy(buf, sync_file->user_name, len); } else { struct dma_fence *fence = sync_file->fence; @@ -172,7 +172,7 @@ static struct sync_file *sync_file_merge(const char *name, struct sync_file *a, return NULL; } sync_file->fence = fence; - strlcpy(sync_file->user_name, name, sizeof(sync_file->user_name)); + strscpy(sync_file->user_name, name, sizeof(sync_file->user_name)); return sync_file; } @@ -262,9 +262,9 @@ static long sync_file_ioctl_merge(struct sync_file *sync_file, static int sync_fill_fence_info(struct dma_fence *fence, struct sync_fence_info *info) { - strlcpy(info->obj_name, fence->ops->get_timeline_name(fence), + strscpy(info->obj_name, fence->ops->get_timeline_name(fence), sizeof(info->obj_name)); - strlcpy(info->driver_name, fence->ops->get_driver_name(fence), + strscpy(info->driver_name, fence->ops->get_driver_name(fence), sizeof(info->driver_name)); info->status = dma_fence_get_status(fence); -- 2.25.1

2 years, 11 months

DMA-buf and uncached system memory

by Christian König

Hi guys, we are currently working an Freesync and direct scan out from system memory on AMD APUs in A+A laptops. On problem we stumbled over is that our display hardware needs to scan out from uncached system memory and we currently don't have a way to communicate that through DMA-buf. For our specific use case at hand we are going to implement something driver specific, but the question is should we have something more generic for this? After all the system memory access pattern is a PCIe extension and as such something generic. Regards, Christian.

2 years, 11 months

[PATCH 0/1] [RFC] drm/fourcc: Add new unsigned R16_UINT/RG1616_UINT formats

by Dennis Tsiang

This patch is an early RFC to discuss the viable options and alternatives for inclusion of unsigned integer formats for the DRM API. This patch adds a new single component 16-bit and a two component 32-bit DRM fourcc’s that represent unsigned integer formats. The use case for needing UINT formats, in our case, would be to support using raw buffers for camera ISPs. For images imported with DRM fourcc + modifier combination, the GPU driver needs a way to determine the datatype of the format which currently the DRM API does not provide explicitly with a notable exception of the floating-point fourccs such as DRM_FORMAT_XRGB16161616F as an example. As the DRM fourccs do not currently define the interpretation of the data, should the information be made explicit in the DRM API similarly to how it is already done in Vulkan? The reason for introducing datatype to the DRM fourcc's is that the alternative, for any API (e.g., EGL) that lacks the format datatype information for fourcc/modifier combination for dma_buf interop would be to introduce explicit additional metadata/attributes that encode this information which then would be passed to the GPU driver but the drawback of this is that it would require extending multiple graphics APIs to support every single platform. By having the DRM API expose the datatype information for formats saves a lot of integration/verification work for all of the different graphics APIs and platforms as this information could be determined by the DRM triple alone for dma_buf interop. It would be good to gather some opinions on what others think about introducing datatypes to the DRM API. Any feedback and suggestions are highly appreciated. Dennis Tsiang (1): [RFC] drm/fourcc: Add new unsigned R16_UINT/RG1616_UINT formats include/uapi/drm/drm_fourcc.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) -- 2.36.1 IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2 years, 11 months

[PATCH 0/3] dma-buf: map-info support

by Rob Clark

From: Rob Clark <robdclark(a)chromium.org> See 1/3 for motivation. Rob Clark (3): dma-buf: Add ioctl to query mmap info drm/prime: Wire up mmap_info support drm/msm/prime: Add mmap_info support drivers/dma-buf/dma-buf.c | 26 ++++++++++++++++++++++++++ drivers/gpu/drm/drm_prime.c | 12 ++++++++++++ drivers/gpu/drm/msm/msm_drv.c | 1 + drivers/gpu/drm/msm/msm_drv.h | 1 + drivers/gpu/drm/msm/msm_gem_prime.c | 11 +++++++++++ include/drm/drm_drv.h | 7 +++++++ include/linux/dma-buf.h | 7 +++++++ include/uapi/linux/dma-buf.h | 28 ++++++++++++++++++++++++++++ 8 files changed, 93 insertions(+) -- 2.36.1

2 years, 11 months

[PATCH] [Draft]: media: videobuf2-dma-heap: add a vendor defined memory runtine

by ayaka＠soulik.info

From: Randy Li <ayaka(a)soulik.info> This module is still at a early stage, I wrote this for showing what APIs we need here. Let me explain why we need such a module here. If you won't allocate buffers from a V4L2 M2M device, this module may not be very useful. I am sure the most of users won't know a device would require them allocate buffers from a DMA-Heap then import those buffers into a V4L2's queue. Then the question goes back to why DMA-Heap. From the Android's description, we know it is about the copyright's DRM. When we allocate a buffer in a DMA-Heap, it may register that buffer in the trusted execution environment so the firmware which is running or could only be acccesed from there could use that buffer later. The answer above leads to another thing which is not done in this version, the DMA mapping. Although in some platforms, a DMA-Heap responses a IOMMU device as well. For the genernal purpose, we would be better assuming the device mapping should be done for each device itself. The problem here we only know alloc_devs in those DMAbuf methods, which are DMA-heaps in my design, the device from the queue is not enough, a plane may requests another IOMMU device or table for mapping. Signed-off-by: Randy Li <ayaka(a)soulik.info> --- drivers/media/common/videobuf2/Kconfig | 6 + drivers/media/common/videobuf2/Makefile | 1 + .../common/videobuf2/videobuf2-dma-heap.c | 350 ++++++++++++++++++ include/media/videobuf2-dma-heap.h | 30 ++ 4 files changed, 387 insertions(+) create mode 100644 drivers/media/common/videobuf2/videobuf2-dma-heap.c create mode 100644 include/media/videobuf2-dma-heap.h diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig index d2223a12c95f..02235077f07e 100644 --- a/drivers/media/common/videobuf2/Kconfig +++ b/drivers/media/common/videobuf2/Kconfig @@ -30,3 +30,9 @@ config VIDEOBUF2_DMA_SG config VIDEOBUF2_DVB tristate select VIDEOBUF2_CORE + +config VIDEOBUF2_DMA_HEAP + tristate + select VIDEOBUF2_CORE + select VIDEOBUF2_MEMOPS + select DMABUF_HEAPS diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile index a6fe3f304685..7fe65f93117f 100644 --- a/drivers/media/common/videobuf2/Makefile +++ b/drivers/media/common/videobuf2/Makefile @@ -10,6 +10,7 @@ endif # (e. g. LC_ALL=C sort Makefile) obj-$(CONFIG_VIDEOBUF2_CORE) += videobuf2-common.o obj-$(CONFIG_VIDEOBUF2_DMA_CONTIG) += videobuf2-dma-contig.o +obj-$(CONFIG_VIDEOBUF2_DMA_HEAP) += videobuf2-dma-heap.o obj-$(CONFIG_VIDEOBUF2_DMA_SG) += videobuf2-dma-sg.o obj-$(CONFIG_VIDEOBUF2_DVB) += videobuf2-dvb.o obj-$(CONFIG_VIDEOBUF2_MEMOPS) += videobuf2-memops.o diff --git a/drivers/media/common/videobuf2/videobuf2-dma-heap.c b/drivers/media/common/videobuf2/videobuf2-dma-heap.c new file mode 100644 index 000000000000..377b82ab8f5a --- /dev/null +++ b/drivers/media/common/videobuf2/videobuf2-dma-heap.c @@ -0,0 +1,350 @@ +/* + * Copyright (C) 2022 Randy Li <ayaka(a)soulik.info> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/dma-buf.h> +#include <linux/dma-heap.h> +#include <linux/refcount.h> +#include <linux/scatterlist.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/dma-mapping.h> + +#include <media/videobuf2-v4l2.h> +#include <media/videobuf2-memops.h> +#include <media/videobuf2-dma-heap.h> + +struct vb2_dmaheap_buf { + struct device *dev; + void *vaddr; + unsigned long size; + struct dma_buf *dmabuf; + dma_addr_t dma_addr; + unsigned long attrs; + enum dma_data_direction dma_dir; + struct sg_table *dma_sgt; + + /* MMAP related */ + struct vb2_vmarea_handler handler; + refcount_t refcount; + + /* DMABUF related */ + struct dma_buf_attachment *db_attach; +}; + +/*********************************************/ +/* callbacks for all buffers */ +/*********************************************/ + +void *vb2_dmaheap_cookie(struct vb2_buffer *vb, void *buf_priv) +{ + struct vb2_dmaheap_buf *buf = buf_priv; + + return &buf->dma_addr; +} + +static void *vb2_dmaheap_vaddr(struct vb2_buffer *vb, void *buf_priv) +{ + struct vb2_dmaheap_buf *buf = buf_priv; + struct iosys_map map; + + if (buf->vaddr) + return buf->vaddr; + + if (buf->db_attach) { + if (!dma_buf_vmap(buf->db_attach->dmabuf, &map)) + buf->vaddr = map.vaddr; + } + + return buf->vaddr; +} + +static unsigned int vb2_dmaheap_num_users(void *buf_priv) +{ + struct vb2_dmaheap_buf *buf = buf_priv; + + return refcount_read(&buf->refcount); +} + +static void vb2_dmaheap_prepare(void *buf_priv) +{ + struct vb2_dmaheap_buf *buf = buf_priv; + + /* TODO: DMABUF exporter will flush the cache for us */ + if (buf->db_attach) + return; + + dma_buf_end_cpu_access(buf->dmabuf, buf->dma_dir); +} + +static void vb2_dmaheap_finish(void *buf_priv) +{ + struct vb2_dmaheap_buf *buf = buf_priv; + + /* TODO: DMABUF exporter will flush the cache for us */ + if (buf->db_attach) + return; + + dma_buf_begin_cpu_access(buf->dmabuf, buf->dma_dir); +} + +/*********************************************/ +/* callbacks for MMAP buffers */ +/*********************************************/ + +void vb2_dmaheap_put(void *buf_priv) +{ + struct vb2_dmaheap_buf *buf = buf_priv; + + if (!refcount_dec_and_test(&buf->refcount)) + return; + + dma_buf_put(buf->dmabuf); + + put_device(buf->dev); + kfree(buf); +} + +static void *vb2_dmaheap_alloc(struct vb2_buffer *vb, + struct device *dev, + unsigned long size) +{ + struct vb2_queue *q = vb->vb2_queue; + struct dma_heap *heap; + struct vb2_dmaheap_buf *buf; + const char *heap_name; + int ret; + + if (WARN_ON(!dev)) + return ERR_PTR(-EINVAL); + + heap_name = dev_name(dev); + if (!heap_name) + return ERR_PTR(-EINVAL); + + heap = dma_heap_find(heap_name); + if (!heap) { + dev_err(dev, "is not a DMA-heap device\n"); + return ERR_PTR(-EINVAL); + } + + buf = kzalloc(sizeof *buf, GFP_KERNEL); + if (!buf) + return ERR_PTR(-ENOMEM); + + /* Prevent the device from being released while the buffer is used */ + buf->dev = get_device(dev); + buf->attrs = vb->vb2_queue->dma_attrs; + buf->dma_dir = vb->vb2_queue->dma_dir; + + /* TODO: heap flags */ + ret = dma_heap_buffer_alloc(heap, size, 0, 0); + if (ret < 0) { + dev_err(dev, "is not a DMA-heap device\n"); + put_device(buf->dev); + kfree(buf); + return ERR_PTR(ret); + } + buf->dmabuf = dma_buf_get(ret); + + /* FIXME */ + buf->dma_addr = 0; + + if ((q->dma_attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0) + buf->vaddr = buf->dmabuf; + + buf->handler.refcount = &buf->refcount; + buf->handler.put = vb2_dmaheap_put; + buf->handler.arg = buf; + + refcount_set(&buf->refcount, 1); + + return buf; +} + +static int vb2_dmaheap_mmap(void *buf_priv, struct vm_area_struct *vma) +{ + struct vb2_dmaheap_buf *buf = buf_priv; + int ret; + + if (!buf) { + printk(KERN_ERR "No buffer to map\n"); + return -EINVAL; + } + + vma->vm_flags &= ~VM_PFNMAP; + + ret = dma_buf_mmap(buf->dmabuf, vma, 0); + if (ret) { + pr_err("Remapping memory failed, error: %d\n", ret); + return ret; + } + vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; + vma->vm_private_data = &buf->handler; + vma->vm_ops = &vb2_common_vm_ops; + + vma->vm_ops->open(vma); + + pr_debug("%s: mapped memid 0x%08lx at 0x%08lx, size %ld\n", + __func__, (unsigned long)buf->dma_addr, vma->vm_start, + buf->size); + + return 0; +} + +/*********************************************/ +/* DMABUF ops for exporters */ +/*********************************************/ + +static struct dma_buf *vb2_dmaheap_get_dmabuf(struct vb2_buffer *vb, + void *buf_priv, + unsigned long flags) +{ + struct vb2_dmaheap_buf *buf = buf_priv; + struct dma_buf *dbuf; + + dbuf = buf->dmabuf; + + return dbuf; +} + +/*********************************************/ +/* callbacks for DMABUF buffers */ +/*********************************************/ + +static int vb2_dmaheap_map_dmabuf(void *mem_priv) +{ + struct vb2_dmaheap_buf *buf = mem_priv; + struct sg_table *sgt; + + if (WARN_ON(!buf->db_attach)) { + pr_err("trying to pin a non attached buffer\n"); + return -EINVAL; + } + + if (WARN_ON(buf->dma_sgt)) { + pr_err("dmabuf buffer is already pinned\n"); + return 0; + } + + /* get the associated scatterlist for this buffer */ + sgt = dma_buf_map_attachment(buf->db_attach, buf->dma_dir); + if (IS_ERR(sgt)) { + pr_err("Error getting dmabuf scatterlist\n"); + return -EINVAL; + } + + buf->dma_addr = sg_dma_address(sgt->sgl); + buf->dma_sgt = sgt; + buf->vaddr = NULL; + + return 0; +} + +static void vb2_dmaheap_unmap_dmabuf(void *mem_priv) +{ + struct vb2_dmaheap_buf *buf = mem_priv; + struct sg_table *sgt = buf->dma_sgt; + struct iosys_map map = IOSYS_MAP_INIT_VADDR(buf->vaddr); + + if (WARN_ON(!buf->db_attach)) { + pr_err("trying to unpin a not attached buffer\n"); + return; + } + + if (WARN_ON(!sgt)) { + pr_err("dmabuf buffer is already unpinned\n"); + return; + } + + if (buf->vaddr) { + dma_buf_vunmap(buf->db_attach->dmabuf, &map); + buf->vaddr = NULL; + } + dma_buf_unmap_attachment(buf->db_attach, sgt, buf->dma_dir); + + buf->dma_addr = 0; + buf->dma_sgt = NULL; +} + +static void vb2_dmaheap_detach_dmabuf(void *mem_priv) +{ + struct vb2_dmaheap_buf *buf = mem_priv; + + /* if vb2 works correctly you should never detach mapped buffer */ + if (WARN_ON(buf->dma_addr)) + vb2_dmaheap_unmap_dmabuf(buf); + + /* detach this attachment */ + dma_buf_detach(buf->db_attach->dmabuf, buf->db_attach); + kfree(buf); +} + +static void *vb2_dmaheap_attach_dmabuf(struct vb2_buffer *vb, struct device *dev, + struct dma_buf *dbuf, unsigned long size) +{ + struct vb2_dmaheap_buf *buf; + struct dma_buf_attachment *dba; + + if (dbuf->size < size) + return ERR_PTR(-EFAULT); + + if (WARN_ON(!dev)) + return ERR_PTR(-EINVAL); + /* + * TODO: A better way to check whether the buffer is coming + * from this heap or this heap could accept this buffer + */ + if (strcmp(dbuf->exp_name, dev_name(dev))) + return ERR_PTR(-EINVAL); + + buf = kzalloc(sizeof(*buf), GFP_KERNEL); + if (!buf) + return ERR_PTR(-ENOMEM); + + buf->dev = dev; + /* create attachment for the dmabuf with the user device */ + dba = dma_buf_attach(dbuf, buf->dev); + if (IS_ERR(dba)) { + pr_err("failed to attach dmabuf\n"); + kfree(buf); + return dba; + } + + buf->dma_dir = vb->vb2_queue->dma_dir; + buf->size = size; + buf->db_attach = dba; + + return buf; +} + +const struct vb2_mem_ops vb2_dmaheap_memops = { + .alloc = vb2_dmaheap_alloc, + .put = vb2_dmaheap_put, + .get_dmabuf = vb2_dmaheap_get_dmabuf, + .cookie = vb2_dmaheap_cookie, + .vaddr = vb2_dmaheap_vaddr, + .prepare = vb2_dmaheap_prepare, + .finish = vb2_dmaheap_finish, + .map_dmabuf = vb2_dmaheap_map_dmabuf, + .unmap_dmabuf = vb2_dmaheap_unmap_dmabuf, + .attach_dmabuf = vb2_dmaheap_attach_dmabuf, + .detach_dmabuf = vb2_dmaheap_detach_dmabuf, + .num_users = vb2_dmaheap_num_users, + .mmap = vb2_dmaheap_mmap, +}; + +MODULE_DESCRIPTION("DMA-Heap memory handling routines for videobuf2"); +MODULE_AUTHOR("Randy Li <ayaka(a)soulik.info>"); +MODULE_LICENSE("GPL"); +MODULE_IMPORT_NS(DMA_BUF); diff --git a/include/media/videobuf2-dma-heap.h b/include/media/videobuf2-dma-heap.h new file mode 100644 index 000000000000..fa057f67d6e9 --- /dev/null +++ b/include/media/videobuf2-dma-heap.h @@ -0,0 +1,30 @@ +/* + * Copyright (C) 2022 Randy Li <ayaka(a)soulik.info> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef _MEDIA_VIDEOBUF2_DMA_HEAP_H +#define _MEDIA_VIDEOBUF2_DMA_HEAP_H + +#include <media/videobuf2-v4l2.h> +#include <linux/dma-mapping.h> + +static inline dma_addr_t +vb2_dmaheap_plane_dma_addr(struct vb2_buffer *vb, unsigned int plane_no) +{ + dma_addr_t *addr = vb2_plane_cookie(vb, plane_no); + + return *addr; +} + +extern const struct vb2_mem_ops vb2_dmaheap_memops; +#endif -- 2.17.1

2 years, 11 months

[PATCH v3 0/6] drm/i915: reduce TLB performance regressions

by Mauro Carvalho Chehab

Doing TLB invalidation cause performance regressions, like: [424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! As reported at: https://gitlab.freedesktop.org/drm/intel/-/issues/6424 as this is an expensive operation. So, reduce the need of it by: - checking if the engine is awake; - checking if the engine is not wedged; - batching operations. Additionally, add a workaround for a known hardware issue on some GPUs. In order to double-check that this series won't be introducing any regressions, I used this new IGT test: https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1 Checking the results for 3 different patchsets, on Broadwell: 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB invalidation and serialization patches: $ sudo build/tests/gem_exec_tlb|grep Subtest Subtest close-clear: SUCCESS (10.490s) Subtest madv-clear: SUCCESS (10.484s) Subtest u-unmap-clear: SUCCESS (10.527s) Subtest u-shrink-clear: SUCCESS (10.506s) Subtest close-dumb: SUCCESS (10.165s) Subtest madv-dumb: SUCCESS (10.177s) Subtest u-unmap-dumb: SUCCESS (10.172s) Subtest u-shrink-dumb: SUCCESS (10.172s) 2) With the new version of the batch TLB invalidation patches from this series: $ sudo build/tests/gem_exec_tlb|grep Subtest Subtest close-clear: SUCCESS (10.483s) Subtest madv-clear: SUCCESS (10.495s) Subtest u-unmap-clear: SUCCESS (10.545s) Subtest u-shrink-clear: SUCCESS (10.508s) Subtest close-dumb: SUCCESS (10.172s) Subtest madv-dumb: SUCCESS (10.169s) Subtest u-unmap-dumb: SUCCESS (10.174s) Subtest u-shrink-dumb: SUCCESS (10.176s) 3) Changing the TLB invalidation routine to do nothing[1]: $ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest (gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries! (gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries! (gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries! (gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries! (gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries! (gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries! (gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries! (gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries! Dynamic subtest smem0 failed. **** DEBUG **** (gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b (gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0 (gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef (gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b **** END **** Subtest close-clear: FAIL (10.434s) Subtest madv-clear: SUCCESS (10.479s) Subtest u-unmap-clear: SUCCESS (10.512s) In summary, the test does properly detect fail when TLB cache invalidation doesn't happen, as shown at result (3). It also shows that both current drm-tip and drm-tip with this series applied don't have TLB invalidation cache issues. [1] I applied this patch on the top of drm-tip: diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 68c2b0d8f187..0aefcd7be5e9 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) + // HACK: don't do TLB invalidations!!! + return; + Regards, Mauro Chris Wilson (4): drm/i915/gt: Ignore TLB invalidations on idle engines drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations drm/i915/gt: Skip TLB invalidations once wedged drm/i915/gt: Batch TLB invalidations Mauro Carvalho Chehab (2): drm/i915/gt: document with_intel_gt_pm_if_awake() drm/i915/gt: describe the new tlb parameter at i915_vma_resource .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 25 +++--- drivers/gpu/drm/i915/gt/intel_gt.c | 77 +++++++++++++++---- drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++- drivers/gpu/drm/i915/gt/intel_gt_pm.h | 11 +++ drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +- drivers/gpu/drm/i915/i915_vma.c | 33 ++++++-- drivers/gpu/drm/i915/i915_vma.h | 1 + drivers/gpu/drm/i915/i915_vma_resource.c | 9 ++- drivers/gpu/drm/i915/i915_vma_resource.h | 6 +- 11 files changed, 163 insertions(+), 40 deletions(-) -- 2.36.1

2 years, 11 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig July 2022