Linaro-mm-sig July 2025

linaro-mm-sig@lists.linaro.org

20 participants
66 discussions

Re: DMA-BUFs always uncached on arm64, causing poor camera performance on Librem 5

by Laurent Pinchart

On Thu, Jul 10, 2025 at 10:49:19AM +0200, Pavel Machek wrote: > Hi! > > > > memcpy() from normal memory is about 2msec/1MB. Unfortunately, for > > > DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do > > > 760p video recording. Plus, copying full-resolution photo buffer takes > > > more than 200msec! > > > > > > There's possibility to do some processing on GPU, and its implemented here: > > > > > > https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads > > > > > > but that hits the same problem in the end -- data is in DMA-BUF, > > > uncached, and takes way too long to copy out. > > > > > > And that's ... wrong. DMA ended seconds ago, complete cache flush > > > would be way cheaper than copying single frame out, and I still have > > > to deal with uncached frames. > > > > > > So I have two questions: > > > > > > 1) Is my analysis correct that, no matter how I get frame from v4l and > > > process it on GPU, I'll have to copy it from uncached memory in the > > > end? > > > > If you need to touch the buffers using the CPU then you are either > > stuck with uncached memory or you need to implement bracketed access to > > do the necessary cache maintenance. Be aware that completely flushing > > the cache is not really an option, as that would impact other > > workloads, so you have to flush the cache by walking the virtual > > address space of the buffer, which may take a significant amount of CPU > > time. > > What kind of "significant amount of CPU time" are we talking here? > Millisecond? It really depends on the platform, the type of cache, and the size of the buffer. I remember that back in the N900 days a selective cash clean of a large buffer for full resolution images took several dozens of milliseconds, possibly close to 100ms. We had to clean the whole D-cache to make it fast enough, but you can't always do that as Lucas mentioned. > Bracketed access is fine with me. > > Flushing a cache should be an option. I'm root, there's no other > significant workload, and copying out the buffer takes 200msec+. There > are lot of cache flushes that can be done in quarter a second! > > > However, if you are only going to use the buffer with the GPU I see no > > reason to touch it from the CPU side. Why would you even need to copy > > the content? After all dma-bufs are meant to enable zero-copy between > > DMA capable accelerators. You can simply import the V4L2 buffer into a > > GL texture using EGL_EXT_image_dma_buf_import. Using this path you > > don't need to bother with the cache at all, as the GPU will directly > > read the video buffers from RAM. > > Yes, so GPU will read video buffer from RAM, then debayer it, and then > what? Then I need to store a data into raw file, or use CPU to turn it > into JPEG file, or maybe run video encoder on it. That are all tasks > that are done on CPU... -- Regards, Laurent Pinchart

1 week, 3 days

Re: DMA-BUFs always uncached on arm64, causing poor camera performance on Librem 5

by Nicolas Dufresne

Hi Pavel, Le jeudi 10 juillet 2025 à 10:24 +0200, Pavel Machek a écrit : > Hi! > > It seems that DMA-BUFs are always uncached on arm64... which is a > problem. > > I'm trying to get useful camera support on Librem 5, and that includes > recording vidos (and taking photos). > > memcpy() from normal memory is about 2msec/1MB. Unfortunately, for > DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do > 760p video recording. Plus, copying full-resolution photo buffer takes > more than 200msec! > > There's possibility to do some processing on GPU, and its implemented here: > > https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads > > but that hits the same problem in the end -- data is in DMA-BUF, > uncached, and takes way too long to copy out. > > And that's ... wrong. DMA ended seconds ago, complete cache flush > would be way cheaper than copying single frame out, and I still have > to deal with uncached frames. > > So I have two questions: > > 1) Is my analysis correct that, no matter how I get frame from v4l and > process it on GPU, I'll have to copy it from uncached memory in the > end? > > 2) Does anyone have patches / ideas / roadmap how to solve that? It > makes GPU unusable for computing, and camera basically unusable for > video. If CPU access is strictly required for your use case, the way forward is to implement V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINT in the capture driver. Very little drivers enable that. Once your driver have that capability, you will be able to set V4L2_MEMORY_FLAG_NON_COHERENT while doing REQBUFS or CREATE_BUFS ioctl. That gives you allocation with CPU cache working, but you'll get the invalidation (or flush) overhead by default. When capture data have not been read by CPU, you can always queue it back with the V4L2_BUF_FLAG_NO_CACHE_INVALIDATE. But for your use case, it seems that you want the invalidation to take place, otherwise your software will endup reading old cache data instead of the next frame data. Please note that the integration in the DMABuf SYNC ioctl was missing for a while, so make sure you have recent enough kernel or get ready for backports. The feature itself was commonly used with CPU only access, notably on ChromeOS using libyuv. No DMABuf was involved initially. regards, Nicolas [0] https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/vidioc-reqbu… > > Best regards, > Pavel

1 week, 4 days

[PATCH v2] Documentation: dma-buf: heaps: Add naming guidelines

by Maxime Ripard

We've discussed a number of times of how some heap names are bad, but not really what makes a good heap name. Let's document what we expect the heap names to look like. Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com> Signed-off-by: Maxime Ripard <mripard(a)kernel.org> --- Changes in v2: - Added justifications for each requirement / suggestions - Added a mention and example of buffer attributes - Link to v1: https://lore.kernel.org/r/20250520-dma-buf-heap-names-doc-v1-1-ab31f74809ee… --- Documentation/userspace-api/dma-buf-heaps.rst | 38 +++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst index 535f49047ce6450796bf4380c989e109355efc05..835ad1c3a65bc07b6f41d387d85c57162909e859 100644 --- a/Documentation/userspace-api/dma-buf-heaps.rst +++ b/Documentation/userspace-api/dma-buf-heaps.rst @@ -21,5 +21,43 @@ following heaps: usually created either through the kernel commandline through the `cma` parameter, a memory region Device-Tree node with the `linux,cma-default` property set, or through the `CMA_SIZE_MBYTES` or `CMA_SIZE_PERCENTAGE` Kconfig options. Depending on the platform, it might be called ``reserved``, ``linux,cma``, or ``default-pool``. + +Naming Convention +================= + +``dma-buf`` heaps name should meet a number of constraints: + +- That name must be stable, and must not change from one version to the + other. Userspace identifies heaps by their name, so if the names ever + changes, we would be likely to introduce regressions. + +- That name must describe the memory region the heap will allocate from, + and must uniquely identify it in a given platform. Since userspace + applications use the heap name as the discriminant, it must be able to + tell which heap it wants to use reliably if there's multiple heaps. + +- That name must not mention implementation details, such as the + allocator. The heap driver will change over time, and implementation + details when it was introduced might not be relevant in the future. + +- The name should describe properties of the buffers that would be + allocated. Doing so will make heap identification easier for + userspace. Such properties are: + + - ``cacheable`` / ``uncacheable`` for buffers with CPU caches enabled + or disabled; + + - ``contiguous`` for physically contiguous buffers; + + - ``protected`` for encrypted buffers not accessible the OS; + +- The name may describe intended usage. Doing so will make heap + identification easier for userspace applications and users. + +For example, assuming a platform with a reserved memory region located +at the RAM address 0x42000000, intended to allocate video framebuffers, +physically contiguous, and backed by the CMA kernel allocator. Good +names would be ``memory@42000000-cacheable-contiguous`` or +``video@42000000``, but ``cma-video`` wouldn't. --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250520-dma-buf-heap-names-doc-31261aa0cfe6 Best regards, -- Maxime Ripard <mripard(a)kernel.org>

1 week, 4 days

Re: DMA-BUFs always uncached on arm64, causing poor camera performance on Librem 5

by Lucas Stach

Hi Pavel, Am Donnerstag, dem 10.07.2025 um 10:24 +0200 schrieb Pavel Machek: > Hi! > > It seems that DMA-BUFs are always uncached on arm64... which is a > problem. > > I'm trying to get useful camera support on Librem 5, and that includes > recording vidos (and taking photos). > > memcpy() from normal memory is about 2msec/1MB. Unfortunately, for > DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do > 760p video recording. Plus, copying full-resolution photo buffer takes > more than 200msec! > > There's possibility to do some processing on GPU, and its implemented here: > > https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads > > but that hits the same problem in the end -- data is in DMA-BUF, > uncached, and takes way too long to copy out. > > And that's ... wrong. DMA ended seconds ago, complete cache flush > would be way cheaper than copying single frame out, and I still have > to deal with uncached frames. > > So I have two questions: > > 1) Is my analysis correct that, no matter how I get frame from v4l and > process it on GPU, I'll have to copy it from uncached memory in the > end? If you need to touch the buffers using the CPU then you are either stuck with uncached memory or you need to implement bracketed access to do the necessary cache maintenance. Be aware that completely flushing the cache is not really an option, as that would impact other workloads, so you have to flush the cache by walking the virtual address space of the buffer, which may take a significant amount of CPU time. However, if you are only going to use the buffer with the GPU I see no reason to touch it from the CPU side. Why would you even need to copy the content? After all dma-bufs are meant to enable zero-copy between DMA capable accelerators. You can simply import the V4L2 buffer into a GL texture using EGL_EXT_image_dma_buf_import. Using this path you don't need to bother with the cache at all, as the GPU will directly read the video buffers from RAM. Regards, Lucas > > 2) Does anyone have patches / ideas / roadmap how to solve that? It > makes GPU unusable for computing, and camera basically unusable for > video. > > Best regards, > Pavel

1 week, 4 days

Re: [PATCH 2/3] drm: tiny: Add support for Mayqueen Pixpaper e-ink panel

by kernel test robot

Hi LiangCheng, kernel test robot noticed the following build warnings: [auto build test WARNING on d7b8f8e20813f0179d8ef519541a3527e7661d3a] url: https://github.com/intel-lab-lkp/linux/commits/LiangCheng-Wang/dt-bindings-… base: d7b8f8e20813f0179d8ef519541a3527e7661d3a patch link: https://lore.kernel.org/r/20250708-drm-v1-2-45055fdadc8a%40gmail.com patch subject: [PATCH 2/3] drm: tiny: Add support for Mayqueen Pixpaper e-ink panel config: sparc-randconfig-r112-20250709 (https://download.01.org/0day-ci/archive/20250709/202507092231.FtZkMync-lkp@…) compiler: sparc64-linux-gcc (GCC) 14.3.0 reproduce: (https://download.01.org/0day-ci/archive/20250709/202507092231.FtZkMync-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202507092231.FtZkMync-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) >> drivers/gpu/drm/tiny/pixpaper.c:85:10: sparse: sparse: Initializer entry defined twice drivers/gpu/drm/tiny/pixpaper.c:86:9: sparse: also defined here drivers/gpu/drm/tiny/pixpaper.c:601:10: sparse: sparse: Initializer entry defined twice drivers/gpu/drm/tiny/pixpaper.c:606:10: sparse: also defined here vim +85 drivers/gpu/drm/tiny/pixpaper.c 80 81 static const struct drm_plane_funcs pixpaper_plane_funcs = { 82 .update_plane = drm_atomic_helper_update_plane, 83 .disable_plane = drm_atomic_helper_disable_plane, 84 .destroy = drm_plane_cleanup, > 85 .reset = drm_atomic_helper_plane_reset, 86 DRM_GEM_SHADOW_PLANE_FUNCS, 87 }; 88 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

1 week, 5 days

[PATCH] dma_buf/sync_file: Enable signaling for fences when querying status

by Mikko Perttunen

From: Mikko Perttunen <mperttunen(a)nvidia.com> dma_fence_get_status is not guaranteed to return valid information on if the fence has been signaled or not if SW signaling has not been enabled for the fence. To ensure valid information is reported, enable SW signaling for fences before getting their status. Signed-off-by: Mikko Perttunen <mperttunen(a)nvidia.com> --- drivers/dma-buf/sync_file.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c index 747e377fb95417ddd506b528618a4288bea9d459..a6fd1d14dde155561b9fd2c07e6aa20dc9863a8d 100644 --- a/drivers/dma-buf/sync_file.c +++ b/drivers/dma-buf/sync_file.c @@ -271,6 +271,8 @@ static int sync_fill_fence_info(struct dma_fence *fence, const char __rcu *timeline; const char __rcu *driver; + dma_fence_enable_sw_signaling(fence); + rcu_read_lock(); driver = dma_fence_driver_name(fence); @@ -320,6 +322,7 @@ static long sync_file_ioctl_fence_info(struct sync_file *sync_file, * info->num_fences. */ if (!info.num_fences) { + dma_fence_enable_sw_signaling(sync_file->fence); info.status = dma_fence_get_status(sync_file->fence); goto no_fences; } else { --- base-commit: 58ba80c4740212c29a1cf9b48f588e60a7612209 change-id: 20250708-syncfile-enable-signaling-a993acff1860

1 week, 5 days

Re: [PATCH 3/3] dt-bindings: display: Add Mayqueen Pixpaper e-ink panel

by Rob Herring (Arm)

On Tue, 08 Jul 2025 18:06:46 +0800, LiangCheng Wang wrote: > The binding is for the Mayqueen Pixpaper e-ink display panel, > controlled via an SPI interface. > > Signed-off-by: LiangCheng Wang <zaq14760(a)gmail.com> > --- > .../bindings/display/mayqueen,pixpaper.yaml | 63 ++++++++++++++++++++++ > 1 file changed, 63 insertions(+) > This should be patch 2. Bindings come before users of them. Reviewed-by: Rob Herring (Arm) <robh(a)kernel.org>

1 week, 5 days

Re: [PATCH 1/3] dt-bindings: vendor-prefixes: Add Mayqueen name

by Rob Herring (Arm)

On Tue, 08 Jul 2025 18:06:44 +0800, LiangCheng Wang wrote: > From: Wig Cheng <onlywig(a)gmail.com> > > Mayqueen is a Taiwan-based company primarily focused on the development > of arm64 development boards and e-paper displays. > > Signed-off-by: Wig Cheng <onlywig(a)gmail.com> > --- > Documentation/devicetree/bindings/vendor-prefixes.yaml | 2 ++ > 1 file changed, 2 insertions(+) > Acked-by: Rob Herring (Arm) <robh(a)kernel.org>

1 week, 5 days

Re: [RFC 00/12] io_uring dmabuf read/write support

by Christoph Hellwig

On Mon, Jul 07, 2025 at 04:41:23PM +0100, Pavel Begunkov wrote: > > I mean a reference the actual dma_buf (probably indirect through the file > > * for it, but listen to the dma_buf experts for that and not me). > > My expectation is that io_uring would pass struct dma_buf to the io_uring isn't the only user. We've already had one other use case coming up for pre-load of media files in mobile very recently. It's also a really good interface for P2P transfers of any kind. > file during registration, so that it can do a bunch of work upfront, > but iterators will carry sth already pre-attached and pre dma mapped, > probably in a file specific format hiding details for multi-device > support, and possibly bundled with the dma-buf pointer if necessary. > (All modulo move notify which I need to look into first). I'd expect that the exported passed around the dma_buf, and something that has access to it then imports it to the file. This could be directly forwarded to the device for the initial scrope in your series where you only support it for block device files. Now we have two variants: 1) the file instance returns a cookie for the registration that the caller has to pass into every read/write 2) the file instance tracks said cookie itself and matches it on every read/write 1) sounds faster, 2) has more sanity checking and could prevent things from going wrong. (all this is based on my limited dma_buf understanding, corrections always welcome). > > > But maybe that's fine. It's 40B -> 48B, > > > > Alternatively we could the union point to a struct that has the dma buf > > pointer and a variable length array of dma_segs. Not sure if that would > > create a mess in the callers, though. > > Iteration helpers adjust the pointer, so either it needs to store > the pointer directly in iter or keep the current index. It could rely > solely on offsets, but that'll be a mess with nested loops (where the > inner one would walk some kind of sg table). Yeah. Maybe just keep is as a separate pointer growing the structure and see if anyone screams.

1 week, 6 days

Re: [PATCH v3] drm/framebuffer: Acquire internal references on GEM handles

by Thomas Zimmermann

Hi Am 07.07.25 um 18:14 schrieb Satadru Pramanik: > Applying this patch to 6.16-rc5 resolves the sleep issue regression > from 6.16-rc4 I was having on my MacBookPro11,3 (Mid-2014 15" > MacBookPro), which has the NVIDIA GK107M GPU enabled via the Nouveau > driver. Thanks for testing. I think the sleep regression was just a side effect of the broken reference counting. Best regards Thomas > > Many thanks, > > Satadru > > On Mon, Jul 7, 2025 at 9:33 AM Thomas Zimmermann <tzimmermann(a)suse.de> > wrote: > > Hi > > Am 07.07.25 um 15:21 schrieb Christian König: > > >> > >> +#define DRM_FRAMEBUFFER_HAS_HANDLE_REF(_i) BIT(0u + (_i)) > > Why the "0u + (_i)" here? An macro trick? > > You mean why not just BIT(_i)? internal_flags could possibly contain > additional flags. Just using BIT(_i) would make it look as if it's > only > for those handle refs. > > Best regards > Thomas > > > > > Regards, > > Christian. > > > >> + > >> /** > >> * struct drm_framebuffer - frame buffer object > >> * > >> @@ -188,6 +191,10 @@ struct drm_framebuffer { > >> * DRM_MODE_FB_MODIFIERS. > >> */ > >> int flags; > >> + /** > >> + * @internal_flags: Framebuffer flags like > DRM_FRAMEBUFFER_HAS_HANDLE_REF. > >> + */ > >> + unsigned int internal_flags; > >> /** > >> * @filp_head: Placed on &drm_file.fbs, protected by > &drm_file.fbs_lock. > >> */ > > -- > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Frankenstrasse 146, 90461 Nuernberg, Germany > GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman > HRB 36809 (AG Nuernberg) > -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)

1 week, 6 days

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig July 2025