Hi Pavel,
Le jeudi 10 juillet 2025 à 10:24 +0200, Pavel Machek a écrit :
Hi!
It seems that DMA-BUFs are always uncached on arm64... which is a problem.
I'm trying to get useful camera support on Librem 5, and that includes recording vidos (and taking photos).
memcpy() from normal memory is about 2msec/1MB. Unfortunately, for DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do 760p video recording. Plus, copying full-resolution photo buffer takes more than 200msec!
There's possibility to do some processing on GPU, and its implemented here:
https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads
but that hits the same problem in the end -- data is in DMA-BUF, uncached, and takes way too long to copy out.
And that's ... wrong. DMA ended seconds ago, complete cache flush would be way cheaper than copying single frame out, and I still have to deal with uncached frames.
So I have two questions:
- Is my analysis correct that, no matter how I get frame from v4l and
process it on GPU, I'll have to copy it from uncached memory in the end?
- Does anyone have patches / ideas / roadmap how to solve that? It
makes GPU unusable for computing, and camera basically unusable for video.
If CPU access is strictly required for your use case, the way forward is to implement V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINT in the capture driver. Very little drivers enable that.
Once your driver have that capability, you will be able to set V4L2_MEMORY_FLAG_NON_COHERENT while doing REQBUFS or CREATE_BUFS ioctl. That gives you allocation with CPU cache working, but you'll get the invalidation (or flush) overhead by default. When capture data have not been read by CPU, you can always queue it back with the V4L2_BUF_FLAG_NO_CACHE_INVALIDATE. But for your use case, it seems that you want the invalidation to take place, otherwise your software will endup reading old cache data instead of the next frame data.
Please note that the integration in the DMABuf SYNC ioctl was missing for a while, so make sure you have recent enough kernel or get ready for backports. The feature itself was commonly used with CPU only access, notably on ChromeOS using libyuv. No DMABuf was involved initially.
regards,
Nicolas
[0] https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/vidioc-reqbuf...
Best regards, Pavel
linaro-mm-sig@lists.linaro.org