On Wed, Jun 19, 2024 at 1:24 AM Nicolas Dufresne nicolas@ndufresne.ca wrote:
Le mardi 18 juin 2024 à 16:47 +0900, Tomasz Figa a écrit :
Hi TaoJiang,
On Tue, Jun 18, 2024 at 4:30 PM TaoJiang tao.jiang_2@nxp.com wrote:
From: Ming Qian ming.qian@nxp.com
When the memory type is VB2_MEMORY_DMABUF, the v4l2 device can't know whether the dma buffer is coherent or synchronized.
The videobuf2-core will skip cache syncs as it think the DMA exporter should take care of cache syncs
But in fact it's likely that the client doesn't synchronize the dma buf before qbuf() or after dqbuf(). and it's difficult to find this type of error directly.
I think it's helpful that videobuf2-core can call dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() to handle the cache syncs.
Signed-off-by: Ming Qian ming.qian@nxp.com Signed-off-by: TaoJiang tao.jiang_2@nxp.com
.../media/common/videobuf2/videobuf2-core.c | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+)
Sorry, that patch is incorrect. I believe you're misunderstanding the way DMA-buf buffers should be managed in the userspace. It's the userspace responsibility to call the DMA_BUF_IOCTL_SYNC ioctl [1] to signal start and end of CPU access to the kernel and imply necessary cache synchronization.
[1] https://docs.kernel.org/driver-api/dma-buf.html#dma-buffer-ioctls
So, really sorry, but it's a NAK.
This patch *could* make sense if it was inside UVC Driver as an example, as this driver can import dmabuf, to CPU memcpy, and does omits the required sync calls (unless that got added recently, I can easily have missed it).
Yeah, currently V4L2 drivers don't call the in-kernel dma_buf_{begin,end}_cpu_access() when they need to access the buffers from the CPU, while my quick grep [1] reveals that we have 68 files retrieving plane vaddr by calling vb2_plane_vaddr() (not necessarily a 100% guarantee of CPU access being done, but rather likely so).
I also repeated the same thing with VB2_DMABUF [2] and tried to attribute both lists to specific drivers (by retaining the path until the first - or _ [3]; which seemed to be relatively accurate), leading to the following drivers that claim support for DMABUF while also retrieving plane vaddr (without proper synchronization - no drivers currently call any begin/end CPU access):
i2c/video pci/bt8xx/bttv pci/cobalt/cobalt pci/cx18/cx18 pci/tw5864/tw5864 pci/tw686x/tw686x platform/allegro platform/amphion/vpu platform/chips platform/intel/pxa platform/marvell/mcam platform/mediatek/jpeg/mtk platform/mediatek/vcodec/decoder/mtk platform/mediatek/vcodec/encoder/mtk platform/nuvoton/npcm platform/nvidia/tegra platform/nxp/imx platform/renesas/rcar platform/renesas/vsp1/vsp1 platform/rockchip/rkisp1/rkisp1 platform/samsung/exynos4 platform/samsung/s5p platform/st/sti/delta/delta platform/st/sti/hva/hva platform/verisilicon/hantro usb/au0828/au0828 usb/cx231xx/cx231xx usb/dvb usb/em28xx/em28xx usb/gspca/gspca.c usb/hackrf/hackrf.c usb/stk1160/stk1160 usb/uvc/uvc
which means we potentially have ~30 drivers which likely don't handle imported DMABUFs correctly (there is still a chance that DMABUF is advertised for one queue, while vaddr is used for another).
I think we have two options: 1) add vb2_{begin/end}_cpu_access() helpers, carefully audit each driver and add calls to those 2) take a heavy gun approach and just call vb2_begin_cpu_access() whenever vb2_plane_vaddr() is called and then vb2_end_cpu_access() whenever vb2_buffer_done() is called (if begin was called before).
The latter has the disadvantage of drivers not having control over the timing of the cache sync, so could end up with less than optimal performance. Also there could be some more complex cases, where the driver needs to mix DMA and CPU accesses to the buffer, so the fixed sequence just wouldn't work for them. (But then they just wouldn't work today either.)
Hans, Marek, do you have any thoughts? (I'd personally just go with 2 and if any driver in the future needs something else, they could call begin/end CPU access manually.)
[1] git grep vb2_plane_vaddr | cut -d":" -f 1 | sort | uniq [2] git grep VB2_DMABUF | cut -d":" -f 1 | sort | uniq [3] by running [1] and [2] through | cut -d"-" -f 1 | cut -d"_" -f 1 | uniq
Best, Tomasz
But generally speaking, bracketing all driver with CPU access synchronization does not make sense indeed, so I second the rejection.
Nicolas
Best regards, Tomasz
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c index 358f1fe42975..4734ff9cf3ce 100644 --- a/drivers/media/common/videobuf2/videobuf2-core.c +++ b/drivers/media/common/videobuf2/videobuf2-core.c @@ -340,6 +340,17 @@ static void __vb2_buf_mem_prepare(struct vb2_buffer *vb) vb->synced = 1; for (plane = 0; plane < vb->num_planes; ++plane) call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
if (vb->memory != VB2_MEMORY_DMABUF)
return;
for (plane = 0; plane < vb->num_planes; ++plane) {
struct dma_buf *dbuf = vb->planes[plane].dbuf;
if (!dbuf)
continue;
dma_buf_end_cpu_access(dbuf, vb->vb2_queue->dma_dir);
}
}
/* @@ -356,6 +367,17 @@ static void __vb2_buf_mem_finish(struct vb2_buffer *vb) vb->synced = 0; for (plane = 0; plane < vb->num_planes; ++plane) call_void_memop(vb, finish, vb->planes[plane].mem_priv);
if (vb->memory != VB2_MEMORY_DMABUF)
return;
for (plane = 0; plane < vb->num_planes; ++plane) {
struct dma_buf *dbuf = vb->planes[plane].dbuf;
if (!dbuf)
continue;
dma_buf_begin_cpu_access(dbuf, vb->vb2_queue->dma_dir);
}
}
/*
2.43.0-rc1