The bigger issue is the previous point about how to deal with cases where the CPU doesn't really need to get involved as an intermediary.
CPU fallback access to the buffer is the only legit case where we need a standardized API to userspace (since CPU access isn't already associated w/ some other kernel device file where some extra ioctl can be added)
The CPU case will still need to wait on an arbitrarily backed sync primitive. It shouldn't need to know if it's backed by the gpu, camera, or dsp.
Right, this is the one place we definitely need something.. some userspace code would just get passed a dmabuf file descriptor and want to mmap it and do something, without really knowing where it came from. I *guess* we'll have to add some ioctl's to the dmabuf fd.
I personally favor having sync primitives have their own anon inode vs. strictly coupling them with dma_buf.
I think this is really the crux of the matter - do we associate sync objects with buffers or not. The approach ARM are suggesting _is_ to associate the sync objects with the buffer and do this by adding kds_resource* as a member of struct dma_buf. The main reason I want to do this is because it doesn't require changes to existing interfaces. Specifically, DRM/KMS & v4l2. These user/kernel interfaces already allow userspace to specify the handle of a buffer the driver should perform an operation on. What dma_buf has done is allowed those driver-specific buffer handles to be exported from one driver and imported into another. While new ioctls have been added to the v4l2 & DRM interfaces for dma_buf, they have only been to allow the import & export of driver-specific buffer objects. Once imported as a driver specific buffer object, existing ioctls are re-used to perform operations on those buffers (at least this is what PRIME does for DRM, I'm not so sure about v4l2?). But my point is that no new "page flip to this dma_buf fd" ioctl has been added to KMS, you use the existing drm_mode_crtc_page_flip and specify an fb_id which has been imported from a dma_buf.
If we associate sync objects with buffers, none of those device specific ioctls which perform operations on buffer objects need to be modified. It's just that internally, those drivers use kds or something similar to make sure they don't tread on each other's toes.
The alternate is to not associate sync objects with buffers and have them be distinct entities, exposed to userspace. This gives userpsace more power and flexibility and might allow for use-cases which an implicit synchronization mechanism can't satisfy - I'd be curious to know any specifics here. However, every driver which needs to participate in the synchronization mechanism will need to have its interface with userspace modified to allow the sync objects to be passed to the drivers. This seemed like a lot of work to me, which is why I prefer the implicit approach. However I don't actually know what work is needed and think it should be explored. I.e. How much work is it to add explicit sync object support to the DRM & v4l2 interfaces?
E.g. I believe DRM/GEM's job dispatch API is "in-order" in which case it might be easy to just add "wait for this fence" and "signal this fence" ioctls. Seems like vmwgfx already has something similar to this already? Could this work over having to specify a list of sync objects to wait on and another list of sync objects to signal for every operation (exec buf/page flip)? What about for v4l2?
I guess my other thought is that implicit vs explicit is not mutually exclusive, though I'd guess there'd be interesting deadlocks to have to debug if both were in use _at the same time_. :-)
Cheers,
Tom