Hi Rob,
Yes, sorry we've been a bit slack progressing KDS publicly. Your approach looks interesting and seems like it could enable both implicit and explicit synchronization. A good compromise.
From: Rob Clark rob@ti.com
A dma-fence can be attached to a buffer which is being filled or consumed by hw, to allow userspace to pass the buffer without waiting to another device. For example, userspace can call page_flip ioctl to display the next frame of graphics after kicking the GPU but while the GPU is still rendering. The display device sharing the buffer with the GPU would attach a callback to get notified when the GPU's rendering- complete IRQ fires, to update the scan-out address of the display, without having to wake up userspace.
A dma-fence is transient, one-shot deal. It is allocated and attached to dma-buf's list of fences. When the one that attached it is done, with the pending operation, it can signal the fence removing it from the dma-buf's list of fences:
- dma_buf_attach_fence()
- dma_fence_signal()
It would be useful to have two lists of fences, those around writes to the buffer and those around reads. The idea being that if you only want to read from a buffer, you don't need to wait for fences around other read operations, you only need to wait for the "last" writer fence. If you do want to write to the buffer however, you need to wait for all the read fences and the last writer fence. The use-case is when EGL swap behaviour is EGL_BUFFER_PRESERVED. You have the display controller reading the buffer with its fence defined to be signalled when it is no-longer scanning out that buffer. It can only stop scanning out that buffer when it is given another buffer to scan-out. If that next buffer must be rendered by copying the currently scanned-out buffer into it (one possible option for implementing EGL_BUFFER_PRESERVED) then you essentially deadlock if the scan-out job blocks the "render the next frame" job.
There's probably variations of this idea, perhaps you only need a flag to indicate if a fence is around a read-only or rw access?
The intention is to provide a userspace interface (presumably via eventfd) later, to be used in conjunction with dma-buf's mmap support for sw access to buffers (or for userspace apps that would prefer to do their own synchronization).
From our experience of our own KDS, we've come up with an interesting
approach to synchronizing userspace applications which have a buffer mmap'd. We wanted to avoid userspace being able to block jobs running on hardware while still allowing userspace to participate. Our original idea was to have a lock/unlock ioctl interface on a dma_buf but have a timeout whereby the application's lock would be broken if held for too long. That at least bounded how long userspace could potentially block hardware making progress, though was pretty "harsh".
The approach we have now settled on is to instead only allow an application to wait for all jobs currently pending for a buffer. So there's no way userspace can prevent anything else from using a buffer, other than not issuing jobs which will use that buffer. Also, the interface we settled on was to add a poll handler to dma_buf, that way userspace can select() on multiple dma_buff buffers in one syscall. It can also chose if it wants to wait for only the last writer fence, I.e. wait until it can read (POLLIN) or wait for all fences as it wants to write to the buffer (POLLOUT). We kinda like this, but does restrict the utility a little. An idea worth considering anyway.
My other thought is around atomicity. Could this be extended to (safely) allow for hardware devices which might want to access multiple buffers simultaneously? I think it probably can with some tweaks to the interface? An atomic function which does something like "give me all the fences for all these buffers and add this fence to each instead/as-well-as"?
Cheers,
Tom