On Thu, Jun 7, 2012 at 4:35 AM, Tom Cooksey tom.cooksey@arm.com wrote:
The alternate is to not associate sync objects with buffers and have them be distinct entities, exposed to userspace. This gives userpsace more power and flexibility and might allow for use-cases which an implicit synchronization mechanism can't satisfy - I'd be curious to know any specifics here.
Time and time again we've had problems with implicit synchronization resulting in bugs where different drivers play by slightly different implicit rules. We're convinced the best way to attack this problem is to move as much of the command and control of synchronization as possible into a single piece of code (the compositor in our case.) To facilitate this we're going to be mandating this explicit approach in the K release of Android.
However, every driver which needs to participate in the synchronization mechanism will need to have its interface with userspace modified to allow the sync objects to be passed to the drivers. This seemed like a lot of work to me, which is why I prefer the implicit approach. However I don't actually know what work is needed and think it should be explored. I.e. How much work is it to add explicit sync object support to the DRM & v4l2 interfaces?
E.g. I believe DRM/GEM's job dispatch API is "in-order" in which case it might be easy to just add "wait for this fence" and "signal this fence" ioctls. Seems like vmwgfx already has something similar to this already? Could this work over having to specify a list of sync objects to wait on and another list of sync objects to signal for every operation (exec buf/page flip)? What about for v4l2?
If I understand you right a job submission with explicit sync would become 3 submission: 1) submit wait for pre-req fence job 2) submit render job 3) submit signal ready fence job
Does DRM provide a way to ensure these 3 jobs are submitted atomically? I also expect GPU vendor would like to get clever about GPU to GPU fence dependancies. That could probably be handled entirely in the userspace GL driver.
I guess my other thought is that implicit vs explicit is not mutually exclusive, though I'd guess there'd be interesting deadlocks to have to debug if both were in use _at the same time_. :-)
I think this is an approach worth investigating. I'd like a way to either opt out of implicit sync or have a way to check if a dma-buf has an attached fence and detach it. Actually, that could work really well. Consider:
* Each dma_buf has a single fence "slot" * on submission * the driver will extract the fence from the dma_buf and queue a wait on it. * the driver will replace that fence with it's own complettion fence before the job submission ioctl returns. * dma_buf will have two userspace ioctls: * DETACH: will return the fence as an FD to userspace and clear the fence slot in the dma_buf * ATTACH: takes a fence FD from userspace and attaches it to the dma_buf fence slot. Returns an error if the fence slot is non-empty.
In the android case, we can do a detach after every submission and an attach right before.
-Erik