On Sun, Sep 14, 2014 at 12:36:43PM +0200, Christian König wrote:
Yeah, right. Providing the fd to reassign to a fence would indeed reduce the create/close overhead.
But it would still be more overhead than for example a simple on demand growing ring buffer which then uses 64bit sequence numbers in userspace to refer to a fence in the kernel.
Apart from that I'm pretty sure that when we do the syncing completely in userspace we need more fences open at the same time than fds are available by default.
If you do the syncing completely in userspace you don't need kernel fences at all. Kernel fences are only required if you sync with a different process (where the pure userspace syncing might not work out) or with different devices.
tbh I don't see any use-case at all where you'd need 10k such fences. That means your driver gets to deal with 2 kinds of fences, but so be it. Since not using fds for cross-device or cross-process syncing imo just doesn't make sense, so that one pretty much will have to stick.
As long as our internal handle or sequence based fence are easily convertible to a fence fd I actually don't really see a problem with that. Going to hack that approach into my prototype and then we can see how bad the code looks after all.
My plan for i915 is to start out with fd fences only, and once we have some clarity on the exact requirements probably add some pure userspace-controlled fences for tightly coupled stuff. Those might be fully internal to the opencl userspace driver though and never get out of there, ever. -Daniel