On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote:
Hello everyone,
to allow concurrent buffer access by different engines beyond the multiple readers/single writer model that we currently use in radeon and other drivers we need some kind of synchonization object exposed to userspace.
My initial patch set for this used (or rather abused) zero sized GEM buffers as fence handles. This is obviously isn't the best way of doing this (to much overhead, rather ugly etc...), Jerome commented on this accordingly.
So what should a driver expose instead? Android sync points? Something else?
I think actually exposing the struct fence objects as a fd, using android syncpts (or at least something compatible to it) is the way to go. Problem is that it's super-hard to get the android guys out of hiding for this :(
Adding a bunch of people in the hopes that something sticks. -Daniel
On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote:
Hello everyone,
to allow concurrent buffer access by different engines beyond the multiple readers/single writer model that we currently use in radeon and other drivers we need some kind of synchonization object exposed to userspace.
My initial patch set for this used (or rather abused) zero sized GEM buffers as fence handles. This is obviously isn't the best way of doing this (to much overhead, rather ugly etc...), Jerome commented on this accordingly.
So what should a driver expose instead? Android sync points? Something else?
I think actually exposing the struct fence objects as a fd, using android syncpts (or at least something compatible to it) is the way to go. Problem is that it's super-hard to get the android guys out of hiding for this :(
Adding a bunch of people in the hopes that something sticks.
More people. -Daniel
On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote:
On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote:
Hello everyone,
to allow concurrent buffer access by different engines beyond the multiple readers/single writer model that we currently use in radeon and other drivers we need some kind of synchonization object exposed to userspace.
My initial patch set for this used (or rather abused) zero sized GEM buffers as fence handles. This is obviously isn't the best way of doing this (to much overhead, rather ugly etc...), Jerome commented on this accordingly.
So what should a driver expose instead? Android sync points? Something else?
I think actually exposing the struct fence objects as a fd, using android syncpts (or at least something compatible to it) is the way to go. Problem is that it's super-hard to get the android guys out of hiding for this :(
Adding a bunch of people in the hopes that something sticks.
More people.
Just to re-iterate, exposing such thing while still using command stream ioctl that use implicit synchronization is a waste and you can only get the lowest common denominator which is implicit synchronization. So i do not see the point of such api if you are not also adding a new cs ioctl with explicit contract that it does not do any kind of synchronization (it could be almost the exact same code modulo the do not wait for previous cmd to complete).
Also one thing that the Android sync point does not have, AFAICT, is a way to schedule synchronization as part of a cs ioctl so cpu never have to be involve for cmd stream that deal only one gpu (assuming the driver and hw can do such trick).
Cheers, Jérôme
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Fri, Sep 12, 2014 at 10:50:49AM -0400, Jerome Glisse wrote:
On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote:
On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote:
Hello everyone,
to allow concurrent buffer access by different engines beyond the multiple readers/single writer model that we currently use in radeon and other drivers we need some kind of synchonization object exposed to userspace.
My initial patch set for this used (or rather abused) zero sized GEM buffers as fence handles. This is obviously isn't the best way of doing this (to much overhead, rather ugly etc...), Jerome commented on this accordingly.
So what should a driver expose instead? Android sync points? Something else?
I think actually exposing the struct fence objects as a fd, using android syncpts (or at least something compatible to it) is the way to go. Problem is that it's super-hard to get the android guys out of hiding for this :(
Adding a bunch of people in the hopes that something sticks.
More people.
Just to re-iterate, exposing such thing while still using command stream ioctl that use implicit synchronization is a waste and you can only get the lowest common denominator which is implicit synchronization. So i do not see the point of such api if you are not also adding a new cs ioctl with explicit contract that it does not do any kind of synchronization (it could be almost the exact same code modulo the do not wait for previous cmd to complete).
I don't think we should cathegorically exclude this, since without some partial implicit/explicit world we'll never convert over to fences. Of course adding fences without any way to at least partially forgoe the implicit syncing is pointless. But that might be some other user (e.g. camera capture device) which needs explicit fences.
Also one thing that the Android sync point does not have, AFAICT, is a way to schedule synchronization as part of a cs ioctl so cpu never have to be involve for cmd stream that deal only one gpu (assuming the driver and hw can do such trick).
You need to integrate the android stuff with your (new) cs ioctl, with a input parameter for the fence fd to wait on before executing the cs and one that gets created to signal when it's all done.
Same goes for all the other places android wants sync objects, e.g. for synchronization before atomic flips and for signalling completion of the same. -Daniel
On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse j.glisse@gmail.com wrote:
On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote:
On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote:
Hello everyone,
to allow concurrent buffer access by different engines beyond the multiple readers/single writer model that we currently use in radeon and other drivers we need some kind of synchonization object exposed to userspace.
My initial patch set for this used (or rather abused) zero sized GEM buffers as fence handles. This is obviously isn't the best way of doing this (to much overhead, rather ugly etc...), Jerome commented on this accordingly.
So what should a driver expose instead? Android sync points? Something else?
I think actually exposing the struct fence objects as a fd, using android syncpts (or at least something compatible to it) is the way to go. Problem is that it's super-hard to get the android guys out of hiding for this :(
Adding a bunch of people in the hopes that something sticks.
More people.
Just to re-iterate, exposing such thing while still using command stream ioctl that use implicit synchronization is a waste and you can only get the lowest common denominator which is implicit synchronization. So i do not see the point of such api if you are not also adding a new cs ioctl with explicit contract that it does not do any kind of synchronization (it could be almost the exact same code modulo the do not wait for previous cmd to complete).
Our thinking was to allow explicit sync from a single process, but implicitly sync between processes.
Alex
Also one thing that the Android sync point does not have, AFAICT, is a way to schedule synchronization as part of a cs ioctl so cpu never have to be involve for cmd stream that deal only one gpu (assuming the driver and hw can do such trick).
Cheers, Jérôme
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote:
On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse j.glisse@gmail.com wrote:
On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote:
On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote:
Hello everyone,
to allow concurrent buffer access by different engines beyond the multiple readers/single writer model that we currently use in radeon and other drivers we need some kind of synchonization object exposed to userspace.
My initial patch set for this used (or rather abused) zero sized GEM buffers as fence handles. This is obviously isn't the best way of doing this (to much overhead, rather ugly etc...), Jerome commented on this accordingly.
So what should a driver expose instead? Android sync points? Something else?
I think actually exposing the struct fence objects as a fd, using android syncpts (or at least something compatible to it) is the way to go. Problem is that it's super-hard to get the android guys out of hiding for this :(
Adding a bunch of people in the hopes that something sticks.
More people.
Just to re-iterate, exposing such thing while still using command stream ioctl that use implicit synchronization is a waste and you can only get the lowest common denominator which is implicit synchronization. So i do not see the point of such api if you are not also adding a new cs ioctl with explicit contract that it does not do any kind of synchronization (it could be almost the exact same code modulo the do not wait for previous cmd to complete).
Our thinking was to allow explicit sync from a single process, but implicitly sync between processes.
This is a BIG NAK if you are using the same ioctl as it would mean you are changing userspace API, well at least userspace expectation. Adding a new cs flag might do the trick but it should not be about inter-process, or any thing special, it's just implicit sync or no synchronization. Converting userspace is not that much of a big deal either, it can be broken into several step. Like mesa use explicit synchronization all time but ddx use implicit.
Cheers, Jérôme
Alex
Also one thing that the Android sync point does not have, AFAICT, is a way to schedule synchronization as part of a cs ioctl so cpu never have to be involve for cmd stream that deal only one gpu (assuming the driver and hw can do such trick).
Cheers, Jérôme
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Fri, Sep 12, 2014 at 11:33 AM, Jerome Glisse j.glisse@gmail.com wrote:
On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote:
On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse j.glisse@gmail.com wrote:
On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote:
On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote:
Hello everyone,
to allow concurrent buffer access by different engines beyond the multiple readers/single writer model that we currently use in radeon and other drivers we need some kind of synchonization object exposed to userspace.
My initial patch set for this used (or rather abused) zero sized GEM buffers as fence handles. This is obviously isn't the best way of doing this (to much overhead, rather ugly etc...), Jerome commented on this accordingly.
So what should a driver expose instead? Android sync points? Something else?
I think actually exposing the struct fence objects as a fd, using android syncpts (or at least something compatible to it) is the way to go. Problem is that it's super-hard to get the android guys out of hiding for this :(
Adding a bunch of people in the hopes that something sticks.
More people.
Just to re-iterate, exposing such thing while still using command stream ioctl that use implicit synchronization is a waste and you can only get the lowest common denominator which is implicit synchronization. So i do not see the point of such api if you are not also adding a new cs ioctl with explicit contract that it does not do any kind of synchronization (it could be almost the exact same code modulo the do not wait for previous cmd to complete).
Our thinking was to allow explicit sync from a single process, but implicitly sync between processes.
This is a BIG NAK if you are using the same ioctl as it would mean you are changing userspace API, well at least userspace expectation. Adding a new cs flag might do the trick but it should not be about inter-process, or any thing special, it's just implicit sync or no synchronization. Converting userspace is not that much of a big deal either, it can be broken into several step. Like mesa use explicit synchronization all time but ddx use implicit.
Right, you'd have to explicitly ask for it to avoid breaking old userspace. My point was just that within a single process, it's quite easy to know exactly what you are doing and handle the synchronization yourself, while for inter-process there is an assumed implicit sync.
Alex
Cheers, Jérôme
Alex
Also one thing that the Android sync point does not have, AFAICT, is a way to schedule synchronization as part of a cs ioctl so cpu never have to be involve for cmd stream that deal only one gpu (assuming the driver and hw can do such trick).
Cheers, Jérôme
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
linaro-mm-sig@lists.linaro.org