Tom, Is there more planned for KDS? It seems to be lacking some important features to be useful across many SoCs and graphics cards and features needed by Android. Here's some general feedback on those gaps.
There is no way to share information between a buffer provider and a buffer consumer. This is important for architectures such as Tegra which have several hardware blocks that share common hardware synchronization.
There's no userspace API. There are several reasons this is necessary. First, some userspace code (such as GL libs) might need to get at the private data of the sync primitive in order to generate command lists for a piece of hardware. Second is does not let userspace have control or even visibility into the graphics pipeline. The direction we are moving in Android is to put more control over synchronization into the compositor and move it out of being implemented "behind the scenes" by every vendor. Third, there's no way for a userspace process to wait on a sync primitive.
There's no debugging or timing information tracked with the sync primitives. During development on new platforms and new OS versions we often have cases where the graphics pipeline stops making forward progress because one of the pieces (GPU, display, camera, dsp, userspace) has, itself, stopped making forward progress. Finding the root cause of the often hard to reproduce cases is difficult when you have to instrument every single driver.
It's unclear how you would attach a dependency on a EGL fence to a dma_buf. Maybe this would be an EGL extension where you pass in the fence and the dma_buf.
At Android we've been working on our own approach to this problem. I'll post those patches for discussion.
Cheers, Erik
On Wed, Jun 6, 2012 at 7:39 AM, Erik Gilling konkers@android.com wrote:
Tom, Is there more planned for KDS? It seems to be lacking some important features to be useful across many SoCs and graphics cards and features needed by Android. Here's some general feedback on those gaps.
fwiw, features can always be added over time.. at this point I think the most important thing is to come up with something that doesn't paint us into a corner wrt. features that we'd like to add later (such as the hw sync'ing you mention below)
There is no way to share information between a buffer provider and a buffer consumer. This is important for architectures such as Tegra which have several hardware blocks that share common hardware synchronization.
yeah, this is a limit. I think probably a simpler fence type scheme might be easier to deal w/ for hw synchronization.
There's no userspace API. There are several reasons this is necessary. First, some userspace code (such as GL libs) might need to get at the private data of the sync primitive in order to generate command lists for a piece of hardware.
well, to be fair this doesn't need to be a userspace API.. or rather it doesn't need to be a standard API added to some new chrdev, since your GL lib is already got it's own interface to the kernel which it could query whatever it needs about a fence / syncobj / syncpoint / whatever. The bigger issue is the previous point about how to deal with cases where the CPU doesn't really need to get involved as an intermediary.
CPU fallback access to the buffer is the only legit case where we need a standardized API to userspace (since CPU access isn't already associated w/ some other kernel device file where some extra ioctl can be added)
BR, -R
Second is does not let userspace have control or even visibility into the graphics pipeline. The direction we are moving in Android is to put more control over synchronization into the compositor and move it out of being implemented "behind the scenes" by every vendor. Third, there's no way for a userspace process to wait on a sync primitive.
There's no debugging or timing information tracked with the sync primitives. During development on new platforms and new OS versions we often have cases where the graphics pipeline stops making forward progress because one of the pieces (GPU, display, camera, dsp, userspace) has, itself, stopped making forward progress. Finding the root cause of the often hard to reproduce cases is difficult when you have to instrument every single driver.
It's unclear how you would attach a dependency on a EGL fence to a dma_buf. Maybe this would be an EGL extension where you pass in the fence and the dma_buf.
At Android we've been working on our own approach to this problem. I'll post those patches for discussion.
Cheers, Erik
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Wed, Jun 6, 2012 at 12:17 PM, Clark, Rob rob@ti.com wrote:
well, to be fair this doesn't need to be a userspace API.. or rather it doesn't need to be a standard API added to some new chrdev, since your GL lib is already got it's own interface to the kernel which it could query whatever it needs about a fence / syncobj / syncpoint / whatever.
If the sync primitives are backed by file descriptors (or attached to dma_bufs) you don't need an extra char dev either.
The bigger issue is the previous point about how to deal with cases where the CPU doesn't really need to get involved as an intermediary.
CPU fallback access to the buffer is the only legit case where we need a standardized API to userspace (since CPU access isn't already associated w/ some other kernel device file where some extra ioctl can be added)
The CPU case will still need to wait on an arbitrarily backed sync primitive. It shouldn't need to know if it's backed by the gpu, camera, or dsp.
Having a standard API is also really useful for debugging and performance tracing. Having timestamps of when each piece signals in the pipeline of getting a frame on the screen helps track down bottlenecks.
-Erik
On Thu, Jun 7, 2012 at 4:05 AM, Erik Gilling konkers@android.com wrote:
On Wed, Jun 6, 2012 at 12:17 PM, Clark, Rob rob@ti.com wrote:
well, to be fair this doesn't need to be a userspace API.. or rather it doesn't need to be a standard API added to some new chrdev, since your GL lib is already got it's own interface to the kernel which it could query whatever it needs about a fence / syncobj / syncpoint / whatever.
If the sync primitives are backed by file descriptors (or attached to dma_bufs) you don't need an extra char dev either.
Sure.. and just fwiw, I'm partially playing devil's advocate here (because whatever kernel<->user APIs get added, we have to maintain compatibility for a long time.. so best to start w/ fewer interfaces to userspace when possible, since it is always easier to add new interfaces than to take one away).
And btw, why not eventfd? (Ok, I looked at possibly an earlier version of syncpoints and only just briefly at what you sent to the list earlier today, so if that has changed pls disregard)
The bigger issue is the previous point about how to deal with cases where the CPU doesn't really need to get involved as an intermediary.
CPU fallback access to the buffer is the only legit case where we need a standardized API to userspace (since CPU access isn't already associated w/ some other kernel device file where some extra ioctl can be added)
The CPU case will still need to wait on an arbitrarily backed sync primitive. It shouldn't need to know if it's backed by the gpu, camera, or dsp.
Right, this is the one place we definitely need something.. some userspace code would just get passed a dmabuf file descriptor and want to mmap it and do something, without really knowing where it came from. I *guess* we'll have to add some ioctl's to the dmabuf fd. Although I suppose it doesn't have to be part of the first patchset, it would be easy enough to add later if we had to.
BR, -R
Having a standard API is also really useful for debugging and performance tracing. Having timestamps of when each piece signals in the pipeline of getting a frame on the screen helps track down bottlenecks.
-Erik
On Wed, Jun 6, 2012 at 2:14 PM, Clark, Rob rob@ti.com wrote:
If the sync primitives are backed by file descriptors (or attached to dma_bufs) you don't need an extra char dev either.
Sure.. and just fwiw, I'm partially playing devil's advocate here (because whatever kernel<->user APIs get added, we have to maintain compatibility for a long time.. so best to start w/ fewer interfaces to userspace when possible, since it is always easier to add new interfaces than to take one away).
That has to be balanced with having enough functionality to be useful in a real system or it won't get used.
And btw, why not eventfd? (Ok, I looked at possibly an earlier version of syncpoints and only just briefly at what you sent to the list earlier today, so if that has changed pls disregard)
I haven't looked at eventfd before and just did a quick read-through of it. The only place I could see using it is as the userspace notification mechanism which is a pretty small part of the whole equation. I'd be interested to hear if you think it could be used in a broader way.
The bigger issue is the previous point about how to deal with cases where the CPU doesn't really need to get involved as an intermediary.
CPU fallback access to the buffer is the only legit case where we need a standardized API to userspace (since CPU access isn't already associated w/ some other kernel device file where some extra ioctl can be added)
The CPU case will still need to wait on an arbitrarily backed sync primitive. It shouldn't need to know if it's backed by the gpu, camera, or dsp.
Right, this is the one place we definitely need something.. some userspace code would just get passed a dmabuf file descriptor and want to mmap it and do something, without really knowing where it came from. I *guess* we'll have to add some ioctl's to the dmabuf fd.
I personally favor having sync primitives have their own anon inode vs. strictly coupling them with dma_buf.
Although I suppose it doesn't have to be part of the first patchset, it would be easy enough to add later if we had to.
This is certainly a case where, without this functionality, we won't use what ever the solution is.
Cheers, Erik
The bigger issue is the previous point about how to deal with cases where the CPU doesn't really need to get involved as an intermediary.
CPU fallback access to the buffer is the only legit case where we need a standardized API to userspace (since CPU access isn't already associated w/ some other kernel device file where some extra ioctl can be added)
The CPU case will still need to wait on an arbitrarily backed sync primitive. It shouldn't need to know if it's backed by the gpu, camera, or dsp.
Right, this is the one place we definitely need something.. some userspace code would just get passed a dmabuf file descriptor and want to mmap it and do something, without really knowing where it came from. I *guess* we'll have to add some ioctl's to the dmabuf fd.
I personally favor having sync primitives have their own anon inode vs. strictly coupling them with dma_buf.
I think this is really the crux of the matter - do we associate sync objects with buffers or not. The approach ARM are suggesting _is_ to associate the sync objects with the buffer and do this by adding kds_resource* as a member of struct dma_buf. The main reason I want to do this is because it doesn't require changes to existing interfaces. Specifically, DRM/KMS & v4l2. These user/kernel interfaces already allow userspace to specify the handle of a buffer the driver should perform an operation on. What dma_buf has done is allowed those driver-specific buffer handles to be exported from one driver and imported into another. While new ioctls have been added to the v4l2 & DRM interfaces for dma_buf, they have only been to allow the import & export of driver-specific buffer objects. Once imported as a driver specific buffer object, existing ioctls are re-used to perform operations on those buffers (at least this is what PRIME does for DRM, I'm not so sure about v4l2?). But my point is that no new "page flip to this dma_buf fd" ioctl has been added to KMS, you use the existing drm_mode_crtc_page_flip and specify an fb_id which has been imported from a dma_buf.
If we associate sync objects with buffers, none of those device specific ioctls which perform operations on buffer objects need to be modified. It's just that internally, those drivers use kds or something similar to make sure they don't tread on each other's toes.
The alternate is to not associate sync objects with buffers and have them be distinct entities, exposed to userspace. This gives userpsace more power and flexibility and might allow for use-cases which an implicit synchronization mechanism can't satisfy - I'd be curious to know any specifics here. However, every driver which needs to participate in the synchronization mechanism will need to have its interface with userspace modified to allow the sync objects to be passed to the drivers. This seemed like a lot of work to me, which is why I prefer the implicit approach. However I don't actually know what work is needed and think it should be explored. I.e. How much work is it to add explicit sync object support to the DRM & v4l2 interfaces?
E.g. I believe DRM/GEM's job dispatch API is "in-order" in which case it might be easy to just add "wait for this fence" and "signal this fence" ioctls. Seems like vmwgfx already has something similar to this already? Could this work over having to specify a list of sync objects to wait on and another list of sync objects to signal for every operation (exec buf/page flip)? What about for v4l2?
I guess my other thought is that implicit vs explicit is not mutually exclusive, though I'd guess there'd be interesting deadlocks to have to debug if both were in use _at the same time_. :-)
Cheers,
Tom
On Thu, Jun 7, 2012 at 8:52 AM, Erik Gilling konkers@android.com wrote:
On Wed, Jun 6, 2012 at 2:14 PM, Clark, Rob rob@ti.com wrote:
If the sync primitives are backed by file descriptors (or attached to dma_bufs) you don't need an extra char dev either.
Sure.. and just fwiw, I'm partially playing devil's advocate here (because whatever kernel<->user APIs get added, we have to maintain compatibility for a long time.. so best to start w/ fewer interfaces to userspace when possible, since it is always easier to add new interfaces than to take one away).
That has to be balanced with having enough functionality to be useful in a real system or it won't get used.
sure, and I don't expect android to sit around and wait multiple merge window cycles for things to get to a point of having enough functionality to be used, but android doesn't really have the same backwards compatibility guarantees
And btw, why not eventfd? (Ok, I looked at possibly an earlier version of syncpoints and only just briefly at what you sent to the list earlier today, so if that has changed pls disregard)
I haven't looked at eventfd before and just did a quick read-through of it. The only place I could see using it is as the userspace notification mechanism which is a pretty small part of the whole equation. I'd be interested to hear if you think it could be used in a broader way.
I was just thinking for the notification to userspace part.. so it is only a small part but seems like it could be useful for that small part
The bigger issue is the previous point about how to deal with cases where the CPU doesn't really need to get involved as an intermediary.
CPU fallback access to the buffer is the only legit case where we need a standardized API to userspace (since CPU access isn't already associated w/ some other kernel device file where some extra ioctl can be added)
The CPU case will still need to wait on an arbitrarily backed sync primitive. It shouldn't need to know if it's backed by the gpu, camera, or dsp.
Right, this is the one place we definitely need something.. some userspace code would just get passed a dmabuf file descriptor and want to mmap it and do something, without really knowing where it came from. I *guess* we'll have to add some ioctl's to the dmabuf fd.
I personally favor having sync primitives have their own anon inode vs. strictly coupling them with dma_buf.
or possible it could be an ioctl in dmabuf to return an sync fd? I do think we want a way to at least optionally stuff a sync obj onto a dmabuf, because pre-existing non android userspace is expecting implicit synchronization in a number of places, and we can't really go breaking all those interfaces. But I don't think that should prevent also being able to do explicit synchronization.
BR, -R
Although I suppose it doesn't have to be part of the first patchset, it would be easy enough to add later if we had to.
This is certainly a case where, without this functionality, we won't use what ever the solution is.
Cheers, Erik
linaro-mm-sig@lists.linaro.org