On Mon, Jun 9, 2014 at 8:44 AM, Pekka Paalanen pekka.paalanen@collabora.co.uk wrote:
On Mon, 9 Jun 2014 12:23:18 +0100 Daniel Stone daniel@fooishbar.org wrote:
Hi,
On 9 June 2014 12:06, Pekka Paalanen pekka.paalanen@collabora.co.uk wrote:
On Mon, 9 Jun 2014 11:00:04 +0200 Benjamin Gaignard benjamin.gaignard@linaro.org wrote:
One of the main comment on the latest patches was that wl_dmabuf use DRM for buffer allocation. This appear to be an issue since wayland doesn't want to rely on one specific framework (DRM, or V4L2) for buffer allocation, so we have start working on a "central dmabuf allocation" on kernel side. The goal is provide some as generic as possible to make it acceptable by wayland.
Why would Wayland need a central allocator for dmabuf?
I think you've just answered your own question further below:
On my hardware the patches you have (+ this one on gstwaylandsink https://bugzilla.gnome.org/show_bug.cgi?id=711155) allow me to do zero copy between the hardware video decoder and the display engine. I don't have implemented GPU yet because my hardware is able to do compose few video overlays planes and it was enough for my tests.
Right.
What I have been thinking is, that the compositor must be able to use the new wl_buffer and we need to guarantee that before-hand. If the compositor fails to use a wl_buffer when the client has already attached it to a wl_surface and it is time to repaint, it is too late and the user will see a glitch. Recovering from that requires asking the client to provide a new wl_buffer of a different kind, which might take time. Or a very rude compositor would just send a protocol error, and then we'd get bug reports like "the video player just disappears when I try to play (and ps. I have an old kernel that doesn't support importing whatever)".
I believe we must allow the compositor to test the wl_buffer before it is usable for the client. That is the reason for the roundtrippy design of the below proposal.
A central allocator would solve these issues, by having everyone agree on the restrictions upfront, instead of working out which of the media decode engine, camera, GPU, or display controller is the lowest common denominator, and forcing all allocations through there.
One such solution was discussed a while back WRT ION: https://lwn.net/Articles/565469/
See the 'possible solutions' part for a way for people to agree on restrictions wrt tiling, stride, contiguousness, etc.
Hi,
that's an excellent article. I didn't know that delayed allocation of dmabufs was not even possible yet, which would have allowed us to not think about importing failures and simply let the client fall back with "ok, don't use dmabuf with this particular device then".
hrm? I know of at least a couple drm drivers that defer allocation of backing pages..
What is the conclusion here?
Wayland protocol does not need to consider import failures at all, and can simply punt those as protocol errors, which essentially kill the app if they ever happen?
Do we need to wait for the central allocator in kernel to materialize before we can design the protocol? Is it simply too early to try to do it now?
I do tend to think the ION/central-allocator is just substituting one problem for another. It doesn't really solve the problem of how different devices which don't actually know each other can decide on buffers that they can share. On an phone/tablet/etc you know up front when building the kernel what devices there are and in what uses-cases they will be used, etc. But that isn't really solving the more general case.
Was the idea of dmabuf in-kernel constraint negotiation with delayed allocation rejected in favour of a central allocator?
not really, that I know of. I still think we need to spiff out dma-mapping to better handle placement constraints. (Although still prefer format constraints to be a userspace topic.)
pengutronix is doing some work in this area:
http://elinux.org/images/b/b0/OSELAS.Presentation-DMABUF-migration.pdf
BR, -R
Will Intel, Nouveau and Radeon support the central allocator, or will it remain for ARM-related devices only?
Thanks, pq
Hi All,
On 11 June 2014 21:30, Rob Clark robdclark@gmail.com wrote:
On Mon, Jun 9, 2014 at 8:44 AM, Pekka Paalanen pekka.paalanen@collabora.co.uk wrote:
On Mon, 9 Jun 2014 12:23:18 +0100 Daniel Stone daniel@fooishbar.org wrote:
Hi,
On 9 June 2014 12:06, Pekka Paalanen pekka.paalanen@collabora.co.uk wrote:
On Mon, 9 Jun 2014 11:00:04 +0200 Benjamin Gaignard benjamin.gaignard@linaro.org wrote:
One of the main comment on the latest patches was that wl_dmabuf use DRM for buffer allocation. This appear to be an issue since wayland doesn't want to rely on one specific framework (DRM, or V4L2) for buffer allocation, so we have start working on a "central dmabuf allocation" on kernel side. The goal is provide some as generic as possible to make it acceptable by wayland.
Why would Wayland need a central allocator for dmabuf?
I think you've just answered your own question further below:
On my hardware the patches you have (+ this one on gstwaylandsink https://bugzilla.gnome.org/show_bug.cgi?id=711155) allow me to do zero copy between the hardware video decoder and the display engine. I don't have implemented GPU yet because my hardware is able to do compose few video overlays planes and it was enough for my tests.
Right.
What I have been thinking is, that the compositor must be able to use the new wl_buffer and we need to guarantee that before-hand. If the compositor fails to use a wl_buffer when the client has already attached it to a wl_surface and it is time to repaint, it is too late and the user will see a glitch. Recovering from that requires asking the client to provide a new wl_buffer of a different kind, which might take time. Or a very rude compositor would just send a protocol error, and then we'd get bug reports like "the video player just disappears when I try to play (and ps. I have an old kernel that doesn't support importing whatever)".
I believe we must allow the compositor to test the wl_buffer before it is usable for the client. That is the reason for the roundtrippy design of the below proposal.
A central allocator would solve these issues, by having everyone agree on the restrictions upfront, instead of working out which of the media decode engine, camera, GPU, or display controller is the lowest common denominator, and forcing all allocations through there.
One such solution was discussed a while back WRT ION: https://lwn.net/Articles/565469/
See the 'possible solutions' part for a way for people to agree on restrictions wrt tiling, stride, contiguousness, etc.
Hi,
that's an excellent article. I didn't know that delayed allocation of dmabufs was not even possible yet, which would have allowed us to not think about importing failures and simply let the client fall back with "ok, don't use dmabuf with this particular device then".
hrm? I know of at least a couple drm drivers that defer allocation of backing pages..
What is the conclusion here?
Wayland protocol does not need to consider import failures at all, and can simply punt those as protocol errors, which essentially kill the app if they ever happen?
Do we need to wait for the central allocator in kernel to materialize before we can design the protocol? Is it simply too early to try to do it now?
I do tend to think the ION/central-allocator is just substituting one problem for another. It doesn't really solve the problem of how different devices which don't actually know each other can decide on buffers that they can share. On an phone/tablet/etc you know up front when building the kernel what devices there are and in what uses-cases they will be used, etc. But that isn't really solving the more general case.
I think I should set up little better nomenclature on the 'central allocator' - what we (at Linaro, including Benjamin) are referring to is more a 'constraint aware' allocator for dma-buf [1] which we are working on, where at dma_buf_attach() time, each of the importers can share its 'constraints', and then at the first dma_buf_map_attachment(), the exporter can then allocate based on the constraints for the current set of attached devices. We're also trying to write some helper functions to help the exporters with this allocation based on constraint-masks. [list of allocators that can allocate for a given constraint-mask] - and of course, the helpers and constraint-aware-allocator would be entirely optional to use, in case there are dma-buf exporters who wish to allocate / arbitrate in another way.
Was the idea of dmabuf in-kernel constraint negotiation with delayed allocation rejected in favour of a central allocator?
not really, that I know of. I still think we need to spiff out dma-mapping to better handle placement constraints. (Although still prefer format constraints to be a userspace topic.)
I hope the above comment clarifies this point too - what we're talking of is exactly a delayed allocation dependent on collecting constraints of each importer beforehand.
pengutronix is doing some work in this area:
http://elinux.org/images/b/b0/OSELAS.Presentation-DMABUF-migration.pdf
BR, -R
Best regards, ~Sumit.
[1]: http://www.linaro.org/documents/download/3290870b08d02fea81ddbd53315ff9bf531...
Will Intel, Nouveau and Radeon support the central allocator, or will it remain for ARM-related devices only?
Thanks, pq
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Wed, 11 Jun 2014 12:00:57 -0400 Rob Clark robdclark@gmail.com wrote:
On Mon, Jun 9, 2014 at 8:44 AM, Pekka Paalanen pekka.paalanen@collabora.co.uk wrote:
On Mon, 9 Jun 2014 12:23:18 +0100 Daniel Stone daniel@fooishbar.org wrote:
Hi,
On 9 June 2014 12:06, Pekka Paalanen pekka.paalanen@collabora.co.uk wrote:
On Mon, 9 Jun 2014 11:00:04 +0200 Benjamin Gaignard benjamin.gaignard@linaro.org wrote:
One of the main comment on the latest patches was that wl_dmabuf use DRM for buffer allocation. This appear to be an issue since wayland doesn't want to rely on one specific framework (DRM, or V4L2) for buffer allocation, so we have start working on a "central dmabuf allocation" on kernel side. The goal is provide some as generic as possible to make it acceptable by wayland.
Why would Wayland need a central allocator for dmabuf?
I think you've just answered your own question further below:
On my hardware the patches you have (+ this one on gstwaylandsink https://bugzilla.gnome.org/show_bug.cgi?id=711155) allow me to do zero copy between the hardware video decoder and the display engine. I don't have implemented GPU yet because my hardware is able to do compose few video overlays planes and it was enough for my tests.
Right.
What I have been thinking is, that the compositor must be able to use the new wl_buffer and we need to guarantee that before-hand. If the compositor fails to use a wl_buffer when the client has already attached it to a wl_surface and it is time to repaint, it is too late and the user will see a glitch. Recovering from that requires asking the client to provide a new wl_buffer of a different kind, which might take time. Or a very rude compositor would just send a protocol error, and then we'd get bug reports like "the video player just disappears when I try to play (and ps. I have an old kernel that doesn't support importing whatever)".
I believe we must allow the compositor to test the wl_buffer before it is usable for the client. That is the reason for the roundtrippy design of the below proposal.
A central allocator would solve these issues, by having everyone agree on the restrictions upfront, instead of working out which of the media decode engine, camera, GPU, or display controller is the lowest common denominator, and forcing all allocations through there.
One such solution was discussed a while back WRT ION: https://lwn.net/Articles/565469/
See the 'possible solutions' part for a way for people to agree on restrictions wrt tiling, stride, contiguousness, etc.
Hi,
that's an excellent article. I didn't know that delayed allocation of dmabufs was not even possible yet, which would have allowed us to not think about importing failures and simply let the client fall back with "ok, don't use dmabuf with this particular device then".
hrm? I know of at least a couple drm drivers that defer allocation of backing pages..
I came a bit harsh there. So it is possible, and few drivers might even do it already, but is there even an intention of requiring all drivers to be able to defer allocation?
Though if migration is going to work, the only downside of not doing deferred allocation would be a performance penalty in the beginning, right?
What is the conclusion here?
Wayland protocol does not need to consider import failures at all, and can simply punt those as protocol errors, which essentially kill the app if they ever happen?
Do we need to wait for the central allocator in kernel to materialize before we can design the protocol? Is it simply too early to try to do it now?
I do tend to think the ION/central-allocator is just substituting one problem for another. It doesn't really solve the problem of how different devices which don't actually know each other can decide on buffers that they can share. On an phone/tablet/etc you know up front when building the kernel what devices there are and in what uses-cases they will be used, etc. But that isn't really solving the more general case.
Right, as I have been following the PC side in the past a lot more than ARM or embedded, a central allocator seemed a little strange as the final solution to me too.
Was the idea of dmabuf in-kernel constraint negotiation with delayed allocation rejected in favour of a central allocator?
not really, that I know of. I still think we need to spiff out dma-mapping to better handle placement constraints. (Although still prefer format constraints to be a userspace topic.)
Sure. What I specifically am interested in, which all things would be left for user space to control and match, as that would affect the Wayland protocol for dmabufs via APIs like GBM and V4L.
pengutronix is doing some work in this area:
http://elinux.org/images/b/b0/OSELAS.Presentation-DMABUF-migration.pdf
That is cool, and it also tells me that it is ok for the initial dmabuf sharing and creating a wl_buffer protocol object to be expensive (require one roundtrip per batch of buffers), as the setup may involve migration even in a good case and buffer re-use is heavily recommended.
This brings a question in my mind.
A Wayland compositor must be able to use a dmabuf-based wl_buffer for at least its fallback compositing path, let's say GLESv2 and we are able to directly texture from the dmabuf. Then the compositor sees an opportunity to promote the surface to a hardware overlay, and attempts to, say, import the dmabuf a second time as a DRM FB. If it is not possible to satisfy all of exporter, EGL-import and DRM-import restrictions at the same time, and especially if exporter vs. DRM-import would cause ping-ponging, it would be better to just let the DRM-import fail, and continue with GLESv2 compositing.
Would you agree?
Could dmabuf related interfaces somehow allow for the user space to choose how much pain is tolerable for the import to succeed?
Thanks, pq
On Thu, Jun 12, 2014 at 2:01 AM, Pekka Paalanen pekka.paalanen@collabora.co.uk wrote:
On Wed, 11 Jun 2014 12:00:57 -0400 Rob Clark robdclark@gmail.com wrote:
On Mon, Jun 9, 2014 at 8:44 AM, Pekka Paalanen pekka.paalanen@collabora.co.uk wrote:
On Mon, 9 Jun 2014 12:23:18 +0100 Daniel Stone daniel@fooishbar.org wrote:
Hi,
On 9 June 2014 12:06, Pekka Paalanen pekka.paalanen@collabora.co.uk wrote:
On Mon, 9 Jun 2014 11:00:04 +0200 Benjamin Gaignard benjamin.gaignard@linaro.org wrote:
One of the main comment on the latest patches was that wl_dmabuf use DRM for buffer allocation. This appear to be an issue since wayland doesn't want to rely on one specific framework (DRM, or V4L2) for buffer allocation, so we have start working on a "central dmabuf allocation" on kernel side. The goal is provide some as generic as possible to make it acceptable by wayland.
Why would Wayland need a central allocator for dmabuf?
I think you've just answered your own question further below:
On my hardware the patches you have (+ this one on gstwaylandsink https://bugzilla.gnome.org/show_bug.cgi?id=711155) allow me to do zero copy between the hardware video decoder and the display engine. I don't have implemented GPU yet because my hardware is able to do compose few video overlays planes and it was enough for my tests.
Right.
What I have been thinking is, that the compositor must be able to use the new wl_buffer and we need to guarantee that before-hand. If the compositor fails to use a wl_buffer when the client has already attached it to a wl_surface and it is time to repaint, it is too late and the user will see a glitch. Recovering from that requires asking the client to provide a new wl_buffer of a different kind, which might take time. Or a very rude compositor would just send a protocol error, and then we'd get bug reports like "the video player just disappears when I try to play (and ps. I have an old kernel that doesn't support importing whatever)".
I believe we must allow the compositor to test the wl_buffer before it is usable for the client. That is the reason for the roundtrippy design of the below proposal.
A central allocator would solve these issues, by having everyone agree on the restrictions upfront, instead of working out which of the media decode engine, camera, GPU, or display controller is the lowest common denominator, and forcing all allocations through there.
One such solution was discussed a while back WRT ION: https://lwn.net/Articles/565469/
See the 'possible solutions' part for a way for people to agree on restrictions wrt tiling, stride, contiguousness, etc.
Hi,
that's an excellent article. I didn't know that delayed allocation of dmabufs was not even possible yet, which would have allowed us to not think about importing failures and simply let the client fall back with "ok, don't use dmabuf with this particular device then".
hrm? I know of at least a couple drm drivers that defer allocation of backing pages..
I came a bit harsh there. So it is possible, and few drivers might even do it already, but is there even an intention of requiring all drivers to be able to defer allocation?
not sure I'd go as far as to require it, but it is a pretty silly optimization to skip..
Though if migration is going to work, the only downside of not doing deferred allocation would be a performance penalty in the beginning, right?
right
What is the conclusion here?
Wayland protocol does not need to consider import failures at all, and can simply punt those as protocol errors, which essentially kill the app if they ever happen?
Do we need to wait for the central allocator in kernel to materialize before we can design the protocol? Is it simply too early to try to do it now?
I do tend to think the ION/central-allocator is just substituting one problem for another. It doesn't really solve the problem of how different devices which don't actually know each other can decide on buffers that they can share. On an phone/tablet/etc you know up front when building the kernel what devices there are and in what uses-cases they will be used, etc. But that isn't really solving the more general case.
Right, as I have been following the PC side in the past a lot more than ARM or embedded, a central allocator seemed a little strange as the final solution to me too.
Was the idea of dmabuf in-kernel constraint negotiation with delayed allocation rejected in favour of a central allocator?
not really, that I know of. I still think we need to spiff out dma-mapping to better handle placement constraints. (Although still prefer format constraints to be a userspace topic.)
Sure. What I specifically am interested in, which all things would be left for user space to control and match, as that would affect the Wayland protocol for dmabufs via APIs like GBM and V4L.
I try to divide buffer constraints into two categories: 1) placement, ie. where the actual pages go (contiguous, special memory range, etc) 2) format (fourcc, tiling format, pitch restrictions)
For most (all?) of the drm drivers, at the GEM level we do not necessarily have any information about category #2. All the kernel cares about is category #1 in most cases.
Also, in at least some cases (gstreamer is a good example), there is already a mechanism in place for negotiating #2.
This is my reasoning behind the conclusion that dmabuf (and kernel level APIs) should care about #1, and userspace should care about #2.
pengutronix is doing some work in this area:
http://elinux.org/images/b/b0/OSELAS.Presentation-DMABUF-migration.pdf
That is cool, and it also tells me that it is ok for the initial dmabuf sharing and creating a wl_buffer protocol object to be expensive (require one roundtrip per batch of buffers), as the setup may involve migration even in a good case and buffer re-use is heavily recommended.
This brings a question in my mind.
A Wayland compositor must be able to use a dmabuf-based wl_buffer for at least its fallback compositing path, let's say GLESv2 and we are able to directly texture from the dmabuf. Then the compositor sees an opportunity to promote the surface to a hardware overlay, and attempts to, say, import the dmabuf a second time as a DRM FB. If it is not possible to satisfy all of exporter, EGL-import and DRM-import restrictions at the same time, and especially if exporter vs. DRM-import would cause ping-ponging, it would be better to just let the DRM-import fail, and continue with GLESv2 compositing.
Would you agree?
Could dmabuf related interfaces somehow allow for the user space to choose how much pain is tolerable for the import to succeed?
hmm, this is actually an interesting idea. So far the assumption has been that, if you could not actually share buffers between devices that userspace would do something different.
It seems like it would be worthwhile for userspace to know how expensive sharing will be vs just using the window surface as a texture..
BR, -R
Thanks, pq
linaro-mm-sig@lists.linaro.org