On Tue, Mar 25, 2014 at 07:01:10PM +0100, Sam Ravnborg wrote:
> >
> > There are two things that don't work too well with this. First this
> > causes the build to break if the build machine doesn't have the new
> > public header (include/uapi/linux/dma-buf.h) installed yet. So the only
> > way to make this work would be by building the kernel once with SAMPLES
> > disabled, install the headers and then build again with SAMPLES enabled.
> > Which really isn't very nice.
> >
> > One other option that I've tried is to modify the include path so that
> > the test program would get the in-tree copy of the public header file,
> > but that didn't build properly either because the header files aren't
> > properly sanitized and therefore the compiler complains about it
> > (include/uapi/linux/types.h).
> >
> > One other disadvantage of carrying the sample program in the tree is
> > that there's only infrastructure to build programs natively on the build
> > machine. That's somewhat unfortunate because if you want to run the test
> > program on a different architecture you have to either compile the
> > kernel natively on that architecture (which isn't very practical on many
> > embedded devices) or cross-compile manually.
> >
> > I think a much nicer solution would be to add infrastructure to cross-
> > compile these test programs, so that they end up being built for the
> > same architecture as the kernel image (i.e. using CROSS_COMPILE).
> >
> > Adding Michal and the linux-kbuild mailing list, perhaps this has been
> > discussed before, or maybe somebody has a better idea on how to solve
> > this.
> I actually looked into this some time ago.
> May try to dust off the patch.
> IIRC the kernel provided headers were used for building - not the one installed on the machine.
> And crosscompile were supported.
That sounds exactly like what I'd want for this. If you need any help,
please let me know.
Thanks,
Thierry
On Mon, 9 Jun 2014 14:06:33 +0300
Pekka Paalanen <pekka.paalanen(a)collabora.co.uk> wrote:
> On Mon, 9 Jun 2014 11:00:04 +0200
> Benjamin Gaignard <benjamin.gaignard(a)linaro.org> wrote:
>
> > On my hardware the patches you have (+ this one on gstwaylandsink
> > https://bugzilla.gnome.org/show_bug.cgi?id=711155) allow me to do zero
> > copy between the hardware video decoder and the display engine. I
> > don't have implemented GPU yet because my hardware is able to do
> > compose few video overlays planes and it was enough for my tests.
>
> Right.
>
> What I have been thinking is, that the compositor must be able to use
> the new wl_buffer and we need to guarantee that before-hand. If the
> compositor fails to use a wl_buffer when the client has already
> attached it to a wl_surface and it is time to repaint, it is too late
> and the user will see a glitch. Recovering from that requires asking
> the client to provide a new wl_buffer of a different kind, which might
> take time. Or a very rude compositor would just send a protocol error,
> and then we'd get bug reports like "the video player just disappears
> when I try to play (and ps. I have an old kernel that doesn't support
> importing whatever)".
>
> I believe we must allow the compositor to test the wl_buffer before it
> is usable for the client. That is the reason for the roundtrippy design
> of the below proposal.
>
> Because we do not even try to communicate all the possible restrictions
> to the client for it to match, we can leave the validation strictly as
> a kernel internal issue. Buffer migration inside the kernel might even
> magically solve some of the mismatches. It does leave the problem of
> what can the client do, if it doesn't fill all the requirements for the
> compositor to be able to import the dmabufs. But what restrictions
> other than color format we can or should communicate, and where does
> user space get them in the first place... *hand-waving*
>
> But, this also leaves it up to the compositor to choose how/where it
> wants to import the dmabufs. If a compositor is usually compositing
> with GL, it will try to import with EGL on whatever GPU it is using. If
> the compositor uses a software renderer, it can try to mmap the dmabufs
> (or try this as a fallback, if the EGL import fails). If the compositor
> is absolutely sure it can rely on the hardware display engine to
> composite these buffers (note, buffers! You don't know to which
> surfaces these buffers will be attached to), it can import directly
> with DRM as FB objects, or V4L, or whatever. A compositor with the
> fullscreen shell extension but without the sub-surface extension comes
> to mind.
>
> In summary, the compositor must be able to use the wl_buffer in its
> default/fallback compositing path. If the wl_buffer is also suitable
> for direct scanout, e.g. on an overlay, that is "just" a bonus.
>
> With the round-trippy design, I am assuming that you can
> export-pass-import a set of dmabufs once, and then reuse them as long
> as you don't need to e.g. resize them. Is this a reasonable assumption?
> Are there any, for instance, hardware video decoders that just insist on
> exporting a new buffer for every frame?
>
> I am tracking the proposal in
> http://cgit.collabora.com/git/user/pq/weston.git/log/?h=linux_dmabuf
>
> So far I added back the event to advertise the supported
> drm_fourcc formats, since that is probably quite crucial.
Yeah, about that...
https://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_i…
provides no way for the compositor to query, what formats the EGL
implementation might support for importing dmabufs. I'm not sure GBM
has that yet either.
So there is no way a compositor could advertise the set of supported
formats, since it has no way of knowing, has it?
Any suggested solutions for this? Or would probing (export dmabuf, send
to compositor, wait for compositor to ack/reject) for suitable formats
be enough?
Thanks,
pq
> > 2014-06-06 17:30 GMT+02:00 Pekka Paalanen <pekka.paalanen(a)collabora.co.uk>:
> > > Hi,
> > >
> > > the previous attempt at introducing a generic wl_dmabuf protocol to
> > > Wayland didn't end too well:
> > > http://lists.freedesktop.org/archives/wayland-devel/2013-December/012390.ht…
> > > http://lists.freedesktop.org/archives/wayland-devel/2013-December/012455.ht…
> > > http://lists.freedesktop.org/archives/wayland-devel/2013-December/012566.ht…
> > > http://lists.freedesktop.org/archives/wayland-devel/2014-January/012727.html
> > >
> > > We are again interested in this, and I did a quick Friday evening draft
> > > to open the discussion again. The base of the draft was a quick look at
> > > https://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_i…
> > >
> > > The basic idea is, that a client has one or more dmabufs it wants to
> > > share with the compositor, making up a single logical buffer (a single
> > > image). The client chooses where and how to export those dmabufs. The
> > > dmabuf fds and metadata are sent to the compositor, the compositor
> > > assembles and tries to import them. If the import succeeds, a wl_buffer
> > > object is created. If the import fails, the client is notified that the
> > > compositor can't use these, it would be better to try something else.
> > >
> > > I assume that if the "import" succeeds, the compositor is able to use
> > > the buffers, e.g. at least turn them into a GL-texture or mmap them, if
> > > not also able to scan out or put on a hw overlay. This could be any kind
> > > of checking to verify that the buffers are usable. Finding out that it
> > > won't work after the client is already using the wl_buffer must not
> > > happen, as we have no way to recover from it: the client will get
> > > disconnected. So the point is knowing in advance, that the buffers are
> > > usable on both sides, preferrably before the client has filled them with
> > > data, but I suppose in the usual case the buffer is already filled.
> > >
> > > As creating a dmabuf-based wl_buffer requires a roundtrip in this
> > > scheme, I assume it only needs to be done rarely, and the same buffer
> > > can be re-used many times with proper synchronization.
> > >
> > > The crude draft is below. Some questions:
> > > - Does this sound sane to you?
> > > - What other metadata would we need? Thierry had some issues with
> > > tiling formats I think.
> > > - This "check if the dmabuf is really usable" is needed, right? We
> > > can't just assume that any dmabuf will work?
> > > - Do we need anything for fences here, or is the dmabuf fd enough?
> > > - Does someone already have something similar running?
On Mon, Jun 9, 2014 at 8:44 AM, Pekka Paalanen
<pekka.paalanen(a)collabora.co.uk> wrote:
> On Mon, 9 Jun 2014 12:23:18 +0100
> Daniel Stone <daniel(a)fooishbar.org> wrote:
>
>> Hi,
>>
>> On 9 June 2014 12:06, Pekka Paalanen <pekka.paalanen(a)collabora.co.uk> wrote:
>>
>> > On Mon, 9 Jun 2014 11:00:04 +0200
>> > Benjamin Gaignard <benjamin.gaignard(a)linaro.org> wrote:
>> > > One of the main comment on the latest patches was that wl_dmabuf use
>> > > DRM for buffer allocation.
>> > > This appear to be an issue since wayland doesn't want to rely on one
>> > > specific framework (DRM, or V4L2) for buffer allocation, so we have
>> > > start working on a "central dmabuf allocation" on kernel side. The
>> > > goal is provide some as generic as possible to make it acceptable by
>> > > wayland.
>> >
>> > Why would Wayland need a central allocator for dmabuf?
>> >
>>
>> I think you've just answered your own question further below:
>>
>>
>> > > On my hardware the patches you have (+ this one on gstwaylandsink
>> > > https://bugzilla.gnome.org/show_bug.cgi?id=711155) allow me to do zero
>> > > copy between the hardware video decoder and the display engine. I
>> > > don't have implemented GPU yet because my hardware is able to do
>> > > compose few video overlays planes and it was enough for my tests.
>> >
>> > Right.
>> >
>> > What I have been thinking is, that the compositor must be able to use
>> > the new wl_buffer and we need to guarantee that before-hand. If the
>> > compositor fails to use a wl_buffer when the client has already
>> > attached it to a wl_surface and it is time to repaint, it is too late
>> > and the user will see a glitch. Recovering from that requires asking
>> > the client to provide a new wl_buffer of a different kind, which might
>> > take time. Or a very rude compositor would just send a protocol error,
>> > and then we'd get bug reports like "the video player just disappears
>> > when I try to play (and ps. I have an old kernel that doesn't support
>> > importing whatever)".
>> >
>> > I believe we must allow the compositor to test the wl_buffer before it
>> > is usable for the client. That is the reason for the roundtrippy design
>> > of the below proposal.
>> >
>>
>> A central allocator would solve these issues, by having everyone agree on
>> the restrictions upfront, instead of working out which of the media decode
>> engine, camera, GPU, or display controller is the lowest common
>> denominator, and forcing all allocations through there.
>>
>> One such solution was discussed a while back WRT ION:
>> https://lwn.net/Articles/565469/
>>
>> See the 'possible solutions' part for a way for people to agree on
>> restrictions wrt tiling, stride, contiguousness, etc.
>
> Hi,
>
> that's an excellent article. I didn't know that delayed allocation of
> dmabufs was not even possible yet, which would have allowed us to
> not think about importing failures and simply let the client fall back
> with "ok, don't use dmabuf with this particular device then".
hrm? I know of at least a couple drm drivers that defer allocation of
backing pages..
> What is the conclusion here?
>
> Wayland protocol does not need to consider import failures at all, and
> can simply punt those as protocol errors, which essentially kill the app
> if they ever happen?
>
> Do we need to wait for the central allocator in kernel to materialize
> before we can design the protocol? Is it simply too early to try to do
> it now?
I do tend to think the ION/central-allocator is just substituting one
problem for another. It doesn't really solve the problem of how
different devices which don't actually know each other can decide on
buffers that they can share. On an phone/tablet/etc you know up front
when building the kernel what devices there are and in what uses-cases
they will be used, etc. But that isn't really solving the more
general case.
> Was the idea of dmabuf in-kernel constraint negotiation with delayed
> allocation rejected in favour of a central allocator?
not really, that I know of. I still think we need to spiff out
dma-mapping to better handle placement constraints. (Although still
prefer format constraints to be a userspace topic.)
pengutronix is doing some work in this area:
http://elinux.org/images/b/b0/OSELAS.Presentation-DMABUF-migration.pdf
BR,
-R
> Will Intel, Nouveau and Radeon support the central allocator, or will
> it remain for ARM-related devices only?
>
>
> Thanks,
> pq
On 06/09/2014 01:23 PM, Daniel Stone wrote:
> Hi,
>
> On 9 June 2014 12:06, Pekka Paalanen <pekka.paalanen(a)collabora.co.uk
> <mailto:pekka.paalanen@collabora.co.uk>> wrote:
>
> On Mon, 9 Jun 2014 11:00:04 +0200
> Benjamin Gaignard <benjamin.gaignard(a)linaro.org
> <mailto:benjamin.gaignard@linaro.org>> wrote:
> > One of the main comment on the latest patches was that wl_dmabuf use
> > DRM for buffer allocation.
> > This appear to be an issue since wayland doesn't want to rely on one
> > specific framework (DRM, or V4L2) for buffer allocation, so we have
> > start working on a "central dmabuf allocation" on kernel side. The
> > goal is provide some as generic as possible to make it acceptable by
> > wayland.
>
> Why would Wayland need a central allocator for dmabuf?
>
>
> I think you've just answered your own question further below:
>
>
> > On my hardware the patches you have (+ this one on gstwaylandsink
> > https://bugzilla.gnome.org/show_bug.cgi?id=711155
> <https://urldefense.proofpoint.com/v1/url?u=https://bugzilla.gnome.org/show_…>)
> allow me to do zero
> > copy between the hardware video decoder and the display engine. I
> > don't have implemented GPU yet because my hardware is able to do
> > compose few video overlays planes and it was enough for my tests.
>
> Right.
>
> What I have been thinking is, that the compositor must be able to use
> the new wl_buffer and we need to guarantee that before-hand. If the
> compositor fails to use a wl_buffer when the client has already
> attached it to a wl_surface and it is time to repaint, it is too late
> and the user will see a glitch. Recovering from that requires asking
> the client to provide a new wl_buffer of a different kind, which might
> take time. Or a very rude compositor would just send a protocol error,
> and then we'd get bug reports like "the video player just disappears
> when I try to play (and ps. I have an old kernel that doesn't support
> importing whatever)".
>
> I believe we must allow the compositor to test the wl_buffer before it
> is usable for the client. That is the reason for the roundtrippy
> design
> of the below proposal.
>
>
> A central allocator would solve these issues, by having everyone agree
> on the restrictions upfront, instead of working out which of the media
> decode engine, camera, GPU, or display controller is the lowest common
> denominator, and forcing all allocations through there.
>
> One such solution was discussed a while back WRT ION:
> https://lwn.net/Articles/565469/
> <https://urldefense.proofpoint.com/v1/url?u=https://lwn.net/Articles/565469/…>
>
> See the 'possible solutions' part for a way for people to agree on
> restrictions wrt tiling, stride, contiguousness, etc.
Hi!
I think before deciding on something like this, one needs also to
account for the virtual drivers like vmwgfx.
Here, a dma-buf internally holds an opaque handle to an object on the
host / hypervisor, and the actual memory buffer is only temporarily
allocated for dma-buf operations that strictly need it. Not to hold the
data while transferring it between devices or applications.
Let's say you'd want to use a USB display controller in a virtual
machine with the vmwgfx exported prime objects, for example. There's no
common denominator. The vmwgfx driver would need to read the dma-buf
data from the host object at sg-table export (dma-buf map) time.
Whereas if you just want to share data between a wayland server and
client, no pages are ever allocated and the only thing passed
around is in effect the opaque handle to the host / hypervisor object.
I'm currently having trouble seeing how a central allocator would be
able to deal with this?
/Thomas
>
> Cheers,
> Daniel
>
>
> _______________________________________________
> wayland-devel mailing list
> wayland-devel(a)lists.freedesktop.org
> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mai…