dma-buf non-coherent mmap

List overview All Threads
Download

newer

older

RFC: get_dma_buf_unless_zero ?

thoughts of looking at android...

Thomas Hellstrom

31 Oct 2013 31 Oct '13

5 p.m.

Hi!

I'm just looking over what's needed to implement drm Prime / dma-buf exports + imports in the vmwgfx driver. It seems like most dma-bufs ops are quite straightforward to implement except user-space mmap().

The reason being that vmwgfx dma-bufs will be using completely non-coherent memory, whenever there needs to be CPU accesses.

The accelerated contents resides in an opaque structure on the device into which we can DMA to and from, so for mmap to work we need to zap ptes and DMA to the device when doing something accelerated, and on the first page-fault DMA data back and wait for idle if the device did a write to the dma-buf.

Now this shouldn't really be a problem if dma-bufs were only used for cross-device sharing, but since people apparently want to use dma-buf file handles to share CPU data between processes it really becomes a serious problem.

Needless to say we'd want to limit the size of the DMAs, and have mmap users request regions for read, and mark regions dirty for write, something similar to gallium's texture transfer objects.

Any ideas?

/Thomas

Show replies by date

Rob Clark

31 Oct 31 Oct

5:52 p.m.

On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...

Hi!

I'm just looking over what's needed to implement drm Prime / dma-buf exports

imports in the vmwgfx driver. It seems like most dma-bufs ops are quite

straightforward to implement except user-space mmap().

The reason being that vmwgfx dma-bufs will be using completely non-coherent memory, whenever there needs to be CPU accesses.

The accelerated contents resides in an opaque structure on the device into which we can DMA to and from, so for mmap to work we need to zap ptes and DMA to the device when doing something accelerated, and on the first page-fault DMA data back and wait for idle if the device did a write to the dma-buf.

Now this shouldn't really be a problem if dma-bufs were only used for cross-device sharing, but since people apparently want to use dma-buf file handles to share CPU data between processes it really becomes a serious problem.

Needless to say we'd want to limit the size of the DMAs, and have mmap users request regions for read, and mark regions dirty for write, something similar to gallium's texture transfer objects.

Any ideas?

well, I think vmwgfx is part of the reason we decided mmap would be optional for dmabuf. So perhaps it is an option to simply ignore mmap?

BR, -R

...

/Thomas _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Thomas Hellstrom

8:40 p.m.

On 10/31/2013 06:52 PM, Rob Clark wrote:

...

On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
Hi!

I'm just looking over what's needed to implement drm Prime / dma-buf exports

imports in the vmwgfx driver. It seems like most dma-bufs ops are quite

straightforward to implement except user-space mmap().

The reason being that vmwgfx dma-bufs will be using completely non-coherent memory, whenever there needs to be CPU accesses.

The accelerated contents resides in an opaque structure on the device into which we can DMA to and from, so for mmap to work we need to zap ptes and DMA to the device when doing something accelerated, and on the first page-fault DMA data back and wait for idle if the device did a write to the dma-buf.

Now this shouldn't really be a problem if dma-bufs were only used for cross-device sharing, but since people apparently want to use dma-buf file handles to share CPU data between processes it really becomes a serious problem.

Needless to say we'd want to limit the size of the DMAs, and have mmap users request regions for read, and mark regions dirty for write, something similar to gallium's texture transfer objects.

Any ideas?

well, I think vmwgfx is part of the reason we decided mmap would be optional for dmabuf. So perhaps it is an option to simply ignore mmap?

BR, -R

Well, I'd be happy to avoid mmap, but then what does optional mean in this context? That all generic user-space apps *must* implement a workaround if mmap isn't implemented?

It's unfortunate a bit like implicit synchronization mentioned in section 3) in Direct Userspace Access/mmap Support in the kernel dma-buf doc: It should be avoided, otherwise it might be relied upon by userspace and exporters not implementing it will suffer.

In reality, people will start using mmap() and won't care to implement workarounds if it's not supported, and drivers like vmwgfx and non-coherent architectures will suffer.

I haven't looked closely at how DRI3 or Wayland/weston use or will use dma-buf, but if they rely on mmap, we're sort of lost. MIR uses the following scheme: 1) Create a GBM buffer 2) Get a Prime handle, export to client 3) Client imports prime handle, casts into a GBM buffer which is typecast to a dumb buffer 4) Client uses DRM dumb mmap(). :(.

Now, step 3) and step 4) are clearly API violations, but it works by coincidence on major drivers. Now this usage-pattern could easily (and IMHO should) be blocked in DRM by refusing to map dumb buffers that haven't been created as dumb buffers, but If dma-bufs are to be shared by user-space apps to transfer contents in this manner, we need a reliable replacement that works also for non-coherent situations.

Ideally I'd like to see tex[sub]image-like operations, but with dma-buf mmap in place, I'm afraid that's what will be used...

Regars, Thomas

...

...
/Thomas _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Dave Airlie

8:48 p.m.

On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...

On 10/31/2013 06:52 PM, Rob Clark wrote:

...
On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
Hi!

I'm just looking over what's needed to implement drm Prime / dma-buf exports

imports in the vmwgfx driver. It seems like most dma-bufs ops are quite

straightforward to implement except user-space mmap().

The reason being that vmwgfx dma-bufs will be using completely non-coherent memory, whenever there needs to be CPU accesses.

The accelerated contents resides in an opaque structure on the device into which we can DMA to and from, so for mmap to work we need to zap ptes and DMA to the device when doing something accelerated, and on the first page-fault DMA data back and wait for idle if the device did a write to the dma-buf.

Now this shouldn't really be a problem if dma-bufs were only used for cross-device sharing, but since people apparently want to use dma-buf file handles to share CPU data between processes it really becomes a serious problem.

Needless to say we'd want to limit the size of the DMAs, and have mmap users request regions for read, and mark regions dirty for write, something similar to gallium's texture transfer objects.

Any ideas?

well, I think vmwgfx is part of the reason we decided mmap would be optional for dmabuf. So perhaps it is an option to simply ignore mmap?

BR, -R

Well, I'd be happy to avoid mmap, but then what does optional mean in this context? That all generic user-space apps *must* implement a workaround if mmap isn't implemented?

It's unfortunate a bit like implicit synchronization mentioned in section 3) in Direct Userspace Access/mmap Support in the kernel dma-buf doc: It should be avoided, otherwise it might be relied upon by userspace and exporters not implementing it will suffer.

In reality, people will start using mmap() and won't care to implement workarounds if it's not supported, and drivers like vmwgfx and non-coherent architectures will suffer.

I haven't looked closely at how DRI3 or Wayland/weston use or will use dma-buf, but if they rely on mmap, we're sort of lost. MIR uses the following scheme:

DRI3 and wayland won't use dma-buf mmap directly,

using dma-buf mmap directly is wrong for anything that shares objects with itself.

I personally wish we hadn't added mmap support to dma-buf at all, but some people had some use cases that they'll never implement.

If you export a dma-buf to be used by a client it should be using drivers on the client to import the dma-buf and then it should be using mesa.

Dave.

Thomas Hellstrom

9:07 p.m.

On 10/31/2013 09:48 PM, Dave Airlie wrote:

...

On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
On 10/31/2013 06:52 PM, Rob Clark wrote:

...
On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
Hi!

I'm just looking over what's needed to implement drm Prime / dma-buf exports

imports in the vmwgfx driver. It seems like most dma-bufs ops are quite

straightforward to implement except user-space mmap().

The reason being that vmwgfx dma-bufs will be using completely non-coherent memory, whenever there needs to be CPU accesses.

The accelerated contents resides in an opaque structure on the device into which we can DMA to and from, so for mmap to work we need to zap ptes and DMA to the device when doing something accelerated, and on the first page-fault DMA data back and wait for idle if the device did a write to the dma-buf.

Now this shouldn't really be a problem if dma-bufs were only used for cross-device sharing, but since people apparently want to use dma-buf file handles to share CPU data between processes it really becomes a serious problem.

Needless to say we'd want to limit the size of the DMAs, and have mmap users request regions for read, and mark regions dirty for write, something similar to gallium's texture transfer objects.

Any ideas?

well, I think vmwgfx is part of the reason we decided mmap would be optional for dmabuf. So perhaps it is an option to simply ignore mmap?

BR, -R

Well, I'd be happy to avoid mmap, but then what does optional mean in this context? That all generic user-space apps *must* implement a workaround if mmap isn't implemented?

It's unfortunate a bit like implicit synchronization mentioned in section 3) in Direct Userspace Access/mmap Support in the kernel dma-buf doc: It should be avoided, otherwise it might be relied upon by userspace and exporters not implementing it will suffer.

In reality, people will start using mmap() and won't care to implement workarounds if it's not supported, and drivers like vmwgfx and non-coherent architectures will suffer.

I haven't looked closely at how DRI3 or Wayland/weston use or will use dma-buf, but if they rely on mmap, we're sort of lost. MIR uses the following scheme:

DRI3 and wayland won't use dma-buf mmap directly,

using dma-buf mmap directly is wrong for anything that shares objects with itself.

That sounds good to hear. Perhaps we should add that to the dma-buf docs.

...

I personally wish we hadn't added mmap support to dma-buf at all, but some people had some use cases that they'll never implement.

If you export a dma-buf to be used by a client it should be using drivers on the client to import the dma-buf and then it should be using mesa.

Agreed.

/Thomas

...

Dave

Daniel Vetter

11 p.m.

On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:

...

On 10/31/2013 09:48 PM, Dave Airlie wrote:

...
On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
Well, I'd be happy to avoid mmap, but then what does optional mean in this context? That all generic user-space apps *must* implement a workaround if mmap isn't implemented?

It's unfortunate a bit like implicit synchronization mentioned in section 3) in Direct Userspace Access/mmap Support in the kernel dma-buf doc: It should be avoided, otherwise it might be relied upon by userspace and exporters not implementing it will suffer.

In reality, people will start using mmap() and won't care to implement workarounds if it's not supported, and drivers like vmwgfx and non-coherent architectures will suffer.

I haven't looked closely at how DRI3 or Wayland/weston use or will use dma-buf, but if they rely on mmap, we're sort of lost. MIR uses the following scheme:

DRI3 and wayland won't use dma-buf mmap directly,

using dma-buf mmap directly is wrong for anything that shares objects with itself.

That sounds good to hear. Perhaps we should add that to the dma-buf docs.

Userspace mmap was essentially added as a concession to the android ion guys since they really, really wanted it. We've tried to tell them that it's a horrible idea (see all the fun with coherency and syncing), but they said that they have userspace for it already and so we let it be.

Imo if you're not running Android userspace there's no need for this at all. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Jakob Bornecrantz

1 Nov 1 Nov

12:17 a.m.

On Fri, Nov 1, 2013 at 12:00 AM, Daniel Vetter daniel@ffwll.ch wrote:

...

On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:

...
On 10/31/2013 09:48 PM, Dave Airlie wrote:

...
On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
Well, I'd be happy to avoid mmap, but then what does optional mean in this context? That all generic user-space apps *must* implement a workaround if mmap isn't implemented?

It's unfortunate a bit like implicit synchronization mentioned in section 3) in Direct Userspace Access/mmap Support in the kernel dma-buf doc: It should be avoided, otherwise it might be relied upon by userspace and exporters not implementing it will suffer.

In reality, people will start using mmap() and won't care to implement workarounds if it's not supported, and drivers like vmwgfx and non-coherent architectures will suffer.

I haven't looked closely at how DRI3 or Wayland/weston use or will use dma-buf, but if they rely on mmap, we're sort of lost. MIR uses the following scheme:

DRI3 and wayland won't use dma-buf mmap directly,

using dma-buf mmap directly is wrong for anything that shares objects with itself.

That sounds good to hear. Perhaps we should add that to the dma-buf docs.

Userspace mmap was essentially added as a concession to the android ion guys since they really, really wanted it. We've tried to tell them that it's a horrible idea (see all the fun with coherency and syncing), but they said that they have userspace for it already and so we let it be.

Imo if you're not running Android userspace there's no need for this at all.

But now it turns out that gstreamer is using it and our life is hell. We should have made it not work for _any_ driver if CONFIG_ANDRIOD wasn't set.

Cheers, Jakob.

Rob Clark

12:25 a.m.

On Thu, Oct 31, 2013 at 8:17 PM, Jakob Bornecrantz wallbraker@gmail.com wrote:

...

On Fri, Nov 1, 2013 at 12:00 AM, Daniel Vetter daniel@ffwll.ch wrote:

...
On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:

...
On 10/31/2013 09:48 PM, Dave Airlie wrote:

...
On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
Well, I'd be happy to avoid mmap, but then what does optional mean in this context? That all generic user-space apps *must* implement a workaround if mmap isn't implemented?

It's unfortunate a bit like implicit synchronization mentioned in section 3) in Direct Userspace Access/mmap Support in the kernel dma-buf doc: It should be avoided, otherwise it might be relied upon by userspace and exporters not implementing it will suffer.

In reality, people will start using mmap() and won't care to implement workarounds if it's not supported, and drivers like vmwgfx and non-coherent architectures will suffer.

I haven't looked closely at how DRI3 or Wayland/weston use or will use dma-buf, but if they rely on mmap, we're sort of lost. MIR uses the following scheme:

DRI3 and wayland won't use dma-buf mmap directly,

using dma-buf mmap directly is wrong for anything that shares objects with itself.

That sounds good to hear. Perhaps we should add that to the dma-buf docs.

Userspace mmap was essentially added as a concession to the android ion guys since they really, really wanted it. We've tried to tell them that it's a horrible idea (see all the fun with coherency and syncing), but they said that they have userspace for it already and so we let it be.

Imo if you're not running Android userspace there's no need for this at all.

But now it turns out that gstreamer is using it and our life is hell. We should have made it not work for _any_ driver if CONFIG_ANDRIOD wasn't set.

well, at the moment mmap is only implemented by a few of the arm drivers, from the looks of it. So I guess in the near term it is mostly going to be of interest to the SoC crowd. Not sure about the long term, perhaps we should see about turning gallium pipe_transfer/map stuff into a kernel interface (ioctl's directly on the dmabuf fd, perhaps?)

BR, -R

...

Cheers, Jakob. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Jakob Bornecrantz

12:37 a.m.

On Fri, Nov 1, 2013 at 1:25 AM, Rob Clark robdclark@gmail.com wrote:

...

On Thu, Oct 31, 2013 at 8:17 PM, Jakob Bornecrantz wallbraker@gmail.com wrote:

...
On Fri, Nov 1, 2013 at 12:00 AM, Daniel Vetter daniel@ffwll.ch wrote:

...
On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:

...
On 10/31/2013 09:48 PM, Dave Airlie wrote:

...
On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
Well, I'd be happy to avoid mmap, but then what does optional mean in this context? That all generic user-space apps *must* implement a workaround if mmap isn't implemented?

It's unfortunate a bit like implicit synchronization mentioned in section 3) in Direct Userspace Access/mmap Support in the kernel dma-buf doc: It should be avoided, otherwise it might be relied upon by userspace and exporters not implementing it will suffer.

In reality, people will start using mmap() and won't care to implement workarounds if it's not supported, and drivers like vmwgfx and non-coherent architectures will suffer.

I haven't looked closely at how DRI3 or Wayland/weston use or will use dma-buf, but if they rely on mmap, we're sort of lost. MIR uses the following scheme:

DRI3 and wayland won't use dma-buf mmap directly,

using dma-buf mmap directly is wrong for anything that shares objects with itself.

That sounds good to hear. Perhaps we should add that to the dma-buf docs.

Userspace mmap was essentially added as a concession to the android ion guys since they really, really wanted it. We've tried to tell them that it's a horrible idea (see all the fun with coherency and syncing), but they said that they have userspace for it already and so we let it be.

Imo if you're not running Android userspace there's no need for this at all.

But now it turns out that gstreamer is using it and our life is hell. We should have made it not work for _any_ driver if CONFIG_ANDRIOD wasn't set.

well, at the moment mmap is only implemented by a few of the arm drivers, from the looks of it. So I guess in the near term it is mostly going to be of interest to the SoC crowd.

Ah okay, I thought this was desktop gstreamer.

...

Not sure about the long term, perhaps we should see about turning gallium pipe_transfer/map stuff into a kernel interface (ioctl's directly on the dmabuf fd, perhaps?)

Or they could just create a OpenGL context, I know it sounds heavy weight. But somebody will eventually want to synchronize this with a different client API in a non-blocking way. And OpenGL or some other Khronos API already have that integration, no real need to reinvent the wheel. Then again I think both GBM and DRI have hooks for mapping and unmapping buffers, so those might suffice.

Cheers, Jakob.

Rob Clark

12:57 a.m.

On Thu, Oct 31, 2013 at 8:37 PM, Jakob Bornecrantz wallbraker@gmail.com wrote:

...

On Fri, Nov 1, 2013 at 1:25 AM, Rob Clark robdclark@gmail.com wrote:

...
On Thu, Oct 31, 2013 at 8:17 PM, Jakob Bornecrantz wallbraker@gmail.com wrote:

...
On Fri, Nov 1, 2013 at 12:00 AM, Daniel Vetter daniel@ffwll.ch wrote:

...
On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:

...
On 10/31/2013 09:48 PM, Dave Airlie wrote:

...
On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom thellstrom@vmware.com wrote: >Well, I'd be happy to avoid mmap, but then what does optional mean in this >context? >That all generic user-space apps *must* implement a workaround if mmap isn't >implemented? > >It's unfortunate a bit like implicit synchronization mentioned in section 3) >in Direct Userspace Access/mmap Support >in the kernel dma-buf doc: It should be avoided, otherwise it might be >relied upon by userspace and exporters >not implementing it will suffer. > >In reality, people will start using mmap() and won't care to implement >workarounds if it's not supported, and drivers like >vmwgfx and non-coherent architectures will suffer. > >I haven't looked closely at how DRI3 or Wayland/weston use or will use >dma-buf, but if they rely on mmap, we're sort >of lost. MIR uses the following scheme: DRI3 and wayland won't use dma-buf mmap directly,

using dma-buf mmap directly is wrong for anything that shares objects with itself.

That sounds good to hear. Perhaps we should add that to the dma-buf docs.

Userspace mmap was essentially added as a concession to the android ion guys since they really, really wanted it. We've tried to tell them that it's a horrible idea (see all the fun with coherency and syncing), but they said that they have userspace for it already and so we let it be.

Imo if you're not running Android userspace there's no need for this at all.

But now it turns out that gstreamer is using it and our life is hell. We should have made it not work for _any_ driver if CONFIG_ANDRIOD wasn't set.

well, at the moment mmap is only implemented by a few of the arm drivers, from the looks of it. So I guess in the near term it is mostly going to be of interest to the SoC crowd.

Ah okay, I thought this was desktop gstreamer.

...
Not sure about the long term, perhaps we should see about turning gallium pipe_transfer/map stuff into a kernel interface (ioctl's directly on the dmabuf fd, perhaps?)

Or they could just create a OpenGL context, I know it sounds heavy weight. But somebody will eventually want to synchronize this with a different client API in a non-blocking way. And OpenGL or some other Khronos API already have that integration, no real need to reinvent the wheel. Then again I think both GBM and DRI have hooks for mapping and unmapping buffers, so those might suffice.

GL(ES) context might be a bit of a PITA, at least until open src mesa drivers are a bit more common in arm world. But not really against a userspace API.. wonder how hard that would be to stuff in GBM? At least that is a simple enough API that some sort of null-gbm "driver" could be made for folks who don't have GL(ES)..

BR, -R

...

Cheers, Jakob.

Lucas Stach

10:03 a.m.

Am Donnerstag, den 31.10.2013, 20:57 -0400 schrieb Rob Clark:

...

On Thu, Oct 31, 2013 at 8:37 PM, Jakob Bornecrantz wallbraker@gmail.com wrote:

...
On Fri, Nov 1, 2013 at 1:25 AM, Rob Clark robdclark@gmail.com wrote:

...
On Thu, Oct 31, 2013 at 8:17 PM, Jakob Bornecrantz wallbraker@gmail.com wrote:

...
On Fri, Nov 1, 2013 at 12:00 AM, Daniel Vetter daniel@ffwll.ch wrote:

...
On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:

...
On 10/31/2013 09:48 PM, Dave Airlie wrote: >On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom thellstrom@vmware.com wrote: >>Well, I'd be happy to avoid mmap, but then what does optional mean in this >>context? >>That all generic user-space apps *must* implement a workaround if mmap isn't >>implemented? >> >>It's unfortunate a bit like implicit synchronization mentioned in section 3) >>in Direct Userspace Access/mmap Support >>in the kernel dma-buf doc: It should be avoided, otherwise it might be >>relied upon by userspace and exporters >>not implementing it will suffer. >> >>In reality, people will start using mmap() and won't care to implement >>workarounds if it's not supported, and drivers like >>vmwgfx and non-coherent architectures will suffer. >> >>I haven't looked closely at how DRI3 or Wayland/weston use or will use >>dma-buf, but if they rely on mmap, we're sort >>of lost. MIR uses the following scheme: >DRI3 and wayland won't use dma-buf mmap directly, > >using dma-buf mmap directly is wrong for anything that shares objects >with itself.

That sounds good to hear. Perhaps we should add that to the dma-buf docs.

Userspace mmap was essentially added as a concession to the android ion guys since they really, really wanted it. We've tried to tell them that it's a horrible idea (see all the fun with coherency and syncing), but they said that they have userspace for it already and so we let it be.

Imo if you're not running Android userspace there's no need for this at all.

But now it turns out that gstreamer is using it and our life is hell. We should have made it not work for _any_ driver if CONFIG_ANDRIOD wasn't set.

well, at the moment mmap is only implemented by a few of the arm drivers, from the looks of it. So I guess in the near term it is mostly going to be of interest to the SoC crowd.

Ah okay, I thought this was desktop gstreamer.

GStreamer needs _some_ way of accessing the buffer contents with the CPU, this doesn't necessarily have to be the dumb mmap we have right now. DMA-BUF support in GSt is not really finished (I know we have a lot of patches internally to make it actually work with anything, which are trickling upstream right now), so this might be a good time to hammer out how it should be done, so we can adapt GSt before people start to rely on the dma-buf fallback mmap.

I would think the bracketed mmap idea that was thrown around by Rob some time ago when the mmap topic first surfaced is simple enough that we don't need additional userspace helpers and should be enough to make the coherency issue disappear.

...

...
...
Not sure about the long term, perhaps we should see about turning gallium pipe_transfer/map stuff into a kernel interface (ioctl's directly on the dmabuf fd, perhaps?)

Or they could just create a OpenGL context, I know it sounds heavy weight. But somebody will eventually want to synchronize this with a different client API in a non-blocking way. And OpenGL or some other Khronos API already have that integration, no real need to reinvent the wheel. Then again I think both GBM and DRI have hooks for mapping and unmapping buffers, so those might suffice.

GL(ES) context might be a bit of a PITA, at least until open src mesa drivers are a bit more common in arm world. But not really against a userspace API.. wonder how hard that would be to stuff in GBM? At least that is a simple enough API that some sort of null-gbm "driver" could be made for folks who don't have GL(ES)..

I already see the need for a dumb GBM driver for importing buffers into plain KMS drivers, so we can stick them on a plane or something. The only thing that crashes in here is GBM being a part of MESA. We certainly don't want GStreamer to depend on MESA just to access some buffers the right way.

Regards, Lucas

Daniel Vetter

10:17 a.m.

On Fri, Nov 1, 2013 at 11:03 AM, Lucas Stach l.stach@pengutronix.de wrote:

...

GStreamer needs _some_ way of accessing the buffer contents with the CPU, this doesn't necessarily have to be the dumb mmap we have right now. DMA-BUF support in GSt is not really finished (I know we have a lot of patches internally to make it actually work with anything, which are trickling upstream right now), so this might be a good time to hammer out how it should be done, so we can adapt GSt before people start to rely on the dma-buf fallback mmap.

I would think the bracketed mmap idea that was thrown around by Rob some time ago when the mmap topic first surfaced is simple enough that we don't need additional userspace helpers and should be enough to make the coherency issue disappear.

Yeah, if we add a BEGIN_MMAP/END_MMAP ioctl or so to the dma-buf mmap stuff and simply demand that userspace calls that we'd have something half-sane at least. Since we currently don't really have any real users we could still do this abi change I think.

One open issue is whether the BEGIN_MMAP ioctl should also synchronize with any outstanding gpu access. Imo it should, since that's pretty much how all the current drm drivers work, but maybe we should reserve space for a few flags so that this can be extended later on - Android seems to be pretty insistent on using explicit syncpoints everywhere instead of implicit synchronization. Or maybe we should have the flag already, but reject implicit syncing until Maarten's dma_fence stuff is in and enabled, dunno. Adding him to cc.

The clear thing otoh is that these two ioctls should only serve as barriers, not as locks as has been proposed in some other RFCs floating around (iirc the one from exynos was borked in this fashion). -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Rob Clark

1:22 p.m.

On Fri, Nov 1, 2013 at 6:17 AM, Daniel Vetter daniel.vetter@ffwll.ch wrote:

...

On Fri, Nov 1, 2013 at 11:03 AM, Lucas Stach l.stach@pengutronix.de wrote:

...
GStreamer needs _some_ way of accessing the buffer contents with the CPU, this doesn't necessarily have to be the dumb mmap we have right now. DMA-BUF support in GSt is not really finished (I know we have a lot of patches internally to make it actually work with anything, which are trickling upstream right now), so this might be a good time to hammer out how it should be done, so we can adapt GSt before people start to rely on the dma-buf fallback mmap.

I would think the bracketed mmap idea that was thrown around by Rob some time ago when the mmap topic first surfaced is simple enough that we don't need additional userspace helpers and should be enough to make the coherency issue disappear.

Yeah, if we add a BEGIN_MMAP/END_MMAP ioctl or so to the dma-buf mmap stuff and simply demand that userspace calls that we'd have something half-sane at least. Since we currently don't really have any real users we could still do this abi change I think.

One open issue is whether the BEGIN_MMAP ioctl should also synchronize with any outstanding gpu access. Imo it should, since that's pretty much how all the current drm drivers work, but maybe we should reserve space for a few flags so that this can be extended later on - Android seems to be pretty insistent on using explicit syncpoints everywhere instead of implicit synchronization. Or maybe we should have the flag already, but reject implicit syncing until Maarten's dma_fence stuff is in and enabled, dunno. Adding him to cc.

I suppose I wouldn't mind if we made BEGIN/END_MMAP as sort of clones of the CPU_PREP/FINI ioctls I have in msm.. where the CPU_PREP takes a flag/bitmask to indicate READ/WRITE/NOSYNC

That doesn't really help with partial buffer access, like you'd get if you were using the buffer for vertex upload. Although not really sure if we'd be seeing dmabuf's for that sort of use-case. We could make a more elaborate kernel interface that maps better to all the different cases in pipe_transfer. Or we could just say that that is overkill. I don't really have strong opinion about that (so someone who does, send an RFC). But I do agree with the "barriers only, not locks".

BR, -R

...

The clear thing otoh is that these two ioctls should only serve as barriers, not as locks as has been proposed in some other RFCs floating around (iirc the one from exynos was borked in this fashion).

-Daniel

Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Lucas Stach

1:32 p.m.

Am Freitag, den 01.11.2013, 09:22 -0400 schrieb Rob Clark:

...

On Fri, Nov 1, 2013 at 6:17 AM, Daniel Vetter daniel.vetter@ffwll.ch wrote:

...
On Fri, Nov 1, 2013 at 11:03 AM, Lucas Stach l.stach@pengutronix.de wrote:

...
GStreamer needs _some_ way of accessing the buffer contents with the CPU, this doesn't necessarily have to be the dumb mmap we have right now. DMA-BUF support in GSt is not really finished (I know we have a lot of patches internally to make it actually work with anything, which are trickling upstream right now), so this might be a good time to hammer out how it should be done, so we can adapt GSt before people start to rely on the dma-buf fallback mmap.

I would think the bracketed mmap idea that was thrown around by Rob some time ago when the mmap topic first surfaced is simple enough that we don't need additional userspace helpers and should be enough to make the coherency issue disappear.

Yeah, if we add a BEGIN_MMAP/END_MMAP ioctl or so to the dma-buf mmap stuff and simply demand that userspace calls that we'd have something half-sane at least. Since we currently don't really have any real users we could still do this abi change I think.

One open issue is whether the BEGIN_MMAP ioctl should also synchronize with any outstanding gpu access. Imo it should, since that's pretty much how all the current drm drivers work, but maybe we should reserve space for a few flags so that this can be extended later on - Android seems to be pretty insistent on using explicit syncpoints everywhere instead of implicit synchronization. Or maybe we should have the flag already, but reject implicit syncing until Maarten's dma_fence stuff is in and enabled, dunno. Adding him to cc.

I don't think we need that right now. We may prepare for implicit sync by having the flag ready in the API, but don't force the exporters to implement it right now.

...

I suppose I wouldn't mind if we made BEGIN/END_MMAP as sort of clones of the CPU_PREP/FINI ioctls I have in msm.. where the CPU_PREP takes a flag/bitmask to indicate READ/WRITE/NOSYNC

That doesn't really help with partial buffer access, like you'd get if you were using the buffer for vertex upload. Although not really sure if we'd be seeing dmabuf's for that sort of use-case. We could make a more elaborate kernel interface that maps better to all the different cases in pipe_transfer. Or we could just say that that is overkill. I don't really have strong opinion about that (so someone who does, send an RFC). But I do agree with the "barriers only, not locks".

I just checked back with our GStreamer guy and he thinks that for the use-cases he sees today it would make a lot of sense to have some way to indicate that userspace is only going to access a partial range of the buffer. I think this may be a crucial optimization on all kinds of devices where you have to ensure coherency manually.

Personally I would like to have the same kind of partial access indicators in DRMs cpu_prep/fini to optimize all the userspace suballocations we have today.

I think we are all on the same page with the barriers only thing.

Regards, Lucas

Thomas Hellstrom

4 Nov 4 Nov

7:53 a.m.

On 11/01/2013 02:32 PM, Lucas Stach wrote:

...

Am Freitag, den 01.11.2013, 09:22 -0400 schrieb Rob Clark:

...
On Fri, Nov 1, 2013 at 6:17 AM, Daniel Vetter daniel.vetter@ffwll.ch wrote:

...
On Fri, Nov 1, 2013 at 11:03 AM, Lucas Stach l.stach@pengutronix.de wrote:

...
GStreamer needs _some_ way of accessing the buffer contents with the CPU, this doesn't necessarily have to be the dumb mmap we have right now. DMA-BUF support in GSt is not really finished (I know we have a lot of patches internally to make it actually work with anything, which are trickling upstream right now), so this might be a good time to hammer out how it should be done, so we can adapt GSt before people start to rely on the dma-buf fallback mmap.

I would think the bracketed mmap idea that was thrown around by Rob some time ago when the mmap topic first surfaced is simple enough that we don't need additional userspace helpers and should be enough to make the coherency issue disappear.

Yeah, if we add a BEGIN_MMAP/END_MMAP ioctl or so to the dma-buf mmap stuff and simply demand that userspace calls that we'd have something half-sane at least. Since we currently don't really have any real users we could still do this abi change I think.

One open issue is whether the BEGIN_MMAP ioctl should also synchronize with any outstanding gpu access. Imo it should, since that's pretty much how all the current drm drivers work, but maybe we should reserve space for a few flags so that this can be extended later on - Android seems to be pretty insistent on using explicit syncpoints everywhere instead of implicit synchronization. Or maybe we should have the flag already, but reject implicit syncing until Maarten's dma_fence stuff is in and enabled, dunno. Adding him to cc.

I don't think we need that right now. We may prepare for implicit sync by having the flag ready in the API, but don't force the exporters to implement it right now.

...
I suppose I wouldn't mind if we made BEGIN/END_MMAP as sort of clones of the CPU_PREP/FINI ioctls I have in msm.. where the CPU_PREP takes a flag/bitmask to indicate READ/WRITE/NOSYNC

That doesn't really help with partial buffer access, like you'd get if you were using the buffer for vertex upload. Although not really sure if we'd be seeing dmabuf's for that sort of use-case. We could make a more elaborate kernel interface that maps better to all the different cases in pipe_transfer. Or we could just say that that is overkill. I don't really have strong opinion about that (so someone who does, send an RFC). But I do agree with the "barriers only, not locks".

I just checked back with our GStreamer guy and he thinks that for the use-cases he sees today it would make a lot of sense to have some way to indicate that userspace is only going to access a partial range of the buffer. I think this may be a crucial optimization on all kinds of devices where you have to ensure coherency manually.

Personally I would like to have the same kind of partial access indicators in DRMs cpu_prep/fini to optimize all the userspace suballocations we have today.

I think we are all on the same page with the barriers only thing.

Regards, Lucas

OK, so I've attached something that would work on top of the current mmap() interface, and that would basically be NOPs on coherent architectures. Comment welcome. In order for this to be meaningful, we would need to make it mandatory if mmap is implemented. One outstanding issue Is how this might play with cache line alignment. We might be able to require the implementation to handle this, making sure that synced regions align both to the device cache and the cpu cache, and that, for example, a sync_for_cpu(DMA_TO_DEVICE) may actually result in device caches in the pad area being flushed...

Region pwrites / preads having basically an identical interface (except the linear flag) but also taking a user pointer and stride, would also work just fine. Linear preads / pwrites could, I guess, be implemented the intended way using write / read syscalls on the file-descriptor.

Still, I must say I'd prefer if we could just avoid mmap and the major drivers didn't implement it. Second on my list is probably pwrites / preads, and third this proposed interface. My preference order is based on the way vmwgfx handles this stuff from the Xorg driver today, doing unsynchronized region preads / pwrites from a shadow user-space buffer, keeping track on dirtied regions on both sides. (If it works for the Xorg driver, it should work for anything else :)).

Thanks, /Thomas

Lucas Stach

10:22 a.m.

Hi Thomas,

I inlined the patch, to discuss more easily.

Am Montag, den 04.11.2013, 08:53 +0100 schrieb Thomas Hellstrom:

...

enum dma_buf_data_direction { DMA_BUF_BIDIRECTIONAL, DMA_BUF_TO_DEVICE, DMA_BUF_FROM_DEVICE };

I don't think the DMA API semantic makes much sense here. Doing a sync from device is at least a bit complicated, as we would have to track down the device in which domain the buffer is currently residing for write. This may be doable by bouncing the sync op from the exporter to the right attachment.

The other way around is even more messier: a DMA-BUF is by definition shared between multiple devices, so which of them should you sync to? All of them?

I think the API used for userspace access should only be concerned with making sure the CPU gets a coherent view to the resource at the point of sharing i.e. sysmem. All DMA-BUF users should make sure that they sync their writes to the point of sharing strictly before signaling that they are done with the buffer. Maybe this is just a naming problem, but IMHO the API should make it really clear that the sync only makes sure that the data reaches the point of sharing.

...

/**

struct dma_buf_sync_arg - Argument to

@start_x: Upper left X coordinate of area to be synced.

Unit is format-dependent

@start_y: Upper left Y coordinate of area to be synced.

Unit is format-dependent.

@width: Width of area to be synced. Grows to the right.

Unit is format-dependent.

@height: Height of area to be synced. Grows downwards.

Unit is format-dependent.

While I see why you would need this I don't really like the concept of a two dimensional resource pushed into the API. You already need the linear flag to make this work with 1D resources and I don't see what happens when you encounter 3D or array resources.

Don't take this as a strong rejection as I see why this is useful, but I don't like how the interface looks like right now.

...

@direction: Intended transfer direction of data.

@flags: Flags to tune the synchronizing behaviour.

*/

struct dma_buf_sync_region_arg { __u32 buf_identifier; __u32 start_x; __u32 start_y; __u32 width; __u32 height; enum dma_buf_data_direction direction; __u32 flags; __u32 pad; }; /**

Force sync any outstanding rendering of the exporter before

returning.

This is strictly a performance hint. The user may omit this flag to

speed up execution of the call, if it is known that previous

rendering

affecting (by read or write) the synchronized area has already

finished. */ #define DMA_BUF_SYNC_FLAG_SYNC (1 << 0);

/**

Treat @start_x as byte offset into buffer and @width as byte

synchronization length. The values of @start_y and @height are

ignored.

(Separate ioctl?)

*/ #define DMA_BUF_SYNC_FLAG_LINEAR (1 << 1);

/**

Allow the implementation to coalesce sync_for_device calls, until

either

a) An explicit flush

b) A sync for cpu call with DMA_BUF_BIDIRECTIONAL or

DMA_BUF_TO_DEVICE

Note: An implementation may choose to ignore this flag.

*/ #define DMA_BUF_SYNC_FLAG_COALESCE (1 << 2);

/**

IOCTLS-

Kernel waits should, if possible, be performed interruptible, and

the

ioctl may sett errno to EINTR if the ioctl needs to be restarted.

To be discussed: Any sync operation may not affect areas outside

the

region indicated. (Good for vmwgfx, but plays ill with cache line

alignment) */

/**

Sync for CPU.

*/ #define DMA_BUF_SYNC_REGION_FOR_CPU \ _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync_region_arg)

/**

Sync for device. This is the default state of a dma-buf.

*/ #define DMA_BUF_SYNC_REGION_FOR_DEVICE \ _IOW(DMA_BUF_BASE, 1, struct dma_buf_sync_region_arg)

/**

Flush any coalesced SYNC_REGION_FOR_DEVICE

*/ #define DMA_BUF_SYNC_REGION_FLUSH \ _IOW(DMA_BUF_BASE, 2, __u32)

What is the use-case for this? Why don't you think that usespace could be smart enough to coalesce the flush on its own?

Regards, Lucas

Thomas Hellstrom

10:48 a.m.

On 11/04/2013 11:22 AM, Lucas Stach wrote:

...

Hi Thomas,

I inlined the patch, to discuss more easily.

Am Montag, den 04.11.2013, 08:53 +0100 schrieb Thomas Hellstrom:

...
enum dma_buf_data_direction { DMA_BUF_BIDIRECTIONAL, DMA_BUF_TO_DEVICE, DMA_BUF_FROM_DEVICE };

I don't think the DMA API semantic makes much sense here. Doing a sync from device is at least a bit complicated, as we would have to track down the device in which domain the buffer is currently residing for write. This may be doable by bouncing the sync op from the exporter to the right attachment.

The other way around is even more messier: a DMA-BUF is by definition shared between multiple devices, so which of them should you sync to? All of them?

I think the API used for userspace access should only be concerned with making sure the CPU gets a coherent view to the resource at the point of sharing i.e. sysmem. All DMA-BUF users should make sure that they sync their writes to the point of sharing strictly before signaling that they are done with the buffer. Maybe this is just a naming problem, but IMHO the API should make it really clear that the sync only makes sure that the data reaches the point of sharing.

Good point. I think the options need to remain, but we could rename DEVICE to STORAGE, or whatever In our case,

sync_for_cpu(DMA_BUF_TO_STORAGE) would be a no-op, whereas, sync_for_cpu(DMA_BUF_BIDIRECTIONAL) would trigger a DMA read.

...

...
/**

struct dma_buf_sync_arg - Argument to

@start_x: Upper left X coordinate of area to be synced.

Unit is format-dependent

@start_y: Upper left Y coordinate of area to be synced.

Unit is format-dependent.

@width: Width of area to be synced. Grows to the right.

Unit is format-dependent.

@height: Height of area to be synced. Grows downwards.

Unit is format-dependent.

While I see why you would need this I don't really like the concept of a two dimensional resource pushed into the API. You already need the linear flag to make this work with 1D resources and I don't see what happens when you encounter 3D or array resources.

Don't take this as a strong rejection as I see why this is useful, but I don't like how the interface looks like right now.

I understand. I don't think 1D only syncing options would suffice performance-wise, as the use-case I'm most afraid of is people trying to share 2D contents between user-space processes.

What do you think about supplying descriptor objects that could either describe a linear area, 2D area or whatever? Whenever (if) a new descriptor type is added to the interface we could have a legacy implementation that implements it in terms of 1D and 2D syncs.

...

...

@direction: Intended transfer direction of data.

@flags: Flags to tune the synchronizing behaviour.

*/

struct dma_buf_sync_region_arg { __u32 buf_identifier; __u32 start_x; __u32 start_y; __u32 width; __u32 height; enum dma_buf_data_direction direction; __u32 flags; __u32 pad; }; /**

Force sync any outstanding rendering of the exporter before

returning.

This is strictly a performance hint. The user may omit this flag to

speed up execution of the call, if it is known that previous

rendering

affecting (by read or write) the synchronized area has already

finished. */ #define DMA_BUF_SYNC_FLAG_SYNC (1 << 0);

/**

Treat @start_x as byte offset into buffer and @width as byte

synchronization length. The values of @start_y and @height are

ignored.

(Separate ioctl?)

*/ #define DMA_BUF_SYNC_FLAG_LINEAR (1 << 1);

/**

Allow the implementation to coalesce sync_for_device calls, until

either

a) An explicit flush

b) A sync for cpu call with DMA_BUF_BIDIRECTIONAL or

DMA_BUF_TO_DEVICE

Note: An implementation may choose to ignore this flag.

*/ #define DMA_BUF_SYNC_FLAG_COALESCE (1 << 2);

/**

IOCTLS-

Kernel waits should, if possible, be performed interruptible, and

the

ioctl may sett errno to EINTR if the ioctl needs to be restarted.

To be discussed: Any sync operation may not affect areas outside

the

region indicated. (Good for vmwgfx, but plays ill with cache line

alignment) */

/**

Sync for CPU.

*/ #define DMA_BUF_SYNC_REGION_FOR_CPU \ _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync_region_arg)

/**

Sync for device. This is the default state of a dma-buf.

*/ #define DMA_BUF_SYNC_REGION_FOR_DEVICE \ _IOW(DMA_BUF_BASE, 1, struct dma_buf_sync_region_arg)

/**

Flush any coalesced SYNC_REGION_FOR_DEVICE

*/ #define DMA_BUF_SYNC_REGION_FLUSH \ _IOW(DMA_BUF_BASE, 2, __u32)

What is the use-case for this? Why don't you think that usespace could be smart enough to coalesce the flush on its own?

Right. A thinko. We could clearly leave that out.

/Thomas

Thomas Hellstrom

1 Nov 1 Nov

1:54 p.m.

On 11/01/2013 11:17 AM, Daniel Vetter wrote:

...

On Fri, Nov 1, 2013 at 11:03 AM, Lucas Stach l.stach@pengutronix.de wrote:

...
GStreamer needs _some_ way of accessing the buffer contents with the CPU, this doesn't necessarily have to be the dumb mmap we have right now. DMA-BUF support in GSt is not really finished (I know we have a lot of patches internally to make it actually work with anything, which are trickling upstream right now), so this might be a good time to hammer out how it should be done, so we can adapt GSt before people start to rely on the dma-buf fallback mmap.

I would think the bracketed mmap idea that was thrown around by Rob some time ago when the mmap topic first surfaced is simple enough that we don't need additional userspace helpers and should be enough to make the coherency issue disappear.

Yeah, if we add a BEGIN_MMAP/END_MMAP ioctl or so to the dma-buf mmap stuff and simply demand that userspace calls that we'd have something half-sane at least. Since we currently don't really have any real users we could still do this abi change I think.

One open issue is whether the BEGIN_MMAP ioctl should also synchronize with any outstanding gpu access. Imo it should, since that's pretty much how all the current drm drivers work, but maybe we should reserve space for a few flags so that this can be extended later on - Android seems to be pretty insistent on using explicit syncpoints everywhere instead of implicit synchronization. Or maybe we should have the flag already, but reject implicit syncing until Maarten's dma_fence stuff is in and enabled, dunno. Adding him to cc.

The clear thing otoh is that these two ioctls should only serve as barriers, not as locks as has been proposed in some other RFCs floating around (iirc the one from exynos was borked in this fashion). -Daniel

I think one of the worst use-cases we could imagine would be a client doing something like the old kde screen saver that drew hundreds of thousands of small squares and in addition would flush for each square. Here begin/mmap end/mmap would not suffice, and _mandatory_ syncing would also be a bad idea.

We'd need to have a region supplied. Something like gallium texture transfers would work reasonably well, but they allow direct mappings of the underlying surface / buffer, which means that a lazy client might allocate texture transfers covering the whole buffer if coherent drivers made that a reasonably fast operation.

access using something like texsubimage (or region-type pwrites / preads) would at least involve a memcpy / kernel copy and would have the client think twice before accessing the dma-buf in this way, but I understand that this might sound frustrating for writers of coherent drivers. (I would have opposed it :))

But I'd like to stress that a single BEGIN_MMAP / END_MMAP wouldn't help us much, and that we need a read/ write/bidirectional indication.

Thanks, /Thomas

Rob Clark

31 Oct 31 Oct

9:10 p.m.

On Thu, Oct 31, 2013 at 4:40 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...

On 10/31/2013 06:52 PM, Rob Clark wrote:

...
On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
Hi!

I'm just looking over what's needed to implement drm Prime / dma-buf exports

imports in the vmwgfx driver. It seems like most dma-bufs ops are quite

straightforward to implement except user-space mmap().

The reason being that vmwgfx dma-bufs will be using completely non-coherent memory, whenever there needs to be CPU accesses.

The accelerated contents resides in an opaque structure on the device into which we can DMA to and from, so for mmap to work we need to zap ptes and DMA to the device when doing something accelerated, and on the first page-fault DMA data back and wait for idle if the device did a write to the dma-buf.

Now this shouldn't really be a problem if dma-bufs were only used for cross-device sharing, but since people apparently want to use dma-buf file handles to share CPU data between processes it really becomes a serious problem.

Needless to say we'd want to limit the size of the DMAs, and have mmap users request regions for read, and mark regions dirty for write, something similar to gallium's texture transfer objects.

Any ideas?

well, I think vmwgfx is part of the reason we decided mmap would be optional for dmabuf. So perhaps it is an option to simply ignore mmap?

BR, -R

Well, I'd be happy to avoid mmap, but then what does optional mean in this context? That all generic user-space apps *must* implement a workaround if mmap isn't implemented?

well, mmap was mostly just added because it was needed by android for ION on dmabuf.

I think it is an option to just not use dmabuf mmap in a linux userspace. I mean, we could also define some ioctls to give us pwrite and other similar sort of functionality if it is needed.

BR, -R

...

It's unfortunate a bit like implicit synchronization mentioned in section 3) in Direct Userspace Access/mmap Support in the kernel dma-buf doc: It should be avoided, otherwise it might be relied upon by userspace and exporters not implementing it will suffer.

In reality, people will start using mmap() and won't care to implement workarounds if it's not supported, and drivers like vmwgfx and non-coherent architectures will suffer.

I haven't looked closely at how DRI3 or Wayland/weston use or will use dma-buf, but if they rely on mmap, we're sort of lost. MIR uses the following scheme:

Create a GBM buffer

Get a Prime handle, export to client

Client imports prime handle, casts into a GBM buffer which is typecast to

a dumb buffer 4) Client uses DRM dumb mmap(). :(.

Now, step 3) and step 4) are clearly API violations, but it works by coincidence on major drivers. Now this usage-pattern could easily (and IMHO should) be blocked in DRM by refusing to map dumb buffers that haven't been created as dumb buffers, but If dma-bufs are to be shared by user-space apps to transfer contents in this manner, we need a reliable replacement that works also for non-coherent situations.

Ideally I'd like to see tex[sub]image-like operations, but with dma-buf mmap in place, I'm afraid that's what will be used...

Regars, Thomas

...
...
/Thomas _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Thomas Hellstrom

9:38 p.m.

On 10/31/2013 10:10 PM, Rob Clark wrote:

...

On Thu, Oct 31, 2013 at 4:40 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
On 10/31/2013 06:52 PM, Rob Clark wrote:

...
On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
Hi!

I'm just looking over what's needed to implement drm Prime / dma-buf exports

imports in the vmwgfx driver. It seems like most dma-bufs ops are quite

straightforward to implement except user-space mmap().

The reason being that vmwgfx dma-bufs will be using completely non-coherent memory, whenever there needs to be CPU accesses.

The accelerated contents resides in an opaque structure on the device into which we can DMA to and from, so for mmap to work we need to zap ptes and DMA to the device when doing something accelerated, and on the first page-fault DMA data back and wait for idle if the device did a write to the dma-buf.

Now this shouldn't really be a problem if dma-bufs were only used for cross-device sharing, but since people apparently want to use dma-buf file handles to share CPU data between processes it really becomes a serious problem.

Needless to say we'd want to limit the size of the DMAs, and have mmap users request regions for read, and mark regions dirty for write, something similar to gallium's texture transfer objects.

Any ideas?

well, I think vmwgfx is part of the reason we decided mmap would be optional for dmabuf. So perhaps it is an option to simply ignore mmap?

BR, -R

Well, I'd be happy to avoid mmap, but then what does optional mean in this context? That all generic user-space apps *must* implement a workaround if mmap isn't implemented?

well, mmap was mostly just added because it was needed by android for ION on dmabuf.

I think it is an option to just not use dmabuf mmap in a linux userspace. I mean, we could also define some ioctls to give us pwrite and other similar sort of functionality if it is needed.

I think that if direct user-space cpu-access to dma-buf is ever needed by linux, something like that is a better option. Best IMHO would be to avoid user-space cpu-access completely.

Regards, /Thomas

...

BR, -R

Benjamin Gaignard

10:43 p.m.

user-space cpu access to dma-buf is needed for example in Gstreamer pipeline when you have mix of harware element (a video decoder) and software element (like colorspace converter).

Gstreamer already support dma-buf in a specific gstallocator: http://cgit.freedesktop.org/gstreamer/gst-plugins-base/tree/gst-libs/gst/all...

Regards, Benjamin

2013/10/31 Thomas Hellstrom thellstrom@vmware.com:

...

On 10/31/2013 10:10 PM, Rob Clark wrote:

...
On Thu, Oct 31, 2013 at 4:40 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
On 10/31/2013 06:52 PM, Rob Clark wrote:

...
On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
Hi!

I'm just looking over what's needed to implement drm Prime / dma-buf exports

imports in the vmwgfx driver. It seems like most dma-bufs ops are

quite straightforward to implement except user-space mmap().

The reason being that vmwgfx dma-bufs will be using completely non-coherent memory, whenever there needs to be CPU accesses.

The accelerated contents resides in an opaque structure on the device into which we can DMA to and from, so for mmap to work we need to zap ptes and DMA to the device when doing something accelerated, and on the first page-fault DMA data back and wait for idle if the device did a write to the dma-buf.

Now this shouldn't really be a problem if dma-bufs were only used for cross-device sharing, but since people apparently want to use dma-buf file handles to share CPU data between processes it really becomes a serious problem.

Needless to say we'd want to limit the size of the DMAs, and have mmap users request regions for read, and mark regions dirty for write, something similar to gallium's texture transfer objects.

Any ideas?

well, I think vmwgfx is part of the reason we decided mmap would be optional for dmabuf. So perhaps it is an option to simply ignore mmap?

BR, -R

Well, I'd be happy to avoid mmap, but then what does optional mean in this context? That all generic user-space apps *must* implement a workaround if mmap isn't implemented?

well, mmap was mostly just added because it was needed by android for ION on dmabuf.

I think it is an option to just not use dmabuf mmap in a linux userspace. I mean, we could also define some ioctls to give us pwrite and other similar sort of functionality if it is needed.

I think that if direct user-space cpu-access to dma-buf is ever needed by linux, something like that is a better option. Best IMHO would be to avoid user-space cpu-access completely.

Regards, /Thomas

...
BR, -R

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- Benjamin Gaignard Graphic Working Group Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog

4602

days inactive

4606

days old

linaro-mm-sig@lists.linaro.org

20 comments

participants

tags (0)

participants (8)

Benjamin Gaignard
Daniel Vetter
Daniel Vetter
Dave Airlie
Jakob Bornecrantz
Lucas Stach
Rob Clark
Thomas Hellstrom