Memory Management Discussion

List overview All Threads
Download

newer

older

How to pass a fd between processes?

Device DMA programming and buffer...

Sree Kumar

18 Apr 2011 18 Apr '11

2:45 p.m.

Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

Thanks, Sree

Attachments:

attachment.html (text/html — 946 bytes)

Show replies by date

Hans Verkuil

18 Apr 18 Apr

2:57 p.m.

...

Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

This thread lists most (I hope) of the requirements that V4L has:

http://www.mail-archive.com/linux-media@vger.kernel.org/msg29857.html

Regards,

Hans

...

Thanks, Sree _______________________________________________ Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Jesse Barker

3:39 p.m.

I'll also direct folks to review the existing "understanding" of the use cases, problem spaces and requirements on this page on the Linaro wiki:

https://wiki.linaro.org/WorkingGroups/Middleware/Graphics/Projects/UnifiedMe...

Sorry for the long URL. I'll work on getting it linked from somewhere more directly accessible. The current content was distilled from media and graphics email threads as well as the BoF at ELC, and should reflect Sree's concerns as well as the V4L thread, and hopefully even existing DRM requirements (Android as well?). I will certainly be happy to edit by proxy as I'm not sure everyone can get an account on the Linaro wiki.

cheers, Jesse

On Mon, Apr 18, 2011 at 7:57 AM, Hans Verkuil hverkuil@xs4all.nl wrote:

...

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

This thread lists most (I hope) of the requirements that V4L has:

http://www.mail-archive.com/linux-media@vger.kernel.org/msg29857.html

Regards,

Hans

...
Thanks, Sree _______________________________________________ Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Clark, Rob

20 Apr 20 Apr

1:52 a.m.

On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumar sreeon@gmail.com wrote:

...

Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

...

From DRI perspective.. I guess the global buffer name is restricted to

a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...

Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

rmorell＠nvidia.com

2:06 a.m.

On Tue, Apr 19, 2011 at 06:52:53PM -0700, Clark, Rob wrote:

...

On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumar sreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

For what it's worth, revving DRI proto isn't really that much of a Bad Thing when both the X server and the clients already need changes, such as in this case. Unlike most X protocol which must interoperate between any server and client in the universe, with DRI protocol the server and client are always on the same system, and must be upgraded simultaneously anyway for a change like this.

In fact, it might be desirable to use new protocol so that a mismatched client won't try to use a new-style handle with old-style DRM or vice versa (the failure should be more obvious with new protocol).

- Robert

...

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Clark, Rob

11 May 11 May

5:04 p.m.

On Tue, Apr 19, 2011 at 9:06 PM, rmorell@nvidia.com wrote:

...

On Tue, Apr 19, 2011 at 06:52:53PM -0700, Clark, Rob wrote:

...
On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumar sreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

For what it's worth, revving DRI proto isn't really that much of a Bad Thing when both the X server and the clients already need changes, such as in this case. Unlike most X protocol which must interoperate between any server and client in the universe, with DRI protocol the server and client are always on the same system, and must be upgraded simultaneously anyway for a change like this.

In fact, it might be desirable to use new protocol so that a mismatched client won't try to use a new-style handle with old-style DRM or vice versa (the failure should be more obvious with new protocol).

btw, one idea suggested after today's session by Daniel V. to handle the DRI part would be to keep existing GEM flink mechanism (or whatever GPU driver mechanism that exists to generate a "name" shared between xorg and client), but yet use file descriptors for device<->device sharing (and optional further sharing outside of DRI protocol process<->process if needed)..

So basically the DRI part stays as it is today, and on client side the GPU driver should provide a mechanism to convert from it's driver specific buffer name to a shared buffer fd. Probably this lives under eglImage with some extension to get sharable fd from an eglImage (which could then be passed to v4l2 camera, dsp decoder, etc).

Anyways, it seemed like a good idea and wanted to capture that in an email before I forgot.

BR, -R

...

Robert

...
Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Zach Pfeffer

20 Apr 20 Apr

4:12 a.m.

Speaking of Graphics and Multimedia - we may want to discuss IOMMU APIs and distributed memory management. These devices are becoming more prevalent and having a standard way of working with them would be useful.

I did a little of this work at Qualcomm and pushed some soundly rejected patches to the kernel, see "mm: iommu: An API to unify IOMMU, CPU and device memory management."

-Zach

On 19 April 2011 20:52, Clark, Rob rob@ti.com wrote:

...

On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumar sreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Kyungmin Park

4:19 a.m.

Please see the following URLs https://lkml.org/lkml/2011/4/19/172

We (Me, Michal, Marek) tried it based on yours. but some peoples doesn't like it.

Thank you, Kyungmin Park

On Wed, Apr 20, 2011 at 1:12 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:

...

Speaking of Graphics and Multimedia - we may want to discuss IOMMU APIs and distributed memory management. These devices are becoming more prevalent and having a standard way of working with them would be useful.

I did a little of this work at Qualcomm and pushed some soundly rejected patches to the kernel, see "mm: iommu: An API to unify IOMMU, CPU and device memory management."

-Zach

On 19 April 2011 20:52, Clark, Rob rob@ti.com wrote:

...
On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumar sreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Zach Pfeffer

4:30 a.m.

Yeah. I wanted to list VCMM it since it seemed to get a lot of feedback. It did everything wrong and seemed to generate a lot of "you should do it this way" traffic, which should help the unified memory manager discussion.

On 19 April 2011 23:19, Kyungmin Park kmpark@infradead.org wrote:

...

Please see the following URLs https://lkml.org/lkml/2011/4/19/172

We (Me, Michal, Marek) tried it based on yours. but some peoples doesn't like it.

Thank you, Kyungmin Park

On Wed, Apr 20, 2011 at 1:12 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:

...
Speaking of Graphics and Multimedia - we may want to discuss IOMMU APIs and distributed memory management. These devices are becoming more prevalent and having a standard way of working with them would be useful.

I did a little of this work at Qualcomm and pushed some soundly rejected patches to the kernel, see "mm: iommu: An API to unify IOMMU, CPU and device memory management."

-Zach

On 19 April 2011 20:52, Clark, Rob rob@ti.com wrote:

...
On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumar sreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Kyungmin Park

4:38 a.m.

On Wed, Apr 20, 2011 at 1:30 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:

...

Yeah. I wanted to list VCMM it since it seemed to get a lot of feedback. It did everything wrong and seemed to generate a lot of "you should do it this way" traffic, which should help the unified memory manager discussion.

Now we're working on the suggested method, DMA APIs for IOMMUs. So can you discuss it together? http://www.spinics.net/lists/linux-media/msg31524.html

...

On 19 April 2011 23:19, Kyungmin Park kmpark@infradead.org wrote:

...
Please see the following URLs https://lkml.org/lkml/2011/4/19/172

We (Me, Michal, Marek) tried it based on yours. but some peoples doesn't like it.

Thank you, Kyungmin Park

On Wed, Apr 20, 2011 at 1:12 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:

...
Speaking of Graphics and Multimedia - we may want to discuss IOMMU APIs and distributed memory management. These devices are becoming more prevalent and having a standard way of working with them would be useful.

I did a little of this work at Qualcomm and pushed some soundly rejected patches to the kernel, see "mm: iommu: An API to unify IOMMU, CPU and device memory management."

-Zach

On 19 April 2011 20:52, Clark, Rob rob@ti.com wrote:

...
On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumar sreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Zach Pfeffer

4:42 a.m.

I think we should. We need a better allocator, in-part, because of all of these SoC engines.

On 19 April 2011 23:38, Kyungmin Park kmpark@infradead.org wrote:

...

On Wed, Apr 20, 2011 at 1:30 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:

...
Yeah. I wanted to list VCMM it since it seemed to get a lot of feedback. It did everything wrong and seemed to generate a lot of "you should do it this way" traffic, which should help the unified memory manager discussion.

Now we're working on the suggested method, DMA APIs for IOMMUs. So can you discuss it together? http://www.spinics.net/lists/linux-media/msg31524.html

...
On 19 April 2011 23:19, Kyungmin Park kmpark@infradead.org wrote:

...
Please see the following URLs https://lkml.org/lkml/2011/4/19/172

We (Me, Michal, Marek) tried it based on yours. but some peoples doesn't like it.

Thank you, Kyungmin Park

On Wed, Apr 20, 2011 at 1:12 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:

...
Speaking of Graphics and Multimedia - we may want to discuss IOMMU APIs and distributed memory management. These devices are becoming more prevalent and having a standard way of working with them would be useful.

I did a little of this work at Qualcomm and pushed some soundly rejected patches to the kernel, see "mm: iommu: An API to unify IOMMU, CPU and device memory management."

-Zach

On 19 April 2011 20:52, Clark, Rob rob@ti.com wrote:

...
On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumar sreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Jordan Crouse

3:13 p.m.

On 04/19/2011 10:12 PM, Zach Pfeffer wrote:

...

Speaking of Graphics and Multimedia - we may want to discuss IOMMU APIs and distributed memory management. These devices are becoming more prevalent and having a standard way of working with them would be useful.

I did a little of this work at Qualcomm and pushed some soundly rejected patches to the kernel, see "mm: iommu: An API to unify IOMMU, CPU and device memory management."

-Zach

As we talked during the meeting at ELC, IOMMU is important, but I think that there is broad agreement to consolidate (eventually) on the standard APIs. I still think that the memory allocation problem is the more interesting one because it affects everybody equally, MMU or not. Not that I want to shut down debate or anything, I just don't want to distract us from the larger problem that we face.

Jordan

...

On 19 April 2011 20:52, Clark, Robrob@ti.com wrote:

...
On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumarsreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- Jordan Crouse Qualcomm Innovation Center Qualcomm Innovation Center is a member of Code Aurora Forum

Zach Pfeffer

9:24 p.m.

That's true.

I think there's some value in splitting the discussion up into 3 areas:

Allocation Mapping Buffer flows

There seems to be a general design disconnect between the way Linux deals with buffer mapping (one mapper at a time, buffers are mapped and unmapped as they flow through the system) and the way users actually want things to work (sharing a global buffer that one entity writes to and another reads without unmapping each time).

Perhaps there's a solution that will give users the allusion of shared mappings while ensuring correctness if those mappings are different (something on-demand perhaps).

-Zach

On 20 April 2011 10:13, Jordan Crouse jcrouse@codeaurora.org wrote:

...

On 04/19/2011 10:12 PM, Zach Pfeffer wrote:

...
Speaking of Graphics and Multimedia - we may want to discuss IOMMU APIs and distributed memory management. These devices are becoming more prevalent and having a standard way of working with them would be useful.

I did a little of this work at Qualcomm and pushed some soundly rejected patches to the kernel, see "mm: iommu: An API to unify IOMMU, CPU and device memory management."

-Zach

As we talked during the meeting at ELC, IOMMU is important, but I think that there is broad agreement to consolidate (eventually) on the standard APIs. I still think that the memory allocation problem is the more interesting one because it affects everybody equally, MMU or not. Not that I want to shut down debate or anything, I just don't want to distract us from the larger problem that we face.

Jordan

...
On 19 April 2011 20:52, Clark, Robrob@ti.com wrote:

...
On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumarsreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- Jordan Crouse Qualcomm Innovation Center Qualcomm Innovation Center is a member of Code Aurora Forum

Jesse Barker

9:30 p.m.

The update to the wiki based upon the ELC talk has those 3, but breaks up the mapping into separate in-kernel and user-space mapping issues. Oh, and in the discussion, I think we were referring to buffer flows as configuration, but it seems the same to me. At any rate, I think we've captured the ideas and some of the mechanisms that address them (or attempt to). Could use more detail, though.

cheers, Jesse

On Wed, Apr 20, 2011 at 2:24 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:

...

That's true.

I think there's some value in splitting the discussion up into 3 areas:

Allocation Mapping Buffer flows

There seems to be a general design disconnect between the way Linux deals with buffer mapping (one mapper at a time, buffers are mapped and unmapped as they flow through the system) and the way users actually want things to work (sharing a global buffer that one entity writes to and another reads without unmapping each time).

Perhaps there's a solution that will give users the allusion of shared mappings while ensuring correctness if those mappings are different (something on-demand perhaps).

-Zach

On 20 April 2011 10:13, Jordan Crouse jcrouse@codeaurora.org wrote:

...
On 04/19/2011 10:12 PM, Zach Pfeffer wrote:

...
Speaking of Graphics and Multimedia - we may want to discuss IOMMU APIs and distributed memory management. These devices are becoming more prevalent and having a standard way of working with them would be useful.

I did a little of this work at Qualcomm and pushed some soundly rejected patches to the kernel, see "mm: iommu: An API to unify IOMMU, CPU and device memory management."

-Zach

As we talked during the meeting at ELC, IOMMU is important, but I think that there is broad agreement to consolidate (eventually) on the standard APIs. I still think that the memory allocation problem is the more interesting one because it affects everybody equally, MMU or not. Not that I want to shut down debate or anything, I just don't want to distract us from the larger problem that we face.

Jordan

...
On 19 April 2011 20:52, Clark, Robrob@ti.com wrote:

...
On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumarsreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- Jordan Crouse Qualcomm Innovation Center Qualcomm Innovation Center is a member of Code Aurora Forum

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Hans Verkuil

21 Apr 21 Apr

8:04 a.m.

On Wednesday, April 20, 2011 23:24:14 Zach Pfeffer wrote:

...

That's true.

I think there's some value in splitting the discussion up into 3 areas:

Allocation Mapping Buffer flows

I agree with this split. They are for the most part independent problems, but solutions for all three are needed to make a zero-copy pipeline a reality.

With regards to V4L: there is currently no buffer sharing possible at all (except in the trivial case of buffers in virtual memory, and of course when using CMEM-type solutions). Even between a V4L capture and output device you cannot share buffers. The main reason is that the videobuf framework used in drivers was a horrid piece of code that prevented any sharing. In 2.6.39 the videobuf2 replacement framework was merged that will make such sharing much, much easier.

This functionality is not yet implemented though, partially because this is all still very new, and partially because we would like to wait and see if a general solution will be created.

It should be fairly easy to implement such a solution in the videobuf2 framework. Buffer 'handles' can be either 32 or 64 bit, so we have no restriction there. File descriptors would be fine as handle.

A requirement to enable buffer sharing between V4L and anything else would probably be that the V4L driver must use videobuf2. Since this is brand new, the initial set of compatible drivers will be very small, but that should improve over time. Anyway, that's our problem :-)

A separate problem that I would like to discuss is the interaction of GPU/FB and V4L display devices. There is overlap there and we need to figure out how to handle that. However, this is unrelated to the memory summit. It might be a good topic for a meeting on Tuesday and/or Wednesday morning, though.

Regards,

Hans

...

There seems to be a general design disconnect between the way Linux deals with buffer mapping (one mapper at a time, buffers are mapped and unmapped as they flow through the system) and the way users actually want things to work (sharing a global buffer that one entity writes to and another reads without unmapping each time).

Perhaps there's a solution that will give users the allusion of shared mappings while ensuring correctness if those mappings are different (something on-demand perhaps).

-Zach

On 20 April 2011 10:13, Jordan Crouse jcrouse@codeaurora.org wrote:

...
On 04/19/2011 10:12 PM, Zach Pfeffer wrote:

...
Speaking of Graphics and Multimedia - we may want to discuss IOMMU APIs and distributed memory management. These devices are becoming more prevalent and having a standard way of working with them would be useful.

I did a little of this work at Qualcomm and pushed some soundly rejected patches to the kernel, see "mm: iommu: An API to unify IOMMU, CPU and device memory management."

-Zach

As we talked during the meeting at ELC, IOMMU is important, but I think that there is broad agreement to consolidate (eventually) on the standard APIs. I still think that the memory allocation problem is the more interesting one because it affects everybody equally, MMU or not. Not that I want to shut down debate or anything, I just don't want to distract us from the larger problem that we face.

Jordan

...
On 19 April 2011 20:52, Clark, Robrob@ti.com wrote:

...
On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumarsreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- Jordan Crouse Qualcomm Innovation Center Qualcomm Innovation Center is a member of Code Aurora Forum

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Zach Pfeffer

4:08 p.m.

This discussion reminded me of a very cool use-case thats looming in the future:

video analytics

In video analytics you'd: capture analyze (probably from user space or in a DSP - either way some data is flowing up the stack) encode to a texture possibly send back through analyze

To make this fast, one buffer may transition between many owners - and having multiple mappings may make that easier to deal with.

I think people are suggesting a multiple map scenario, where only one mapper is live at a time. We could make this on-demand where the appropriate fixups for a given mapping are done when that mapping becomes live.

On 21 April 2011 03:04, Hans Verkuil hverkuil@xs4all.nl wrote:

...

On Wednesday, April 20, 2011 23:24:14 Zach Pfeffer wrote:

...
That's true.

I think there's some value in splitting the discussion up into 3 areas:

Allocation Mapping Buffer flows

I agree with this split. They are for the most part independent problems, but solutions for all three are needed to make a zero-copy pipeline a reality.

With regards to V4L: there is currently no buffer sharing possible at all (except in the trivial case of buffers in virtual memory, and of course when using CMEM-type solutions). Even between a V4L capture and output device you cannot share buffers. The main reason is that the videobuf framework used in drivers was a horrid piece of code that prevented any sharing. In 2.6.39 the videobuf2 replacement framework was merged that will make such sharing much, much easier.

This functionality is not yet implemented though, partially because this is all still very new, and partially because we would like to wait and see if a general solution will be created.

It should be fairly easy to implement such a solution in the videobuf2 framework. Buffer 'handles' can be either 32 or 64 bit, so we have no restriction there. File descriptors would be fine as handle.

A requirement to enable buffer sharing between V4L and anything else would probably be that the V4L driver must use videobuf2. Since this is brand new, the initial set of compatible drivers will be very small, but that should improve over time. Anyway, that's our problem :-)

A separate problem that I would like to discuss is the interaction of GPU/FB and V4L display devices. There is overlap there and we need to figure out how to handle that. However, this is unrelated to the memory summit. It might be a good topic for a meeting on Tuesday and/or Wednesday morning, though.

Regards,

Hans

...
There seems to be a general design disconnect between the way Linux deals with buffer mapping (one mapper at a time, buffers are mapped and unmapped as they flow through the system) and the way users actually want things to work (sharing a global buffer that one entity writes to and another reads without unmapping each time).

Perhaps there's a solution that will give users the allusion of shared mappings while ensuring correctness if those mappings are different (something on-demand perhaps).

-Zach

On 20 April 2011 10:13, Jordan Crouse jcrouse@codeaurora.org wrote:

...
On 04/19/2011 10:12 PM, Zach Pfeffer wrote:

...
Speaking of Graphics and Multimedia - we may want to discuss IOMMU APIs and distributed memory management. These devices are becoming more prevalent and having a standard way of working with them would be useful.

I did a little of this work at Qualcomm and pushed some soundly rejected patches to the kernel, see "mm: iommu: An API to unify IOMMU, CPU and device memory management."

-Zach

As we talked during the meeting at ELC, IOMMU is important, but I think that there is broad agreement to consolidate (eventually) on the standard APIs. I still think that the memory allocation problem is the more interesting one because it affects everybody equally, MMU or not. Not that I want to shut down debate or anything, I just don't want to distract us from the larger problem that we face.

Jordan

...
On 19 April 2011 20:52, Clark, Robrob@ti.com wrote:

...
On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumarsreeon@gmail.com wrote:

...
Thanks Jesse for initiating the mailing list.

We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases.

1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features.

Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing DRI protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

...
Thanks, Sree

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- Jordan Crouse Qualcomm Innovation Center Qualcomm Innovation Center is a member of Code Aurora Forum

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Rebecca Schultz Zavin

6:36 p.m.

On Thu, Apr 21, 2011 at 9:08 AM, Zach Pfeffer zach.pfeffer@linaro.orgwrote:

...

This discussion reminded me of a very cool use-case thats looming in the future:

video analytics

In video analytics you'd: capture analyze (probably from user space or in a DSP - either way some data is flowing up the stack) encode to a texture possibly send back through analyze

To make this fast, one buffer may transition between many owners - and having multiple mappings may make that easier to deal with.

I think people are suggesting a multiple map scenario, where only one mapper is live at a time. We could make this on-demand where the appropriate fixups for a given mapping are done when that mapping becomes live.

This is already the common case for something like video's being sent to the gpu as textures. Typically the system maps a collection of 4 or so video output buffers to the decoder and the gpu and then they handle synchronization. Mapping them on demand is too expensive. I think the synchronization problem is a separate but related issue that could be tackled once we have this whole buffer management solution ironed out :)

Rebecca

...

On 21 April 2011 03:04, Hans Verkuil hverkuil@xs4all.nl wrote:

...
On Wednesday, April 20, 2011 23:24:14 Zach Pfeffer wrote:

...
That's true.

I think there's some value in splitting the discussion up into 3 areas:

Allocation Mapping Buffer flows

I agree with this split. They are for the most part independent problems,

but

...
solutions for all three are needed to make a zero-copy pipeline a

reality.

...
With regards to V4L: there is currently no buffer sharing possible at all (except in the trivial case of buffers in virtual memory, and of course

when

...
using CMEM-type solutions). Even between a V4L capture and output device

you

...
cannot share buffers. The main reason is that the videobuf framework used

in

...
drivers was a horrid piece of code that prevented any sharing. In 2.6.39

the

...
videobuf2 replacement framework was merged that will make such sharing

much,

...
much easier.

This functionality is not yet implemented though, partially because this

is all

...
still very new, and partially because we would like to wait and see if a general solution will be created.

It should be fairly easy to implement such a solution in the videobuf2

framework.

...
Buffer 'handles' can be either 32 or 64 bit, so we have no restriction

there.

...
File descriptors would be fine as handle.

A requirement to enable buffer sharing between V4L and anything else

would

...
probably be that the V4L driver must use videobuf2. Since this is brand

new,

...
the initial set of compatible drivers will be very small, but that should improve over time. Anyway, that's our problem :-)

A separate problem that I would like to discuss is the interaction of

GPU/FB

...
and V4L display devices. There is overlap there and we need to figure out

how

...
to handle that. However, this is unrelated to the memory summit. It might

be

...
a good topic for a meeting on Tuesday and/or Wednesday morning, though.

Regards,
   Hans
...
There seems to be a general design disconnect between the way Linux deals with buffer mapping (one mapper at a time, buffers are mapped and unmapped as they flow through the system) and the way users actually want things to work (sharing a global buffer that one entity writes to and another reads without unmapping each time).

Perhaps there's a solution that will give users the allusion of shared mappings while ensuring correctness if those mappings are different (something on-demand perhaps).

-Zach

On 20 April 2011 10:13, Jordan Crouse jcrouse@codeaurora.org wrote:

...
On 04/19/2011 10:12 PM, Zach Pfeffer wrote:

...
Speaking of Graphics and Multimedia - we may want to discuss IOMMU APIs and distributed memory management. These devices are becoming more prevalent and having a standard way of working with them would
be

...
...
...
...
useful.

I did a little of this work at Qualcomm and pushed some soundly rejected patches to the kernel, see "mm: iommu: An API to unify

IOMMU,

...
...
...
...
CPU and device memory management."

-Zach

As we talked during the meeting at ELC, IOMMU is important, but I

think that

...
...
...
there is broad agreement to consolidate (eventually) on the standard APIs.

I

...
...
...
still think that the memory allocation problem is the more interesting one because

it

...
...
...
affects everybody equally, MMU or not. Not that I want to shut down debate or anything, I just don't want to distract us from the larger problem that we face.

Jordan

...
On 19 April 2011 20:52, Clark, Robrob@ti.com wrote:

...
On Mon, Apr 18, 2011 at 9:45 AM, Sree Kumarsreeon@gmail.com

wrote:

...
...
...
...
...
> > Thanks Jesse for initiating the mailing list. > > We need to address the requirements of Graphics and Multimedia > Accelerators > (IPs). > What we really need is a permanent solution (at upstream) which > accommodates > the following requirements and conforms to Graphics and Multimedia

use

...
...
...
...
...
> cases. > > 1.Mechanism to map/unmap the memory. Some of the IPs’ have the

ability

...
...
...
...
...
> to > address virtual memory and some can address only physically

contiguous

...
...
...
...
...
> address space. We need to address both these cases. > 2.Mechanism to allocate and release memory. > 3.Method to share the memory (ZERO copy is a MUST for better > performance) > between different device drivers (example output of camera to

multimedia

...
...
...
...
...
> encoder). > 4.Method to share the memory with different processes in userspace.

The

...
...
...
...
...
> sharing mechanism should include built-in security features. > > Are there any special requirements from V4L or DRM perspectives?

From DRI perspective.. I guess the global buffer name is restricted

to

...
...
...
...
...
a 4 byte integer, unless you change the DRI proto..

Authentication hooks for the driver (on x11 driver side) are for a single authentication covering all buffers shared between client and server, and is done by 4 byte token exchange between client and server. I've not had time yet to look more closely at the authentication aspect of ION.

Those are just things off the top of my head, hopefully someone else from X11 world chimes in with whatever else I missed. But I guess most important thing is whether or not it can fit within existing

DRI

...
...
...
...
...
protocol. If it does, then the drivers on client and server side could use whatever..

BR, -R

> Thanks, > Sree > > _______________________________________________ > Linaro-mm-sig mailing list > Linaro-mm-sig@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-mm-sig > >

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- Jordan Crouse Qualcomm Innovation Center Qualcomm Innovation Center is a member of Code Aurora Forum

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Arnd Bergmann

20 Apr 20 Apr

6:25 a.m.

On Wednesday 20 April 2011 03:52:53 Clark, Rob wrote:

...

From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

I still like the idea of using file handles to pass the buffers between kernel subsystems. Maybe there could be an ioctl to encapsulate a buffer from DRI in a file so we can give it to another subsystem, and/or an ioctl to register a buffer from a file handle with DRI.

Arnd

Daniel Vetter

6:55 a.m.

On Wed, Apr 20, 2011 at 8:25 AM, Arnd Bergmann arnd@arndb.de wrote:

...

On Wednesday 20 April 2011 03:52:53 Clark, Rob wrote:

...
From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

I still like the idea of using file handles to pass the buffers between kernel subsystems. Maybe there could be an ioctl to encapsulate a buffer from DRI in a file so we can give it to another subsystem, and/or an ioctl to register a buffer from a file handle with DRI.

That's been the original design of gem, i.e. using fd handles (and perhaps even passing them around in unix domain sockets). There's one small problem with that approach: You're quickly running out of fds with the linux default limit of 1024. Hence the roll-your-own approach.

Aside: I'll be participating as a gem drm/i915 hacker. I'll send a short overview of how gem/kms tackles these problems after easter because our approach is rather different from what the arm community seems to want (as far as I can tell). -Daniel

-- Daniel Vetter daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch

Tom Cooksey

12:23 p.m.

...

-----Original Message----- From: linaro-mm-sig-bounces@lists.linaro.org [mailto:linaro-mm-sig- bounces@lists.linaro.org] On Behalf Of Daniel Vetter Sent: 20 April 2011 07:56 To: Arnd Bergmann Cc: linaro-mm-sig@lists.linaro.org Subject: Re: [Linaro-mm-sig] Memory Management Discussion

On Wed, Apr 20, 2011 at 8:25 AM, Arnd Bergmann arnd@arndb.de wrote:

...
On Wednesday 20 April 2011 03:52:53 Clark, Rob wrote:

...
From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

I still like the idea of using file handles to pass the buffers between kernel subsystems. Maybe there could be an ioctl to encapsulate a buffer from DRI in a file so we can give it to another subsystem, and/or an ioctl to register a buffer from a file handle with DRI.

That's been the original design of gem, i.e. using fd handles (and perhaps even passing them around in unix domain sockets). There's one small problem with that approach: You're quickly running out of fds with the linux default limit of 1024. Hence the roll-your-own approach.

There's a big difference between GEM and what we're trying to do. GEM is designed to manage _all_ buffers used by the graphics hardware. What I believe we're trying to do is only provide a manager which allows buffers to be shared between devices. Of all the buffers and textures a GPU needs to access, only a tiny fraction of them need to be shared between devices and userspace processes. How large that fraction is I don't know, it might still be approaching the 1024 limit, but I doubt it...

So, the buffers we're interested in sharing between different processes and devices are:

* Decoded video buffers (from both cameras & video decoders) * Window back-buffers * System-wide theme textures and font glyph caches

... Anyone know of other candidates?

I guess the bottleneck will probably be the window compositor, as it will need to have a reference to all window back buffers in the system. So, do any of the DRI folks have any idea how many windows (with back buffers) a typical desktop session has? Do minimised windows have back buffers allocated in X11?

Even if there were as many as 100 top-level windows (which seems excessive), each triple-buffered, that's still pretty far from the 1024 limit?

Cheers,

Tom

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

rmorell＠nvidia.com

3:18 p.m.

On Wed, Apr 20, 2011 at 05:23:18AM -0700, Tom Cooksey wrote:

...

I guess the bottleneck will probably be the window compositor, as it will need to have a reference to all window back buffers in the system. So, do any of the DRI folks have any idea how many windows (with back buffers) a typical desktop session has? Do minimised windows have back buffers allocated in X11?

In a traditional X11 composited environment, all top-level window front buffers are redirected offscreen to their own memory (not only back buffers).

...

Even if there were as many as 100 top-level windows (which seems excessive), each triple-buffered, that's still pretty far from the 1024 limit?

If the process doing the compositing is the X server itself (rather than a direct-rendering compositing window manger), there is already file handle pressure, due to each client connection consuming a socket, plus all of the device nodes needed to open for input and output. If each client conservatively creates only one window, then we hit the limit at less than 512 clients. (Throw in a few clients like the gimp that create a whole bunch of top-level windows and you use them up more quickly.)

Admittedly, the default compile-time MAXCLIENTS in the server right now is 256, but, e.g., Red Hat has patched their servers since 2007 to bump that to 512 [1], so presumably somebody is hitting the limit already and uses more than 256.

I think somebody mentioned that the X server runs as root so it can just change its ulimit, but I don't think that's a good argument since we're actually trying to get away from X needing to be suid root.

[1] https://rhn.redhat.com/errata/RHBA-2007-0756.html

- Robert

Jordan Crouse

3:26 p.m.

On 04/20/2011 06:23 AM, Tom Cooksey wrote:

...

...
-----Original Message----- From: linaro-mm-sig-bounces@lists.linaro.org [mailto:linaro-mm-sig- bounces@lists.linaro.org] On Behalf Of Daniel Vetter Sent: 20 April 2011 07:56 To: Arnd Bergmann Cc: linaro-mm-sig@lists.linaro.org Subject: Re: [Linaro-mm-sig] Memory Management Discussion

On Wed, Apr 20, 2011 at 8:25 AM, Arnd Bergmannarnd@arndb.de wrote:

...
On Wednesday 20 April 2011 03:52:53 Clark, Rob wrote:

...
From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

I still like the idea of using file handles to pass the buffers between kernel subsystems. Maybe there could be an ioctl to encapsulate a buffer from DRI in a file so we can give it to another subsystem, and/or an ioctl to register a buffer from a file handle with DRI.

That's been the original design of gem, i.e. using fd handles (and perhaps even passing them around in unix domain sockets). There's one small problem with that approach: You're quickly running out of fds with the linux default limit of 1024. Hence the roll-your-own approach.

There's a big difference between GEM and what we're trying to do. GEM is designed to manage _all_ buffers used by the graphics hardware. What I believe we're trying to do is only provide a manager which allows buffers to be shared between devices. Of all the buffers and textures a GPU needs to access, only a tiny fraction of them need to be shared between devices and userspace processes. How large that fraction is I don't know, it might still be approaching the 1024 limit, but I doubt it...

I don't think that is an accurate characterization.

Sharing *is* important, but at least this vendor is looking for a way to allocate all multimedia related buffers, shared or not. Even if you never shared a single buffer between processes everything else we've put on the table re memory management is still a concern: cache policy, contiguous vs paged, IOMMU, etc, etc. I'm in the market for one memory manager to rule them all.

Jordan

Marcus Lorentzon

3:29 p.m.

On 04/20/2011 02:23 PM, Tom Cooksey wrote:

...

...
On Wed, Apr 20, 2011 at 8:25 AM, Arnd Bergmannarnd@arndb.de wrote:

...
On Wednesday 20 April 2011 03:52:53 Clark, Rob wrote:

...
From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

I still like the idea of using file handles to pass the buffers between kernel subsystems. Maybe there could be an ioctl to encapsulate a buffer from DRI in a file so we can give it to another subsystem, and/or an ioctl to register a buffer from a file handle with DRI.

That's been the original design of gem, i.e. using fd handles (and perhaps even passing them around in unix domain sockets). There's one small problem with that approach: You're quickly running out of fds with the linux default limit of 1024. Hence the roll-your-own approach.

There's a big difference between GEM and what we're trying to do. GEM is designed to manage _all_ buffers used by the graphics hardware. What I believe we're trying to do is only provide a manager which allows buffers to be shared between devices. Of all the buffers and textures a GPU needs to access, only a tiny fraction of them need to be shared between devices and userspace processes. How large that fraction is I don't know, it might still be approaching the 1024 limit, but I doubt it...

Why is it that everyone want it to be fds? It just put requirements on dup-ing the fds on IPC instead of sending a simple int. What is wrong with the GEM FLINK names? In STE HWMEM we use the same concepts as GEM with local device ids for buffers and exported global names (since we also planned to merge HWMEM with GEM when time permits). But kernel ioctl API is just one piece. Then we have a kernel internal API for other devices to use, like V4L2 drivers. These can then just use the global GEM FLINK name as input in a "register_gem_buffer_ioctl". Then the driver resolve and check rights for this name/buffer on register/import. And adding some in kernel API to GEM for resolving buffers from global names shouldn't need any big breaking changes or new revs of ioctl API.

But then we have platforms like Android that use fds as part of their buffer security model (binder-dup(fd), like pipes). To support this, drivers need to add an extra ioctl for exporting a local buffer handle to a fd (as we allow in HWMEM). But maybe Android will change their model once we have this Unified manager in Linaro/ARM (which could be an extended GEM). Of course we also need some way to disable the "autenticate and get access to all flinked buffers" feature (unless I misunderstood that). But since Android abstract this in a HAL layer, each driver could have their own ioctls for these fd exports (but I prefer a common API). In HWMEM we also added a more granular security model, where the buffer allocator can set global &| per process rights in form of Read/Write/Import on each buffer.

And for the 1024 limit, I don't think the backbuffers or shared textures need to be device-to-device capable. GEM should be fine, since this sharing is just inter process, but same device (as already proven in DRI & Wayland).

...

So, the buffers we're interested in sharing between different processes and devices are:

Decoded video buffers (from both cameras& video decoders)

Window back-buffers

System-wide theme textures and font glyph caches

... Anyone know of other candidates?

The only buffers I can think of are V4L2 <-> DRM, multimedia & graphics. But that also include GEM buffers targeted for video encode and misc V4L2 mem2mem devices.

/BR /Marcus

Daniel Vetter

3:35 p.m.

On Wed, Apr 20, 2011 at 5:29 PM, Marcus Lorentzon marcus.xm.lorentzon@stericsson.com wrote:

...

And for the 1024 limit, I don't think the backbuffers or shared textures need to be device-to-device capable. GEM should be fine, since this sharing is just inter process, but same device (as already proven in DRI & Wayland).

That's a rather important thing. GEM is _only_ for buffer objects of one device, no device-device sharing possible. Dave Airlie wrote a proof-of-concept for that to enable runtime-switchable graphics (where the add-on gpu blits to the scanout buffer of the integrated gpu when running):

http://airlied.livejournal.com/71734.html

-Daniel

-- Daniel Vetter daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch

Marcus Lorentzon

4:31 p.m.

On 04/20/2011 05:35 PM, Daniel Vetter wrote:

...

On Wed, Apr 20, 2011 at 5:29 PM, Marcus Lorentzon marcus.xm.lorentzon@stericsson.com wrote:

...
And for the 1024 limit, I don't think the backbuffers or shared textures need to be device-to-device capable. GEM should be fine, since this sharing is just inter process, but same device (as already proven in DRI& Wayland).

That's a rather important thing. GEM is _only_ for buffer objects of one device, no device-device sharing possible. Dave Airlie wrote a proof-of-concept for that to enable runtime-switchable graphics (where the add-on gpu blits to the scanout buffer of the integrated gpu when running):

http://airlied.livejournal.com/71734.html

Where you just emphasizing GEM is _currently_ only for single device, which I understand. Or are you also saying it will not be possible to extend GEM FLINK names to be airlied PRIME names without messing up current behavior of GEM? If so, maybe there's actually a point for a new DRM isolated memory manager like HWMEM (which just needed a few lines of code anyway to integrate with KMS).

My view has always been that I prefer an isolated memory manager that can be used by multiple DRM drivers and V4L2 drivers (like HWMEM and some other similar ARM MMs). But maybe GEM doesn't want to fit that target?

Am I right if I say GEM only have 3 standard ioctls, FLINK, OPEN, CLOSE? If so, a second implementation/variation of "GEM API" could use the same namespace for FLINK names across devices and still be called GEM? Just probing GEM ground to see what could be "fixed" in GEM and what is written in stone.

What seems clear from Linaro is that we need V4L2<->DRM buffer sharing. But I don't think anyone has requested DRM<->DRM sharing like PRIME. But a common Linaro embedded & x86 desktop solution probably needs that. The simplification here is that V4L2-x<->single DRM sharing could be easier to implement than full DRM-0<->DRM-1<->V4L2-x sharing. Since the DRM driver itself could expose the device memory sharing API in kernel, make GEM believe it's still single device.

Or maybe just a kernel shared name space where DRM/V4L2/CMA drivers register/resolve exported buffers is easier and more flexible (no DRM/GEM specifics in buffer sharing). Since allocation ioctl is anyway driver specific, adding a "register_global_buffer" ioctl could also be made to work like any normal GEM allocation, just give it a global name instead of size. Like an in kernel hub for sharing memory buffers between devices independent of allocators and users.

/BR /Marcus

Daniel Vetter

8:02 p.m.

On Wed, Apr 20, 2011 at 6:31 PM, Marcus Lorentzon marcus.xm.lorentzon@stericsson.com wrote:

...

Am I right if I say GEM only have 3 standard ioctls, FLINK, OPEN, CLOSE? If so, a second implementation/variation of "GEM API" could use the same namespace for FLINK names across devices and still be called GEM? Just probing GEM ground to see what could be "fixed" in GEM and what is written in stone.

It's even better, there's not even a generic open for gem objects ;-) Well, if you neglect the recently added dumb_create which is just to create a framebuffer for graphical boot splash screens and unusable for rendering.

Anyway, just a short reply, I'm a bit snowed under atm. I'll send out an overview of gem and how I think it could be integrated with the linaro wishlist after easter.

Cheers, Daniel

-- Daniel Vetter daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch

Arnd Bergmann

4:19 p.m.

On Wednesday 20 April 2011, Marcus Lorentzon wrote:

...

On 04/20/2011 02:23 PM, Tom Cooksey wrote:

...
There's a big difference between GEM and what we're trying to do. GEM is designed to manage all buffers used by the graphics hardware. What I believe we're trying to do is only provide a manager which allows buffers to be shared between devices. Of all the buffers and textures a GPU needs to access, only a tiny fraction of them need to be shared between devices and userspace processes. How large that fraction is I don't know, it might still be approaching the 1024 limit, but I doubt it...

Why is it that everyone want it to be fds? It just put requirements on dup-ing the fds on IPC instead of sending a simple int. What is wrong with the GEM FLINK names? In STE HWMEM we use the same concepts as GEM with local device ids for buffers and exported global names (since we also planned to merge HWMEM with GEM when time permits). But kernel ioctl API is just one piece. Then we have a kernel internal API for other devices to use, like V4L2 drivers. These can then just use the global GEM FLINK name as input in a "register_gem_buffer_ioctl". Then the driver resolve and check rights for this name/buffer on register/import. And adding some in kernel API to GEM for resolving buffers from global names shouldn't need any big breaking changes or new revs of ioctl API.

File descriptors have a number of very nice properties that we can use:

* Efficient lookup in system calls * Established ways to pass them around * Easy to mmap() -- if you want to map anything into user space, you need a file descriptor anyway * Easy to wait for events -- if you want to wait for something to happen, you always need an fd to poll() on (this might not be necessary here, but in most subsystems, it comes up sooner or later) * Allows arbitrary subsystems to create compatible handles. Like a socket can be provided by unrelated subsystems, this file handle could be created by any of GEM, v4l, tmpfs, ... and we can have operations that each of them provides as callbacks * lifetime management: a process dies and do_exit() automatically closes the file descriptors, no need to clean up afterwards. See posix SHM vs SysV SHM, or posix message queues vs. sysv message queues.

Not using file descriptors basically comes at the cost of reinventing all these when we need them. IMHO, using files should be the first thing to look at and we only look for something else when there are very strong reasons not to use them.

Arnd

Marcus Lorentzon

4:56 p.m.

On 04/20/2011 06:19 PM, Arnd Bergmann wrote:

...

File descriptors have a number of very nice properties that we can use:

Efficient lookup in system calls

If both use the idr I don't see how fds are faster than ints.

...

Established ways to pass them around

What could be easier than passing an int? I just don't like the "feature" of passing fds where they are dup-ed without driver knowing so. If you want to store process specific info associated with the fd you have to use a list since you don't know if the app sends the fd to another process. Probably not a big deal, but I don't like it ;)

...

Easy to mmap() -- if you want to map anything into user space, you need a file descriptor anyway

True, passing the local ids as offset is kind of messy.

...

Easy to wait for events -- if you want to wait for something to happen, you always need an fd to poll() on (this might not be necessary here, but in most subsystems, it comes up sooner or later)

My experience says that you normally wait for device related events. But if the possibility exist, maybe someone can make use of it. Video events are mostly one per buffer (like frame decoded/encoded).

...

Allows arbitrary subsystems to create compatible handles. Like a socket can be provided by unrelated subsystems, this file handle could be created by any of GEM, v4l, tmpfs, ... and we can have operations that each of them provides as callbacks

Global ids should also be "compatible", since they are from the same name space. But what do you mean by callbacks? In kernel API on buffer objects? If so, idr_find(ID)->buffer_struct->callback() should be similar to idr_find(FD)->file_struct->callback()

...

lifetime management: a process dies and do_exit() automatically closes the file descriptors, no need to clean up afterwards. See posix SHM vs SysV SHM, or posix message queues vs. sysv message queues.

Even if you use fds, I think these only replace the need for global ids, not local ids. Like GEM local id or V4L2 buffer index. And you really want to register the buffer only once, not every time it is used in an ioctl. So the driver/device accepting the buffer probably want to hold some meta data for its HW for each buffer. Like device MMU tables, device cache state etc. Having these local references will still force you to do clean up. For global ids/fds, I see no point in using them to hold references to the underlying allocation.

/BR /Marcus

Arnd Bergmann

21 Apr 21 Apr

12:15 p.m.

On Wednesday 20 April 2011, Marcus Lorentzon wrote:

...

On 04/20/2011 06:19 PM, Arnd Bergmann wrote:

...
File descriptors have a number of very nice properties that we can use:

Efficient lookup in system calls

If both use the idr I don't see how fds are faster than ints.

File descriptors don't use idr.

...

...

Established ways to pass them around

What could be easier than passing an int? I just don't like the "feature" of passing fds where they are dup-ed without driver knowing so. If you want to store process specific info associated with the fd you have to use a list since you don't know if the app sends the fd to another process. Probably not a big deal, but I don't like it ;)

The problem with passing an integer is that it doesn't have any concept of ownership or lifetime rules.

If you allow any process access to all integers, a malicious process might be able to guess it unless you use long cryptographic random numbers.

When you have a file descriptor, you can assume that the object is still alive until you close it. With an integer passed by some other application, that is less clear.

...

...

Allows arbitrary subsystems to create compatible handles. Like a socket can be provided by unrelated subsystems, this file handle could be created by any of GEM, v4l, tmpfs, ... and we can have operations that each of them provides as callbacks

Global ids should also be "compatible", since they are from the same name space. But what do you mean by callbacks? In kernel API on buffer objects? If so, idr_find(ID)->buffer_struct->callback() should be similar to idr_find(FD)->file_struct->callback()

In the idr example, the modules all need to link to the code that provides does the DEFINE_IDR(), in the file example, they only need to provide the same interfaces, so the modules need not link against any common code other than the VFS.

Not a major point though.

Arnd

Marcus Lorentzon

12:55 p.m.

On 04/21/2011 02:15 PM, Arnd Bergmann wrote:

...

On Wednesday 20 April 2011, Marcus Lorentzon wrote:

...
On 04/20/2011 06:19 PM, Arnd Bergmann wrote:

...
File descriptors have a number of very nice properties that we can use:
Efficient lookup in system calls
 
If both use the idr I don't see how fds are faster than ints.
File descriptors don't use idr.

My mistake, but still, ints could use an array too, and idr or lookup in general should not be an efficiency problem anyway.

...

...
...
Established ways to pass them around
 
What could be easier than passing an int? I just don't like the "feature" of passing fds where they are dup-ed without driver knowing so. If you want to store process specific info associated with the fd you have to use a list since you don't know if the app sends the fd to another process. Probably not a big deal, but I don't like it ;)
The problem with passing an integer is that it doesn't have any concept of ownership or lifetime rules.

The idea of passing global ids should not affect lifetime. These ints are still lifetime controlled by the "fd" device that created them. So I think they do have a defined lifetime, that of whatever device this int is registered in. And if the process is shut down, all buffers registered in the drm/v4l2 device fd will be released and even freed if it was the last ref. And if you mean lifetime while process is still alive, even fds has to be closed, and ints has to be closed/unregistered using ioctl. And this is not something that is used by applications either, these refs and allocs are handled by the user space drivers, like libEGL / libGL / X-drivers etc, so ioctl vs. close should not matter.

...

If you allow any process access to all integers, a malicious process might be able to guess it unless you use long cryptographic random numbers.

That's why you put a security model on top. Like GEM auth (which only have access all or nothing) or something like HWMEM where each buffer/id has read/write/import rights per process. This is also easier to trace/debug security since driver is notified when a buffer is transfered to another process. You never get this info from binder/pipe (dup-ed).

...

When you have a file descriptor, you can assume that the object is still alive until you close it. With an integer passed by some other application, that is less clear.

That's why I prefer the register/import global id step with device. It gives the driver a chance to store meta data and prepare to use this buffer. If this has to be done for every device ioctl call, you loose efficiency and all device APIs would have to be updated with cloned ops for fds. Register/import an fd/globalid and then use device local handles is much more efficient and don't require API changes, only additions.

...

...
...
Allows arbitrary subsystems to create compatible handles. Like a socket can be provided by unrelated subsystems, this file handle could be created by any of GEM, v4l, tmpfs, ... and we can have operations that each of them provides as callbacks
 
Global ids should also be "compatible", since they are from the same name space. But what do you mean by callbacks? In kernel API on buffer objects? If so, idr_find(ID)->buffer_struct->callback() should be similar to idr_find(FD)->file_struct->callback()
In the idr example, the modules all need to link to the code that provides does the DEFINE_IDR(), in the file example, they only need to provide the same interfaces, so the modules need not link against any common code other than the VFS.

Not a major point though.

True, but I would prefer an in kernel API or callback ifc to do the "mem ops" like resolve.

Most of this can be seen in code @ http://git.linaro.org/gitweb?p=bsp/st-ericsson/linux-2.6.35-ux500.git%3Ba=tr... . But the idea is very close to GEM, actually we just needed a few lines of code in our KMS proto to use HWMEM in KMS & Wayland without changes to user space drm protocols and libs. But note that I'm not promoting HWMEM for unified mem driver, only showing some concepts that could be employed in GEM for use in Android & Wayland with common driver (I consider X "dead" in embedded ;).

/BR /Marcus

Rebecca Schultz Zavin

6:43 p.m.

On Thu, Apr 21, 2011 at 5:55 AM, Marcus Lorentzon < marcus.xm.lorentzon@stericsson.com> wrote:

...

On 04/21/2011 02:15 PM, Arnd Bergmann wrote:

...
On Wednesday 20 April 2011, Marcus Lorentzon wrote:

...
On 04/20/2011 06:19 PM, Arnd Bergmann wrote:

...
File descriptors have a number of very nice properties that we can use:

Efficient lookup in system calls

If both use the idr I don't see how fds are faster than ints.

File descriptors don't use idr.

My mistake, but still, ints could use an array too, and idr or lookup in general should not be an efficiency problem anyway.

Established ways to pass them around

...
...
...
What could be easier than passing an int? I just don't like the "feature" of passing fds where they are dup-ed without driver knowing so. If you want to store process specific info associated with the fd you have to use a list since you don't know if the app sends the fd to another process. Probably not a big deal, but I don't like it ;)

The problem with passing an integer is that it doesn't have any concept of ownership or lifetime rules.

The idea of passing global ids should not affect lifetime. These ints are still lifetime controlled by the "fd" device that created them. So I think they do have a defined lifetime, that of whatever device this int is registered in. And if the process is shut down, all buffers registered in the drm/v4l2 device fd will be released and even freed if it was the last ref. And if you mean lifetime while process is still alive, even fds has to be closed, and ints has to be closed/unregistered using ioctl. And this is not something that is used by applications either, these refs and allocs are handled by the user space drivers, like libEGL / libGL / X-drivers etc, so ioctl vs. close should not matter.

The problem isn't managing their lifetime in the side that created the buffer, it's managing it while they are in flight. What happens if process 1 passes a buffer to process 2 and before process 2 takes a reference to it, process 1 crashes? Some central clearing house has to handle that. I'm guessing that's the X server in the X case. In my proposal that's handled by the extra reference being held by the passed fd itself (ie the kernel has a reference as long as the file struct exists in either processes file descriptor table).

...

If you allow any process access to all integers, a malicious process

...
might be able to guess it unless you use long cryptographic random numbers.

That's why you put a security model on top. Like GEM auth (which only have access all or nothing) or something like HWMEM where each buffer/id has read/write/import rights per process. This is also easier to trace/debug security since driver is notified when a buffer is transfered to another process. You never get this info from binder/pipe (dup-ed).

It's totally trivial to have debug info on what buffers are currently mapped into what processes. The kernel knows where all the memory manager's file descriptors have gone. This is already implemented in the proposal I posted. From userspace security becomes really simple, a process owns all the buffers it's created and any that have been shared with it. If it doesn't want to share a buffer with another process, it doesn't pass it to it.

...

When you have a file descriptor, you can assume that the object

...
is still alive until you close it. With an integer passed by some other application, that is less clear.

That's why I prefer the register/import global id step with device. It gives the driver a chance to store meta data and prepare to use this buffer. If this has to be done for every device ioctl call, you loose efficiency and all device APIs would have to be updated with cloned ops for fds. Register/import an fd/globalid and then use device local handles is much more efficient and don't require API changes, only additions.

That's exactly what I'm proposing, you import an fd that's been passed to you.

...

Allows arbitrary subsystems to create compatible handles. Like a

...
...
...
socket can be provided by unrelated subsystems, this file handle could be created by any of GEM, v4l, tmpfs, ... and we can have operations that each of them provides as callbacks

Global ids should also be "compatible", since they are from the same name space. But what do you mean by callbacks? In kernel API on buffer objects? If so, idr_find(ID)->buffer_struct->callback() should be similar to idr_find(FD)->file_struct->callback()

In the idr example, the modules all need to link to the code that provides does the DEFINE_IDR(), in the file example, they only need to provide the same interfaces, so the modules need not link against any common code other than the VFS.

Not a major point though.

True, but I would prefer an in kernel API or callback ifc to do the "mem ops" like resolve.

Most of this can be seen in code @ http://git.linaro.org/gitweb?p=bsp/st-ericsson/linux-2.6.35-ux500.git%3Ba=tr.... But the idea is very close to GEM, actually we just needed a few lines of code in our KMS proto to use HWMEM in KMS & Wayland without changes to user space drm protocols and libs. But note that I'm not promoting HWMEM for unified mem driver, only showing some concepts that could be employed in GEM for use in Android & Wayland with common driver (I consider X "dead" in embedded ;).

/BR /Marcus

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Laurent Pinchart

26 Apr 26 Apr

7:51 a.m.

On Thursday 21 April 2011 20:43:33 Rebecca Schultz Zavin wrote:

...

On Thu, Apr 21, 2011 at 5:55 AM, Marcus Lorentzon wrote:

...
On 04/21/2011 02:15 PM, Arnd Bergmann wrote:

...
On Wednesday 20 April 2011, Marcus Lorentzon wrote:

...
On 04/20/2011 06:19 PM, Arnd Bergmann wrote:

...
File descriptors have a number of very nice properties that we can use:

Efficient lookup in system calls

If both use the idr I don't see how fds are faster than ints.

File descriptors don't use idr.

My mistake, but still, ints could use an array too, and idr or lookup in general should not be an efficiency problem anyway.

Established ways to pass them around

...
...
What could be easier than passing an int? I just don't like the "feature" of passing fds where they are dup-ed without driver knowing so. If you want to store process specific info associated with the fd you have to use a list since you don't know if the app sends the fd to another process. Probably not a big deal, but I don't like it ;)

The problem with passing an integer is that it doesn't have any concept of ownership or lifetime rules.

The idea of passing global ids should not affect lifetime. These ints are still lifetime controlled by the "fd" device that created them. So I think they do have a defined lifetime, that of whatever device this int is registered in. And if the process is shut down, all buffers registered in the drm/v4l2 device fd will be released and even freed if it was the last ref. And if you mean lifetime while process is still alive, even fds has to be closed, and ints has to be closed/unregistered using ioctl. And this is not something that is used by applications either, these refs and allocs are handled by the user space drivers, like libEGL / libGL / X-drivers etc, so ioctl vs. close should not matter.

The problem isn't managing their lifetime in the side that created the buffer, it's managing it while they are in flight. What happens if process 1 passes a buffer to process 2 and before process 2 takes a reference to it, process 1 crashes? Some central clearing house has to handle that. I'm guessing that's the X server in the X case. In my proposal that's handled by the extra reference being held by the passed fd itself (ie the kernel has a reference as long as the file struct exists in either processes file descriptor table).

I don't see a real problem here. Buffers would be ref-counted in the kernel. If process 1 crashes before process 2 gets to take are reference to the buffer, the reference count will be decreased to 0 and the buffer will be freed. Process 2 will then receive an error when it will try to take a reference to the buffer.

-- Regards, Laurent Pinchart

Marcus Lorentzon

8:52 a.m.

On 04/21/2011 08:43 PM, Rebecca Schultz Zavin wrote:

...

On Thu, Apr 21, 2011 at 5:55 AM, Marcus Lorentzon <marcus.xm.lorentzon@stericsson.com mailto:marcus.xm.lorentzon@stericsson.com> wrote:
On 04/21/2011 02:15 PM, Arnd Bergmann wrote:

    On Wednesday 20 April 2011, Marcus Lorentzon wrote:

        On 04/20/2011 06:19 PM, Arnd Bergmann wrote:


            * Established ways to pass them around


        What could be easier than passing an int? I just don't
        like the
        "feature" of passing fds where they are dup-ed without
        driver knowing
        so. If you want to store process specific info associated
        with the fd
        you have to use a list since you don't know if the app
        sends the fd to
        another process. Probably not a big deal, but I don't like
        it ;)

    The problem with passing an integer is that it doesn't have any
    concept of ownership or lifetime rules.


The idea of passing global ids should not affect lifetime. These
ints are still lifetime controlled by the "fd" device that created
them. So I think they do have a defined lifetime, that of whatever
device this int is registered in. And if the process is shut down,
all buffers registered in the drm/v4l2 device fd will be released
and even freed if it was the last ref.
And if you mean lifetime while process is still alive, even fds
has to be closed, and ints has to be closed/unregistered using
ioctl. And this is not something that is used by applications
either, these refs and allocs are handled by the user space
drivers, like libEGL / libGL / X-drivers etc, so ioctl vs. close
should not matter.
The problem isn't managing their lifetime in the side that created the buffer, it's managing it while they are in flight. What happens if process 1 passes a buffer to process 2 and before process 2 takes a reference to it, process 1 crashes? Some central clearing house has to handle that. I'm guessing that's the X server in the X case. In my proposal that's handled by the extra reference being held by the passed fd itself (ie the kernel has a reference as long as the file struct exists in either processes file descriptor table).

I see no advantage in process 2 getting a reference to the buffer before process 1 crash. Process 1 could crash just be fore it sent it too, giving same issue. Doesn't seem to solve any real problem. And if process 1 & 2 are communicating, one crashes, then this is probably the least important problem. And the case with global ids I was trying to propose doesn't have any issue in this case either. The global id doesn't hold a reference. Only the device local ids hold references, and the global id is valid until all local handles are released. So the creator can control lifetime of global id even after it is sent, or until another process has imported the global id, creating its own local handle/ref. Since the local handles are device local, they will be automatically released upon device close (or process crash).

But, not that I'm not saying we Can't do fds. I'm just saying it just a few lines of extra code to have both. I know fds are easy in Android due to Binder. But for all other IPC use cases handling buffer ids specially is not that simple (having to use spacial pipe functions for example, which might not be available in all IPC protocols). So, if you have device local ids, exportable to Either fd or global id, then both models will be supported (see HWMEM for example, supporting DRM & Android).

Another case hard to solve using fd security model is Media process -> Application -> "Flinger" passing of DRM (Digital Rights Management) buffers. For example if you have to extend the media framework to playback buffers applications can "see". Then fds will still give full rights to any process the buffer passes through (Unless you start using EGL streams ;). But global ids with explicit buffer rights (Read / Write / Import) will allow buffers to be passed through "unsecure" processes. Even the media process could be considered unsecure if media decoder is in kernel (accepting global ids as buffer references at decode, resolving these in kernel).

...

    If you allow any process access to all integers, a malicious
    process
    might be able to guess it unless you use long cryptographic random
    numbers.


That's why you put a security model on top. Like GEM auth (which
only have access all or nothing) or something like HWMEM where
each buffer/id has read/write/import rights per process. This is
also easier to trace/debug security since driver is notified when
a buffer is transfered to another process. You never get this info
from binder/pipe (dup-ed).
It's totally trivial to have debug info on what buffers are currently mapped into what processes. The kernel knows where all the memory manager's file descriptors have gone. This is already implemented in the proposal I posted. From userspace security becomes really simple, a process owns all the buffers it's created and any that have been shared with it. If it doesn't want to share a buffer with another process, it doesn't pass it to it.

Unless you don't want to give access to intermediate process as in the use case above.

...

    When you have a file descriptor, you can assume that the object
    is still alive until you close it. With an integer passed by some
    other application, that is less clear.


That's why I prefer the register/import global id step with
device. It gives the driver a chance to store meta data and
prepare to use this buffer. If this has to be done for every
device ioctl call, you loose efficiency and all device APIs would
have to be updated with cloned ops for fds. Register/import an
fd/globalid and then use device local handles is much more
efficient and don't require API changes, only additions.

That's exactly what I'm proposing, you import an fd that's been passed to you.

And that is fine, as long as you support global ids too for those without Binder ;). Or at least start out with an attempt to map your requirements on GEM or something else already solving most of the issues. Even if Android decide to take their own route again, Linaro has as main target to upstream everything. And not making an attempt to merge with "GEM" or what already exist upstream will probably only make that job even harder than merge the ARM world where every one currently have their own "pmem".

/BR /Marcus

Dave Airlie

20 Apr 20 Apr

8:48 p.m.

...

Efficient lookup in system calls

Established ways to pass them around

Easy to mmap() -- if you want to map anything into user space, you

need a file descriptor anyway

This isn't a great boon as we've seen with GEM and TTM.

The problem for GEM was we used shmem to back the fds, so the mmap was a cached mapping, now when we had alternate mapping available such as via the GTT we had to go and add code to mmap via that drm file descriptor. Though probably could have gotten around that by changing the mmap backing away from shmem.

Dave.

Rebecca Schultz Zavin

9:32 p.m.

I've been lurking on this thread all day since I've been underwater on something else but I want to give some perspective on both things that we've done on Android in the past and where we are going in the future that's relevant to the fd's vs. "cookies" discussion here. I've owned the "memory manager" component on pretty much every android program the android team has been involved with directly, qualcomm's pmem was written by me, nvidia's nvmap was heavily modified, many of ti's things flowed through my fingers etc. As many of you know, we're also working on our own new solution -- yes I know we don't need one more from google, but really you already have several from google, we're trying to replace them with one that solves all the problems the others do.

As Arnd has pointed out, filedescriptors give us for free a lot of things we desperately need, especially lifetime management. This is probably the single biggest source of bugs around multimedia and graphics. Moving forward we are going to make it a requirement that buffers must exist in file descriptors while they are in userspace, so they automatically get reference counted correctly when they are passed between processes. Any solution that doesn't do that will not be compatible with Anrdoid's next generation gralloc api.

That being said, I think all the detractors of file descriptors can be easily worked around. My proposal is to only "promote" buffers into file descriptors while they are being passed or mmaped. This way, we can easily manage the number of fds required in the system. If you want to see how that might work, take a look at my first stab at an implementation for a new memory manager here:

https://review.source.android.com/#change,22239

I was waiting to post that until I had time to put together some cogent documentation for it, but it sounds like it's time has come. Comments are welcome there or by email. Keep in mind it's intended to implement the basic api without the machinery to manage the related hardware and it's still a work in progress.

Rebecca

On Wed, Apr 20, 2011 at 1:48 PM, Dave Airlie airlied@gmail.com wrote:

...

...

Efficient lookup in system calls

Established ways to pass them around

Easy to mmap() -- if you want to map anything into user space, you

need a file descriptor anyway

This isn't a great boon as we've seen with GEM and TTM.

The problem for GEM was we used shmem to back the fds, so the mmap was a cached mapping, now when we had alternate mapping available such as via the GTT we had to go and add code to mmap via that drm file descriptor. Though probably could have gotten around that by changing the mmap backing away from shmem.

Dave.

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Tom Cooksey

21 Apr 21 Apr

10:58 a.m.

Please excuse my naivety, but won't SurfaceFlinger start hitting the same "I need more than 1024 file descriptors" issue as the X-server at some point? I guess most current Android devices don't have 100s of applications running at the same time, but it doesn't seem completely inconceivable they might in the future? I also seem to remember reading something about binder also using file descriptors to pass buffers between processes, so it's potentially not just graphics buffers?

On the same point, can't the X-server/SurfaceFlinger run as a user which has the file descriptor ulimit raised above 1024? Or is the 1024 a more fundamental limit/implementation constraint?

Cheers,

Tom

From: linaro-mm-sig-bounces@lists.linaro.org [mailto:linaro-mm-sig-bounces@lists.linaro.org] On Behalf Of Rebecca Schultz Zavin Sent: 20 April 2011 22:32 To: Dave Airlie Cc: linaro-mm-sig@lists.linaro.org; Arnd Bergmann Subject: Re: [Linaro-mm-sig] Memory Management Discussion

https://review.source.android.com/#change,22239

Rebecca

On Wed, Apr 20, 2011 at 1:48 PM, Dave Airlie <airlied@gmail.commailto:airlied@gmail.com> wrote:

...

Efficient lookup in system calls

Established ways to pass them around

Easy to mmap() -- if you want to map anything into user space, you

need a file descriptor anyway

This isn't a great boon as we've seen with GEM and TTM.

Dave.

_______________________________________________ Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.orgmailto:Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Arnd Bergmann

11:37 a.m.

On Thursday 21 April 2011, Tom Cooksey wrote:

...

Please excuse my naivety, but won't SurfaceFlinger start hitting the same "I need more than 1024 file descriptors" issue as the X-server at some point? I guess most current Android devices don't have 100s of applications running at the same time, but it doesn't seem completely inconceivable they might in the future? I also seem to remember reading something about binder also using file descriptors to pass buffers between processes, so it's potentially not just graphics buffers?

On the same point, can't the X-server/SurfaceFlinger run as a user which has the file descriptor ulimit raised above 1024? Or is the 1024 a more fundamental limit/implementation constraint?

There are two limits to consider here:

1. the ulimit you mentioned, this can be set by any process with CAP_SYS_RESOURCE, and gets inherited by its children.

2. The libc implementation specific FD_SET_SIZE, which limits the highest file descriptor number that can be passed into the select system call.

I believe in case of Android, neither of these is a serious problem. For 1, this means giving a high ulimit to the SurfaceFlinger, as you said. For 2, setting the libc FD_SET_SIZE higher is one option (as long as you don't require binary compatibility with old apps that dynamically link to libc), another option is to replace all remaining calls to select() in SurfaceFlinger with calls to epoll(), which is probably used already because it is more efficient.

In case of glibc, changing FD_SET_SIZE is not an option, and running the X server with a higher ulimit is also not desired (we want to move to running it from regular users instead of root), but there are probably ways to work around that.

Arnd

Jesse Barnes

2:59 p.m.

On Thu, 21 Apr 2011 13:37:57 +0200 Arnd Bergmann arnd@arndb.de wrote:

...

On Thursday 21 April 2011, Tom Cooksey wrote:

...
Please excuse my naivety, but won't SurfaceFlinger start hitting the same "I need more than 1024 file descriptors" issue as the X-server at some point? I guess most current Android devices don't have 100s of applications running at the same time, but it doesn't seem completely inconceivable they might in the future? I also seem to remember reading something about binder also using file descriptors to pass buffers between processes, so it's potentially not just graphics buffers?

On the same point, can't the X-server/SurfaceFlinger run as a user which has the file descriptor ulimit raised above 1024? Or is the 1024 a more fundamental limit/implementation constraint?

There are two limits to consider here:

the ulimit you mentioned, this can be set by any process with CAP_SYS_RESOURCE, and gets inherited by its children.

The libc implementation specific FD_SET_SIZE, which limits the highest file descriptor number that can be passed into the select system call.

I believe in case of Android, neither of these is a serious problem. For 1, this means giving a high ulimit to the SurfaceFlinger, as you said. For 2, setting the libc FD_SET_SIZE higher is one option (as long as you don't require binary compatibility with old apps that dynamically link to libc), another option is to replace all remaining calls to select() in SurfaceFlinger with calls to epoll(), which is probably used already because it is more efficient.

In case of glibc, changing FD_SET_SIZE is not an option, and running the X server with a higher ulimit is also not desired (we want to move to running it from regular users instead of root), but there are probably ways to work around that.

Client apps also have to worry about the fd count, since depending on the app and object caching policy it's very easy to get over 1024 objects. But the solutions above may work for that case as well; I don't expect many apps rely on select(), and those that do can fairly easily be converted.

-- Jesse Barnes, Intel Open Source Technology Center

Arnd Bergmann

3:04 p.m.

On Thursday 21 April 2011, Jesse Barnes wrote:

...

Client apps also have to worry about the fd count, since depending on the app and object caching policy it's very easy to get over 1024 objects. But the solutions above may work for that case as well; I don't expect many apps rely on select(), and those that do can fairly easily be converted.

Hmm, I think client apps are a much harder problem, because they are not under a central control. Note that there may be libraries using select(), so you would not be able to link to those, and you'd also need to solve the ulimit problem here.

Arnd

Jesse Barnes

3:15 p.m.

On Thu, 21 Apr 2011 17:04:58 +0200 Arnd Bergmann arnd@arndb.de wrote:

...

On Thursday 21 April 2011, Jesse Barnes wrote:

...
Client apps also have to worry about the fd count, since depending on the app and object caching policy it's very easy to get over 1024 objects. But the solutions above may work for that case as well; I don't expect many apps rely on select(), and those that do can fairly easily be converted.

Hmm, I think client apps are a much harder problem, because they are not under a central control. Note that there may be libraries using select(), so you would not be able to link to those, and you'd also need to solve the ulimit problem here.

For exising apps especially that's an issue. Lots of apps use external libs via NDK.

But there's another option for those: put the unified mm fds in "high" fd space. I think there are patches floating around to do that on glibc and Linux, which makes things easier in general for libraries that want to allocate their own fds, and other C libraries could implement something similar, assuming we had kernel support.

-- Jesse Barnes, Intel Open Source Technology Center

Tom Cooksey

3:09 p.m.

...

-----Original Message----- From: linaro-mm-sig-bounces@lists.linaro.org [mailto:linaro-mm-sig- bounces@lists.linaro.org] On Behalf Of Jesse Barnes Sent: 21 April 2011 16:00 To: Arnd Bergmann Cc: linaro-mm-sig@lists.linaro.org Subject: Re: [Linaro-mm-sig] Memory Management Discussion

On Thu, 21 Apr 2011 13:37:57 +0200 Arnd Bergmann arnd@arndb.de wrote:

...
On Thursday 21 April 2011, Tom Cooksey wrote:

...
Please excuse my naivety, but won't SurfaceFlinger start hitting

the

...
...
same "I need more than 1024 file descriptors" issue as the X-server at some point? I guess most current Android devices don't have 100s of applications running at the same time, but it doesn't seem

completely

...
...
inconceivable they might in the future? I also seem to remember reading something about binder also using file descriptors to pass buffers between processes, so it's potentially not just graphics

buffers?

...
...
On the same point, can't the X-server/SurfaceFlinger run as a user which has the file descriptor ulimit raised above 1024? Or is the 1024 a more fundamental limit/implementation constraint?

There are two limits to consider here:

the ulimit you mentioned, this can be set by any process with CAP_SYS_RESOURCE, and gets inherited by its children.

The libc implementation specific FD_SET_SIZE, which limits the highest file descriptor number that can be passed into the select system call.

I believe in case of Android, neither of these is a serious problem. For 1, this means giving a high ulimit to the SurfaceFlinger, as you said. For 2, setting the libc FD_SET_SIZE higher is one option (as long as you don't require binary compatibility with old apps that dynamically link to libc), another option is to replace all remaining calls to select() in SurfaceFlinger with calls to epoll(), which is probably used already because it is more efficient.

In case of glibc, changing FD_SET_SIZE is not an option, and running the X server with a higher ulimit is also not desired (we want to move to running it from regular users instead of root), but there are probably ways to work around that.

Client apps also have to worry about the fd count, since depending on the app and object caching policy it's very easy to get over 1024 objects. But the solutions above may work for that case as well; I don't expect many apps rely on select(), and those that do can fairly easily be converted.

Though presumably not if you only have an fd for buffers you want to share with another process or device? I think the common case is you don't want to share a texture or command buffer or whatever with another process, so most of the objects don't need an fd?

Cheers,

Tom

Jesse Barnes

3:14 p.m.

On Thu, 21 Apr 2011 16:09:46 +0100 Tom Cooksey Tom.Cooksey@arm.com wrote:

...

...
Client apps also have to worry about the fd count, since depending on the app and object caching policy it's very easy to get over 1024 objects. But the solutions above may work for that case as well; I don't expect many apps rely on select(), and those that do can fairly easily be converted.

Though presumably not if you only have an fd for buffers you want to share with another process or device? I think the common case is you don't want to share a texture or command buffer or whatever with another process, so most of the objects don't need an fd?

Oh sure, if you don't actually allocate fds for the objects, then there's no issue. But I think the ION proposal created fds for mapped objects as well? That count could get pretty high...

-- Jesse Barnes, Intel Open Source Technology Center

Rebecca Schultz Zavin

6:50 p.m.

On Thu, Apr 21, 2011 at 8:14 AM, Jesse Barnes jbarnes@virtuousgeek.orgwrote:

...

On Thu, 21 Apr 2011 16:09:46 +0100 Tom Cooksey Tom.Cooksey@arm.com wrote:

...
...
Client apps also have to worry about the fd count, since depending on the app and object caching policy it's very easy to get over 1024 objects. But the solutions above may work for that case as well; I don't expect many apps rely on select(), and those that do can fairly easily be converted.

Though presumably not if you only have an fd for buffers you want to share with another process or device? I think the common case is you don't want to share a texture or command buffer or whatever with another process, so most of the objects don't need an fd?

Oh sure, if you don't actually allocate fds for the objects, then there's no issue. But I think the ION proposal created fds for mapped objects as well? That count could get pretty high...

That's lazy programmer (me) wanting to be able to mmap at an offset and not wanting to implement all of mmap in an ioctl. Anyway I think you can map them and then close them, wouldn't that allow you to recycle the fd?

...

-- Jesse Barnes, Intel Open Source Technology Center

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Jesse Barnes

25 Apr 25 Apr

6:47 p.m.

On Thu, 21 Apr 2011 11:50:25 -0700 Rebecca Schultz Zavin rebecca@android.com wrote:

...

On Thu, Apr 21, 2011 at 8:14 AM, Jesse Barnes jbarnes@virtuousgeek.orgwrote:

...
On Thu, 21 Apr 2011 16:09:46 +0100 Tom Cooksey Tom.Cooksey@arm.com wrote:

...
...
Client apps also have to worry about the fd count, since depending on the app and object caching policy it's very easy to get over 1024 objects. But the solutions above may work for that case as well; I don't expect many apps rely on select(), and those that do can fairly easily be converted.

Though presumably not if you only have an fd for buffers you want to share with another process or device? I think the common case is you don't want to share a texture or command buffer or whatever with another process, so most of the objects don't need an fd?

Oh sure, if you don't actually allocate fds for the objects, then there's no issue. But I think the ION proposal created fds for mapped objects as well? That count could get pretty high...

That's lazy programmer (me) wanting to be able to mmap at an offset and not wanting to implement all of mmap in an ioctl. Anyway I think you can map them and then close them, wouldn't that allow you to recycle the fd?

Yeah, I think that would work fine.

-- Jesse Barnes, Intel Open Source Technology Center

Rebecca Schultz Zavin

21 Apr 21 Apr

6:33 p.m.

On Thu, Apr 21, 2011 at 3:58 AM, Tom Cooksey Tom.Cooksey@arm.com wrote:

...

Please excuse my naivety, but won’t SurfaceFlinger start hitting the same “I need more than 1024 file descriptors” issue as the X-server at some point? I guess most current Android devices don’t have 100s of applications running at the same time, but it doesn’t seem completely inconceivable they might in the future? I also seem to remember reading something about binder also using file descriptors to pass buffers between processes, so it’s potentially not just graphics buffers?

The binder uses it's own method for ipc between processes. Sometimes it shares memory that way between processes.

...

On the same point, can’t the X-server/SurfaceFlinger run as a user which has the file descriptor ulimit raised above 1024? Or is the 1024 a more fundamental limit/implementation constraint?

I don't think that would be a problem. Also I don't think the Android stack uses select for anything.

Rebecca

...

Cheers,

Tom

*From:* linaro-mm-sig-bounces@lists.linaro.org [mailto: linaro-mm-sig-bounces@lists.linaro.org] *On Behalf Of *Rebecca Schultz Zavin *Sent:* 20 April 2011 22:32 *To:* Dave Airlie *Cc:* linaro-mm-sig@lists.linaro.org; Arnd Bergmann

*Subject:* Re: [Linaro-mm-sig] Memory Management Discussion

I've been lurking on this thread all day since I've been underwater on something else but I want to give some perspective on both things that we've done on Android in the past and where we are going in the future that's relevant to the fd's vs. "cookies" discussion here. I've owned the "memory manager" component on pretty much every android program the android team has been involved with directly, qualcomm's pmem was written by me, nvidia's nvmap was heavily modified, many of ti's things flowed through my fingers etc. As many of you know, we're also working on our own new solution -- yes I know we don't need one more from google, but really you already have several from google, we're trying to replace them with one that solves all the problems the others do.

As Arnd has pointed out, filedescriptors give us for free a lot of things we desperately need, especially lifetime management. This is probably the single biggest source of bugs around multimedia and graphics. Moving forward we are going to make it a requirement that buffers must exist in file descriptors while they are in userspace, so they automatically get reference counted correctly when they are passed between processes. Any solution that doesn't do that will not be compatible with Anrdoid's next generation gralloc api.

That being said, I think all the detractors of file descriptors can be easily worked around. My proposal is to only "promote" buffers into file descriptors while they are being passed or mmaped. This way, we can easily manage the number of fds required in the system. If you want to see how that might work, take a look at my first stab at an implementation for a new memory manager here:

https://review.source.android.com/#change,22239

I was waiting to post that until I had time to put together some cogent documentation for it, but it sounds like it's time has come. Comments are welcome there or by email. Keep in mind it's intended to implement the basic api without the machinery to manage the related hardware and it's still a work in progress.

Rebecca

On Wed, Apr 20, 2011 at 1:48 PM, Dave Airlie airlied@gmail.com wrote:

...

Efficient lookup in system calls

Established ways to pass them around

Easy to mmap() -- if you want to map anything into user space, you

need a file descriptor anyway

This isn't a great boon as we've seen with GEM and TTM.

The problem for GEM was we used shmem to back the fds, so the mmap was a cached mapping, now when we had alternate mapping available such as via the GTT we had to go and add code to mmap via that drm file descriptor. Though probably could have gotten around that by changing the mmap backing away from shmem.

Dave.

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Arnd Bergmann

12:51 p.m.

On Wednesday 20 April 2011, Rebecca Schultz Zavin wrote:

...

I've been lurking on this thread all day since I've been underwater on something else but I want to give some perspective on both things that we've done on Android in the past and where we are going in the future that's relevant to the fd's vs. "cookies" discussion here. I've owned the "memory manager" component on pretty much every android program the android team has been involved with directly, qualcomm's pmem was written by me, nvidia's nvmap was heavily modified, many of ti's things flowed through my fingers etc. As many of you know, we're also working on our own new solution -- yes I know we don't need one more from google, but really you already have several from google, we're trying to replace them with one that solves all the problems the others do.

I think people have stopped being bothered by code in the Android kernel that doesn't go upstream and have accepted the fact that each release is just a fork of Linux that gets thrown away with the next release.

IMO it's a very welcome change that you are now sharing your experience and work together with us. If we get it right this time, we might end up with an allocator that works for both Linux and Android.

...

That being said, I think all the detractors of file descriptors can be easily worked around. My proposal is to only "promote" buffers into file descriptors while they are being passed or mmaped. This way, we can easily manage the number of fds required in the system. If you want to see how that might work, take a look at my first stab at an implementation for a new memory manager here:

https://review.source.android.com/#change,22239

I was waiting to post that until I had time to put together some cogent documentation for it, but it sounds like it's time has come. Comments are welcome there or by email. Keep in mind it's intended to implement the basic api without the machinery to manage the related hardware and it's still a work in progress.

The user interface looks nice and simple. Please go ahead and post the patch for review, even without the documentation.

My main concern with the code is that it tries to do something very generic, but is still linked very closely to a single device driver, which makes it unnecessarily hard to do a second driver using the same code.

Integrating the buffer management into DRM, next to GEM, would probably be one way to handle that more nicely if you find that GEM cannot do what you need. If DRM is found to be completely useless for embedded graphics, I think it would still be better to have a new framework for these and plug the ION driver into that, rather than having the buffer management code inside of that one driver.

Arnd

Daniel Stone

22 Apr 22 Apr

9:03 a.m.

Hi, Sorry if I'm covering any ground that you guys have already gone over; I thought this list was just continuing to be quiet, but turns out it was all landing in my spam folder ...

On Wed, Apr 20, 2011 at 01:23:18PM +0100, Tom Cooksey wrote:

...

There's a big difference between GEM and what we're trying to do. GEM is designed to manage _all_ buffers used by the graphics hardware. What I believe we're trying to do is only provide a manager which allows buffers to be shared between devices. Of all the buffers and textures a GPU needs to access, only a tiny fraction of them need to be shared between devices and userspace processes. How large that fraction is I don't know, it might still be approaching the 1024 limit, but I doubt it...

I can see this being problematic from an upstream point of view. We already have two memory managers in the kernel (GEM and TTM, even if TTM is thankfully built on top of GEM); adding another graphics memory management framework for this limited usecase while explicitly requiring the driver to support or implement another sounds like it will get Linus upset at us all again[0]. And, given the parameters, it sounds very much like the intention is to let everyone keep their own custom memory managers and not bother with GEM or TTM.

Bear in mind that I'm not saying this is a terrible idea / everyone should port to GEM right now / we should design an amazing overarching memory management and allocation API that handles literally every usecase anyone can think of. I'm just saying that upstreaming it might be more difficult than you'd think.

...

So, the buffers we're interested in sharing between different processes and devices are:

Decoded video buffers (from both cameras & video decoders)

Window back-buffers

System-wide theme textures and font glyph caches

.... Anyone know of other candidates?

Well, in the Wayland case, window frontbuffers as well. And note that 'window' covers a lot more than what you might think -- panels, your desktop background, system tray icons (although these are, for the most part, not on the critical path for GPU performance), the date & time widget in the panel, etc, etc. And all the application windows you'd think of too. And sometimes their subwindows (cf. XEMBED, and non-WMODE NSAPI browser plugins).

So I guess you really need to replace 'window back-buffers' with 'anything the compositor will ever need to address', because having to block, synchronise, flush, etc to pull just one unaddressable surface out of X in your compositor, is going to be unbelievably painful.

...

I guess the bottleneck will probably be the window compositor, as it will need to have a reference to all window back buffers in the system. So, do any of the DRI folks have any idea how many windows (with back buffers) a typical desktop session has?

Well, define 'typical', I guess? My grandparents have Nautilus background + two panels + Chromium + Empathy contact list + Empathy chat. A graphic designer with ADHD probably has four thousand GIMP windows, a billion Chromium windows, a billion Empathy chat windows, etc.

...

Do minimised windows have back buffers allocated in X11?

In a composited environment, most window managers/compositors choose to keep minimised windows redirected (so, the answer being yes) so they can continue to update the icon previews and so on.

Cheers, Daniel

[0]: He's currently most upset at both ARM and GPU/DRM guys, the former for excessive duplication of subsystems when they could be shared. So, this may rub him the wrong way, coming from ARM GPU guys. :)

Jordan Crouse

20 Apr 20 Apr

3:21 p.m.

On 04/20/2011 12:55 AM, Daniel Vetter wrote:

...

On Wed, Apr 20, 2011 at 8:25 AM, Arnd Bergmannarnd@arndb.de wrote:

...
On Wednesday 20 April 2011 03:52:53 Clark, Rob wrote:

...
From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

I still like the idea of using file handles to pass the buffers between kernel subsystems. Maybe there could be an ioctl to encapsulate a buffer from DRI in a file so we can give it to another subsystem, and/or an ioctl to register a buffer from a file handle with DRI.

That's been the original design of gem, i.e. using fd handles (and perhaps even passing them around in unix domain sockets). There's one small problem with that approach: You're quickly running out of fds with the linux default limit of 1024. Hence the roll-your-own approach.

Aside: I'll be participating as a gem drm/i915 hacker. I'll send a short overview of how gem/kms tackles these problems after easter because our approach is rather different from what the arm community seems to want (as far as I can tell).

I don't think the goals and aspirations of both APIs are really that different when you get down to it. I think the biggest concern that most ARM vendors have is that GEM is tied to DRM and KMS in spirit, and DRM/KMS/GEM as a whole is pretty scary. If you look at any one of our implementations the GEM wheel gets re-invented a lot, so there is a lot of overlap and a chance to collaborate.

Jordan

Clark, Rob

2:03 p.m.

On Wed, Apr 20, 2011 at 1:25 AM, Arnd Bergmann arnd@arndb.de wrote:

...

On Wednesday 20 April 2011 03:52:53 Clark, Rob wrote:

...
From DRI perspective.. I guess the global buffer name is restricted to a 4 byte integer, unless you change the DRI proto..

I still like the idea of using file handles to pass the buffers between kernel subsystems. Maybe there could be an ioctl to encapsulate a buffer from DRI in a file so we can give it to another subsystem, and/or an ioctl to register a buffer from a file handle with DRI.

I know GEM explicitly avoided fd's.. but, I think if it is possible to create buffers w/ a non-fd based handle, and then later when it is decided the buffer needs to be shared, create a fd (ie. instead of DRM_IOCTL_GEM_FLINK) then I guess it should be ok..

(I guess in the end, Robert is correct that we can rev the DRI protocol if needed.. although I think it should be avoided unless there is a good reason)

BR, -R

5179

days inactive

5202

days old

linaro-mm-sig@lists.linaro.org

48 comments

participants

tags (0)

participants (18)

Arnd Bergmann
Clark, Rob
Daniel Stone
Daniel Vetter
Dave Airlie
Hans Verkuil
Jesse Barker
Jesse Barnes
Jordan Crouse
Kyungmin Park
Laurent Pinchart
Marcus Lorentzon
Rebecca Schultz Zavin
Rebecca Schultz Zavin
rmorell＠nvidia.com
Sree Kumar
Tom Cooksey
Zach Pfeffer