Memory Management Mini-Summit Linaro Developer Summit, Budapest, May 9-11, 2011 =================================================
Hi all. Apologies for this report being so long in coming. I know others have thrown in their perceptions and opinions on how the mini-summit went, so I suppose it's my turn.
Outcomes: --------- * Approach (full proposal under draft, to be sent to the lists below) - Modified CMA for additional physically contiguous buffer support. - dma-mapping API changes, enhancements and ARM architecture support. - "struct dma_buf" based buffer sharing infrastructure with support from device drivers. - Pick any "low-hanging fruit" with respect to consolidation (supporting the ARM arch/sub-arch goals).
* Proposal for work around allocation, mapping and buffer sharing to be announced on: - dri-devel - linux-arm-kernel - linux-kernel - linux-media - linux-mm - linux-mm-sig
* Communication and Meetings - New IRC channel #linaro-mm-sig for meetings and general communication between those working on and interested in these topics (already created). - IRC meetings will be weekly with an option for the consituency to decide on ultimate frequency (logs to be emailed to linaro-mm-sig list). - Linaro can provide wiki services and any dial-in needed. - Next face-to-face meetings: . Linaro mid-cycle summit (August 1-5, see https://wiki.linaro.org/Events/2011-08-LDS) . Linux Plumbers Conference (September 7-9, see http://www.linuxplumbersconf.org/2011/ocw/proposals/567) . V4L2 brainstorm meeting (Hans Verkuil to update with details)
Overview and Goals for the 3 days: ---------------------------------- * Day 1 - Component overviews, expected to spill over into day 2 * Day 2 - Concrete use case that outlines a definition of the problem that we are trying to solve, and shows that we have solved it. * Day 3 - Dig into the lower level details of the current implementations. What do we have, what's missing, what's not implemented for ARM.
This is about memory management, zero-copy pipelines, kernel/userspace interfaces, memory management, memory reservations and much more :-) In particular, what we would like to end up with is:
* Understand who is working on what; avoid work duplication. * Focus on a specific problem we want to solve and discuss possible solutions. * Come up with a plan to fix this specific problem. * Start enumerating work items that the Linaro Graphics WG can work on in this cycle.
Day 1: ------ The first day got off to a little bit of a stutter start as the summit scheduler would not let us indicate that our desired starting time was immediately after lunch, during the plenaries. However, that didn't stop people from flocking to the session in droves. By the time I made the kickoff comments on why we were there, and what we were there to accomplish (see "Overview and Goals for the 3 days" above), we had brought in an extra 10 chairs and there were people on the floor and spilling out into the hallway.
Based upon our experiences from the birds-of-a-feather at the Embedded Linux Conference, 2 things dominated day 1. First things first, I assigned someone to take notes ;-). Etherpad made it really easy for people to take notes collectively, including those participating remotely, and for everyone to see who was writing what, but we definitely needed someone whose focus would be capturing the proceedings, so thanks to Dave Rusling for shouldering that burden. The second thing was that we desperately needed an education in each others components and subsystems. Without this, we would risk missing significant areas of discussion, or possibly even be violently agreeing on something without realizing it. So, we started with a series of component overviews. These were presentations on the order of 20 minutes with some room for Q&A. On day 1, we had:
* V4L2 - Hans Verkuil * DRM/GEM/KMS - Daniel Vetter * TTM - Thomas Hellstrom * CMA - Marek Szyprowski * VCMM - Zach Pfeffer
All of these (as well as the ones from day 2) are available through links on the mini-summit wiki (https://wiki.linaro.org/Events/2011-05-MM).
Day 2: ------ The second day got off to a bit better a start than did day 1 as we more clearly communicated the start time to everyone involved, and forgot about the summit scheduler. We (conceptually) picked up where day 1 left off with one more component overview:
* UMP - Ketil Johnson
and, covered the MediaController API for good measure. From there, we spent a fair amount of time discussing use cases to illustrate our problem space. We started (via pre-summit submissions) with a couple of variations on what amounted to basically the same thing. I think the actual case is probably best illustrated by the pdf slides from Sakari Ailus (see the link on the mini-summit wiki). Basically, we want to take a video input, either from a camera or from a file, decode it, process it, render to it and/or with it and display it. These pipeline stages may be handled by hardware, by software on the CPU or some combination of the two; each stage should be handled by accepting a buffer from the last stage and operating on it in some fashion (no copies wherever possible). It turned out that still image capture can actually be a more complicated version of this use case, but even something as simple as taking input through the camera and displaying it (image preview) can involve much of the underpinnings required to support the more complicated cases. We may indeed start with this simple case as a proof-of-concept.
Once we had the use case nailed down, we moved onto the actual components/subsystems that would need to share buffers in order for the use case to work properly with the zero-copy (or at least minimal-copy) requirement. We had:
* DRM * V4L2 * fbdev * ALSA * DSP * User-space (kind of all encompassing and could include things like OpenCL, which also makes an interesting use case). * DVB * Out-of-tree GPU drivers
We wound out the day by discussing exactly what metadata we would want to track in order to enable the desired levels of sharing with simultaneous device mappings, cache management and other considerations (e.g., device peculiarities). What we came up with is a struct (we called it "dma_buf") that has the following info:
* Size * Creator/Allocator * Attributes: - sharable? - contiguous? - device-local? * Reference count * Pinning reference count * CPU cache management data * Device private data (e.g., quirky tiling modes) * Scatter list * Synchronization data (for managing in-flight device transactions) * Mapping data
These last few (device privates through mapping data) are lists of data, one for each device that has a mapping of the buffer. The mapping data is nominally an address and per-device cache management data. We actually got through the this part fairly quickly. The biggest part of the discussion was what to use for handles/identifiers in the buffer sharing scheme. The discussion was between global identifiers like GEM uses, or file descriptors as favored by Android. Initially, there was an informal consensus around unique IDs, though it was not a definitive decision (yet). The atomicity of passing file descriptors between processes makes them quite attractive for the task.
Day 3: ------ By the third day, there was a sense of running out of time and really needing to ensure that we left with a reasonable set of outcomes (see the overview and goals section above). In short, we wanted to make sure that we had a plan/roadmap, a reasonably actionable set of tasks that could be picked up by Linaro engineers and community members alike, and that we would not only avoid duplicating new work, but also reduce some of the existing code duplication that got us to this point in the first place.
But, we weren't done. We still had to cover the requirements around allocation and explore the dma-mapping and IOMMU APIs.
This took most of the day, but was a quite fruitful set of discussions. As with the rest of the discussions, we focused on leveraging existing technologies as much as possible. With allocations, however, this wasn't entirely possible as we have devices on ARM SoCs that do not have an IOMMU and require physically contiguous buffers in order to operate. After a fair amount of discussion, it was decided that a modified version of the current CMA (see Marek's slides linked from the wiki). It assumes the pages are movable and manages them and not the mappings. There was concern that the API didn't quite fit with other related API, so the changes from the current state will be around those details.
On the mapping side, we focused on the dma-mapping API with appropriate layering on the IOMMU API where appropriate. Without going into crazy detail, we are looking at something like 4 implementation s of the dma_map_ops functions for ARM: with and without IOMMU, with and without bounce buffer (these last two exist, but not using the dma_map_ops API). Marek has put out patches for comment on the IOMMU based implementation of this based upon work he had in progress. Also in the area of dma_map_ops, the sync related API need a start address and offset, and the alloc and free need attribute parameters like map and unmap already have (to support cacheable/coherent/write-combined). In the "not involving dma_map_ops" category, we have a couple of changes that are likely to be non-trivial (not that any of the other proposed work is). It was proposed to modify (actually, the word thrown about in the discussions was "fix") dma_alloc_coherent for ARM to support unmapping from the kernel linear mapping and the use of HIGHMEM; two separate implementations, configured at build-time. And, last but not least, there was a fair amount of concern over the cache management API and its ability to live cleanly with the IOMMU code and to resist breakage from other architecture implementations.
At this point, we reviewed what we had done and finalized the outcomes (see the outcomes section at the top). And, with a half an hour to spare, I re-instigated the file descriptors versus unique identifiers discussion from day 2. I think file descriptors were winning by the end (especially after people started posting pointers to code samples of how to actually pass them between processes)....
Attendees: ---------- I will likely miss people here trying to list out everyone, especially given that some of the sessions were quite literally overflowing the room we were in. For as accurate an account of attendance as I can muster, check out the list of attendees on the mini-summit wiki page or the discussion blueprints we used for scheduling:
https://wiki.linaro.org/Events/2011-05-MM#Attendees https://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-me... https://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-me... https://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-me...
The occupants of the fishbowl (the front/center of the room in closest proximity to the microphones) were primarily:
Arnd Bergmann Laurent Pinchart Hans Verkuil Mauro Chehab Daniel Vetter Sakari Ailus Thomas Hellstrom Marek Szyprowski Jesse Barker
The IRC fishbowl seemed to consist of:
Rob Morell Jordan Crouse David Brown
There were certainly others both local and remote participating to varying degrees that I do not intend to omit, and a special thanks goes out to Joey Stanford for arranging a larger room for us on days 2 and 3 when we had people sitting on the floor and spilling into the hallway during day 1.