[Linaro-mm-sig] Memory Management Mini-Summit Report

1 Jun 2011


      Memory Management Mini-Summit
Linaro Developer Summit, Budapest, May 9-11, 2011
=================================================
Hi all.  Apologies for this report being so long in coming.  I know
others have thrown in their perceptions and opinions on how the
mini-summit went, so I suppose it's my turn.
Outcomes:
---------
 * Approach (full proposal under draft, to be sent to the lists below)
  - Modified CMA for additional physically contiguous buffer support.
  - dma-mapping API changes, enhancements and ARM architecture support.
  - "struct dma_buf" based buffer sharing infrastructure with support
from device drivers.
  - Pick any "low-hanging fruit" with respect to consolidation
(supporting the ARM arch/sub-arch goals).
* Proposal for work around allocation, mapping and buffer sharing to
be announced on:
  - dri-devel
  - linux-arm-kernel
  - linux-kernel
  - linux-media
  - linux-mm
  - linux-mm-sig
* Communication and Meetings
  - New IRC channel #linaro-mm-sig for meetings and general
communication between those working on and interested in these topics
(already created).
  - IRC meetings will be weekly with an option for the consituency to
decide on ultimate frequency (logs to be emailed to linaro-mm-sig
list).
  - Linaro can provide wiki services and any dial-in needed.
  - Next face-to-face meetings:
   . Linaro mid-cycle summit (August 1-5, see
https://wiki.linaro.org/Events/2011-08-LDS)
   . Linux Plumbers Conference (September 7-9, see
http://www.linuxplumbersconf.org/2011/ocw/proposals/567)
   . V4L2 brainstorm meeting (Hans Verkuil to update with details)
Overview and Goals for the 3 days:
----------------------------------
 * Day 1 - Component overviews, expected to spill over into day 2
 * Day 2 - Concrete use case that outlines a definition of the problem
that we are trying to solve, and shows that we have solved it.
 * Day 3 - Dig into the lower level details of the current
implementations.   What do we have, what's missing, what's not
implemented for ARM.
This is about memory management, zero-copy pipelines, kernel/userspace
interfaces, memory management, memory reservations and much more  :-)
In particular, what we would like to end up with is:
* Understand who is working on what; avoid work duplication.
 * Focus on a specific problem we want to solve and discuss possible solutions.
 * Come up with a plan to fix this specific problem.
 * Start enumerating work items that the Linaro Graphics WG can work
on in this cycle.
Day 1:
------
The first day got off to a little bit of a stutter start as the summit
scheduler would not let us indicate that our desired starting time was
immediately after lunch, during the plenaries.  However, that didn't
stop people from flocking to the session in droves.  By the time I
made the kickoff comments on why we were there, and what we were there
to accomplish (see "Overview and Goals for the 3 days" above), we had
brought in an extra 10 chairs and there were people on the floor and
spilling out into the hallway.
Based upon our experiences from the birds-of-a-feather at the Embedded
Linux Conference, 2 things dominated day 1.  First things first, I
assigned someone to take notes ;-).  Etherpad made it really easy for
people to take notes collectively, including those participating
remotely, and for everyone to see who was writing what, but we
definitely needed someone whose focus would be capturing the
proceedings, so thanks to Dave Rusling for shouldering that burden.
The second thing was that we desperately needed an education in each
others components and subsystems.  Without this, we would risk missing
significant areas of discussion, or possibly even be violently
agreeing on something without realizing it.  So, we started with a
series of component overviews.  These were presentations on the order
of 20 minutes with some room for Q&A.  On day 1, we had:
* V4L2 - Hans Verkuil
 * DRM/GEM/KMS - Daniel Vetter
 * TTM - Thomas Hellstrom
 * CMA - Marek Szyprowski
 * VCMM - Zach Pfeffer
All of these (as well as the ones from day 2) are available through
links on the mini-summit wiki
(https://wiki.linaro.org/Events/2011-05-MM).
Day 2:
------
The second day got off to a bit better a start than did day 1 as we
more clearly communicated the start time to everyone involved, and
forgot about the summit scheduler.  We (conceptually) picked up where
day 1 left off with one more component overview:
* UMP - Ketil Johnson
and, covered the MediaController API for good measure.  From there, we
spent a fair amount of time discussing use cases to illustrate our
problem space.  We started (via pre-summit submissions) with a couple
of variations on what amounted to basically the same thing.  I think
the actual case is probably best illustrated by the pdf slides from
Sakari Ailus (see the link on the mini-summit wiki).  Basically, we
want to take a video input, either from a camera or from a file,
decode it, process it, render to it and/or with it and display it.
These pipeline stages may be handled by hardware, by software on the
CPU or some combination of the two; each stage should be handled by
accepting a buffer from the last stage and operating on it in some
fashion (no copies wherever possible).  It turned out that still image
capture can actually be a more complicated version of this use case,
but even something as simple as taking input through the camera and
displaying it (image preview) can involve much of the underpinnings
required to support the more complicated cases.  We may indeed start
with this simple case as a proof-of-concept.
Once we had the use case nailed down, we moved onto the actual
components/subsystems that would need to share buffers in order for
the use case to work properly with the zero-copy (or at least
minimal-copy) requirement.  We had:
* DRM
 * V4L2
 * fbdev
 * ALSA
 * DSP
 * User-space (kind of all encompassing and could include things like
OpenCL, which also makes an interesting use case).
 * DVB
 * Out-of-tree GPU drivers
We wound out the day by discussing exactly what metadata we would want
to track in order to enable the desired levels of sharing with
simultaneous device mappings, cache management and other
considerations (e.g., device peculiarities).  What we came up with is
a struct (we called it "dma_buf") that has the following info:
* Size
 * Creator/Allocator
 * Attributes:
   - sharable?
   - contiguous?
   - device-local?
 * Reference count
 * Pinning reference count
 * CPU cache management data
 * Device private data (e.g., quirky tiling modes)
 * Scatter list
 * Synchronization data (for managing in-flight device transactions)
 * Mapping data
These last few (device privates through mapping data) are lists of
data, one for each device that has a mapping of the buffer.  The
mapping data is nominally an address and per-device cache management
data.  We actually got through the this part fairly quickly.  The
biggest part of the discussion was what to use for handles/identifiers
in the buffer sharing scheme.  The discussion was between global
identifiers like GEM uses, or file descriptors as favored by Android.
Initially, there was an informal consensus around unique IDs, though
it was not a definitive decision (yet).  The atomicity of passing file
descriptors between processes makes them quite attractive for the
task.
Day 3:
------
By the third day, there was a sense of running out of time and really
needing to ensure that we left with a reasonable set of outcomes (see
the overview and goals section above).  In short, we wanted to make
sure that we had a plan/roadmap, a reasonably actionable set of tasks
that could be picked up by Linaro engineers and community members
alike, and that we would not only avoid duplicating new work, but also
reduce some of the existing code duplication that got us to this point
in the first place.
But, we weren't done.  We still had to cover the requirements around
allocation and explore the dma-mapping and IOMMU APIs.
This took most of the day, but was a quite fruitful set of
discussions.  As with the rest of the discussions, we focused on
leveraging existing technologies as much as possible.  With
allocations, however, this wasn't entirely possible as we have devices
on ARM SoCs that do not have an IOMMU and require physically
contiguous buffers in order to operate.  After a fair amount of
discussion, it was decided that a modified version of the current CMA
(see Marek's slides linked from the wiki).  It assumes the pages are
movable and manages them and not the mappings.  There was concern that
the API didn't quite fit with other related API, so the changes from
the current state will be around those details.
On the mapping side, we focused on the dma-mapping API with
appropriate layering on the IOMMU API where appropriate.  Without
going into crazy detail, we are looking at something like 4
implementation s of the dma_map_ops functions for ARM: with and
without IOMMU, with and without bounce buffer (these last two exist,
but not using the dma_map_ops API).  Marek has put out patches for
comment on the IOMMU based implementation of this based upon work he
had in progress.  Also in the area of dma_map_ops, the sync related
API need a start address and offset, and the alloc and free need
attribute parameters like map and unmap already have (to support
cacheable/coherent/write-combined).  In the "not involving
dma_map_ops" category, we have a couple of changes that are likely to
be non-trivial (not that any of the other proposed work is).  It was
proposed to modify (actually, the word thrown about in the discussions
was "fix") dma_alloc_coherent for ARM to support unmapping from the
kernel linear mapping and the use of HIGHMEM; two separate
implementations, configured at build-time.  And, last but not least,
there was a fair amount of concern over the cache management API and
its ability to live cleanly with the IOMMU code and to resist breakage
from other architecture implementations.
At this point, we reviewed what we had done and finalized the outcomes
(see the outcomes section at the top).  And, with a half an hour to
spare, I re-instigated the file descriptors versus unique identifiers
discussion from day 2.  I think file descriptors were winning by the
end (especially after people started posting pointers to code samples
of how to actually pass them between processes)....
Attendees:
----------
I will likely miss people here trying to list out everyone, especially
given that some of the sessions were quite literally overflowing the
room we were in.  For as accurate an account of attendance as I can
muster, check out the list of attendees on the mini-summit wiki page
or the discussion blueprints we used for scheduling:
https://wiki.linaro.org/Events/2011-05-MM#Attendees
https://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-me...
https://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-me...
https://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-me...
The occupants of the fishbowl (the front/center of the room in closest
proximity to the microphones) were primarily:
Arnd Bergmann
Laurent Pinchart
Hans Verkuil
Mauro Chehab
Daniel Vetter
Sakari Ailus
Thomas Hellstrom
Marek Szyprowski
Jesse Barker
The IRC fishbowl seemed to consist of:
Rob Morell
Jordan Crouse
David Brown
There were certainly others both local and remote participating to
varying degrees that I do not intend to omit, and a special thanks
goes out to Joey Stanford for arranging a larger room for us on days 2
and 3 when we had people sitting on the floor and spilling into the
hallway during day 1.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[Linaro-mm-sig] Memory Management Mini-Summit Report