[Linaro-mm-sig] Memory region attribute bits and multiple mappings

Marek Szyprowski m.szyprowski at samsung.com
Wed Apr 20 12:22:22 UTC 2011


Hello,

On Tuesday, April 19, 2011 11:23 PM Arnd Bergmann wrote:

> This may be a stupid question, but do we have an agreement that it
> is actually a requirement to have uncached mappings? With the
> streaming DMA mapping API, it should be possible to work around
> noncoherent DMA by flushing the caches at the right times, which
> probably results in better performance than simply doing noncached
> mappings. What is the specific requirement for noncached memory
> regions?

Flushing cache for large buffers also takes a significant time,
especially if it is implemented by iterating over the whole buffer and
calling flush instruction for each line.

For most use cases the CPU write speed is not degraded on non-caches
memory areas. ARM CPUs with write combining feature performs really
well on uncached memory.

Non-cached buffers are also the only solution for buffers that need to
be permanently mapped to userspace (like framebuffer). Non-cached
mappings are also useful when one doesn't touch the memory with cpu at
all (zero copy between 2 independent multimedia blocks).

(snipped)

> > If we really need all mappings of physical memory to have the same cache
> > attribute bits, I see three workarounds:
> >
> > 1- set aside memory at boot that never gets mapped by the kernel.  The
> > unified memory manager can then ensure there's only one mapping at a time.
> > Obvious drawbacks here are that you have to statically partition your
> system
> > into memory you want accessible to the unified memory manager and memory
> you
> > don't.  This may not be that big a deal, since most current solutions,
> pmem,
> > cmem, et al basically do this.  I can say that on Android devices running
> on
> > a high resolution display (720p and above) we're easily talking about
> > needing 256M of memory or more to dedicate to this.
> 
> Right, I believe this is what people generally do to avoid the problem,
> but I wouldn't call it a solution.

Is is also a huge memory waste. Similar solutions have been proposed to 
overcome the problem of memory fragmentation. I don't think we can afford
giving away almost half of the system memory just to have the possibility
of processing 720p movie.

That's why we came with the idea of CMA (contiguous memory allocator)
which can 'recycle' memory areas that are not used by multimedia hardware.
CMA allows system to allocate movable pages (like page cache, user process
memory, etc) from defined CMA range and migrate them on allocation request
for contiguous memory. For more information, please refer to:
https://lkml.org/lkml/2011/3/31/213

I want to merge this idea with changing changing the kernel linear low-mem
mapping, so 2-level page mapping will be done only for the defined CMA
range, what should reduce TLB pressure. Once the contiguous block is 
allocated from CMA range, the mapping in low-mem area can be removed to
fulfill the ARM specification.

> > 3- fix up the unity mapping so the attribute bits match those desired by
> the
> > unified memory manger.  This could be done by removing pages from the
> unity
> > map.  It's complicated by the fact that the unity map makes use of large
> > pages, sections and supersections to reduce tlb pressure.  I don't think
> > this is impossible if we restrict the set of contexts from which it can
> > happen, but I'm imagining that we will also need to maintain some kind of
> > pool of memory we've moved from cached to uncached since the process is
> > likely to be expensive.  Quite likely we will have to iterate
> > over processes and update all their top level page tables.
> 
> Would it get simpler if we only allow entire supersections to be moved
> into the uncached memory allocator?

I'm not sure if this will change anything. Updating attributes of supersection
still requires iterating over all processes and their page tables. If we change
the attribute of the supersection on system boot and allow only to allocate
buffer from it, we will get the solution #1 (allocation of dma buffers only
from the reserved memory).

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



More information about the Linaro-mm-sig mailing list