Re: [Linaro-mm-sig] Memory region attribute bits and multiple mappings

19 Apr 2011


      On Tuesday 19 April 2011 22:06:50 Rebecca Schultz Zavin wrote:
...
Hey all,
While we are working out requirements, I was hoping to get some more
information about another related issue that keeps coming up on mailing
lists and in discussions.
Thanks for the summary and getting this started!
...
ARM has stated that if you have the same physical memory mapped with two
different sets of attribute bits you get undefined behavior.  I think it's
going to be a requirement that some of the memory allocated via the unified
memory manager is mapped uncached.
This may be a stupid question, but do we have an agreement that it
is actually a requirement to have uncached mappings? With the
streaming DMA mapping API, it should be possible to work around
noncoherent DMA by flushing the caches at the right times, which
probably results in better performance than simply doing noncached
mappings. What is the specific requirement for noncached memory
regions?
...
However, because all of memory is mapped
cached into the unity map at boot, we already have two mappings with
different attributes.  I want to understand the mechanism of the problem,
because none of the solutions I can come up with are particularly nice.  I'd
also like to know exactly which architectures are affected, since the fix
may be costly in performance, memory or both.  Can someone at ARM explain to
me why this causes a problem.  I have a theory, but it's mostly a guess.  I
especially want to understand if it's still a problem if we never access the
memory via the mapping in the unity map.  I know speculative prefetching is
part of the issue, so I assume older architectures without that feature
don't exhibit this behaviour
In general (not talking about ARM in particular), Linux does not support
mapping RAM pages with conflicting cache attributes. E.g. on certain powerpc
CPUs, you get a checkstop if you try to bypass the cache when there is already
an active cache line for it.
This is a variant of the cache aliasing problem we see with virtually indexed
caches: You may end up with multiple cache lines for the same physical
address, with different contents. The results are unpredictable, so most
CPU architectures explicitly forbid this.
...
If we really need all mappings of physical memory to have the same cache
attribute bits, I see three workarounds:
1- set aside memory at boot that never gets mapped by the kernel.  The
unified memory manager can then ensure there's only one mapping at a time.
Obvious drawbacks here are that you have to statically partition your system
into memory you want accessible to the unified memory manager and memory you
don't.  This may not be that big a deal, since most current solutions, pmem,
cmem, et al basically do this.  I can say that on Android devices running on
a high resolution display (720p and above) we're easily talking about
needing 256M of memory or more to dedicate to this.
Right, I believe this is what people generally do to avoid the problem,
but I wouldn't call it a solution.
...
2- use highmem pages only for the unified memory manager.  Highmem pages
only get mapped on demand.
This has some performance costs when the kernel allocates other metadata in
highmem.  Most embedded systems still don't have enough memory to need
highmem, though I'm guessing that'll follow the current trend and shift in
the next couple of years.
We are very close to needing highmem on a lot of systems, and in Linaro
we generally assume that it's there. For instance, Acer has announced
an Android tablet that has a full gigabyte of RAM, so they are most
likely using highmem already.
There is a significant overhead in simply enabling highmem on a system
where you don't need it, but it also makes it possible to use the memory
for page cache that would otherwise be wasted when there is no active
user of the reserved memory.
...
3- fix up the unity mapping so the attribute bits match those desired by the
unified memory manger.  This could be done by removing pages from the unity
map.  It's complicated by the fact that the unity map makes use of large
pages, sections and supersections to reduce tlb pressure.  I don't think
this is impossible if we restrict the set of contexts from which it can
happen, but I'm imagining that we will also need to maintain some kind of
pool of memory we've moved from cached to uncached since the process is
likely to be expensive.  Quite likely we will have to iterate
over processes and update all their top level page tables.
Would it get simpler if we only allow entire supersections to be moved
into the uncached memory allocator?
Arnd

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Linaro-mm-sig] Memory region attribute bits and multiple mappings