[Linaro-mm-sig] Memory region attribute bits and multiple mappings
Rebecca Schultz Zavin
rebecca at android.com
Tue Apr 19 21:37:48 UTC 2011
On Tue, Apr 19, 2011 at 2:23 PM, Arnd Bergmann <arnd at arndb.de> wrote:
> On Tuesday 19 April 2011 22:06:50 Rebecca Schultz Zavin wrote:
> > Hey all,
> > While we are working out requirements, I was hoping to get some more
> > information about another related issue that keeps coming up on mailing
> > lists and in discussions.
> Thanks for the summary and getting this started!
> > ARM has stated that if you have the same physical memory mapped with two
> > different sets of attribute bits you get undefined behavior. I think
> > going to be a requirement that some of the memory allocated via the
> > memory manager is mapped uncached.
> This may be a stupid question, but do we have an agreement that it
> is actually a requirement to have uncached mappings? With the
> streaming DMA mapping API, it should be possible to work around
> noncoherent DMA by flushing the caches at the right times, which
> probably results in better performance than simply doing noncached
> mappings. What is the specific requirement for noncached memory
That was my original plan, but our graphics folks and those at our partner
companies basically have me convinced that the common case is for userspace
to stream data into memory, say copying an image into a texture, and never
read from it or touch it again. The alternative will mean a lot of cache
flushes for small memory regions, in and of itself this becomes a
performance problem. I think we want to optimize for this case, rather than
the much less likely case of read-modify-write to these buffers.
> > However, because all of memory is mapped
> > cached into the unity map at boot, we already have two mappings with
> > different attributes. I want to understand the mechanism of the problem,
> > because none of the solutions I can come up with are particularly nice.
> > also like to know exactly which architectures are affected, since the fix
> > may be costly in performance, memory or both. Can someone at ARM explain
> > me why this causes a problem. I have a theory, but it's mostly a guess.
> > especially want to understand if it's still a problem if we never access
> > memory via the mapping in the unity map. I know speculative prefetching
> > part of the issue, so I assume older architectures without that feature
> > don't exhibit this behaviour
> In general (not talking about ARM in particular), Linux does not support
> mapping RAM pages with conflicting cache attributes. E.g. on certain
> CPUs, you get a checkstop if you try to bypass the cache when there is
> an active cache line for it.
> This is a variant of the cache aliasing problem we see with virtually
> caches: You may end up with multiple cache lines for the same physical
> address, with different contents. The results are unpredictable, so most
> CPU architectures explicitly forbid this.
I think the extra wrinkle here is the presence of the unity mapping as
cached, even if you never access it, causes a problem. I totally understand
why you wouldn't want to access mappings with different attributes, but just
having them hang around seems like it shouldn't in general be a problem.
How does powerpc handle it when you need an uncached page for dma?
> > If we really need all mappings of physical memory to have the same cache
> > attribute bits, I see three workarounds:
> > 1- set aside memory at boot that never gets mapped by the kernel. The
> > unified memory manager can then ensure there's only one mapping at a
> > Obvious drawbacks here are that you have to statically partition your
> > into memory you want accessible to the unified memory manager and memory
> > don't. This may not be that big a deal, since most current solutions,
> > cmem, et al basically do this. I can say that on Android devices running
> > a high resolution display (720p and above) we're easily talking about
> > needing 256M of memory or more to dedicate to this.
> Right, I believe this is what people generally do to avoid the problem,
> but I wouldn't call it a solution.
> > 2- use highmem pages only for the unified memory manager. Highmem pages
> > only get mapped on demand.
> > This has some performance costs when the kernel allocates other metadata
> > highmem. Most embedded systems still don't have enough memory to need
> > highmem, though I'm guessing that'll follow the current trend and shift
> > the next couple of years.
> We are very close to needing highmem on a lot of systems, and in Linaro
> we generally assume that it's there. For instance, Acer has announced
> an Android tablet that has a full gigabyte of RAM, so they are most
> likely using highmem already.
> There is a significant overhead in simply enabling highmem on a system
> where you don't need it, but it also makes it possible to use the memory
> for page cache that would otherwise be wasted when there is no active
> user of the reserved memory.
> > 3- fix up the unity mapping so the attribute bits match those desired by
> > unified memory manger. This could be done by removing pages from the
> > map. It's complicated by the fact that the unity map makes use of large
> > pages, sections and supersections to reduce tlb pressure. I don't think
> > this is impossible if we restrict the set of contexts from which it can
> > happen, but I'm imagining that we will also need to maintain some kind of
> > pool of memory we've moved from cached to uncached since the process is
> > likely to be expensive. Quite likely we will have to iterate
> > over processes and update all their top level page tables.
> Would it get simpler if we only allow entire supersections to be moved
> into the uncached memory allocator?
I've thought about it. It adds the requirement that we need to be able to
make a relatively high order, 16M (supsersection sized) allocation at
runtime; one of the problems many of these memory managers were originally
introduced to solve. Even if we could solve that cleanly with something
like compaction, I'm not sure just modifying those attribute bits would be
that much easier than rewriting the page table for that section. Either way
we would want some way to put the cache attributes back under memory
pressure or we're back to solution 1. I have some idea how to modify the
ARM page tables in hardware to do this, but figuring out how that connects
with the page tables in linux to do this safely at runtime is where things
will get fun.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Linaro-mm-sig