Re: [Linaro-mm-sig] Memory region attribute bits and multiple mappings

19 Apr 2011


      On Tue, 19 Apr 2011, Rebecca Schultz Zavin wrote:
...
Hey all,
While we are working out requirements, I was hoping to get some more
information about another related issue that keeps coming up on mailing
lists and in discussions.
ARM has stated that if you have the same physical memory mapped with two
different sets of attribute bits you get undefined behavior.  I think it's
going to be a requirement that some of the memory allocated via the unified
memory manager is mapped uncached.  However, because all of memory is mapped
cached into the unity map at boot, we already have two mappings with
different attributes.  I want to understand the mechanism of the problem,
because none of the solutions I can come up with are particularly nice.  I'd
also like to know exactly which architectures are affected, since the fix
may be costly in performance, memory or both.  Can someone at ARM explain to
me why this causes a problem.  I have a theory, but it's mostly a guess.
My own guess is that the cacheable attribute is tied to cache entries 
which are physically tagged.  Access to one mapping could establish some 
attributes that the second mapping would inherit on cache hit.
...
I especially want to understand if it's still a problem if we never 
access the memory via the mapping in the unity map.  I know 
speculative prefetching is part of the issue, so I assume older 
architectures without that feature don't exhibit this behaviour
Yes, speculative prefetching is what will cause spurious accesses 
through the 
kernel direct mapping (that's how it is called in Linux) even if you 
don't access it explicitly.  Older architectures don't have speculative 
prefetching, and even older ones have VIVT caches which has no problem 
with multiple different mappings.
...
If we really need all mappings of physical memory to have the same cache
attribute bits, I see three workarounds:
1- set aside memory at boot that never gets mapped by the kernel.  The
unified memory manager can then ensure there's only one mapping at a time.
Obvious drawbacks here are that you have to statically partition your system
into memory you want accessible to the unified memory manager and memory you
don't.  This may not be that big a deal, since most current solutions, pmem,
cmem, et al basically do this.  I can say that on Android devices running on
a high resolution display (720p and above) we're easily talking about
needing 256M of memory or more to dedicate to this.
This is obviously suboptimal.
...
2- use highmem pages only for the unified memory manager.  Highmem pages
only get mapped on demand.
This has some performance costs when the kernel allocates other metadata in
highmem.  Most embedded systems still don't have enough memory to need
highmem, though I'm guessing that'll follow the current trend and shift in
the next couple of years.
The kernel tries not to allocate its own data in highmem.  Instead, 
highmem pages are used for user space processes or the buffer cache 
which can be populated directly by DMA and be largely untouched by the 
kernel.  The highmem pages are also fairly easily reclaimable making 
them an easy target when large physically contiguous allocations are 
required.
It is true that most systems might not have enough memory to require 
highmem, but they can make use of it nevertheless, simply by changing 
the direct mapped memory threshold.
While highmem is not free in terms of overhead, it is still quite 
lightweight compared to other memory partitioning schemes, and above all 
it is already supported across the whole kernel and relied upon by many 
people already.
...
3- fix up the unity mapping so the attribute bits match those desired by the
unified memory manger.  This could be done by removing pages from the unity
map.  It's complicated by the fact that the unity map makes use of large
pages, sections and supersections to reduce tlb pressure.  I don't think
this is impossible if we restrict the set of contexts from which it can
happen, but I'm imagining that we will also need to maintain some kind of
pool of memory we've moved from cached to uncached since the process is
likely to be expensive.  Quite likely we will have to iterate
over processes and update all their top level page tables.
The kernel direct mapping share the same mapping entries across all 
processes.  So if (part of) the kernel direct mapping uses second level 
page table entries, then the first level entries will share the same 
second level page table across all processes.  Hence changing memory 
attributes for those pages covered by that second level table won't 
require any itteration over all processes.  Obviously the drawback here 
is more TLB pressure, however if the memory put aside is not used by the 
kernel directly then the associated TLBs won't be involved.
...
These all have drawbacks, so I'd like to really understand the problem
before pursuing them.  Can the linaro folks find someone who can explain the
problem in more detail?
I'm happy to discuss about the details when needed.
Nicolas

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Linaro-mm-sig] Memory region attribute bits and multiple mappings