Rebecca

On Tue, Apr 19, 2011 at 1:25 PM, Gross, Andy <andy.gross@ti.com> wrote:

Rebecca,

We have some of the same issues with our Tiler implementation. We want to use uncached allocations to simplify coherency concerns and performance issues with flush/invalidate. However, as you stated, there really is not a solution out there that gives us what we need without serious drawbacks.

Looking around at the other architectures, I noticed that the ia64 arch has a uncached allocator where they convert pages from cached to uncached. If we had something similar it would keep us from having to resort to option 1 or 2 below. The ia64 uncached allocator is located in arch/ia64/kernel/uncached.c.

Regards,

Andy

On Tue, Apr 19, 2011 at 3:06 PM, Rebecca Schultz Zavin <rebecca@android.com> wrote:

Hey all,

While we are working out requirements, I was hoping to get some more information about another related issue that keeps coming up on mailing lists and in discussions.

ARM has stated that if you have the same physical memory mapped with two different sets of attribute bits you get undefined behavior. I think it's going to be a requirement that some of the memory allocated via the unified memory manager is mapped uncached. However, because all of memory is mapped cached into the unity map at boot, we already have two mappings with different attributes. I want to understand the mechanism of the problem, because none of the solutions I can come up with are particularly nice. I'd also like to know exactly which architectures are affected, since the fix may be costly in performance, memory or both. Can someone at ARM explain to me why this causes a problem. I have a theory, but it's mostly a guess. I especially want to understand if it's still a problem if we never access the memory via the mapping in the unity map. I know speculative prefetching is part of the issue, so I assume older architectures without that feature don't exhibit this behaviour

If we really need all mappings of physical memory to have the same cache attribute bits, I see three workarounds:

1- set aside memory at boot that never gets mapped by the kernel. The unified memory manager can then ensure there's only one mapping at a time.

Obvious drawbacks here are that you have to statically partition your system into memory you want accessible to the unified memory manager and memory you don't. This may not be that big a deal, since most current solutions, pmem, cmem, et al basically do this. I can say that on Android devices running on a high resolution display (720p and above) we're easily talking about needing 256M of memory or more to dedicate to this.

2- use highmem pages only for the unified memory manager. Highmem pages only get mapped on demand.
This has some performance costs when the kernel allocates other metadata in highmem. Most embedded systems still don't have enough memory to need highmem, though I'm guessing that'll follow the current trend and shift in the next couple of years.

3- fix up the unity mapping so the attribute bits match those desired by the unified memory manger. This could be done by removing pages from the unity map. It's complicated by the fact that the unity map makes use of large pages, sections and supersections to reduce tlb pressure. I don't think this is impossible if we restrict the set of contexts from which it can happen, but I'm imagining that we will also need to maintain some kind of pool of memory we've moved from cached to uncached since the process is likely to be expensive. Quite likely we will have to iterate over processes and update all their top level page tables.

These all have drawbacks, so I'd like to really understand the problem before pursuing them. Can the linaro folks find someone who can explain the problem in more detail?

Thanks,
Rebecca

_______________________________________________
Linaro-mm-sig mailing list
Linaro-mm-sig@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-mm-sig