On Tuesday 19 April 2011 23:37:48 Rebecca Schultz Zavin wrote:
On Tue, Apr 19, 2011 at 2:23 PM, Arnd Bergmann arnd@arndb.de wrote:
This may be a stupid question, but do we have an agreement that it is actually a requirement to have uncached mappings? With the streaming DMA mapping API, it should be possible to work around noncoherent DMA by flushing the caches at the right times, which probably results in better performance than simply doing noncached mappings. What is the specific requirement for noncached memory regions?
That was my original plan, but our graphics folks and those at our partner companies basically have me convinced that the common case is for userspace to stream data into memory, say copying an image into a texture, and never read from it or touch it again. The alternative will mean a lot of cache flushes for small memory regions, in and of itself this becomes a performance problem. I think we want to optimize for this case, rather than the much less likely case of read-modify-write to these buffers.
I find it hard to believe that flushing cache lines for the entire buffer once you are done writing is more expensive than doing uncached accesses all the time. That would mean that the CPU is either really good at doing uncached writes or that the cache management operations are really bad. Has anyone actually measured this?
This is a variant of the cache aliasing problem we see with virtually indexed caches: You may end up with multiple cache lines for the same physical address, with different contents. The results are unpredictable, so most CPU architectures explicitly forbid this.
I think the extra wrinkle here is the presence of the unity mapping as cached, even if you never access it, causes a problem. I totally understand why you wouldn't want to access mappings with different attributes, but just having them hang around seems like it shouldn't in general be a problem. How does powerpc handle it when you need an uncached page for dma?
You don't need uncached pages for DMA, Linux only supports systems that are coherent on powerpc, which has pretty much solved the problem by forcing hardware designers to do it the easy way, rather than requiring the software add extra overhead to work around it.
Arnd