On Wednesday 20 April 2011, Marek Szyprowski wrote:
Hello,
On Tuesday, April 19, 2011 11:23 PM Arnd Bergmann wrote:
This may be a stupid question, but do we have an agreement that it is actually a requirement to have uncached mappings? With the streaming DMA mapping API, it should be possible to work around noncoherent DMA by flushing the caches at the right times, which probably results in better performance than simply doing noncached mappings. What is the specific requirement for noncached memory regions?
Flushing cache for large buffers also takes a significant time, especially if it is implemented by iterating over the whole buffer and calling flush instruction for each line.
For most use cases the CPU write speed is not degraded on non-caches memory areas. ARM CPUs with write combining feature performs really well on uncached memory.
Ok, makes sense.
Non-cached buffers are also the only solution for buffers that need to be permanently mapped to userspace (like framebuffer).
Why? Are the cache flush operations privileged on ARM?
Non-cached mappings are also useful when one doesn't touch the memory with cpu at all (zero copy between 2 independent multimedia blocks).
I would think that if we don't want to touch the data, ideally it should not be mapped at all into the kernel address space.
Right, I believe this is what people generally do to avoid the problem, but I wouldn't call it a solution.
Is is also a huge memory waste. Similar solutions have been proposed to overcome the problem of memory fragmentation. I don't think we can afford giving away almost half of the system memory just to have the possibility of processing 720p movie.
Yes, that was my point. It's not a solution at all, just the easiest way to ignore the problem by creating a different one.
That's why we came with the idea of CMA (contiguous memory allocator) which can 'recycle' memory areas that are not used by multimedia hardware. CMA allows system to allocate movable pages (like page cache, user process memory, etc) from defined CMA range and migrate them on allocation request for contiguous memory. For more information, please refer to: https://lkml.org/lkml/2011/3/31/213
I thought CMA was mostly about dealing with systems that don't have an IOMMU, which is a related problem, but is not the same as dealing with noncoherent DMA.
I want to merge this idea with changing changing the kernel linear low-mem mapping, so 2-level page mapping will be done only for the defined CMA range, what should reduce TLB pressure. Once the contiguous block is allocated from CMA range, the mapping in low-mem area can be removed to fulfill the ARM specification.
I'm not convinced that trying to solve both issues at the same time is a good idea. For systems that have an IOMMU, we just want a bunch of pages and map them virtually contiguous into the bus address space. For other systems, we really need physically contiguous memory. Independent of that, you may or may not need to unmap them from the linear mapping and make them noncached, depending on whether there is prefetching and noncoherent DMA involved.
Arnd