Hi,
My question concerns this patch -------- commit 2ffe2da3e71652d4f4cae19539b5c78c2a239136 Author: Russell King rmk+kernel@arm.linux.org.uk Date: Sat Oct 31 16:52:16 2009 +0000
ARMv6 and ARMv7 CPUs can perform speculative prefetching, which makes DMA cache coherency handling slightly more interesting. Rather than being able to rely upon the CPU not accessing the DMA buffer until DMA has completed, we now must expect that the cache could be loaded with possibly stale data from the DMA buffer.
Where DMA involves data being transferred to the device, we clean the cache before handing it over for DMA, otherwise we invalidate the buffer to get rid of potential writebacks. On DMA Completion, if data was transferred from the device, we invalidate the buffer to get rid of any stale speculative prefetches.
Signed-off-by: Russell King rmk+kernel@arm.linux.org.uk Tested-By: Santosh Shilimkar santosh.shilimkar@ti.com ---------
file: arch/arm/mm/dma-mapping.c
void ___dma_page_cpu_to_dev(struct page *page, unsigned long off, size_t size, enum dma_data_direction dir) { ... if (dir == DMA_FROM_DEVICE) { outer_inv_range(paddr, paddr + size); ... } EXPORT_SYMBOL(___dma_page_cpu_to_dev);
void ___dma_page_dev_to_cpu(struct page *page, unsigned long off, size_t size, enum dma_data_direction dir) { ... if (dir != DMA_TO_DEVICE) outer_inv_range(paddr, paddr + size); ... }
outer_inv_range () is called twice for DMA_FROM_DEVICE. The first time to "get rid of potential writebacks" and the second time to "get rid of any stale speculative prefetches" outer_inv_range() is a rather expensive operation. In the first case isn't it enough to just call cache_sync()?
What about: void ___dma_page_cpu_to_dev(struct page *page, unsigned long off, size_t size, enum dma_data_direction dir) { ... if (dir == DMA_FROM_DEVICE) { - outer_inv_range(paddr, paddr + size); + outer_sync(); ... }
/Per
On Tue, Feb 15, 2011 at 02:14:55PM +0100, Per Forlin wrote:
outer_inv_range () is called twice for DMA_FROM_DEVICE. The first time to "get rid of potential writebacks" and the second time to "get rid of any stale speculative prefetches"
Correct.
outer_inv_range() is a rather expensive operation. In the first case isn't it enough to just call cache_sync()?
No. If the CPU speculatively fetches data from the DMA buffer after it's been mapped for DMA, it will bring data into the L2 cache. This data may or may not be up to date with the DMA buffer contents once DMA has completed.
As there is no way to know, we have to invalidate the L2 cache (and the L1 cache) after the DMA has completed to avoid any possibility of data corruption.
On Tue, Feb 15, 2011 at 01:32:32PM +0000, Russell King - ARM Linux wrote:
On Tue, Feb 15, 2011 at 02:14:55PM +0100, Per Forlin wrote:
outer_inv_range () is called twice for DMA_FROM_DEVICE. The first time to "get rid of potential writebacks" and the second time to "get rid of any stale speculative prefetches"
Correct.
outer_inv_range() is a rather expensive operation. In the first case isn't it enough to just call cache_sync()?
No. If the CPU speculatively fetches data from the DMA buffer after it's been mapped for DMA, it will bring data into the L2 cache. This data may or may not be up to date with the DMA buffer contents once DMA has completed.
As there is no way to know, we have to invalidate the L2 cache (and the L1 cache) after the DMA has completed to avoid any possibility of data corruption.
I should add: the solution to all of this is to have cache coherent DMA.
As the CPUs become more complex and start playing tricks like speculative prefetching, we have seen cache maintainence for DMA becomes more expensive. The only way to reduce the cost of that is to have cache coherency for DMA.
There is no way to safely avoid the double-invalidate for DMA_FROM_DEVICE.
On 15 February 2011 14:41, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Tue, Feb 15, 2011 at 01:32:32PM +0000, Russell King - ARM Linux wrote:
On Tue, Feb 15, 2011 at 02:14:55PM +0100, Per Forlin wrote:
outer_inv_range () is called twice for DMA_FROM_DEVICE. The first time to "get rid of potential writebacks" and the second time to "get rid of any stale speculative prefetches"
Correct.
outer_inv_range() is a rather expensive operation. In the first case isn't it enough to just call cache_sync()?
No. If the CPU speculatively fetches data from the DMA buffer after it's been mapped for DMA, it will bring data into the L2 cache. This data may or may not be up to date with the DMA buffer contents once DMA has completed.
As there is no way to know, we have to invalidate the L2 cache (and the L1 cache) after the DMA has completed to avoid any possibility of data corruption.
I should add: the solution to all of this is to have cache coherent DMA.
As the CPUs become more complex and start playing tricks like speculative prefetching, we have seen cache maintainence for DMA becomes more expensive. The only way to reduce the cost of that is to have cache coherency for DMA.
There is no way to safely avoid the double-invalidate for DMA_FROM_DEVICE.
I don't fully understand this yet. I think you are right but I need a little help to get there myself. I agree, the cache (L1 and L2) must be invalidated after the DMA has completed. Before starting the DMA the write buffers must be drained (cache_sync).
Why invalidate the cache before starting the DMA?
The user shouldn't care about the cache until the DMA has finished and the cache is invalidated. I don't unsderstand how the DMA transfered data can be corrupt from a CPU perspective if the cache is invalidated after DMA transfer is done, but _not_ before DMA is started?
Thanks for your fast response /Per
On Tue, Feb 15, 2011 at 02:54:21PM +0100, Per Forlin wrote:
I don't fully understand this yet. I think you are right but I need a little help to get there myself. I agree, the cache (L1 and L2) must be invalidated after the DMA has completed. Before starting the DMA the write buffers must be drained (cache_sync).
Why invalidate the cache before starting the DMA?
Think about what happens if you have dirty cache lines in the DMA region. These can be evicted when other cache lines are loaded, which will result in them overwriting contents of memory. If the DMA device has already written to that memory, the result is data corruption.
So, the invalidate prior to DMA is to get rid of any dirty cache lines which could be written back to memory.
On 15 February 2011 15:12, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Tue, Feb 15, 2011 at 02:54:21PM +0100, Per Forlin wrote:
I don't fully understand this yet. I think you are right but I need a little help to get there myself. I agree, the cache (L1 and L2) must be invalidated after the DMA has completed. Before starting the DMA the write buffers must be drained (cache_sync).
Why invalidate the cache before starting the DMA?
Think about what happens if you have dirty cache lines in the DMA region. These can be evicted when other cache lines are loaded, which will result in them overwriting contents of memory. If the DMA device has already written to that memory, the result is data corruption.
So, the invalidate prior to DMA is to get rid of any dirty cache lines which could be written back to memory.
True. In my case write back is disable but read back is enable, that's why I didn't consider it. Do you think it is feasible to let dma-mapping detect the cache configurations in runtime in order to prevent "unnecessary" cache operations? I can see some FIXME comments in the code which indicates you may have plans to change it.
I can try to think of a proposal if you agree.
Thanks again, /Per
On Tue, Feb 15, 2011 at 04:11:09PM +0100, Per Forlin wrote:
True. In my case write back is disable but read back is enable, that's why I didn't consider it.
What do you mean "write back is disable" ? You can only prevent writebacks by using a write-through cache.
Any writeback cache, whether in read-allocate or read-write-allocate mode will have writebacks.
Do you think it is feasible to let dma-mapping detect the cache configurations in runtime in order to prevent "unnecessary" cache operations?
Let's first straighten out your cache understanding above, because I don't think you understand properly yet.
On 15 February 2011 16:46, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Tue, Feb 15, 2011 at 04:11:09PM +0100, Per Forlin wrote:
True. In my case write back is disable but read back is enable, that's why I didn't consider it.
What do you mean "write back is disable" ? You can only prevent writebacks by using a write-through cache.
I meant "no write allocate"
Any writeback cache, whether in read-allocate or read-write-allocate mode will have writebacks.
Yepp, this is where I jumped to conclusions. I thought there would be no writebacks if using "no write allocate". Now that I look at the cache documentation again it feels so obvious.
Thanks for explaining this to me, Per