From: Mans Rullgard [mailto:mans.rullgard@linaro.org] Sent: Sunday, November 27, 2011 6:26 PM
Hi,
By the way, do you know whether it is safe to use "SCU Speculative linefills" with Cortex-A9 r2pX and PL310 r3pX?
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0407f/BABEBFBH.html
As a quick and dirty test, it can be enabled in 'arch/arm/kernel/smp_scu.c' by just setting extra (1 << 3) bit in SCU Control Register from 'scu_enable' function.
<snip>
Be careful. That chip has PL310 r3p0 so it's affected by erratum 729806, "Speculative reads from the Cortex-A9 MPCore processor can cause deadlock".
For OMAP4's it should be OK as long as other necessary errata work arounds are activated in code today. However, As Mans points out other partner chips might have an issue.
Your benchmark results are interesting. I did have a couple threads with an expert on this point and your result matches. Impact depends on data set size of use case and configured speed of pl310 logic.
It was explained that when 1 processor requests shared data (all Linux-SMP memory is marked with S-bit), the SCU can signal a hint to PL310 to start a parallel L2-tag lookup while SCU-snoop-tag is checked (to see if coherency L1-cache-2-cache transfer needs to happen). In case of a snoop-cache-miss, you are now 2-3 cycles ahead of the game into the PL310 cache-tag lookup. If the data is in other processor you would have burned some power with needless lookup and perhaps (depending on resource load) delayed some valid request.
Regard, Richard W.