On 31 October 2012 02:25, Peter Fordham peter.fordham@gmail.com wrote:
I'm working with a dual Cortex-A9 system and I've been trying to understand exactly why spinlock functions need to use DMB. It seems that as long as the merging store buffer is flushed the lock value should end up in the L1 on the unlocking core and the SCU should either invalidate or update the value in the L1 of the other core. This is enough to maintain coherency and safe locking right? And doesn't STREX skip the merging store buffer anyway, meaning we don't even need the flush?
DMB appears to be something of a blunt hammer, especially since it defaults to the system domain, which likely means a write all the way to main memory, which can be expensive.
Are the DMBs in the locks there as a workaround for drivers that don't use smp_mb properly?
I'm currently seeing, based on the performance counters, about 5% of my system cycles disappearing in stalls caused by DMB.
Hi Peter,
Following is taken from Documentation/memory-barriers.txt:
(5) LOCK operations.
This acts as a one-way permeable barrier. It guarantees that all memory operations after the LOCK operation will appear to happen after the LOCK operation with respect to the other components of the system.
Memory operations that occur before a LOCK operation may appear to happen after it completes.
(6) UNLOCK operations.
This also acts as a one-way permeable barrier. It guarantees that all memory operations before the UNLOCK operation will appear to happen before the UNLOCK operation with respect to the other components of the system.
Memory operations that occur after an UNLOCK operation may appear to happen before it completes.
Because ARMv6 and above are weakly ordered, we need to guarantee that the code after the lock must execute after the lock is taken and so a barrier there.
Also in unlock we must guarantee that code before the unlock must execute before the unlocking is done. So a smp_mb() at the beginning.
Adding Russell and Arnd in cc to correct my statements :)
-- viresh