Hi,
I'm working with a dual Cortex-A9 system and I've been trying to understand exactly why spinlock functions need to use DMB. It seems that as long as the merging store buffer is flushed the lock value should end up in the L1 on the unlocking core and the SCU should either invalidate or update the value in the L1 of the other core. This is enough to maintain coherency and safe locking right? And doesn't STREX skip the merging store buffer anyway, meaning we don't even need the flush?
DMB appears to be something of a blunt hammer, especially since it defaults to the system domain, which likely means a write all the way to main memory, which can be expensive.
Are the DMBs in the locks there as a workaround for drivers that don't use smp_mb properly?
I'm currently seeing, based on the performance counters, about 5% of my system cycles disappearing in stalls caused by DMB.
-Pete