On Wed, Jun 20, 2018 at 09:17:16AM +0100, Will Deacon wrote:
On Wed, Jun 20, 2018 at 11:31:55AM +0800, 陈华才 wrote:
Loongson-3's Store Fill Buffer is nearly the same as your "Store Buffer", and it increases the memory ordering weakness. So, smp_cond_load_acquire() only need a __smp_mb() before the loop, not after every READ_ONCE(). In other word, the following code is just OK:
#define smp_cond_load_acquire(ptr, cond_expr) \ ({ \ typeof(ptr) __PTR = (ptr); \ typeof(*ptr) VAL; \ __smp_mb(); \ for (;;) { \ VAL = READ_ONCE(*__PTR); \ if (cond_expr) \ break; \ cpu_relax(); \ } \ __smp_mb(); \ VAL; \ })
the __smp_mb() before loop is used to avoid "reads prioritised over writes", which is caused by SFB's weak ordering and similar to ARM11MPCore (mentioned by Will Deacon).
Sure, but smp_cond_load_acquire() isn't the only place you'll see this sort of pattern in the kernel. In other places, the only existing arch hook is cpu_relax(), so unless you want to audit all loops and add a special MIPs-specific smp_mb() to those that are affected, I think your only option is to stick it in cpu_relax().
I assume you don't have a control register that can disable this prioritisation in the SFB?
Right, I think we also want to clarify that this 'feature' is not supported by the Linux kernel in general and LKMM in specific.
It really is a CPU bug. And the cpu_relax() change is a best effort work-around.