Hi, Paul and Peter,
I think we find the real root cause, READ_ONCE() doesn't need any barriers, the problematic code is queued_spin_lock_slowpath() in kernel/locking/qspinlock.c:
if (old & _Q_TAIL_MASK) { prev = decode_tail(old);
/* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node);
pv_wait_node(node, prev); arch_mcs_spin_lock_contended(&node->locked);
/* * While waiting for the MCS lock, the next pointer may have * been set by another lock waiter. We optimistically load * the next pointer & prefetch the cacheline for writing * to reduce latency in the upcoming MCS unlock operation. */ next = READ_ONCE(node->next); if (next) prefetchw(next); }
After WRITE_ONCE(prev->next, node); arch_mcs_spin_lock_contended() enter a READ_ONCE() loop, so the effect of WRITE_ONCE() is invisible by other cores because of the write buffer. As a result, arch_mcs_spin_lock_contended() will wait for ever because the waiters of prev->next will wait for ever. I think the right way to fix this is flush SFB after this WRITE_ONCE(), but I don't have a good solution: 1, MIPS has wbflush() which can be used to flush SFB, but other archs don't have; 2, Every arch has mb(), and add mb() after WRITE_ONCE() can actually solve Loongson's problem, but in syntax, mb() is different from wbflush(); 3, Maybe we can define a Loongson-specific WRITE_ONCE(), but not every WRITE_ONCE() need wbflush(), we only need wbflush() between WRITE_ONCE() and a READ_ONCE() loop.
Any ideas?
Huacai
On Tue, Jul 10, 2018 at 12:49 AM, Paul Burton paul.burton@mips.com wrote:
Hi Huacai,
On Mon, Jul 09, 2018 at 10:26:38AM +0800, Huacai Chen wrote:
After commit 7f56b58a92aaf2c ("locking/mcs: Use smp_cond_load_acquire() in MCS spin loop") Loongson-3 fails to boot. This is because Loongson-3 has SFB (Store Fill Buffer) and the weak-ordering may cause READ_ONCE() to get an old value in a tight loop. So in smp_cond_load_acquire() we need a __smp_rmb() before the READ_ONCE() loop.
This patch introduce a Loongson-specific smp_cond_load_acquire(). And it should be backported to as early as linux-4.5, in which release the smp_cond_acquire() is introduced.
There may be other cases where memory barriers is needed, we will fix them one by one.
Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen chenhc@lemote.com
arch/mips/include/asm/barrier.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h index a5eb1bb..e8c4c63 100644 --- a/arch/mips/include/asm/barrier.h +++ b/arch/mips/include/asm/barrier.h @@ -222,6 +222,24 @@ #define __smp_mb__before_atomic() __smp_mb__before_llsc() #define __smp_mb__after_atomic() smp_llsc_mb()
+#ifdef CONFIG_CPU_LOONGSON3 +/* Loongson-3 need a __smp_rmb() before READ_ONCE() loop */ +#define smp_cond_load_acquire(ptr, cond_expr) \ +({ \
typeof(ptr) __PTR = (ptr); \
typeof(*ptr) VAL; \
__smp_rmb(); \
for (;;) { \
VAL = READ_ONCE(*__PTR); \
if (cond_expr) \
break; \
cpu_relax(); \
} \
__smp_rmb(); \
VAL; \
+}) +#endif /* CONFIG_CPU_LOONGSON3 */
The discussion on v1 of this patch [1] seemed to converge on the view that Loongson suffers from the same problem as ARM platforms which enable the CONFIG_ARM_ERRATA_754327 workaround, and that we might require a similar workaround.
Is there a reason you've not done that, and instead tweaked your patch that's specific to the smp_cond_load_acquire() case? I'm not comfortable with fixing just this one case when there could be many similar problematic pieces of code you just haven't hit yet.
Please also keep the LKMM maintainers on copy for this - their feedback will be valuable & I'll be much more comfortable applying a workaround for Loongson's behavior here if it's one that they're OK with.
Thanks, Paul
[1] https://www.linux-mips.org/archives/linux-mips/2018-06/msg00139.html