Re: [PATCH V2] MIPS: implement smp_cond_load_acquire() for Loongson-3

10 Jul 2018

Hi, Paul and Peter,
I think we find the real root cause, READ_ONCE() doesn't need any
barriers, the problematic code is queued_spin_lock_slowpath() in
kernel/locking/qspinlock.c:
if (old & _Q_TAIL_MASK) {
                prev = decode_tail(old);
/* Link @node into the waitqueue. */
                WRITE_ONCE(prev->next, node);
pv_wait_node(node, prev);
                arch_mcs_spin_lock_contended(&node->locked);
/*
                 * While waiting for the MCS lock, the next pointer may have
                 * been set by another lock waiter. We optimistically load
                 * the next pointer & prefetch the cacheline for writing
                 * to reduce latency in the upcoming MCS unlock operation.
                 */
                next = READ_ONCE(node->next);
                if (next)
                        prefetchw(next);
        }
After WRITE_ONCE(prev->next, node); arch_mcs_spin_lock_contended()
enter a READ_ONCE() loop, so the effect of WRITE_ONCE() is invisible
by other cores because of the write buffer. As a result,
arch_mcs_spin_lock_contended() will wait for ever because the waiters
of prev->next will wait for ever. I think the right way to fix this is
flush SFB after this WRITE_ONCE(), but I don't have a good solution:
1, MIPS has wbflush() which can be used to flush SFB, but other archs
don't have;
2, Every arch has mb(), and add mb() after WRITE_ONCE() can actually
solve Loongson's problem, but in syntax, mb() is different from
wbflush();
3, Maybe we can define a Loongson-specific WRITE_ONCE(), but not every
WRITE_ONCE() need wbflush(), we only need wbflush() between
WRITE_ONCE() and a READ_ONCE() loop.
Any ideas?
Huacai
On Tue, Jul 10, 2018 at 12:49 AM, Paul Burton paul.burton@mips.com wrote:
...
Hi Huacai,
On Mon, Jul 09, 2018 at 10:26:38AM +0800, Huacai Chen wrote:
...
After commit 7f56b58a92aaf2c ("locking/mcs: Use smp_cond_load_acquire()
in MCS spin loop") Loongson-3 fails to boot. This is because Loongson-3
has SFB (Store Fill Buffer) and the weak-ordering may cause READ_ONCE()
to get an old value in a tight loop. So in smp_cond_load_acquire() we
need a __smp_rmb() before the READ_ONCE() loop.
This patch introduce a Loongson-specific smp_cond_load_acquire(). And
it should be backported to as early as linux-4.5, in which release the
smp_cond_acquire() is introduced.
There may be other cases where memory barriers is needed, we will fix
them one by one.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen chenhc@lemote.com

arch/mips/include/asm/barrier.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index a5eb1bb..e8c4c63 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -222,6 +222,24 @@
 #define __smp_mb__before_atomic()    __smp_mb__before_llsc()
 #define __smp_mb__after_atomic()     smp_llsc_mb()
+#ifdef CONFIG_CPU_LOONGSON3
+/* Loongson-3 need a __smp_rmb() before READ_ONCE() loop */
+#define smp_cond_load_acquire(ptr, cond_expr)                        \
+({                                                           \

typeof(ptr) __PTR = (ptr);                              \


typeof(*ptr) VAL;                                       \


__smp_rmb();                                            \


for (;;) {                                              \


        VAL = READ_ONCE(*__PTR);                        \


        if (cond_expr)                                  \


                break;                                  \


        cpu_relax();                                    \


}                                                       \


__smp_rmb();                                            \


VAL;                                                    \



+})
+#endif       /* CONFIG_CPU_LOONGSON3 */
The discussion on v1 of this patch [1] seemed to converge on the view
that Loongson suffers from the same problem as ARM platforms which
enable the CONFIG_ARM_ERRATA_754327 workaround, and that we might
require a similar workaround.
Is there a reason you've not done that, and instead tweaked your patch
that's specific to the smp_cond_load_acquire() case? I'm not comfortable
with fixing just this one case when there could be many similar
problematic pieces of code you just haven't hit yet.
Please also keep the LKMM maintainers on copy for this - their feedback
will be valuable & I'll be much more comfortable applying a workaround
for Loongson's behavior here if it's one that they're OK with.
Thanks,
    Paul
[1] https://www.linux-mips.org/archives/linux-mips/2018-06/msg00139.html

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH V2] MIPS: implement smp_cond_load_acquire() for Loongson-3