Hi, Peter,
I'm afraid that you have missing something......
Firstly, our previous conclusion (READ_ONCE need a barrier to avoid 'reads prioritised over writes') is totally wrong. So define cpu_relax() to smp_mb() like ARM11MPCore is incorrect, even if it can 'solve' Loongson's problem. Secondly, I think the real problem is like this: 1, CPU0 set the lock to 0, then do something; 2, While CPU0 is doing something, CPU1 set the flag to 1 with WRITE_ONCE(), and then wait the lock become to 1 with a READ_ONCE() loop; 3, After CPU0 complete its work, it wait the flag become to 1, and if so then set the lock to 1; 4, If the lock becomes to 1, CPU1 will leave the READ_ONCE() loop. If without SFB, everything is OK. But with SFB in step 2, a READ_ONCE() loop is right after WRITE_ONCE(), which makes the flag cached in SFB (so be invisible by other CPUs) for ever, then both CPU0 and CPU1 wait for ever.
I don't think this is a hardware bug, in design, SFB will flushed to L1 cache in three cases: 1, data in SFB is full (be a complete cache line); 2, there is a subsequent read access in the same cache line; 3, a 'sync' instruction is executed.
In this case, there is no other memory access (read or write) between WRITE_ONCE() and READ_ONCE() loop. So Case 1 and Case 2 will not happen, and the only way to make the flag be visible is wbflush (wbflush is sync in Loongson's case).
I think this problem is not only happens on Loongson, but will happen on other CPUs which have write buffer (unless the write buffer has a 4th case to be flushed).
Huacai
------------------ Original ------------------ From: "Peter Zijlstra"peterz@infradead.org; Date: Tue, Jul 10, 2018 06:54 PM To: "Huacai Chen"chenhc@lemote.com; Cc: "Paul Burton"paul.burton@mips.com; "Ralf Baechle"ralf@linux-mips.org; "James Hogan"jhogan@kernel.org; "linux-mips"linux-mips@linux-mips.org; "Fuxin Zhang"zhangfx@lemote.com; "wuzhangjin"wuzhangjin@gmail.com; "stable"stable@vger.kernel.org; "Alan Stern"stern@rowland.harvard.edu; "Andrea Parri"andrea.parri@amarulasolutions.com; "Will Deacon"will.deacon@arm.com; "Boqun Feng"boqun.feng@gmail.com; "Nicholas Piggin"npiggin@gmail.com; "David Howells"dhowells@redhat.com; "Jade Alglave"j.alglave@ucl.ac.uk; "Luc Maranget"luc.maranget@inria.fr; "Paul E. McKenney"paulmck@linux.vnet.ibm.com; "Akira Yokosawa"akiyks@gmail.com; "LKML"linux-kernel@vger.kernel.org; Subject: Re: [PATCH V2] MIPS: implement smp_cond_load_acquire() for Loongson-3
On Tue, Jul 10, 2018 at 11:36:37AM +0200, Peter Zijlstra wrote:
So now explain why the cpu_relax() hack that arm did doesn't work for you?
So below is the patch I think you want; if not explain in detail how this is wrong.
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index af34afbc32d9..e59773de6528 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
+#ifdef CONFIG_CPU_LOONGSON3 +/* + * Loongson-3 has a CPU bug where the store buffer gets starved when stuck in a + * read loop. Since spin loops of any kind should have a cpu_relax() in them, + * force a store-buffer flush from cpu_relax() such that any pending writes + * will become available as expected. + */ +#define cpu_relax() smp_mb() +#else #define cpu_relax() barrier() +#endif
/* * Return_address is a replacement for __builtin_return_address(count)