Linux expects that if a CPU modifies a memory location, then that modification will eventually become visible to other CPUs in the system.
On Loongson-3 processor with SFB (Store Fill Buffer), loads may be prioritised over stores so it is possible for a store operation to be postponed if a polling loop immediately follows it. If the variable being polled indirectly depends on the outstanding store [for example, another CPU may be polling the variable that is pending modification] then there is the potential for deadlock if interrupts are disabled. This deadlock occurs in qspinlock code.
This patch changes the definition of cpu_relax() to smp_mb() for Loongson-3, forcing a flushing of the SFB on SMP systems before the next load takes place. If the Kernel is not compiled for SMP support, this will expand to a barrier() as before.
References: 534be1d5a2da940 (ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore) Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen chenhc@lemote.com --- arch/mips/include/asm/processor.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index af34afb..a8c4a3a 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
+#ifdef CONFIG_CPU_LOONGSON3 +/* + * Loongson-3's SFB (Store-Fill-Buffer) may get starved when stuck in a read + * loop. Since spin loops of any kind should have a cpu_relax() in them, force + * a Store-Fill-Buffer flush from cpu_relax() such that any pending writes will + * become available as expected. + */ +#define cpu_relax() smp_mb() +#else #define cpu_relax() barrier() +#endif
/* * Return_address is a replacement for __builtin_return_address(count)
Hi Huacai,
On Fri, Jul 13, 2018 at 03:37:57PM +0800, Huacai Chen wrote:
Linux expects that if a CPU modifies a memory location, then that modification will eventually become visible to other CPUs in the system.
On Loongson-3 processor with SFB (Store Fill Buffer), loads may be prioritised over stores so it is possible for a store operation to be postponed if a polling loop immediately follows it. If the variable being polled indirectly depends on the outstanding store [for example, another CPU may be polling the variable that is pending modification] then there is the potential for deadlock if interrupts are disabled. This deadlock occurs in qspinlock code.
This patch changes the definition of cpu_relax() to smp_mb() for Loongson-3, forcing a flushing of the SFB on SMP systems before the next load takes place. If the Kernel is not compiled for SMP support, this will expand to a barrier() as before.
References: 534be1d5a2da940 (ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore) Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen chenhc@lemote.com
arch/mips/include/asm/processor.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index af34afb..a8c4a3a 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status) +#ifdef CONFIG_CPU_LOONGSON3 +/*
- Loongson-3's SFB (Store-Fill-Buffer) may get starved when stuck in a read
- loop. Since spin loops of any kind should have a cpu_relax() in them, force
- a Store-Fill-Buffer flush from cpu_relax() such that any pending writes will
- become available as expected.
- */
I think "may starve writes" or "may queue writes indefinitely" would be clearer than "may get starved".
+#define cpu_relax() smp_mb() +#else #define cpu_relax() barrier() +#endif /*
- Return_address is a replacement for __builtin_return_address(count)
-- 2.7.0
Apart from the comment above though this looks better to me.
Re-copying the LKMM maintainers - are you happy(ish) with this?
Thanks, Paul
On Tue, Jul 17, 2018 at 10:52:32AM -0700, Paul Burton wrote:
Hi Huacai,
On Fri, Jul 13, 2018 at 03:37:57PM +0800, Huacai Chen wrote:
Linux expects that if a CPU modifies a memory location, then that modification will eventually become visible to other CPUs in the system.
On Loongson-3 processor with SFB (Store Fill Buffer), loads may be prioritised over stores so it is possible for a store operation to be postponed if a polling loop immediately follows it. If the variable being polled indirectly depends on the outstanding store [for example, another CPU may be polling the variable that is pending modification] then there is the potential for deadlock if interrupts are disabled. This deadlock occurs in qspinlock code.
This patch changes the definition of cpu_relax() to smp_mb() for Loongson-3, forcing a flushing of the SFB on SMP systems before the next load takes place. If the Kernel is not compiled for SMP support, this will expand to a barrier() as before.
References: 534be1d5a2da940 (ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore) Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen chenhc@lemote.com
arch/mips/include/asm/processor.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index af34afb..a8c4a3a 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status) +#ifdef CONFIG_CPU_LOONGSON3 +/*
- Loongson-3's SFB (Store-Fill-Buffer) may get starved when stuck in a read
- loop. Since spin loops of any kind should have a cpu_relax() in them, force
- a Store-Fill-Buffer flush from cpu_relax() such that any pending writes will
- become available as expected.
- */
I think "may starve writes" or "may queue writes indefinitely" would be clearer than "may get starved".
+#define cpu_relax() smp_mb() +#else #define cpu_relax() barrier() +#endif /*
- Return_address is a replacement for __builtin_return_address(count)
-- 2.7.0
Apart from the comment above though this looks better to me.
Re-copying the LKMM maintainers - are you happy(ish) with this?
This looks much better to me.
Thanx, Paul
On Tue, Jul 17, 2018 at 10:52:32AM -0700, Paul Burton wrote:
On Fri, Jul 13, 2018 at 03:37:57PM +0800, Huacai Chen wrote:
Linux expects that if a CPU modifies a memory location, then that modification will eventually become visible to other CPUs in the system.
On Loongson-3 processor with SFB (Store Fill Buffer), loads may be prioritised over stores so it is possible for a store operation to be postponed if a polling loop immediately follows it. If the variable being polled indirectly depends on the outstanding store [for example, another CPU may be polling the variable that is pending modification] then there is the potential for deadlock if interrupts are disabled. This deadlock occurs in qspinlock code.
This patch changes the definition of cpu_relax() to smp_mb() for Loongson-3, forcing a flushing of the SFB on SMP systems before the next load takes place. If the Kernel is not compiled for SMP support, this will expand to a barrier() as before.
References: 534be1d5a2da940 (ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore) Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen chenhc@lemote.com
arch/mips/include/asm/processor.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index af34afb..a8c4a3a 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status) +#ifdef CONFIG_CPU_LOONGSON3 +/*
- Loongson-3's SFB (Store-Fill-Buffer) may get starved when stuck in a read
- loop. Since spin loops of any kind should have a cpu_relax() in them, force
- a Store-Fill-Buffer flush from cpu_relax() such that any pending writes will
- become available as expected.
- */
I think "may starve writes" or "may queue writes indefinitely" would be clearer than "may get starved".
Agreed.
+#define cpu_relax() smp_mb() +#else #define cpu_relax() barrier() +#endif /*
- Return_address is a replacement for __builtin_return_address(count)
Apart from the comment above though this looks better to me.
Re-copying the LKMM maintainers - are you happy(ish) with this?
Right, thanks for adding us back on :-)
Yes, this is much better, although I myself would also prefer explicit mention that this is a work-around for a hardware bug.
But aside from the actual comment bike-shedding, this looks entirely acceptible (also because ARM is already doing this -- and the Changelog might want to refer to that particular patch).
On Wed, Jul 18, 2018 at 1:52 AM, Paul Burton paul.burton@mips.com wrote:
Hi Huacai,
On Fri, Jul 13, 2018 at 03:37:57PM +0800, Huacai Chen wrote:
Linux expects that if a CPU modifies a memory location, then that modification will eventually become visible to other CPUs in the system.
On Loongson-3 processor with SFB (Store Fill Buffer), loads may be prioritised over stores so it is possible for a store operation to be postponed if a polling loop immediately follows it. If the variable being polled indirectly depends on the outstanding store [for example, another CPU may be polling the variable that is pending modification] then there is the potential for deadlock if interrupts are disabled. This deadlock occurs in qspinlock code.
This patch changes the definition of cpu_relax() to smp_mb() for Loongson-3, forcing a flushing of the SFB on SMP systems before the next load takes place. If the Kernel is not compiled for SMP support, this will expand to a barrier() as before.
References: 534be1d5a2da940 (ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore) Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen chenhc@lemote.com
arch/mips/include/asm/processor.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index af34afb..a8c4a3a 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
+#ifdef CONFIG_CPU_LOONGSON3 +/*
- Loongson-3's SFB (Store-Fill-Buffer) may get starved when stuck in a read
- loop. Since spin loops of any kind should have a cpu_relax() in them, force
- a Store-Fill-Buffer flush from cpu_relax() such that any pending writes will
- become available as expected.
- */
I think "may starve writes" or "may queue writes indefinitely" would be clearer than "may get starved".
Need I change the comment and resend? Or you change the comment and get merged?
Huacai
+#define cpu_relax() smp_mb() +#else #define cpu_relax() barrier() +#endif
/*
- Return_address is a replacement for __builtin_return_address(count)
-- 2.7.0
Apart from the comment above though this looks better to me.
Re-copying the LKMM maintainers - are you happy(ish) with this?
Thanks, Paul
Hi Huacai,
On Wed, Jul 18, 2018 at 09:15:46AM +0800, Huacai Chen wrote:
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index af34afb..a8c4a3a 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
+#ifdef CONFIG_CPU_LOONGSON3 +/*
- Loongson-3's SFB (Store-Fill-Buffer) may get starved when stuck in a read
- loop. Since spin loops of any kind should have a cpu_relax() in them, force
- a Store-Fill-Buffer flush from cpu_relax() such that any pending writes will
- become available as expected.
- */
I think "may starve writes" or "may queue writes indefinitely" would be clearer than "may get starved".
Need I change the comment and resend? Or you change the comment and get merged?
I'm happy to fix up the comment - but have a couple more questions.
Looking into the history, would it be fair to say that this is only a problem after commit 1e820da3c9af ("MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT") when CONFIG_LOONGSON3_ENHANCEMENT=y, which adds code to enable the SFB?
If so would it make sense to use CONFIG_LOONGSON3_ENHANCEMENT to select the use of smp_mb()?
How much does performance gain does enabling the SFB give you? Would it be reasonable to just disable it, rather than using this workaround?
Thanks, Paul
Hi, Paul,
SFB can improve the memory bandwidth as much as 30%, and we are planning to enable SFB by default. So, we want to control cpu_relax() under CONFIG_CPU_LOONGSON3, not under CONFIG_LOONGSON3_ENHANCEMENT.
Huacai
------------------ Original ------------------ From: "Paul Burton"paul.burton@mips.com; Date: Fri, Jul 20, 2018 05:15 AM To: "Huacai Chen"chenhc@lemote.com; Cc: "Ralf Baechle"ralf@linux-mips.org; "James Hogan"jhogan@kernel.org; "linux-mips"linux-mips@linux-mips.org; "Fuxin Zhang"zhangfx@lemote.com; "wuzhangjin"wuzhangjin@gmail.com; "stable"stable@vger.kernel.org; "Alan Stern"stern@rowland.harvard.edu; "Andrea Parri"andrea.parri@amarulasolutions.com; "Will Deacon"will.deacon@arm.com; "Peter Zijlstra"peterz@infradead.org; "Boqun Feng"boqun.feng@gmail.com; "Nicholas Piggin"npiggin@gmail.com; "David Howells"dhowells@redhat.com; "Jade Alglave"j.alglave@ucl.ac.uk; "Luc Maranget"luc.maranget@inria.fr; "Paul E. McKenney"paulmck@linux.vnet.ibm.com; "Akira Yokosawa"akiyks@gmail.com; "LKML"linux-kernel@vger.kernel.org; Subject: Re: [PATCH] MIPS: Change definition of cpu_relax() for Loongson-3
Hi Huacai,
On Wed, Jul 18, 2018 at 09:15:46AM +0800, Huacai Chen wrote:
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index af34afb..a8c4a3a 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
+#ifdef CONFIG_CPU_LOONGSON3 +/*
- Loongson-3's SFB (Store-Fill-Buffer) may get starved when stuck in a read
- loop. Since spin loops of any kind should have a cpu_relax() in them, force
- a Store-Fill-Buffer flush from cpu_relax() such that any pending writes will
- become available as expected.
- */
I think "may starve writes" or "may queue writes indefinitely" would be clearer than "may get starved".
Need I change the comment and resend? Or you change the comment and get merged?
I'm happy to fix up the comment - but have a couple more questions.
Looking into the history, would it be fair to say that this is only a problem after commit 1e820da3c9af ("MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT") when CONFIG_LOONGSON3_ENHANCEMENT=y, which adds code to enable the SFB?
If so would it make sense to use CONFIG_LOONGSON3_ENHANCEMENT to select the use of smp_mb()?
How much does performance gain does enabling the SFB give you? Would it be reasonable to just disable it, rather than using this workaround?
Thanks, Paul
Hi Huacai,
On Sat, Jul 21, 2018 at 09:35:59AM +0800, 陈华才 wrote:
SFB can improve the memory bandwidth as much as 30%, and we are planning to enable SFB by default. So, we want to control cpu_relax() under CONFIG_CPU_LOONGSON3, not under CONFIG_LOONGSON3_ENHANCEMENT.
OK, applied to mips-next for 4.19 with changes to the commit message & comment.
Thanks, Paul
linux-stable-mirror@lists.linaro.org