On Mon, Jun 18, 2018 at 11:51:41AM -0700, Paul Burton wrote:
Hi Huacai,
On Fri, Jun 15, 2018 at 02:07:38PM +0800, Huacai Chen wrote:
After commit 7f56b58a92aaf2c ("locking/mcs: Use smp_cond_load_acquire() in MCS spin loop") Loongson-3 fails to boot. This is because Loongson-3 has SFB (Store Fill Buffer) and READ_ONCE() may get an old value in a tight loop. So in smp_cond_load_acquire() we need a __smp_mb() after every READ_ONCE().
Thanks - modifying smp_cond_load_acquire() is a step better than modifying arch_mcs_spin_lock_contended() to avoid it, but I'm still not sure we've reached the root of the problem.
Agreed, this looks entirely dodgy.
If tight loops using READ_ONCE() are at fault then what's special about smp_cond_load_acquire()? Could other such loops not hit the same problem?
Right again, Linux has a number of places where it relies on loops like this.
for (;;) { if (READ_ONCE(*ptr)) break;
cpu_relax(); }
That is assumed to terminate -- provided the store to make *ptr != 0 happens of course.
And this has nothing to do with store buffers per se, sure store-buffers might delay the store from being visible for a (little) while, but we very much assume store buffers will not indefinitely hold on to data.
The proposed __smp_mb() doesn't make any kind of sense here. I presume it's effect is to flush remote store buffers, but that is not something LKMM allows for.
Is the scenario you encounter the same as that outlined in the "DATA DEPENDENCY BARRIERS (HISTORICAL)" section of Documentation/memory-barriers.txt by any chance? If so then perhaps it would be better to implement smp_read_barrier_depends(), or just raw read_barrier_depends() depending upon how your SFB functions.
That doesn't make any sense, there is no actual data dependency here. We load a single variable. Data dependencies are where you have (at least) 2 loads, where the second depends on the first, like:
struct obj *obj = rcu_dereference(objp); int val = obj->val;
Here the load of val depends on the load of obj.
Is there any public documentation describing the behaviour of the store fill buffer in Loongson-3?
I know of Store Buffers, but what exactly is a Store Fill Buffer? Better would be to get a coherent memory model document on Loongson, because this all smells horribly.
Part of the problem is that I'm still not sure what's actually happening in your system - it would be helpful to have further information in the commit message about what actually happens. For example if you could walk us through an example of the problem step by step in the style of the diagrams you'll see in Documentation/memory-barriers.txt then I think that would help us to see what the best solution here is.
I've copied the LKMM maintainers in case they have further input.
Thanks, patches like proposed certainly require closer scrutiny and I agree with you that this needs far better explanation.