On Tue, Nov 08, 2022 at 01:47:15PM +0100, Linus Walleij wrote:
Hi Maria,
thanks for your patch!
On Thu, Oct 27, 2022 at 8:51 AM Maria Yu quic_aiquny@quicinc.com wrote:
We've got a dump that current cpu is in pinctrl_commit_state, the old_state != p->state while the stack is still in the process of pinmux_disable_setting. So it means even if the current p->state is changed in new state, the settings are not yet up-to-date enabled complete yet.
Currently p->state in different value to synchronize the pinctrl_commit_state behaviors. The p->state will have transaction like old_state -> NULL -> new_state. When in old_state, it will try to disable all the all state settings. And when after new state settings enabled, p->state will changed to the new state after that. So use smp_mb to synchronize the p->state variable and the settings in order.
drivers/pinctrl/core.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/pinctrl/core.c b/drivers/pinctrl/core.c index 9e57f4c62e60..cd917a5b1a0a 100644 --- a/drivers/pinctrl/core.c +++ b/drivers/pinctrl/core.c @@ -1256,6 +1256,7 @@ static int pinctrl_commit_state(struct pinctrl *p, struct pinctrl_state *state) } }
smp_mb(); p->state = NULL; /* Apply all the settings for the new state - pinmux first */
@@ -1305,6 +1306,7 @@ static int pinctrl_commit_state(struct pinctrl *p, struct pinctrl_state *state) pinctrl_link_add(setting->pctldev, p->dev); }
smp_mb(); p->state = state; return 0;
Ow!
It's not often that I loop in Paul McKenney on patches, but this is in the core of the subsystem used across all architectures so if this is a generic problem of concurrency, I really want some professional concurrency person to look at it before I apply it.
Hello, Linus and Maria!
Insertion of unadorned and uncommented memory barriers does rouse more than a bit of suspicion, to be sure. ;-)
Could you please outline what ordering this smp_mb() is intended to provide? Yes, my guess is that the p->state change is to be seen as happening after the prior memory accesses, but:
1. What is the other side of the interaction doing? My guess is that something is reading p->state and the referencing the same memory referenced prior to the pair of smp_mb() calls above. For example, are the other relevant memory references referenced by the pointer "p"?
For example, what happens if two of the above updates happen in quick succession during the execution of a single instance of the other side of the interaction?
2. Why smp_mb() rather than using smp_store_release() to update p->state?
3. More generally, why unmarked accesses to p->state? Are the other relevant accesses also unmarked?
Please see these LWN articles for more on the potential dangers of unmarked accesses to shared variables:
Who's afraid of a big bad optimizing compiler? https://lwn.net/Articles/793253/
Calibrating your fear of big bad optimizing compilers https://lwn.net/Articles/799218/
4. There are some tools that can help with this sort of ordering code, for example:
Concurrency bugs should fear the big bad data-race detector (part 1) https://lwn.net/Articles/816850/ Concurrency bugs should fear the big bad data-race detector (part 2) https://lwn.net/Articles/816854/
For this tool (KCSAN) to find a problem, your testing must come close to making it happen.
A design-level full-state-space tool may be found in tools/memotry-model. This tool, as you might expect, is restricted to very short code fragments, but does fully handle concurrency. It might take some work to squeeze what you have into the confines of this tool.
Again, to evaluate this change, I need to understand what it is trying to accomplish.
Thanx, Paul