Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

3 Dec 2022


      Marco Elver wrote on Sat, Dec 03, 2022 at 05:46:46PM +0100:
...
...
But I can't really find a problem with what KCSAN complains about --
we are indeed accessing status from two threads without any locks.
Instead of a lock, we're using a barrier so that:

recv thread/cb: writes to req stuff || write to req status
p9_client_rpc: reads req status || reads other fields from req

Which has been working well enough (at least, without the barrier things
blow up quite fast).
So can I'll just consider this a false positive, but if someone knows
how much one can read into this that'd be appreciated.
The barriers only ensure ordering, but not atomicity of the accesses
themselves (for one, the compiler is well in its right to transform
plain accesses in ways that the concurrent algorithm wasn't designed
for). In this case it looks like it's just missing
READ_ONCE()/WRITE_ONCE().
Aha! Thanks for this!
I've always believed plain int types accesses are always atomic and the
only thing to watch for would be compilers reordering instrucions, which
would be ensured by the barrier in this case, but I guess there are some
architectures or places where this isn't true?
I'm a bit confused though, I can only see five places where wait_event*
functions use READ_ONCE and I believe they more or less all would
require such a marker -- I guess non-equality checks might be safe
(waiting for a value to change from a known value) but if non-atomic
updates are on the table equality and comparisons checks all would need
to be decorated with READ_ONCE; afaiu, unlike usespace loops with
pthread_cond_wait there is nothing protecting the condition itself.
Should I just update the wrapped condition, as below?
-       err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
+       err = wait_event_killable(req->wq,
+                                 READ_ONCE(req->status) >= REQ_STATUS_RCVD);
The writes all are straightforward, there's all the error paths to
convert to WRITE_ONCE too but that's not difficult (leaving only the
init without such a marker); I'll send a patch when you've confirmed the
read looks good.
(the other reads are a bit less obvious as some are protected by a lock
in trans_fd, which should cover all cases of possible concurrent updates
there as far as I can see, but this mixed model is definitely hard to
reason with... Well, that's how it was written and I won't ever have time
to rewrite any of this. Enough ranting.)
...
A (relatively) quick primer on the kernel's memory model and
where/what/how we need to "mark" accesses:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
I read Documentation/memory-barriers.txt ages ago but wasn't aware of
this memory-model directory; I've skimmed through and will have a proper
read as time permits.
Thank you,
-- 
Dominique

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc