On 11/12/25 1:22 AM, Bart Van Assche wrote:
On 11/10/25 10:25 PM, Nilay Shroff wrote:
I applied your patch on my linux tree and ran some tests. And as I earlier suspected, I found the following race from KCSAN:
[ ... ]
Thank you for having run these tests. It's unfortunate that I couldn't trigger these KCSAN complaints in my tests with KCSAN enabled in the kernel configuration.
So from the above trace it seems obvious that we need to mark both writers and readers to avoid potential race.
That would be an intrusive change. I don't think that the kernel maintainers would agree with marking all rq_timeout and all ra_pages reads with READ_ONCE(). I propose to annotate both the rq_timeout and ra_pages data members with __data_racy to suppress these KCSAN reports.
Yes, that should also work. I validated the use of __data_racy on my test kernel.
However, while compiling the kernel with __data_racy applied to q->rq_timeout, I encountered a build failure. After some investigation, I found that the issue occurred because my kernel configuration had CONFIG_DEBUG_INFO_BTF enabled. During the build, when the compiler attempted to generate BTF types from the vmlinux.unstripped binary, it failed.
Mu guess is that some compilation units have KCSAN disabled, in which case the pre-processor expands __data_racy to nothing. In other units where KCSAN is enabled, __data_racy expands to the volatile qualifier. As a result, BTF resolver encountered two versions of struct request_queue: one where q->rq_timeout was declared with the volatile keyword and another where it was declared without it. This type mismatch caused resolver to fail during the BTF extraction phase.
Yes this is something not related to block layer and has to fixed by KCSAN/eBPF developers.
Thanks, --Nilay