[Please ignore if it is already reported, and not an expert of KCSAN]
While booting arm64 with allmodconfig following kernel BUG found, this build is enabled with CONFIG_INIT_STACK_NONE=y
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034] [ 0.000000] Linux version 6.1.0-rc7-next-20221130 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 11.3.0-6) 11.3.0, GNU ld (GNU Binutils for Debian) 2.39) #2 SMP PREEMPT_DYNAMIC @1669786411 [ 0.000000] random: crng init done [ 0.000000] Machine model: linux,dummy-virt ... [ 424.408466] ================================================================== [ 424.412792] BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc [ 424.416806] [ 424.418214] write to 0xffff00000a753000 of 4 bytes by interrupt on cpu 0: [ 424.422437] p9_client_cb+0x84/0x100 [ 424.425048] req_done+0xfc/0x1c0 [ 424.427443] vring_interrupt+0x174/0x1c0 [ 424.430204] __handle_irq_event_percpu+0x2c8/0x680 [ 424.433455] handle_irq_event+0x9c/0x180 [ 424.436187] handle_fasteoi_irq+0x2b0/0x340 [ 424.439139] generic_handle_domain_irq+0x78/0xc0 [ 424.442323] __gic_handle_irq_from_irqson.isra.0+0x3d8/0x480 [ 424.446054] gic_handle_irq+0xb4/0x100 [ 424.448663] call_on_irq_stack+0x2c/0x38 [ 424.451443] do_interrupt_handler+0xd0/0x140 [ 424.454452] el1_interrupt+0x88/0xc0 [ 424.457001] el1h_64_irq_handler+0x18/0x40 [ 424.459856] el1h_64_irq+0x78/0x7c [ 424.462331] arch_local_irq_enable+0x50/0x80 [ 424.465273] arm64_preempt_schedule_irq+0x80/0xc0 [ 424.468497] el1_interrupt+0x90/0xc0 [ 424.471096] el1h_64_irq_handler+0x18/0x40 [ 424.474009] el1h_64_irq+0x78/0x7c [ 424.476464] __tsan_read8+0x118/0x280 [ 424.479086] __delay+0x104/0x140 [ 424.481521] __udelay+0x5c/0xc0 [ 424.483905] kcsan_setup_watchpoint+0x6cc/0x7c0 [ 424.487081] __tsan_read4+0x168/0x280 [ 424.489729] p9_client_rpc+0x1d0/0x580 [ 424.492429] p9_client_getattr_dotl+0xd0/0x3c0 [ 424.495457] v9fs_inode_from_fid_dotl+0x48/0x1c0 [ 424.498602] v9fs_vfs_lookup+0x23c/0x3c0 [ 424.501386] __lookup_slow+0x1b0/0x240 [ 424.504056] walk_component+0x168/0x280 [ 424.506807] path_lookupat+0x154/0x2c0 [ 424.509489] filename_lookup+0x160/0x2c0 [ 424.512261] vfs_statx+0xc0/0x280 [ 424.514710] vfs_fstatat+0x84/0x100 [ 424.517308] __do_sys_newfstatat+0x64/0x100 [ 424.520189] __arm64_sys_newfstatat+0x74/0xc0 [ 424.523262] invoke_syscall+0xb0/0x1c0 [ 424.525939] el0_svc_common.constprop.0+0x10c/0x180 [ 424.529219] do_el0_svc+0x54/0x80 [ 424.531662] el0_svc+0x4c/0xc0 [ 424.533944] el0t_64_sync_handler+0xc8/0x180 [ 424.536837] el0t_64_sync+0x1a4/0x1a8 [ 424.539436] [ 424.540810] read to 0xffff00000a753000 of 4 bytes by task 74 on cpu 0: [ 424.544927] p9_client_rpc+0x1d0/0x580 [ 424.547692] p9_client_getattr_dotl+0xd0/0x3c0 [ 424.550564] v9fs_inode_from_fid_dotl+0x48/0x1c0 [ 424.553550] v9fs_vfs_lookup+0x23c/0x3c0 [ 424.556144] __lookup_slow+0x1b0/0x240 [ 424.558655] walk_component+0x168/0x280 [ 424.561192] path_lookupat+0x154/0x2c0 [ 424.563721] filename_lookup+0x160/0x2c0 [ 424.566337] vfs_statx+0xc0/0x280 [ 424.568638] vfs_fstatat+0x84/0x100 [ 424.571051] __do_sys_newfstatat+0x64/0x100 [ 424.573821] __arm64_sys_newfstatat+0x74/0xc0 [ 424.576650] invoke_syscall+0xb0/0x1c0 [ 424.579144] el0_svc_common.constprop.0+0x10c/0x180 [ 424.582212] do_el0_svc+0x54/0x80 [ 424.584475] el0_svc+0x4c/0xc0 [ 424.586611] el0t_64_sync_handler+0xc8/0x180 [ 424.589347] el0t_64_sync+0x1a4/0x1a8 [ 424.591758] [ 424.593045] 1 lock held by systemd-journal/74: [ 424.595821] #0: ffff00000a0ead88 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x158/0x280 [ 424.601588] irq event stamp: 416642 [ 424.603875] hardirqs last enabled at (416641): [<ffff80000a552040>] preempt_schedule_irq+0x40/0x100 [ 424.609078] hardirqs last disabled at (416642): [<ffff80000a5422b8>] el1_interrupt+0x78/0xc0 [ 424.613887] softirqs last enabled at (416464): [<ffff800008011130>] __do_softirq+0x5b0/0x694 [ 424.618699] softirqs last disabled at (416453): [<ffff80000801a9b0>] ____do_softirq+0x30/0x80 [ 424.623562] [ 424.624841] value changed: 0x00000002 -> 0x00000003 [ 424.627838] [ 424.629117] Reported by Kernel Concurrency Sanitizer on: [ 424.632298] CPU: 0 PID: 74 Comm: systemd-journal Tainted: G T 6.1.0-rc7-next-20221130 #2 26b4d3787db66414ab23fce17d22967bb2169e1f [ 424.639393] Hardware name: linux,dummy-virt (DT)
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
-- Linaro LKFT https://lkft.linaro.org
On Wed, 30 Nov 2022 at 13:50, Naresh Kamboju naresh.kamboju@linaro.org wrote:
[Please ignore if it is already reported, and not an expert of KCSAN]
While booting arm64 with allmodconfig following kernel BUG found, this build is enabled with CONFIG_INIT_STACK_NONE=y
Unsure why CONFIG_INIT_STACK_NONE=y is relevant.
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034] [ 0.000000] Linux version 6.1.0-rc7-next-20221130 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 11.3.0-6) 11.3.0, GNU ld (GNU Binutils for Debian) 2.39) #2 SMP PREEMPT_DYNAMIC @1669786411 [ 0.000000] random: crng init done [ 0.000000] Machine model: linux,dummy-virt ... [ 424.408466] ================================================================== [ 424.412792] BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc [ 424.416806] [ 424.418214] write to 0xffff00000a753000 of 4 bytes by interrupt on cpu 0: [ 424.422437] p9_client_cb+0x84/0x100
Please always provide line numbers and kernel commit hash or tag (I think it's next-20221130, but not entirely clear).
Then we can look at git blame of the lines and see if it's new code.
[ 424.425048] req_done+0xfc/0x1c0 [ 424.427443] vring_interrupt+0x174/0x1c0 [ 424.430204] __handle_irq_event_percpu+0x2c8/0x680 [ 424.433455] handle_irq_event+0x9c/0x180 [ 424.436187] handle_fasteoi_irq+0x2b0/0x340 [ 424.439139] generic_handle_domain_irq+0x78/0xc0 [ 424.442323] __gic_handle_irq_from_irqson.isra.0+0x3d8/0x480 [ 424.446054] gic_handle_irq+0xb4/0x100 [ 424.448663] call_on_irq_stack+0x2c/0x38 [ 424.451443] do_interrupt_handler+0xd0/0x140 [ 424.454452] el1_interrupt+0x88/0xc0 [ 424.457001] el1h_64_irq_handler+0x18/0x40 [ 424.459856] el1h_64_irq+0x78/0x7c [ 424.462331] arch_local_irq_enable+0x50/0x80 [ 424.465273] arm64_preempt_schedule_irq+0x80/0xc0 [ 424.468497] el1_interrupt+0x90/0xc0 [ 424.471096] el1h_64_irq_handler+0x18/0x40 [ 424.474009] el1h_64_irq+0x78/0x7c [ 424.476464] __tsan_read8+0x118/0x280 [ 424.479086] __delay+0x104/0x140 [ 424.481521] __udelay+0x5c/0xc0 [ 424.483905] kcsan_setup_watchpoint+0x6cc/0x7c0 [ 424.487081] __tsan_read4+0x168/0x280 [ 424.489729] p9_client_rpc+0x1d0/0x580 [ 424.492429] p9_client_getattr_dotl+0xd0/0x3c0 [ 424.495457] v9fs_inode_from_fid_dotl+0x48/0x1c0 [ 424.498602] v9fs_vfs_lookup+0x23c/0x3c0 [ 424.501386] __lookup_slow+0x1b0/0x240 [ 424.504056] walk_component+0x168/0x280 [ 424.506807] path_lookupat+0x154/0x2c0 [ 424.509489] filename_lookup+0x160/0x2c0 [ 424.512261] vfs_statx+0xc0/0x280 [ 424.514710] vfs_fstatat+0x84/0x100 [ 424.517308] __do_sys_newfstatat+0x64/0x100 [ 424.520189] __arm64_sys_newfstatat+0x74/0xc0 [ 424.523262] invoke_syscall+0xb0/0x1c0 [ 424.525939] el0_svc_common.constprop.0+0x10c/0x180 [ 424.529219] do_el0_svc+0x54/0x80 [ 424.531662] el0_svc+0x4c/0xc0 [ 424.533944] el0t_64_sync_handler+0xc8/0x180 [ 424.536837] el0t_64_sync+0x1a4/0x1a8 [ 424.539436] [ 424.540810] read to 0xffff00000a753000 of 4 bytes by task 74 on cpu 0: [ 424.544927] p9_client_rpc+0x1d0/0x580 [ 424.547692] p9_client_getattr_dotl+0xd0/0x3c0 [ 424.550564] v9fs_inode_from_fid_dotl+0x48/0x1c0 [ 424.553550] v9fs_vfs_lookup+0x23c/0x3c0 [ 424.556144] __lookup_slow+0x1b0/0x240 [ 424.558655] walk_component+0x168/0x280 [ 424.561192] path_lookupat+0x154/0x2c0 [ 424.563721] filename_lookup+0x160/0x2c0 [ 424.566337] vfs_statx+0xc0/0x280 [ 424.568638] vfs_fstatat+0x84/0x100 [ 424.571051] __do_sys_newfstatat+0x64/0x100 [ 424.573821] __arm64_sys_newfstatat+0x74/0xc0 [ 424.576650] invoke_syscall+0xb0/0x1c0 [ 424.579144] el0_svc_common.constprop.0+0x10c/0x180 [ 424.582212] do_el0_svc+0x54/0x80 [ 424.584475] el0_svc+0x4c/0xc0 [ 424.586611] el0t_64_sync_handler+0xc8/0x180 [ 424.589347] el0t_64_sync+0x1a4/0x1a8 [ 424.591758] [ 424.593045] 1 lock held by systemd-journal/74: [ 424.595821] #0: ffff00000a0ead88 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x158/0x280 [ 424.601588] irq event stamp: 416642 [ 424.603875] hardirqs last enabled at (416641): [<ffff80000a552040>] preempt_schedule_irq+0x40/0x100 [ 424.609078] hardirqs last disabled at (416642): [<ffff80000a5422b8>] el1_interrupt+0x78/0xc0 [ 424.613887] softirqs last enabled at (416464): [<ffff800008011130>] __do_softirq+0x5b0/0x694 [ 424.618699] softirqs last disabled at (416453): [<ffff80000801a9b0>] ____do_softirq+0x30/0x80 [ 424.623562] [ 424.624841] value changed: 0x00000002 -> 0x00000003 [ 424.627838] [ 424.629117] Reported by Kernel Concurrency Sanitizer on: [ 424.632298] CPU: 0 PID: 74 Comm: systemd-journal Tainted: G T 6.1.0-rc7-next-20221130 #2 26b4d3787db66414ab23fce17d22967bb2169e1f [ 424.639393] Hardware name: linux,dummy-virt (DT)
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
-- Linaro LKFT https://lkft.linaro.org
On Wed, 30 Nov 2022 at 18:25, Marco Elver elver@google.com wrote:
On Wed, 30 Nov 2022 at 13:50, Naresh Kamboju naresh.kamboju@linaro.org wrote:
[Please ignore if it is already reported, and not an expert of KCSAN]
While booting arm64 with allmodconfig following kernel BUG found, this build is enabled with CONFIG_INIT_STACK_NONE=y
Unsure why CONFIG_INIT_STACK_NONE=y is relevant.
I agree.
This is from qemu-arm64 boot log.
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034] [ 0.000000] Linux version 6.1.0-rc7-next-20221130 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 11.3.0-6) 11.3.0, GNU ld (GNU Binutils for Debian) 2.39) #2 SMP PREEMPT_DYNAMIC @1669786411 [ 0.000000] random: crng init done [ 0.000000] Machine model: linux,dummy-virt ... [ 424.408466] ================================================================== [ 424.412792] BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc [ 424.416806] [ 424.418214] write to 0xffff00000a753000 of 4 bytes by interrupt on cpu 0: [ 424.422437] p9_client_cb+0x84/0x100
Please always provide line numbers and kernel commit hash or tag (I think it's next-20221130, but not entirely clear).
It is the Linux next-20221130 tag. https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/?h=next-...
Then we can look at git blame of the lines and see if it's new code.
True. Hope that tree and tag could help you get git details.
[ 424.425048] req_done+0xfc/0x1c0 [ 424.427443] vring_interrupt+0x174/0x1c0 [ 424.430204] __handle_irq_event_percpu+0x2c8/0x680 [ 424.433455] handle_irq_event+0x9c/0x180 [ 424.436187] handle_fasteoi_irq+0x2b0/0x340 [ 424.439139] generic_handle_domain_irq+0x78/0xc0 [ 424.442323] __gic_handle_irq_from_irqson.isra.0+0x3d8/0x480 [ 424.446054] gic_handle_irq+0xb4/0x100 [ 424.448663] call_on_irq_stack+0x2c/0x38 [ 424.451443] do_interrupt_handler+0xd0/0x140 [ 424.454452] el1_interrupt+0x88/0xc0 [ 424.457001] el1h_64_irq_handler+0x18/0x40 [ 424.459856] el1h_64_irq+0x78/0x7c [ 424.462331] arch_local_irq_enable+0x50/0x80 [ 424.465273] arm64_preempt_schedule_irq+0x80/0xc0 [ 424.468497] el1_interrupt+0x90/0xc0 [ 424.471096] el1h_64_irq_handler+0x18/0x40 [ 424.474009] el1h_64_irq+0x78/0x7c [ 424.476464] __tsan_read8+0x118/0x280 [ 424.479086] __delay+0x104/0x140 [ 424.481521] __udelay+0x5c/0xc0 [ 424.483905] kcsan_setup_watchpoint+0x6cc/0x7c0 [ 424.487081] __tsan_read4+0x168/0x280 [ 424.489729] p9_client_rpc+0x1d0/0x580 [ 424.492429] p9_client_getattr_dotl+0xd0/0x3c0 [ 424.495457] v9fs_inode_from_fid_dotl+0x48/0x1c0 [ 424.498602] v9fs_vfs_lookup+0x23c/0x3c0 [ 424.501386] __lookup_slow+0x1b0/0x240 [ 424.504056] walk_component+0x168/0x280 [ 424.506807] path_lookupat+0x154/0x2c0 [ 424.509489] filename_lookup+0x160/0x2c0 [ 424.512261] vfs_statx+0xc0/0x280 [ 424.514710] vfs_fstatat+0x84/0x100 [ 424.517308] __do_sys_newfstatat+0x64/0x100 [ 424.520189] __arm64_sys_newfstatat+0x74/0xc0 [ 424.523262] invoke_syscall+0xb0/0x1c0 [ 424.525939] el0_svc_common.constprop.0+0x10c/0x180 [ 424.529219] do_el0_svc+0x54/0x80 [ 424.531662] el0_svc+0x4c/0xc0 [ 424.533944] el0t_64_sync_handler+0xc8/0x180 [ 424.536837] el0t_64_sync+0x1a4/0x1a8 [ 424.539436] [ 424.540810] read to 0xffff00000a753000 of 4 bytes by task 74 on cpu 0: [ 424.544927] p9_client_rpc+0x1d0/0x580 [ 424.547692] p9_client_getattr_dotl+0xd0/0x3c0 [ 424.550564] v9fs_inode_from_fid_dotl+0x48/0x1c0 [ 424.553550] v9fs_vfs_lookup+0x23c/0x3c0 [ 424.556144] __lookup_slow+0x1b0/0x240 [ 424.558655] walk_component+0x168/0x280 [ 424.561192] path_lookupat+0x154/0x2c0 [ 424.563721] filename_lookup+0x160/0x2c0 [ 424.566337] vfs_statx+0xc0/0x280 [ 424.568638] vfs_fstatat+0x84/0x100 [ 424.571051] __do_sys_newfstatat+0x64/0x100 [ 424.573821] __arm64_sys_newfstatat+0x74/0xc0 [ 424.576650] invoke_syscall+0xb0/0x1c0 [ 424.579144] el0_svc_common.constprop.0+0x10c/0x180 [ 424.582212] do_el0_svc+0x54/0x80 [ 424.584475] el0_svc+0x4c/0xc0 [ 424.586611] el0t_64_sync_handler+0xc8/0x180 [ 424.589347] el0t_64_sync+0x1a4/0x1a8 [ 424.591758] [ 424.593045] 1 lock held by systemd-journal/74: [ 424.595821] #0: ffff00000a0ead88 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x158/0x280 [ 424.601588] irq event stamp: 416642 [ 424.603875] hardirqs last enabled at (416641): [<ffff80000a552040>] preempt_schedule_irq+0x40/0x100 [ 424.609078] hardirqs last disabled at (416642): [<ffff80000a5422b8>] el1_interrupt+0x78/0xc0 [ 424.613887] softirqs last enabled at (416464): [<ffff800008011130>] __do_softirq+0x5b0/0x694 [ 424.618699] softirqs last disabled at (416453): [<ffff80000801a9b0>] ____do_softirq+0x30/0x80 [ 424.623562] [ 424.624841] value changed: 0x00000002 -> 0x00000003 [ 424.627838] [ 424.629117] Reported by Kernel Concurrency Sanitizer on: [ 424.632298] CPU: 0 PID: 74 Comm: systemd-journal Tainted: G T 6.1.0-rc7-next-20221130 #2 26b4d3787db66414ab23fce17d22967bb2169e1f [ 424.639393] Hardware name: linux,dummy-virt (DT)
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
-- Linaro LKFT https://lkft.linaro.org
Naresh Kamboju wrote on Wed, Nov 30, 2022 at 09:34:45PM +0530:
[ 424.418214] write to 0xffff00000a753000 of 4 bytes by interrupt on cpu 0: [ 424.422437] p9_client_cb+0x84/0x100
Then we can look at git blame of the lines and see if it's new code.
True. Hope that tree and tag could help you get git details.
Even with the git tag, if we don't build for the same arch with the same compiler version/options and the same .config we aren't likely to have identical binaries, so we cannot make sense of these offsets without much work.
As much as I'd like to investigate a data race in 9p (and geez that code has been such a headache from syzbot already so I don't doubt there are more), having line numbers is really not optional if we want to scale at all. If you still have the vmlinux binary from that build (or if you can rebuild with the same options), running this text through addr2line should not take you too long. (You might need to build with at least CONFIG_DEBUG_INFO_REDUCED (or not reduced), but that is on by default for aarch64)
-- Dominique
On Thu, 1 Dec 2022 at 01:35, Dominique Martinet asmadeus@codewreck.org wrote:
Naresh Kamboju wrote on Wed, Nov 30, 2022 at 09:34:45PM +0530:
[ 424.418214] write to 0xffff00000a753000 of 4 bytes by interrupt on cpu 0: [ 424.422437] p9_client_cb+0x84/0x100
Then we can look at git blame of the lines and see if it's new code.
True. Hope that tree and tag could help you get git details.
Even with the git tag, if we don't build for the same arch with the same compiler version/options and the same .config we aren't likely to have identical binaries, so we cannot make sense of these offsets without much work.
As much as I'd like to investigate a data race in 9p (and geez that code has been such a headache from syzbot already so I don't doubt there are more), having line numbers is really not optional if we want to scale at all. If you still have the vmlinux binary from that build (or if you can rebuild with the same options), running this text through addr2line should not take you too long.
Please find build artifacts in this link, - config - vmlinux - System.map https://people.linaro.org/~anders.roxell/next-20221130-allmodconfig-arm64-tu...
And
# aarch64-linux-gnu-objdump -D vmlinux|less search for p9_client_cb
Attached objdump log and here is the link. - http://ix.io/4hk1
(You might need to build with at least CONFIG_DEBUG_INFO_REDUCED (or not reduced), but that is on by default for aarch64)
Thanks for the suggestions. The Kconfig is enabled now. CONFIG_DEBUG_INFO_REDUCED=y
-- Dominique
- Naresh
(reply out of order)
Naresh Kamboju wrote on Thu, Dec 01, 2022 at 01:13:25PM +0530:
(You might need to build with at least CONFIG_DEBUG_INFO_REDUCED (or not reduced), but that is on by default for aarch64)
Thanks for the suggestions. The Kconfig is enabled now. CONFIG_DEBUG_INFO_REDUCED=y
It looks enabled in your the config file you linked at, I don't understand this remark? Did you produce the trace the other day without it and rebuild the kernel with it? In this case you also have CONFIG_DEBUG_INFO_SPLIT set, so the vmlinux file does not contain enough informations to retrieve line numbers or types, and in particular addr2line cannot be used on the files you provided. I've never used split debug infos before, but digging old threads I'm not too hopeful unless that changed: https://lkml.iu.edu/hypermail/linux/kernel/1711.1/03393.html https://sourceware.org/bugzilla/show_bug.cgi?id=22434
(...a test build later, it's still mostly useless... normal build $ ./scripts/faddr2line vmlinux __schedule+0x314 __schedule+0x314/0x6c0: perf_fetch_caller_regs at include/linux/perf_event.h:1286 (inlined by) __perf_sw_event_sched at include/linux/perf_event.h:1307 (inlined by) perf_event_task_sched_out at include/linux/perf_event.h:1347 (inlined by) prepare_task_switch at kernel/sched/core.c:5053 (inlined by) context_switch at kernel/sched/core.c:5195 (inlined by) __schedule at kernel/sched/core.c:6561
split dwarf build $ ./scripts/faddr2line vmlinux __schedule+0x314 aarch64-linux-gnu-addr2line: DWARF error: could not find abbrev number 860923 __schedule+0x314/0x780: aarch64-linux-gnu-addr2line: DWARF error: could not find abbrev number 860923 __schedule at core.c:?
I'd tend to agree build time/space savings aren't worth the developer time. )
Anyway, address sanitizer used to have a kasan_symbolize.py script but it looks like it got removed as no longer maintained, and I'm not sure what's a good tool to just run these logs through nowadays, might want to ask other test projects folks what they use...
If you still have the vmlinux binary from that build (or if you can rebuild with the same options), running this text through addr2line should not take you too long.
Please find build artifacts in this link,
- config
- vmlinux
- System.map
https://people.linaro.org/~anders.roxell/next-20221130-allmodconfig-arm64-tu...
So from the disassembly...
- p9_client_cb+0x84 is right before the wake_up and after the wmb(), so I assume we're on writing req->status line 441:
--- p9_client_cb(...) { ... smp_wmb(); req->status = status;
wake_up(&req->wq); ---
report is about a write from 2 to 3, this makes sense we're going from REQ_STATUS_SENT (2) to REQ_STATUS_RCVD (3).
- p9_client_rpc+0x1d0 isn't as simple to pin down as I'm having a hard time making sense of the kcsan instrumentations... The report is talking about a READ of 4 bytes at the same address, so I'd expect to see an ccess to req->status (and we're likely spot on wait_event_killable which checks req->status), but this doesn't seem to match up with the assembly: here's the excerpt from disass around 0x1d0 = 464 (why doesn't gdb provide hex offsets..) --- 0xffff80000a46e9b8 <+440>: cmn w28, #0x200 0xffff80000a46e9bc <+444>: ccmn w28, #0xe, #0x4, ne // ne = any 0xffff80000a46e9c0 <+448>: b.eq 0xffff80000a46ecfc <p9_client_rpc+1276> // b.none 0xffff80000a46e9c4 <+452>: mov x0, x25 0xffff80000a46e9c8 <+456>: bl 0xffff800008543640 <__tsan_write4> 0xffff80000a46e9cc <+460>: mov w0, #0x2 // #2 0xffff80000a46e9d0 <+464>: str w0, [x21, #88] 0xffff80000a46e9d4 <+468>: b 0xffff80000a46ecfc <p9_client_rpc+1276> 0xffff80000a46e9d8 <+472>: mov w27, #0x1 // #1 0xffff80000a46e9dc <+476>: mov x0, x23 0xffff80000a46e9e0 <+480>: mov w1, #0x2bc // #700 0xffff80000a46e9e4 <+484>: bl 0xffff800008192d80 <__might_sleep> ---
+464 is a write to x21 (client 'c', from looking at how it is passed into x0 for other function calls) at offset 88 (status field according to dwarf infos from a rebuild with your config/same sources)
So, err, I'm a bit lost on this side. But I can't really find a problem with what KCSAN complains about -- we are indeed accessing status from two threads without any locks. Instead of a lock, we're using a barrier so that: - recv thread/cb: writes to req stuff || write to req status - p9_client_rpc: reads req status || reads other fields from req
Which has been working well enough (at least, without the barrier things blow up quite fast).
So can I'll just consider this a false positive, but if someone knows how much one can read into this that'd be appreciated.
Thanks, -- Dominique
On Sat, 3 Dec 2022 at 16:37, Dominique Martinet asmadeus@codewreck.org wrote:
(reply out of order)
Naresh Kamboju wrote on Thu, Dec 01, 2022 at 01:13:25PM +0530:
(You might need to build with at least CONFIG_DEBUG_INFO_REDUCED (or not reduced), but that is on by default for aarch64)
Thanks for the suggestions. The Kconfig is enabled now. CONFIG_DEBUG_INFO_REDUCED=y
It looks enabled in your the config file you linked at, I don't understand this remark? Did you produce the trace the other day without it and rebuild the kernel with it? In this case you also have CONFIG_DEBUG_INFO_SPLIT set, so the vmlinux file does not contain enough informations to retrieve line numbers or types, and in particular addr2line cannot be used on the files you provided. I've never used split debug infos before, but digging old threads I'm not too hopeful unless that changed: https://lkml.iu.edu/hypermail/linux/kernel/1711.1/03393.html https://sourceware.org/bugzilla/show_bug.cgi?id=22434
(...a test build later, it's still mostly useless... normal build $ ./scripts/faddr2line vmlinux __schedule+0x314 __schedule+0x314/0x6c0: perf_fetch_caller_regs at include/linux/perf_event.h:1286 (inlined by) __perf_sw_event_sched at include/linux/perf_event.h:1307 (inlined by) perf_event_task_sched_out at include/linux/perf_event.h:1347 (inlined by) prepare_task_switch at kernel/sched/core.c:5053 (inlined by) context_switch at kernel/sched/core.c:5195 (inlined by) __schedule at kernel/sched/core.c:6561
split dwarf build $ ./scripts/faddr2line vmlinux __schedule+0x314 aarch64-linux-gnu-addr2line: DWARF error: could not find abbrev number 860923 __schedule+0x314/0x780: aarch64-linux-gnu-addr2line: DWARF error: could not find abbrev number 860923 __schedule at core.c:?
I'd tend to agree build time/space savings aren't worth the developer time. )
Anyway, address sanitizer used to have a kasan_symbolize.py script but it looks like it got removed as no longer maintained, and I'm not sure what's a good tool to just run these logs through nowadays, might want to ask other test projects folks what they use...
If you still have the vmlinux binary from that build (or if you can rebuild with the same options), running this text through addr2line should not take you too long.
Please find build artifacts in this link,
- config
- vmlinux
- System.map
https://people.linaro.org/~anders.roxell/next-20221130-allmodconfig-arm64-tu...
So from the disassembly...
- p9_client_cb+0x84 is right before the wake_up and after the wmb(), so
I assume we're on writing req->status line 441:
p9_client_cb(...) { ... smp_wmb(); req->status = status;
wake_up(&req->wq);
report is about a write from 2 to 3, this makes sense we're going from REQ_STATUS_SENT (2) to REQ_STATUS_RCVD (3).
- p9_client_rpc+0x1d0 isn't as simple to pin down as I'm having a hard
time making sense of the kcsan instrumentations... The report is talking about a READ of 4 bytes at the same address, so I'd expect to see an ccess to req->status (and we're likely spot on wait_event_killable which checks req->status), but this doesn't seem to match up with the assembly: here's the excerpt from disass around 0x1d0 = 464 (why doesn't gdb provide hex offsets..)
0xffff80000a46e9b8 <+440>: cmn w28, #0x200 0xffff80000a46e9bc <+444>: ccmn w28, #0xe, #0x4, ne // ne = any 0xffff80000a46e9c0 <+448>: b.eq 0xffff80000a46ecfc <p9_client_rpc+1276> // b.none 0xffff80000a46e9c4 <+452>: mov x0, x25 0xffff80000a46e9c8 <+456>: bl 0xffff800008543640 <__tsan_write4> 0xffff80000a46e9cc <+460>: mov w0, #0x2 // #2 0xffff80000a46e9d0 <+464>: str w0, [x21, #88] 0xffff80000a46e9d4 <+468>: b 0xffff80000a46ecfc <p9_client_rpc+1276> 0xffff80000a46e9d8 <+472>: mov w27, #0x1 // #1 0xffff80000a46e9dc <+476>: mov x0, x23 0xffff80000a46e9e0 <+480>: mov w1, #0x2bc // #700 0xffff80000a46e9e4 <+484>: bl 0xffff800008192d80 <__might_sleep>
+464 is a write to x21 (client 'c', from looking at how it is passed into x0 for other function calls) at offset 88 (status field according to dwarf infos from a rebuild with your config/same sources)
So, err, I'm a bit lost on this side. But I can't really find a problem with what KCSAN complains about -- we are indeed accessing status from two threads without any locks. Instead of a lock, we're using a barrier so that:
- recv thread/cb: writes to req stuff || write to req status
- p9_client_rpc: reads req status || reads other fields from req
Which has been working well enough (at least, without the barrier things blow up quite fast).
So can I'll just consider this a false positive, but if someone knows how much one can read into this that'd be appreciated.
The barriers only ensure ordering, but not atomicity of the accesses themselves (for one, the compiler is well in its right to transform plain accesses in ways that the concurrent algorithm wasn't designed for). In this case it looks like it's just missing READ_ONCE()/WRITE_ONCE().
A (relatively) quick primer on the kernel's memory model and where/what/how we need to "mark" accesses: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
Thanks, -- Marco
Marco Elver wrote on Sat, Dec 03, 2022 at 05:46:46PM +0100:
But I can't really find a problem with what KCSAN complains about -- we are indeed accessing status from two threads without any locks. Instead of a lock, we're using a barrier so that:
- recv thread/cb: writes to req stuff || write to req status
- p9_client_rpc: reads req status || reads other fields from req
Which has been working well enough (at least, without the barrier things blow up quite fast).
So can I'll just consider this a false positive, but if someone knows how much one can read into this that'd be appreciated.
The barriers only ensure ordering, but not atomicity of the accesses themselves (for one, the compiler is well in its right to transform plain accesses in ways that the concurrent algorithm wasn't designed for). In this case it looks like it's just missing READ_ONCE()/WRITE_ONCE().
Aha! Thanks for this!
I've always believed plain int types accesses are always atomic and the only thing to watch for would be compilers reordering instrucions, which would be ensured by the barrier in this case, but I guess there are some architectures or places where this isn't true?
I'm a bit confused though, I can only see five places where wait_event* functions use READ_ONCE and I believe they more or less all would require such a marker -- I guess non-equality checks might be safe (waiting for a value to change from a known value) but if non-atomic updates are on the table equality and comparisons checks all would need to be decorated with READ_ONCE; afaiu, unlike usespace loops with pthread_cond_wait there is nothing protecting the condition itself.
Should I just update the wrapped condition, as below?
- err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD); + err = wait_event_killable(req->wq, + READ_ONCE(req->status) >= REQ_STATUS_RCVD);
The writes all are straightforward, there's all the error paths to convert to WRITE_ONCE too but that's not difficult (leaving only the init without such a marker); I'll send a patch when you've confirmed the read looks good. (the other reads are a bit less obvious as some are protected by a lock in trans_fd, which should cover all cases of possible concurrent updates there as far as I can see, but this mixed model is definitely hard to reason with... Well, that's how it was written and I won't ever have time to rewrite any of this. Enough ranting.)
A (relatively) quick primer on the kernel's memory model and where/what/how we need to "mark" accesses: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
I read Documentation/memory-barriers.txt ages ago but wasn't aware of this memory-model directory; I've skimmed through and will have a proper read as time permits.
Thank you,
On Sun, 4 Dec 2022 at 00:08, Dominique Martinet asmadeus@codewreck.org wrote:
Marco Elver wrote on Sat, Dec 03, 2022 at 05:46:46PM +0100:
But I can't really find a problem with what KCSAN complains about -- we are indeed accessing status from two threads without any locks. Instead of a lock, we're using a barrier so that:
- recv thread/cb: writes to req stuff || write to req status
- p9_client_rpc: reads req status || reads other fields from req
Which has been working well enough (at least, without the barrier things blow up quite fast).
So can I'll just consider this a false positive, but if someone knows how much one can read into this that'd be appreciated.
The barriers only ensure ordering, but not atomicity of the accesses themselves (for one, the compiler is well in its right to transform plain accesses in ways that the concurrent algorithm wasn't designed for). In this case it looks like it's just missing READ_ONCE()/WRITE_ONCE().
Aha! Thanks for this!
I've always believed plain int types accesses are always atomic and the only thing to watch for would be compilers reordering instrucions, which would be ensured by the barrier in this case, but I guess there are some architectures or places where this isn't true?
I'm a bit confused though, I can only see five places where wait_event* functions use READ_ONCE and I believe they more or less all would require such a marker -- I guess non-equality checks might be safe (waiting for a value to change from a known value) but if non-atomic updates are on the table equality and comparisons checks all would need to be decorated with READ_ONCE; afaiu, unlike usespace loops with pthread_cond_wait there is nothing protecting the condition itself.
Should I just update the wrapped condition, as below?
err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
err = wait_event_killable(req->wq,
READ_ONCE(req->status) >= REQ_STATUS_RCVD);
Yes, this looks good!
The writes all are straightforward, there's all the error paths to convert to WRITE_ONCE too but that's not difficult (leaving only the init without such a marker); I'll send a patch when you've confirmed the read looks good. (the other reads are a bit less obvious as some are protected by a lock in trans_fd, which should cover all cases of possible concurrent updates there as far as I can see, but this mixed model is definitely hard to reason with... Well, that's how it was written and I won't ever have time to rewrite any of this. Enough ranting.)
If the lock-protected accesses indeed are non-racy, they should be left unmarked. If some assumption here turns out to be wrong, KCSAN would (hopefully) tell us one way or another.
Thanks!
-- Marco
Marco Elver wrote on Mon, Dec 05, 2022 at 08:00:00AM +0100:
Should I just update the wrapped condition, as below?
err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
err = wait_event_killable(req->wq,
READ_ONCE(req->status) >= REQ_STATUS_RCVD);
Yes, this looks good!
The writes all are straightforward, there's all the error paths to convert to WRITE_ONCE too but that's not difficult (leaving only the init without such a marker); I'll send a patch when you've confirmed the read looks good. (the other reads are a bit less obvious as some are protected by a lock in trans_fd, which should cover all cases of possible concurrent updates there as far as I can see, but this mixed model is definitely hard to reason with... Well, that's how it was written and I won't ever have time to rewrite any of this. Enough ranting.)
If the lock-protected accesses indeed are non-racy, they should be left unmarked. If some assumption here turns out to be wrong, KCSAN would (hopefully) tell us one way or another.
Great, that makes sense.
I've left the commit at home, will submit it tonight -- you and Naresh will be in Cc from suggested/reported-by tags.