While running "mkfs -t ext4" on arm64 juno-r2 device connected with SSD drive the following kernel warning reported on stable rc 5.9.13-rc1 kernel.
Steps to reproduce: ------------------ # boot arm64 Juno-r2 device with stable-rc 5.9.13-rc1. # Connect SSD drive # Format the file system ext4 type mkfs -t ext4 <SSD-drive> # you will notice this warning
Crash log: -------------- Writing superblocks and filesystem accounting information: 0/895 [ 86.131095] [ 86.132592] ===================================== [ 86.137300] WARNING: bad unlock balance detected! [ 86.142012] 5.9.13-rc1 #1 Not tainted [ 86.145675] ------------------------------------- [ 86.150384] mkfs.ext4/426 is trying to release lock (rcu_read_lock) at: [ 86.157020] [<ffff80001063478c>] blk_queue_exit+0xcc/0x1b0 [ 86.162511] but there are no more locks to release! [ 86.167392] [ 86.167392] other info that might help us debug this: [ 86.173929] no locks held by mkfs.ext4/426. [ 86.178114] [ 86.178114] stack backtrace: [ 86.182478] CPU: 1 PID: 426 Comm: mkfs.ext4 Not tainted 5.9.13-rc1 #1 [ 86.188926] Hardware name: ARM Juno development board (r2) (DT) [ 86.194853] Call trace: [ 86.197302] dump_backtrace+0x0/0x1f8 [ 86.200967] show_stack+0x2c/0x38 [ 86.204287] dump_stack+0xec/0x158 [ 86.207694] print_unlock_imbalance_bug+0xec/0xf0 [ 86.212404] lock_release+0x300/0x388 [ 86.216070] blk_queue_exit+0xe0/0x1b0 [ 86.219822] blk_mq_submit_bio+0x250/0xa08 [ 86.223922] submit_bio_noacct+0x468/0x518 [ 86.228022] submit_bio+0x4c/0x230 [ 86.231429] submit_bh_wbc+0x17c/0x218 [ 86.235182] __block_write_full_page+0x210/0x648 [ 86.239805] block_write_full_page+0x8c/0x150 [ 86.244167] blkdev_writepage+0x30/0x40 [ 86.248008] __writepage+0x38/0xd8 [ 86.251412] write_cache_pages+0x1fc/0x590 [ 86.255513] generic_writepages+0x64/0xa0 [ 86.259526] blkdev_writepages+0x28/0x38 [ 86.263452] do_writepages+0x6c/0x138 [ 86.267118] __filemap_fdatawrite_range+0x10c/0x148 [ 86.272001] file_write_and_wait_range+0x6c/0xd0 [ 86.276623] blkdev_fsync+0x3c/0x68 [ 86.280113] vfs_fsync_range+0x4c/0x90 [ 86.283864] do_fsync+0x48/0x78 [ 86.287007] __arm64_sys_fsync+0x24/0x38 [ 86.290933] el0_svc_common.constprop.3+0x7c/0x198 [ 86.295729] do_el0_svc+0x34/0xa0 [ 86.299047] el0_sync_handler+0x16c/0x210 [ 86.303060] el0_sync+0x140/0x180
Reported-by: Naresh Kamboju naresh.kamboju@linaro.org
Full test log link, https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.9.y/build/v5.9.12...
metadata: git branch: linux-5.9.y git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git commit: 1372e1af58d410676db7917cc3484ca22d471623 git describe: v5.9.12-47-g1372e1af58d4 make_kernelversion: 5.9.13-rc1 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/juno/lkft/linux-stab...
On Mon, Dec 07, 2020 at 11:17:29AM +0530, Naresh Kamboju wrote:
While running "mkfs -t ext4" on arm64 juno-r2 device connected with SSD drive the following kernel warning reported on stable rc 5.9.13-rc1 kernel.
Steps to reproduce:
# boot arm64 Juno-r2 device with stable-rc 5.9.13-rc1. # Connect SSD drive # Format the file system ext4 type mkfs -t ext4 <SSD-drive> # you will notice this warning
Does it happen easily? Can you bisect?
Crash log:
Writing superblocks and filesystem accounting information: 0/895 [ 86.131095] [ 86.132592] ===================================== [ 86.137300] WARNING: bad unlock balance detected! [ 86.142012] 5.9.13-rc1 #1 Not tainted [ 86.145675] ------------------------------------- [ 86.150384] mkfs.ext4/426 is trying to release lock (rcu_read_lock) at: [ 86.157020] [<ffff80001063478c>] blk_queue_exit+0xcc/0x1b0 [ 86.162511] but there are no more locks to release!
This really doesn't make much sense. blk_queue_exit() in 5.9.12 does:
percpu_ref_put(&q->q_usage_counter); (literally, that's the entire function)
percpu_ref_put() does:
rcu_read_lock();
if (__ref_is_percpu(ref, &percpu_count)) this_cpu_sub(*percpu_count, nr); else if (unlikely(atomic_long_sub_and_test(nr, &ref->count))) ref->release(ref);
rcu_read_unlock();
Unless ->release() has an unbalanced rcu_read_unlock(), there definitely is a lock to release! Some archaeology says that ->release is blk_queue_usage_counter_release(), which calls wake_up_all(&q->mq_freeze_wq);
which doesn't appear to use RCU at all. So this trace makes no sense, and all I can do is ask you to bisect it.
On Mon, 7 Dec 2020 at 11:37, Matthew Wilcox willy@infradead.org wrote:
On Mon, Dec 07, 2020 at 11:17:29AM +0530, Naresh Kamboju wrote:
While running "mkfs -t ext4" on arm64 juno-r2 device connected with SSD drive the following kernel warning reported on stable rc 5.9.13-rc1 kernel.
Steps to reproduce:
# boot arm64 Juno-r2 device with stable-rc 5.9.13-rc1. # Connect SSD drive # Format the file system ext4 type mkfs -t ext4 <SSD-drive> # you will notice this warning
Does it happen easily? Can you bisect?
I have been running multi test loops to reproduce this problem but no luck yet :( Since it is hard to reproduce we can not bisect.
- Naresh
linux-stable-mirror@lists.linaro.org