On Wed, 6 Oct 2021 at 00:37, Thomas Backlund tmb@iki.fi wrote:
Den 2021-10-05 kl. 18:59, skrev Guenter Roeck:
On Tue, Oct 05, 2021 at 10:38:40AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.14.10 release. There are 173 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 07 Oct 2021 08:32:44 +0000. Anything received after that time might be too late.
AFAICS the warning problems are still seen. Unfortunately I won't be able to bisect since I have limited internet access.
Guenter
========================= WARNING: held lock freed! 5.14.10-rc2-00174-g355f3195d051 #1 Not tainted
ip/202 is freeing memory c000000009918900-c000000009918f7f, with a lock still held there! c000000009918a20 (sk_lock-AF_INET){+.+.}-{0:0}, at: .sk_common_release+0x4c/0x1b0 2 locks held by ip/202: #0: c00000000ae149d0 (&sb->s_type->i_mutex_key#5){+.+.}-{3:3}, at: .__sock_release+0x4c/0x150 #1: c000000009918a20 (sk_lock-AF_INET){+.+.}-{0:0}, at: .sk_common_release+0x4c/0x1b0
When I reverted the following two patches the warning got fixed.
73a03563f123 af_unix: fix races in sk_peer_pid and sk_peer_cred accesses b226d61807f1 net: introduce and use lock_sock_fast_nested()
Isn't this a fallout of:
queue-5.14/net-introduce-and-use-lock_sock_fast_nested.patch that has: Fixes: 2dcb96bacce3 ("net: core: Correct the sock::sk_lock.owned lockdep annotations")
I have cherry-picked and tested but still I see this new warning.
old warings are: ---------------------- [ 22.528947] WARNING: held lock freed! or [ 36.765439] WARNING: lock held when returning to user space!
new warning after the cherry pick. ------------------ [ 22.330646] ============================================ [ 22.335955] WARNING: possible recursive locking detected [ 22.341260] 5.14.10-rc2 #1 Not tainted [ 22.345004] -------------------------------------------- [ 22.348869] igb 0000:02:00.0 eno2: renamed from eth1 [ 22.350309] sd-resolve/345 is trying to acquire lock: [ 22.350310] ffff9a39c9580120 (sk_lock-AF_INET){+.+.}-{0:0}, at: udp_destroy_sock+0x3a/0xe0 [ 22.350317] [ 22.350317] but task is already holding lock: [ 22.350317] ffff9a39c9580120 (sk_lock-AF_INET){+.+.}-{0:0}, at: sk_common_release+0x22/0x110 [ 22.350321] [ 22.350321] other info that might help us debug this: [ 22.350321] Possible unsafe locking scenario: [ 22.350321] [ 22.350322] CPU0 [ 22.350322] ---- [ 22.350322] lock(sk_lock-AF_INET); [ 22.350323] lock(sk_lock-AF_INET); [ 22.350324] [ 22.350324] *** DEADLOCK *** [ 22.350324] [ 22.350324] May be due to missing lock nesting notation [ 22.350324] [ 22.350325] 2 locks held by sd-resolve/345: [ 22.350326] #0: ffff9a39c0610c10 (&sb->s_type->i_mutex_key#6){+.+.}-{3:3}, at: __sock_release+0x32/0xb0 [ 22.424188] #1: ffff9a39c9580120 (sk_lock-AF_INET){+.+.}-{0:0}, at: sk_common_release+0x22/0x110 [ 22.424191] [ 22.424191] stack backtrace: [ 22.424192] CPU: 2 PID: 345 Comm: sd-resolve Not tainted 5.14.10-rc2 #1 [ 22.424194] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0b 07/27/2017 [ 22.424195] Call Trace: [ 22.424196] dump_stack_lvl+0x49/0x5e [ 22.424200] dump_stack+0x10/0x12 [ 22.424202] __lock_acquire.cold+0x21f/0x2b8 [ 22.424204] ? lock_is_held_type+0x9d/0x110 [ 22.424207] lock_acquire+0xb5/0x2c0 [ 22.424209] ? udp_destroy_sock+0x3a/0xe0 [ 22.424212] ? sk_common_release+0x22/0x110 [ 22.424214] __lock_sock_fast+0x34/0x90 [ 22.424216] ? udp_destroy_sock+0x3a/0xe0 [ 22.424217] udp_destroy_sock+0x3a/0xe0 [ 22.424219] ? sk_common_release+0x22/0x110 [ 22.424220] sk_common_release+0x22/0x110 [ 22.424221] udp_lib_close+0x9/0x10 [ 22.424223] inet_release+0x48/0x80 [ 22.424225] __sock_release+0x42/0xb0 [ 22.424227] sock_close+0x18/0x20 [ 22.424228] __fput+0xbb/0x270 [ 22.424230] ____fput+0xe/0x10 [ 22.424231] task_work_run+0x64/0xb0 [ 22.424234] exit_to_user_mode_prepare+0x201/0x210 [ 22.424237] syscall_exit_to_user_mode+0x1e/0x50 [ 22.424239] do_syscall_64+0x69/0x80 [ 22.424241] ? do_syscall_64+0x69/0x80 [ 22.424243] ? exc_page_fault+0x7c/0x220 [ 22.424244] ? asm_exc_page_fault+0x8/0x30 [ 22.424246] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 22.424248] RIP: 0033:0x7f509b8c4837 [ 22.424250] Code: 2c 00 f7 d8 64 89 02 b8 ff ff ff ff c3 48 8b 0d 57 86 2c 00 f7 d8 64 89 01 b8 ff ff ff ff eb c2 0f 1f 00 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 31 86 2c 00 f7 d8 64 89 02 b8 [ 22.424251] RSP: 002b:00007f509a9419c8 EFLAGS: 00000213 ORIG_RAX: 0000000000000003 [ 22.424253] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00007f509b8c4837 [ 22.424254] RDX: 0000000000001387 RSI: 0000000000000000 RDI: 000000000000000c [ 22.424255] RBP: 00007f509a949db8 R08: 0000000000000000 R09: 0000000000005d18 [ 22.424256] R10: 00007ffc8831c080 R11: 0000000000000213 R12: 0000000000000000 [ 22.424257] R13: 0000000000000000 R14: 00007f509a949db8 R15: 0000000000000004
ref: new warning after the cherry-pick full test log link, https://lkft.validation.linaro.org/scheduler/job/3673750
on RC2 the original reported warning links, https://lkft.validation.linaro.org/scheduler/job/3672925#L1185 https://lkft.validation.linaro.org/scheduler/job/3672930#L1261
BUT:
$ git describe --contains 2dcb96bacce3 v5.15-rc3~30^2~26
-- Thomas
- Naresh