Sorry for top-posting and quoting this all, but the actual people involved with the wchan changes don't seem to be on the participant list.
And if people see this in stable kernels, I suspect it happens in my kernel too, but there are more users of stable backports.
I already complained about the locking for the get_wchan() patch for other reasons, but it looks like this is an actual unwinder problem that triggers this issue.
Which I'm not AT ALL surprised about. In my locking complaint, I explicitly mentioned that we've had unwinder issues before, and that it's probably a really bad idea to call some random unwinder inside a critical lock.
I really don't think the WCHAN code should use unwinders at all. It's too damn fragile, and it's too easily triggered from user space.
So I think we need to revert all the wchan changes. Not just in stable, but in mainline too.
And I hope people start taking the whole "unwinding is so fraught with problems that you shouldn't do it in any normal circumstances" seriously.
We've had toolchain issues with unwinding markers being wrong or incomplete.
We've had our own asm files not have all the proper annotations.
And most importantly, we've had unwinding code that is written by people who don't take these issues seriously enough, and just blindly follow possibly corrupted or wrong pointers.
Ironically, in the scheduler pull request for this all, the claim was "Make wchan() more robust".
Which was very much the opposite of what it actually did.
Linus
On Wed, Nov 17, 2021 at 3:32 PM Holger Hoffstätte holger@applied-asynchrony.com wrote:
On 2021-11-17 22:32, Justin Forbes wrote:
On Wed, Nov 17, 2021 at 11:19:15AM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.3 release. There are 923 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 19 Nov 2021 10:14:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.3-rc3.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
I replied to Bruno's original message to lkml which has CKI artifacts for the issue, but I am still seeing it with rc3 on x86:
[ 4.435551] BUG: unable to handle page fault for address: ffffb381402d7de0 [ 4.437498] #PF: supervisor read access in kernel mode [ 4.438937] #PF: error_code(0x0000) - not-present page [ 4.440373] PGD 100000067 P4D 100000067 PUD 1001d7067 PMD 100a1f067 PTE 0 [ 4.442269] Oops: 0000 [#1] SMP PTI [ 4.443256] CPU: 1 PID: 1 Comm: systemd Not tainted 5.15.3-0.rc3.1.fc35.x86_64 #1 [ 4.445230] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-3.fc34 04/01/2014 [ 4.447514] RIP: 0010:__unwind_start+0x10b/0x1e0 [ 4.448749] Code: af fb ff 85 c0 75 d2 eb c0 65 48 8b 04 25 c0 fb 01 00 48 39 c6 0f 84 86 00 00 00 48 8b 86 98 23 00 00 48 8d 78 38 48 89 7d 38 <48> 8b 50 28 48 89 55 40 48 8b 40 30 48 89 45 48 48 3d 80 43 00 a1 [ 4.453406] RSP: 0018:ffffb38140017c18 EFLAGS: 00010006 [ 4.454672] RAX: ffffb381402d7db8 RBX: ffffb381402d7db8 RCX: 0000000000000000 [ 4.456370] RDX: 0000000000000000 RSI: ffff9b5080c08000 RDI: ffffb381402d7df0 [ 4.458065] RBP: ffffb38140017c38 R08: 0000000000000040 R09: 0000000000005000 [ 4.459689] R10: 8000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 4.461306] R13: ffff9b5080c08c74 R14: 000000000000024b R15: 0000000000000001 [ 4.462857] FS: 00007f8d7729c340(0000) GS:ffff9b51f7d00000(0000) knlGS:0000000000000000 [ 4.464613] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.465825] CR2: ffffb381402d7de0 CR3: 0000000100244004 CR4: 0000000000770ee0 [ 4.467301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.468789] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4.470217] PKRU: 55555554 [ 4.470777] Call Trace: [ 4.471280] <TASK> [ 4.471718] __get_wchan+0x35/0x80 [ 4.472415] get_wchan+0x65/0x80 [ 4.473085] do_task_stat+0xcd9/0xde0 [ 4.473821] proc_single_show+0x4d/0xb0 [ 4.474583] seq_read_iter+0x120/0x4b0 [ 4.475327] seq_read+0xed/0x120 [ 4.475973] ? cap_convert_nscap+0x160/0x1b0 [ 4.476832] vfs_read+0x95/0x190 [ 4.477472] ksys_read+0x4f/0xc0 [ 4.478115] do_syscall_64+0x3b/0x90 [ 4.478830] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4.479823] RIP: 0033:0x7f8d77e2c31c [ 4.480537] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 f9 49 f9 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 4f 4a f9 ff 48 [ 4.484140] RSP: 002b:00007ffc2434e8c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 4.485608] RAX: ffffffffffffffda RBX: 000055aa6dc4f650 RCX: 00007f8d77e2c31c [ 4.486991] RDX: 0000000000000400 RSI: 000055aa6dcaf960 RDI: 0000000000000005 [ 4.488376] RBP: 00007f8d77f00300 R08: 0000000000000000 R09: 0000000000000001 [ 4.489761] R10: 0000000000001000 R11: 0000000000000246 R12: 00007f8d7729c0f8 [ 4.491159] R13: 0000000000000d68 R14: 00007f8d77eff700 R15: 0000000000000d68 [ 4.492545] </TASK> [ 4.492982] Modules linked in: xfs crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_console virtio_blk virtio_net net_failover failover qemu_fw_cfg pkcs8_key_parser [ 4.496354] CR2: ffffb381402d7de0 [ 4.497010] ---[ end trace dc5691b47f8ba15b ]--- [ 4.497913] RIP: 0010:__unwind_start+0x10b/0x1e0 [ 4.498822] Code: af fb ff 85 c0 75 d2 eb c0 65 48 8b 04 25 c0 fb 01 00 48 39 c6 0f 84 86 00 00 00 48 8b 86 98 23 00 00 48 8d 78 38 48 89 7d 38 <48> 8b 50 28 48 89 55 40 48 8b 40 30 48 89 45 48 48 3d 80 43 00 a1 [ 4.502401] RSP: 0018:ffffb38140017c18 EFLAGS: 00010006 [ 4.503418] RAX: ffffb381402d7db8 RBX: ffffb381402d7db8 RCX: 0000000000000000 [ 4.504803] RDX: 0000000000000000 RSI: ffff9b5080c08000 RDI: ffffb381402d7df0 [ 4.506185] RBP: ffffb38140017c38 R08: 0000000000000040 R09: 0000000000005000 [ 4.507582] R10: 8000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 4.508956] R13: ffff9b5080c08c74 R14: 000000000000024b R15: 0000000000000001 [ 4.510339] FS: 00007f8d7729c340(0000) GS:ffff9b51f7d00000(0000) knlGS:0000000000000000 [ 4.511914] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.513032] CR2: ffffb381402d7de0 CR3: 0000000100244004 CR4: 0000000000770ee0 [ 4.514420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.515803] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4.517182] PKRU: 55555554 [ 4.517724] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 [ 4.519317] Kernel Offset: 0x20000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 4.521398] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
This is great! Several people (incl. me) have seen the _exact same_ trace, but with BMQ/PDS (custom CPU schedulers) so we suspected a locking issue/incompatibility in get_wchan()'s spinlocking & task diddling compared to CFS. The fact that this happens with vanilla means it's a generic problem with either: "sched: Add wrapper for get_wchan() to keep task blocked" or "x86: Fix get_wchan() to support the ORC unwinder" or both. I have been running with a dummy implementation of get_wchan that just returns 0 (effectively disabling wchan) and 5.15.3-rc3 has been rock-solid again.
Maybe just revert all the wchan stuff and let it stew in mainline a bit longer?
-h