Hi Dylan,
On Wed, Dec 31, 2025 at 12:00:07PM -0800, Dylan E. wrote:
Hello,
When booting into the v6.18.2 tagged kernel from linux-stable, I get the following stack trace while booting into the system every 1 in 5 boots or so, usually during fsck or early systemd service initialization:
BUG: kernel NULL pointer dereference, address: 0000000000000051 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP CPU: 0 UID: 0 PID: 15 Comm: rcu_preempt Not tainted 6.18.2 #2 PREEMPT(full) Hardware name: /SKYBAY, BIOS 5.12 06/27/2017 RIP: 0010:pick_task_fair+0x57/0x160 Code: 66 90 66 90 48 8b 5d 50 48 85 db 74 10 48 8b 73 70 48 89 ef e8 3a 74 ff ff 85 c0 75 71 be 01 00 00 00 48 89 ef e8 29 a5 ff ff <80> 78 51 00 48 89 c3 0f 85 80 00 00 00 48 85 c0 0f 84 87 00 00 00 RSP: 0000:ffffc900000d3cf8 EFLAGS: 00010086 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000800 RDX: fffffc02295d3c00 RSI: 0000000000000800 RDI: 0000000002edc4f2 RBP: ffff888108f13000 R08: 0000000000000400 R09: 0000000000000002 R10: 0000000000000260 R11: ffff888108b74200 R12: ffff888265c2cd00 R13: 0000000000000000 R14: ffff888265c2cd80 R15: ffffffff827c6fa0 FS: 0000000000000000(0000) GS:ffff8882e2724000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000051 CR3: 00000001110a5003 CR4: 00000000003706f0 Call Trace: <TASK> pick_next_task_fair+0x1d/0x3d0 __schedule+0x1ee/0x10c0 ? lock_timer_base+0x6d/0x90 ? rcu_gp_cleanup+0x560/0x560 schedule+0x23/0xc0 schedule_timeout+0x6e/0xe0 ? hrtimers_cpu_dying+0x1b0/0x1b0 rcu_gp_fqs_loop+0xfb/0x510 rcu_gp_kthread+0xcd/0x160 kthread+0xf5/0x1e0 ? kthreads_online_cpu+0x100/0x100 ? kthreads_online_cpu+0x100/0x100 ret_from_fork+0x114/0x140 ? kthreads_online_cpu+0x100/0x100 ret_from_fork_asm+0x11/0x20 </TASK> Modules linked in: i915 drm_buddy intel_gtt drm_client_lib drm_display_helper drm_kms_helper igb cec dca rc_core i2c_algo_bit ttm agpgart e1000e serio_raw hwmon drm mei_wdt i2c_core intel_oc_wdt video wmi CR2: 0000000000000051 ---[ end trace 0000000000000000 ]--- RIP: 0010:pick_task_fair+0x57/0x160 Code: 66 90 66 90 48 8b 5d 50 48 85 db 74 10 48 8b 73 70 48 89 ef e8 3a 74 ff ff 85 c0 75 71 be 01 00 00 00 48 89 ef e8 29 a5 ff ff <80> 78 51 00 48 89 c3 0f 85 80 00 00 00 48 85 c0 0f 84 87 00 00 00 RSP: 0000:ffffc900000d3cf8 EFLAGS: 00010086 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000800 RDX: fffffc02295d3c00 RSI: 0000000000000800 RDI: 0000000002edc4f2 RBP: ffff888108f13000 R08: 0000000000000400 R09: 0000000000000002 R10: 0000000000000260 R11: ffff888108b74200 R12: ffff888265c2cd00 R13: 0000000000000000 R14: ffff888265c2cd80 R15: ffffffff827c6fa0 FS: 0000000000000000(0000) GS:ffff8882e2724000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000051 CR3: 00000001110a5003 CR4: 00000000003706f0 Kernel panic - not syncing: Fatal exception Shutting down cpus with NMI Kernel Offset: disabled Rebooting in 30 seconds..
I can't seem to reproduce this issue with the v6.18.1 tagged linux-stable build, and after bisecting between v6.18.1 and v6.18.2, I land on this commit (which is clearly not the problem):
d911fa97dab3ba026a8b96bb7f833d007b7fc4e1 | wifi: ath12k: fix VHT MCS assignment
I don't have any ath12k radios in my system, but I do have 1 ath9k and 2 ath10k radios. A little up the tree I see this patch which *could* be related, but I lack the knowledge to know:
b1497ea246396962156b63d5c568a16d6e32de0b | wifi: ath10k: move recovery check logic into a new work
Let me know if there's any more info that's needed or additional steps I can take to further diagnose the bug.
This should be the same issue as reported in https://lore.kernel.org/oe-lkp/202510211205.1e0f5223-lkp@intel.com/ and then fixed in mainline with 127b90315ca0 ("sched/proxy: Yield the donor task") .
Can you confirm picking this commit fixes the issue?
Regards, Salvatore