Hi,
I have discovered a 100% reliable soft lockup on boot on my laptop: Purism Librem 14, Intel Core i7-10710U, 48Gb RAM, Samsung Evo Plus 970 SSD, CoreBoot BIOS, grub bootloader, Arch Linux.
The last working release is kernel 6.9.10, every release from 6.10 onwards reliably exhibit the issue, which, based on journalctl logs, seems to be triggered somewhere in systemd-udev: https://gitlab.archlinux.org/-/project/42594/uploads/04583baf22189a0a8bb2f87...
Bisect points to commit 5186ba33234c9a90833f7c93ce7de80e25fac6f5
At a glance, I see two potentially problematic changes in this diff. Specifically, in the refactoring to move the call to rdt_ctrl_update inside the loop that walks over r->domains :
1. the change from on_each_cpu_mask to smp_call_function_any means that preemption is no longer disabled around the call to rdt_ctrl_update, which could plausibly be a problem
2. there's now a race condition on the msr_params struct: afaict there's no write barrier, so if the call to rdt_ctrl_update is executed on a different CPU, it could plausibly read an outdated value of the dom field, which prior to this series of patches wasn't passed as an explicit parameter, but derived inside rdt_ctrl_update
For initial report to Arch Linux bugtracker and bisect log see: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/74
Best Hugues