On Mon, Sep 9, 2024 at 1:02 AM Borislav Petkov bp@alien8.de wrote:
On Sun, Sep 08, 2024 at 11:53:56PM -0700, Hugues Bruant wrote:
Hi,
I have discovered a 100% reliable soft lockup on boot on my laptop: Purism Librem 14, Intel Core i7-10710U, 48Gb RAM, Samsung Evo Plus 970 SSD, CoreBoot BIOS, grub bootloader, Arch Linux.
The last working release is kernel 6.9.10, every release from 6.10 onwards reliably exhibit the issue, which, based on journalctl logs, seems to be triggered somewhere in systemd-udev: https://gitlab.archlinux.org/-/project/42594/uploads/04583baf22189a0a8bb2f87...
Bisect points to commit 5186ba33234c9a90833f7c93ce7de80e25fac6f5
That's a merge commit. Meaning, the bisection likely went into the wrong direction.
I double-checked and the bisection results seem quite consistent. While merge commits are unlikely to be correct bisection results, they're entirely possible if the bug is triggered by an unexpected interaction between multiple unrelated commits.
However, you have out-of-tree modules. Try reproducing it without them.
That was the first suggestion on the Arch bug tracker. The whole bisection was done without out-of-tree modules.
Now, for the fun part: the kind soul on the Arch bugtracker who provided me with the kernel images for bisection built a patched 6.10.9 at my request, reverting just Tony's RDT changes that were flagged by the bisection: bd4955d4bc2182ccb660c9c30a4dd7f36feaf943 and e3ca96e479c91d6ee657d3caa5092a6a3a620f9f
That patch bring the boot success rate on my machine from 0/10 up to 4/10, even though this code is not supposed to be used, its presence is clearly impactful!
The framebuffer fix seems to also have a positive (though smaller, closer to 20%) impact on boot success rate, so I'm planning to test the combination of both as a next step.
See some extra boot logs attached