This patch is fatal for me in the downstream Raspberry Pi kernel - it locks up on boot even before the earlycon output is available. Hacking jump_label_init to skip the jump_entry for "crng_is_ready" allows it to boot, but is likely to have consequences further down the line.
The stable branch may not be living up to its name, but I don't think this is a quick fix.
Phil
Hi Phil,
Thanks for testing this. Can you let me know if v1 of this works?
https://lore.kernel.org/lkml/20220602212234.344394-1-Jason@zx2c4.com/
(I'll also fashion a revert for this part of stable.)
Jason
Hi Jason,
On 07/06/2022 09:30, Jason A. Donenfeld wrote:
Hi Phil,
Thanks for testing this. Can you let me know if v1 of this works?
https://lore.kernel.org/lkml/20220602212234.344394-1-Jason@zx2c4.com/
(I'll also fashion a revert for this part of stable.)
Jason
Thanks for the quick response, but that doesn't work for me either. Let me say again that I'm on a downstream kernel (rpi-5.15.y) so this may not be a universal problem, but merging either of these fixing patches would be fatal for us.
Phil
Hi Phil,
On Tue, Jun 7, 2022 at 10:47 AM Phil Elwell phil@raspberrypi.com wrote:
Thanks for the quick response, but that doesn't work for me either. Let me say again that I'm on a downstream kernel (rpi-5.15.y) so this may not be a universal problem, but merging either of these fixing patches would be fatal for us.
Alright, thanks. And I'm guessing you don't currently have a problem *without* either of the fixing patches, because your device tree doesn't use rng-seed. Is that right?
In anycase, I sent in a revert to get all the static branch stuff out of stable -- https://lore.kernel.org/stable/20220607084005.666059-1-Jason@zx2c4.com/ -- so the "urgency" of this should decrease and we can fix this as normal during the 5.19 cycle. But with that said, I do want to get this fixed as soon as possible still. I'll be back at my desk tonight and will finally have access to real hardware again. Are you hitting this by loading a 32bit kernel on a raspi4? Or running a 32bit kernel on the raspi1? Or some other combo?
Jason
Hi Jason,
On Tue, Jun 07, 2022 at 10:51:41AM +0200, Jason A. Donenfeld wrote:
On Tue, Jun 7, 2022 at 10:47 AM Phil Elwell phil@raspberrypi.com wrote:
Thanks for the quick response, but that doesn't work for me either. Let me say again that I'm on a downstream kernel (rpi-5.15.y) so this may not be a universal problem, but merging either of these fixing patches would be fatal for us.
Alright, thanks. And I'm guessing you don't currently have a problem *without* either of the fixing patches, because your device tree doesn't use rng-seed. Is that right?
In anycase, I sent in a revert to get all the static branch stuff out of stable -- https://lore.kernel.org/stable/20220607084005.666059-1-Jason@zx2c4.com/ -- so the "urgency" of this should decrease and we can fix this as normal during the 5.19 cycle.
Since the above revert got queued in -stable, I assume you don't need commit 73e2d827a501 ("arm64: Initialize jump labels before setup_machine_fdt()") in stable either.
Do you plan to fix the crng_ready() static branch differently? If you do, I'd like to revert the corresponding arm64 commit as well. It seems to be harmless but I'd rather not keep it if no longer needed. So please keep me updated whatever you decide.
Thanks.
Hi Catalin,
On Tue, Jun 07, 2022 at 11:06:56AM +0100, Catalin Marinas wrote:
Hi Jason,
On Tue, Jun 07, 2022 at 10:51:41AM +0200, Jason A. Donenfeld wrote:
On Tue, Jun 7, 2022 at 10:47 AM Phil Elwell phil@raspberrypi.com wrote:
Thanks for the quick response, but that doesn't work for me either. Let me say again that I'm on a downstream kernel (rpi-5.15.y) so this may not be a universal problem, but merging either of these fixing patches would be fatal for us.
Alright, thanks. And I'm guessing you don't currently have a problem *without* either of the fixing patches, because your device tree doesn't use rng-seed. Is that right?
In anycase, I sent in a revert to get all the static branch stuff out of stable -- https://lore.kernel.org/stable/20220607084005.666059-1-Jason@zx2c4.com/ -- so the "urgency" of this should decrease and we can fix this as normal during the 5.19 cycle.
Since the above revert got queued in -stable, I assume you don't need commit 73e2d827a501 ("arm64: Initialize jump labels before setup_machine_fdt()") in stable either.
I made the point here: https://lore.kernel.org/stable/Yp8i9DH57dRGfTNf@kroah.com/T/#m8f33bc14b67798...
Do you plan to fix the crng_ready() static branch differently? If you do, I'd like to revert the corresponding arm64 commit as well. It seems to be harmless but I'd rather not keep it if no longer needed. So please keep me updated whatever you decide.
I sent a "backup commit" for that here: https://lore.kernel.org/all/20220607100210.683136-1-Jason@zx2c4.com/ But I would like a few days to see if there's some trivial way of not needing that on arm32. If it turns out to be easy, then I'd prefer the direct fix akin to the arm64 one. If it turns out to be not easy, then I'll merge the backup commit. I'll keep you posted (and I assume anyway you'll see the arm32 attempts progress or fail here, also).
Jason
Hi Catalin,
On Tue, Jun 7, 2022 at 12:07 PM Catalin Marinas catalin.marinas@arm.com wrote:
Do you plan to fix the crng_ready() static branch differently? If you do, I'd like to revert the corresponding arm64 commit as well. It seems to be harmless but I'd rather not keep it if no longer needed. So please keep me updated whatever you decide.
Feel free to revert. I'm aborting this line of inquiry and will just fix it downstream in random.c.
Jason
Hi Jason,
On Wed, Jun 08, 2022 at 10:20:21AM +0200, Jason A. Donenfeld wrote:
On Tue, Jun 7, 2022 at 12:07 PM Catalin Marinas catalin.marinas@arm.com wrote:
Do you plan to fix the crng_ready() static branch differently? If you do, I'd like to revert the corresponding arm64 commit as well. It seems to be harmless but I'd rather not keep it if no longer needed. So please keep me updated whatever you decide.
Feel free to revert. I'm aborting this line of inquiry and will just fix it downstream in random.c.
Thanks for the update.
On Tue, Jun 07, 2022 at 09:47:30AM +0100, Phil Elwell wrote:
Hi Jason,
On 07/06/2022 09:30, Jason A. Donenfeld wrote:
Hi Phil,
Thanks for testing this. Can you let me know if v1 of this works?
https://lore.kernel.org/lkml/20220602212234.344394-1-Jason@zx2c4.com/
(I'll also fashion a revert for this part of stable.)
Jason
Thanks for the quick response, but that doesn't work for me either. Let me say again that I'm on a downstream kernel (rpi-5.15.y) so this may not be a universal problem, but merging either of these fixing patches would be fatal for us.
I have reports of a "clean" 5.15.45 working just fine on a rpi. Anything special in your tree that isn't upstream yet that might be conflicting with this? Any chance you can try a kernel.org release instead?
thanks,
greg k-h
Hi Greg,
On 07/06/2022 10:10, Greg KH wrote:
On Tue, Jun 07, 2022 at 09:47:30AM +0100, Phil Elwell wrote:
Hi Jason,
On 07/06/2022 09:30, Jason A. Donenfeld wrote:
Hi Phil,
Thanks for testing this. Can you let me know if v1 of this works?
https://lore.kernel.org/lkml/20220602212234.344394-1-Jason@zx2c4.com/
(I'll also fashion a revert for this part of stable.)
Jason
Thanks for the quick response, but that doesn't work for me either. Let me say again that I'm on a downstream kernel (rpi-5.15.y) so this may not be a universal problem, but merging either of these fixing patches would be fatal for us.
I have reports of a "clean" 5.15.45 working just fine on a rpi. Anything special in your tree that isn't upstream yet that might be conflicting with this? Any chance you can try a kernel.org release instead?
A clean 5.15.45 boots cleanly, whereas a downstream kernel shows the static key warning (but it does go on to boot). The significant difference is that our defconfigs set CONFIG_RANDOM_TRUST_BOOTLOADER=y - defining that on top of multi_v7_defconfig demonstrates the issue on a clean 5.15.45. Conversely, not setting that option in a downstream kernel build avoids the warning, presumably because it takes much longer to accumulate the required entropy.
Phil
Hi Phil,
On Tue, Jun 07, 2022 at 12:04:13PM +0100, Phil Elwell wrote:
A clean 5.15.45 boots cleanly, whereas a downstream kernel shows the static key warning (but it does go on to boot). The significant difference is that our defconfigs set CONFIG_RANDOM_TRUST_BOOTLOADER=y - defining that on top of multi_v7_defconfig demonstrates the issue on a clean 5.15.45. Conversely, not setting that option in a downstream kernel build avoids the warning
Ah, that makes sense. Note that I've got a patch out for changing that defconfig as well to be Y, which means the CI will catch this sort of thing in the future.
Jason
On Tue, Jun 07, 2022 at 10:30:49AM +0200, Jason A. Donenfeld wrote:
Hi Phil,
Thanks for testing this. Can you let me know if v1 of this works?
https://lore.kernel.org/lkml/20220602212234.344394-1-Jason@zx2c4.com/
(I'll also fashion a revert for this part of stable.)
As the arm32 version hasn't been merged yet, how is it in stable already? If it is in stable, isn't that a yet another violation of the stable kernel rules?
On 6/7/22, Russell King (Oracle) linux@armlinux.org.uk wrote:
On Tue, Jun 07, 2022 at 10:30:49AM +0200, Jason A. Donenfeld wrote:
Hi Phil,
Thanks for testing this. Can you let me know if v1 of this works?
https://lore.kernel.org/lkml/20220602212234.344394-1-Jason@zx2c4.com/
(I'll also fashion a revert for this part of stable.)
As the arm32 version hasn't been merged yet, how is it in stable already? If it is in stable, isn't that a yet another violation of the stable kernel rules?
You misunderstood my ambiguous sentence, sorry. I meant a revert of the thing in stable that makes this discussion here a stable discussion rather than a 5.19 discussion. No arm32 stuff has been committed anywhere.
Jason
Hi Phil,
On Tue, Jun 7, 2022 at 10:29 AM Phil Elwell phil@raspberrypi.com wrote:
This patch is fatal for me in the downstream Raspberry Pi kernel - it locks up on boot even before the earlycon output is available. Hacking jump_label_init to skip the jump_entry for "crng_is_ready" allows it to boot, but is likely to have consequences further down the line.
Also, reading this a few times, I'm not 100% sure I understand what you did to hack around this and why that works. Think you could paste your hackpatch just out of interest to the discussion (but obviously not to be applied)?
Jason
On 07/06/2022 09:43, Jason A. Donenfeld wrote:
Hi Phil,
On Tue, Jun 7, 2022 at 10:29 AM Phil Elwell phil@raspberrypi.com wrote:
This patch is fatal for me in the downstream Raspberry Pi kernel - it locks up on boot even before the earlycon output is available. Hacking jump_label_init to skip the jump_entry for "crng_is_ready" allows it to boot, but is likely to have consequences further down the line.
Also, reading this a few times, I'm not 100% sure I understand what you did to hack around this and why that works. Think you could paste your hackpatch just out of interest to the discussion (but obviously not to be applied)?
This is the minimal version of my workaround patch that at least allows the board to boot. Bear in mind that it was written with no previous knowledge of jump labels and was arrived at by iteratively bisecting the list of jump_labels until the first dangerous one was found, then later working out that there was only one.
diff --git a/kernel/jump_label.c b/kernel/jump_label.c index b156e152d6b48..7b053521f7216 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -466,6 +466,7 @@ void __init jump_label_init(void) struct jump_entry *iter_stop = __stop___jump_table; struct static_key *key = NULL; struct jump_entry *iter; + int iter_count = 0;
/* * Since we are initializing the static_key.enabled field with @@ -499,7 +500,9 @@ void __init jump_label_init(void) continue;
key = iterk; - static_key_set_entries(key, iter); + iter_count++; + if (iter_count != 1083) + static_key_set_entries(key, iter); } static_key_initialized = true; jump_label_unlock();
I'm sure this could be rewritten in a less fragile manner making reference to crng_is_ready directly, but this is where I got to yesterday.
Ha! I just proved the patch's fragility by switching from a Pi 2 to a Pi 4, so the saner version is:
diff --git a/drivers/char/random.c b/drivers/char/random.c index ca17a658c2147..ca7c8a67b8314 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -79,7 +79,7 @@ static enum { CRNG_EARLY = 1, /* At least POOL_EARLY_BITS collected */ CRNG_READY = 2 /* Fully initialized with POOL_READY_BITS collected */ } crng_init __read_mostly = CRNG_EMPTY; -static DEFINE_STATIC_KEY_FALSE(crng_is_ready); +DEFINE_STATIC_KEY_FALSE(crng_is_ready); #define crng_ready() (static_branch_likely(&crng_is_ready) || crng_init >= CRNG_READY) /* Various types of waiters for crng_init->CRNG_READY transition. */ static DECLARE_WAIT_QUEUE_HEAD(crng_init_wait);
diff --git a/kernel/jump_label.c b/kernel/jump_label.c index b156e152d6b48..b79de9da036fd 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -484,6 +484,7 @@ void __init jump_label_init(void) jump_label_sort_entries(iter_start, iter_stop);
for (iter = iter_start; iter < iter_stop; iter++) { + extern struct static_key_false crng_is_ready; struct static_key *iterk; bool in_init;
@@ -499,7 +500,8 @@ void __init jump_label_init(void) continue;
key = iterk; - static_key_set_entries(key, iter); + if (key != &crng_is_ready.key) + static_key_set_entries(key, iter); } static_key_initialized = true; jump_label_unlock();
And to answer your questions from the other thread:
* Without any fixing patches we see the warning messages because we are using rng-seed in DT.
* I've seen the problem on a Pi 2 and a Pi 4 running 32-bit kernels.
Phil
Hey again,
On Tue, Jun 07, 2022 at 10:15:27AM +0100, Phil Elwell wrote:
On 07/06/2022 09:43, Jason A. Donenfeld wrote:
Hi Phil,
On Tue, Jun 7, 2022 at 10:29 AM Phil Elwell phil@raspberrypi.com wrote:
This patch is fatal for me in the downstream Raspberry Pi kernel - it locks up on boot even before the earlycon output is available. Hacking jump_label_init to skip the jump_entry for "crng_is_ready" allows it to boot, but is likely to have consequences further down the line.
Also, reading this a few times, I'm not 100% sure I understand what you did to hack around this and why that works. Think you could paste your hackpatch just out of interest to the discussion (but obviously not to be applied)?
This is the minimal version of my workaround patch that at least allows the board to boot. Bear in mind that it was written with no previous knowledge of jump labels and was arrived at by iteratively bisecting the list of jump_labels until the first dangerous one was found, then later working out that there was only one.
Looks like this patch fails due to CONFIG_STRICT_KERNEL_RWX. Investigating deeper now, but that for starters seems to be the differentiating factor between my prior test rig and one that reproduces the error. I assume your raspi also sets CONFIG_STRICT_KERNEL_RWX.
Jason
Jason,
On 07/06/2022 16:14, Jason A. Donenfeld wrote:
Hey again,
On Tue, Jun 07, 2022 at 10:15:27AM +0100, Phil Elwell wrote:
On 07/06/2022 09:43, Jason A. Donenfeld wrote:
Hi Phil,
On Tue, Jun 7, 2022 at 10:29 AM Phil Elwell phil@raspberrypi.com wrote:
This patch is fatal for me in the downstream Raspberry Pi kernel - it locks up on boot even before the earlycon output is available. Hacking jump_label_init to skip the jump_entry for "crng_is_ready" allows it to boot, but is likely to have consequences further down the line.
Also, reading this a few times, I'm not 100% sure I understand what you did to hack around this and why that works. Think you could paste your hackpatch just out of interest to the discussion (but obviously not to be applied)?
This is the minimal version of my workaround patch that at least allows the board to boot. Bear in mind that it was written with no previous knowledge of jump labels and was arrived at by iteratively bisecting the list of jump_labels until the first dangerous one was found, then later working out that there was only one.
Looks like this patch fails due to CONFIG_STRICT_KERNEL_RWX. Investigating deeper now, but that for starters seems to be the differentiating factor between my prior test rig and one that reproduces the error. I assume your raspi also sets CONFIG_STRICT_KERNEL_RWX.
Yes, it does, as does multi_v7_defconfig.
Phil
Hi Phil,
On Tue, Jun 07, 2022 at 04:35:32PM +0100, Phil Elwell wrote:
Jason,
On 07/06/2022 16:14, Jason A. Donenfeld wrote:
Hey again,
On Tue, Jun 07, 2022 at 10:15:27AM +0100, Phil Elwell wrote:
On 07/06/2022 09:43, Jason A. Donenfeld wrote:
Hi Phil,
On Tue, Jun 7, 2022 at 10:29 AM Phil Elwell phil@raspberrypi.com wrote:
This patch is fatal for me in the downstream Raspberry Pi kernel - it locks up on boot even before the earlycon output is available. Hacking jump_label_init to skip the jump_entry for "crng_is_ready" allows it to boot, but is likely to have consequences further down the line.
Also, reading this a few times, I'm not 100% sure I understand what you did to hack around this and why that works. Think you could paste your hackpatch just out of interest to the discussion (but obviously not to be applied)?
This is the minimal version of my workaround patch that at least allows the board to boot. Bear in mind that it was written with no previous knowledge of jump labels and was arrived at by iteratively bisecting the list of jump_labels until the first dangerous one was found, then later working out that there was only one.
Looks like this patch fails due to CONFIG_STRICT_KERNEL_RWX. Investigating deeper now, but that for starters seems to be the differentiating factor between my prior test rig and one that reproduces the error. I assume your raspi also sets CONFIG_STRICT_KERNEL_RWX.
Yes, it does, as does multi_v7_defconfig.
Oh good. Adjusting my CI now to have that.
Having tickled arch/arm/ a little bit now, this is looking sort of complicated. So I think I might be leaning toward giving up and just rolling with https://git.zx2c4.com/linux-rng/commit/?id=78f79dda.
Unless of course somebody has some ARM chops and can think of a quick easy fix.
Jason
On 07/06/2022 16:44, Jason A. Donenfeld wrote:
Hi Phil,
On Tue, Jun 07, 2022 at 04:35:32PM +0100, Phil Elwell wrote:
Jason,
On 07/06/2022 16:14, Jason A. Donenfeld wrote:
Hey again,
On Tue, Jun 07, 2022 at 10:15:27AM +0100, Phil Elwell wrote:
On 07/06/2022 09:43, Jason A. Donenfeld wrote:
Hi Phil,
On Tue, Jun 7, 2022 at 10:29 AM Phil Elwell phil@raspberrypi.com wrote:
This patch is fatal for me in the downstream Raspberry Pi kernel - it locks up on boot even before the earlycon output is available. Hacking jump_label_init to skip the jump_entry for "crng_is_ready" allows it to boot, but is likely to have consequences further down the line.
Also, reading this a few times, I'm not 100% sure I understand what you did to hack around this and why that works. Think you could paste your hackpatch just out of interest to the discussion (but obviously not to be applied)?
This is the minimal version of my workaround patch that at least allows the board to boot. Bear in mind that it was written with no previous knowledge of jump labels and was arrived at by iteratively bisecting the list of jump_labels until the first dangerous one was found, then later working out that there was only one.
Looks like this patch fails due to CONFIG_STRICT_KERNEL_RWX. Investigating deeper now, but that for starters seems to be the differentiating factor between my prior test rig and one that reproduces the error. I assume your raspi also sets CONFIG_STRICT_KERNEL_RWX.
Yes, it does, as does multi_v7_defconfig.
Oh good. Adjusting my CI now to have that.
Having tickled arch/arm/ a little bit now, this is looking sort of complicated. So I think I might be leaning toward giving up and just rolling with https://git.zx2c4.com/linux-rng/commit/?id=78f79dda.
Unless of course somebody has some ARM chops and can think of a quick easy fix.
I've not looked at the performance implications (if any), but in terms of when the RNG initilialization completes and the lack of a WARN I'm happy with that patch, a.k.a. [1].
Phil
[1] https://lore.kernel.org/lkml/20220607100210.683136-1-Jason@zx2c4.com/
linux-stable-mirror@lists.linaro.org