Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18

List overview All Threads
Download

newer

older

[PATCH] arm64: tegra: delete the...

[PATCH 6.6 000/142] 6.6.81-rc1...

Ulrich Gemkow

4 Mar 2025 4 Mar '25

2:49 p.m.

Hello,

starting with stable kernel 6.6.18 we have problems with PXE booting. A bisect shows that the following patch is guilty:

From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel ardb@kernel.org Date: Tue, 12 Sep 2023 09:00:55 +0000 Subject: x86/boot: Remove the 'bugger off' message

Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: H. Peter Anvin (Intel) hpa@zytor.com Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com

With this patch applied PXE starts, requests the kernel and the initrd. Without showing anything on the console, the boot process stops. It seems, that the kernel crashes very early.

With stable kernel 6.6.17 PXE boot works without problems.

Reverting this single patch (which is part of a larger set of patches) solved the problem for us, PXE boot is working again.

We use the packages syslinux-efi and syslinux-common from Debian 12. The used boot files are /efi64/syslinux.efi and /ldlinux.e64.

Our config-File (for 6.6.80) is attached.

Regarding the patch description, we really do not boot with a floppy :-)

Any help would be greatly appreciated, I have a bit of a bad feeling about simply reverting a patch at such a deep level in the kernel.

Thank you and best regards

Ulrich

-- |----------------------------------------------------------------------- | Ulrich Gemkow | University of Stuttgart | Institute of Communication Networks and Computer Engineering (IKR) |-----------------------------------------------------------------------

Attachments:

config.xz (application/x-xz — 24.9 KB)

Show replies by date

Greg KH

4 Mar 4 Mar

4:20 p.m.

On Tue, Mar 04, 2025 at 03:49:35PM +0100, Ulrich Gemkow wrote:

...

Hello,

starting with stable kernel 6.6.18 we have problems with PXE booting. A bisect shows that the following patch is guilty:

From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel ardb@kernel.org Date: Tue, 12 Sep 2023 09:00:55 +0000 Subject: x86/boot: Remove the 'bugger off' message

Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: H. Peter Anvin (Intel) hpa@zytor.com Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com

With this patch applied PXE starts, requests the kernel and the initrd. Without showing anything on the console, the boot process stops. It seems, that the kernel crashes very early.

With stable kernel 6.6.17 PXE boot works without problems.

Reverting this single patch (which is part of a larger set of patches) solved the problem for us, PXE boot is working again.

We use the packages syslinux-efi and syslinux-common from Debian 12. The used boot files are /efi64/syslinux.efi and /ldlinux.e64.

Our config-File (for 6.6.80) is attached.

Regarding the patch description, we really do not boot with a floppy :-)

Any help would be greatly appreciated, I have a bit of a bad feeling about simply reverting a patch at such a deep level in the kernel.

Does newer kernels than 6.7.y work properly? What about the latest 6.12.y release?

thanks,

greg k-h

Ulrich Gemkow

4:59 p.m.

Hallo,

On Tuesday 04 March 2025, Greg KH wrote:

...

On Tue, Mar 04, 2025 at 03:49:35PM +0100, Ulrich Gemkow wrote:

...
Hello,

starting with stable kernel 6.6.18 we have problems with PXE booting. A bisect shows that the following patch is guilty:

From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel ardb@kernel.org Date: Tue, 12 Sep 2023 09:00:55 +0000 Subject: x86/boot: Remove the 'bugger off' message

Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: H. Peter Anvin (Intel) hpa@zytor.com Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com

With this patch applied PXE starts, requests the kernel and the initrd. Without showing anything on the console, the boot process stops. It seems, that the kernel crashes very early.

With stable kernel 6.6.17 PXE boot works without problems.

Reverting this single patch (which is part of a larger set of patches) solved the problem for us, PXE boot is working again.

We use the packages syslinux-efi and syslinux-common from Debian 12. The used boot files are /efi64/syslinux.efi and /ldlinux.e64.

Our config-File (for 6.6.80) is attached.

Regarding the patch description, we really do not boot with a floppy :-)

Any help would be greatly appreciated, I have a bit of a bad feeling about simply reverting a patch at such a deep level in the kernel.

Does newer kernels than 6.7.y work properly? What about the latest 6.12.y release?

thanks,

greg k-h

Thanks for looking into this!

The latest 6.12.y kernel has the same problem, it also needs reverting the mentioned patch. I did not test Kernels in between but I am happy to do so, when this gives a hint.

Thanks again and best regards

Ulrich

Greg KH

5:40 p.m.

On Tue, Mar 04, 2025 at 05:59:32PM +0100, Ulrich Gemkow wrote:

...

Hallo,

On Tuesday 04 March 2025, Greg KH wrote:

...
On Tue, Mar 04, 2025 at 03:49:35PM +0100, Ulrich Gemkow wrote:

...
Hello,

starting with stable kernel 6.6.18 we have problems with PXE booting. A bisect shows that the following patch is guilty:

From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel ardb@kernel.org Date: Tue, 12 Sep 2023 09:00:55 +0000 Subject: x86/boot: Remove the 'bugger off' message

Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: H. Peter Anvin (Intel) hpa@zytor.com Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com

With this patch applied PXE starts, requests the kernel and the initrd. Without showing anything on the console, the boot process stops. It seems, that the kernel crashes very early.

With stable kernel 6.6.17 PXE boot works without problems.

Reverting this single patch (which is part of a larger set of patches) solved the problem for us, PXE boot is working again.

We use the packages syslinux-efi and syslinux-common from Debian 12. The used boot files are /efi64/syslinux.efi and /ldlinux.e64.

Our config-File (for 6.6.80) is attached.

Regarding the patch description, we really do not boot with a floppy :-)

Any help would be greatly appreciated, I have a bit of a bad feeling about simply reverting a patch at such a deep level in the kernel.

Does newer kernels than 6.7.y work properly? What about the latest 6.12.y release?

thanks,

greg k-h

Thanks for looking into this!

The latest 6.12.y kernel has the same problem, it also needs reverting the mentioned patch. I did not test Kernels in between but I am happy to do so, when this gives a hint.

Thanks again and best regards

Great, then this is an issue in Linus's tree and should be fixed there first.

thansk,

greg k-h

Ard Biesheuvel

6 Mar 6 Mar

10 a.m.

On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow ulrich.gemkow@ikr.uni-stuttgart.de wrote:

...

Hello,

starting with stable kernel 6.6.18 we have problems with PXE booting. A bisect shows that the following patch is guilty:

From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel ardb@kernel.org Date: Tue, 12 Sep 2023 09:00:55 +0000 Subject: x86/boot: Remove the 'bugger off' message

Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: H. Peter Anvin (Intel) hpa@zytor.com Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com

With this patch applied PXE starts, requests the kernel and the initrd. Without showing anything on the console, the boot process stops. It seems, that the kernel crashes very early.

With stable kernel 6.6.17 PXE boot works without problems.

Reverting this single patch (which is part of a larger set of patches) solved the problem for us, PXE boot is working again.

We use the packages syslinux-efi and syslinux-common from Debian 12. The used boot files are /efi64/syslinux.efi and /ldlinux.e64.

Our config-File (for 6.6.80) is attached.

Regarding the patch description, we really do not boot with a floppy :-)

Any help would be greatly appreciated, I have a bit of a bad feeling about simply reverting a patch at such a deep level in the kernel.

Hello Ulrich,

Thanks for the report, and apologies for the breakage.

I will look into this today - hopefully it is something that can be resolved swiftly.

Can you share your syslinux config too, please?

Ulrich Gemkow

10:07 a.m.

On Thursday 06 March 2025, Ard Biesheuvel wrote:

...

On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow ulrich.gemkow@ikr.uni-stuttgart.de wrote:

...
Hello,

starting with stable kernel 6.6.18 we have problems with PXE booting. A bisect shows that the following patch is guilty:

From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel ardb@kernel.org Date: Tue, 12 Sep 2023 09:00:55 +0000 Subject: x86/boot: Remove the 'bugger off' message

Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: H. Peter Anvin (Intel) hpa@zytor.com Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com

With this patch applied PXE starts, requests the kernel and the initrd. Without showing anything on the console, the boot process stops. It seems, that the kernel crashes very early.

With stable kernel 6.6.17 PXE boot works without problems.

Reverting this single patch (which is part of a larger set of patches) solved the problem for us, PXE boot is working again.

We use the packages syslinux-efi and syslinux-common from Debian 12. The used boot files are /efi64/syslinux.efi and /ldlinux.e64.

Our config-File (for 6.6.80) is attached.

Regarding the patch description, we really do not boot with a floppy :-)

Any help would be greatly appreciated, I have a bit of a bad feeling about simply reverting a patch at such a deep level in the kernel.

Hello Ulrich,

Thanks for the report, and apologies for the breakage.

I will look into this today - hopefully it is something that can be resolved swiftly.

Can you share your syslinux config too, please?

Hello Ard,

Thank you! The config file is attached. Please feel free to ask for more info.

Best regards

Ulrich

Ard Biesheuvel

2:36 p.m.

(cc Peter)

On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow ulrich.gemkow@ikr.uni-stuttgart.de wrote:

...

Hello,

starting with stable kernel 6.6.18 we have problems with PXE booting. A bisect shows that the following patch is guilty:

From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel ardb@kernel.org Date: Tue, 12 Sep 2023 09:00:55 +0000 Subject: x86/boot: Remove the 'bugger off' message

Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: H. Peter Anvin (Intel) hpa@zytor.com Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com

With this patch applied PXE starts, requests the kernel and the initrd. Without showing anything on the console, the boot process stops. It seems, that the kernel crashes very early.

With stable kernel 6.6.17 PXE boot works without problems.

Reverting this single patch (which is part of a larger set of patches) solved the problem for us, PXE boot is working again.

We use the packages syslinux-efi and syslinux-common from Debian 12. The used boot files are /efi64/syslinux.efi and /ldlinux.e64.

I managed to track this down to a bug in syslinux, fixed by the hunk below. The problem is that syslinux violates the x86 boot protocol, which stipulates that the setup header (starting at 0x1f1 bytes into the bzImage) must be copied into a zeroed boot_params structure, but it also copies the preceding bytes, which could be any value, as they overlap with the PE/COFF header or other header data. This produces a command line pointer with garbage in the top 32 bits, resulting in an early crash.

In your case, you might be able to work around this by removing the padding value (=0xffffffff) from arch/x86/boot/setup.ld, given that you are building with CONFIG_EFI_STUB disabled. However, this still requires fixing on the syslinux side.

[syslinux base commit 05ac953c23f90b2328d393f7eecde96e41aed067]

--- a/efi/main.c +++ b/efi/main.c @@ -1139,10 +1139,14 @@ bp = (struct boot_params *)(UINTN)addr;

memset((void *)bp, 0x0, BOOT_PARAM_BLKSIZE); - /* Copy the first two sectors to boot_params */ - memcpy((char *)bp, kernel_buf, 2 * 512); hdr = (struct linux_header *)bp;

+ /* Copy the setup header to boot_params */ + memcpy(&hdr->setup_sects, + &((struct linux_header *)kernel_buf)->setup_sects, + sizeof(struct linux_header) - + offsetof(struct linux_header, setup_sects)); + setup_sz = (hdr->setup_sects + 1) * 512; if (hdr->version >= 0x20a) { pref_address = hdr->pref_address; --- a/com32/include/syslinux/linux.h +++ b/com32/include/syslinux/linux.h @@ -116,6 +116,7 @@ struct linux_header { uint64_t pref_address; uint32_t init_size; uint32_t handover_offset; + uint32_t kernel_info_offset; } __packed;

struct screen_info {

H. Peter Anvin

2:38 p.m.

On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel ardb@kernel.org wrote:

...

(cc Peter)

On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow ulrich.gemkow@ikr.uni-stuttgart.de wrote:

...
Hello,

starting with stable kernel 6.6.18 we have problems with PXE booting. A bisect shows that the following patch is guilty:

From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel ardb@kernel.org Date: Tue, 12 Sep 2023 09:00:55 +0000 Subject: x86/boot: Remove the 'bugger off' message

Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: H. Peter Anvin (Intel) hpa@zytor.com Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com

With this patch applied PXE starts, requests the kernel and the initrd. Without showing anything on the console, the boot process stops. It seems, that the kernel crashes very early.

With stable kernel 6.6.17 PXE boot works without problems.

Reverting this single patch (which is part of a larger set of patches) solved the problem for us, PXE boot is working again.

We use the packages syslinux-efi and syslinux-common from Debian 12. The used boot files are /efi64/syslinux.efi and /ldlinux.e64.

I managed to track this down to a bug in syslinux, fixed by the hunk below. The problem is that syslinux violates the x86 boot protocol, which stipulates that the setup header (starting at 0x1f1 bytes into the bzImage) must be copied into a zeroed boot_params structure, but it also copies the preceding bytes, which could be any value, as they overlap with the PE/COFF header or other header data. This produces a command line pointer with garbage in the top 32 bits, resulting in an early crash.

In your case, you might be able to work around this by removing the padding value (=0xffffffff) from arch/x86/boot/setup.ld, given that you are building with CONFIG_EFI_STUB disabled. However, this still requires fixing on the syslinux side.

[syslinux base commit 05ac953c23f90b2328d393f7eecde96e41aed067]

--- a/efi/main.c +++ b/efi/main.c @@ -1139,10 +1139,14 @@ bp = (struct boot_params *)(UINTN)addr;
   memset((void *)bp, 0x0, BOOT_PARAM_BLKSIZE);
  /* Copy the first two sectors to boot_params */
  memcpy((char *)bp, kernel_buf, 2 * 512);
 hdr = (struct linux_header *)bp;
   /* Copy the setup header to boot_params */
   memcpy(&hdr->setup_sects,
         &((struct linux_header *)kernel_buf)->setup_sects,
         sizeof(struct linux_header) -
         offsetof(struct linux_header, setup_sects));
 setup_sz = (hdr->setup_sects + 1) * 512;
 if (hdr->version >= 0x20a) {
         pref_address = hdr->pref_address;
--- a/com32/include/syslinux/linux.h +++ b/com32/include/syslinux/linux.h @@ -116,6 +116,7 @@ struct linux_header { uint64_t pref_address; uint32_t init_size; uint32_t handover_offset;

uint32_t kernel_info_offset;

} __packed;

struct screen_info {

Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.

Ard Biesheuvel

2:44 p.m.

On Thu, 6 Mar 2025 at 15:39, H. Peter Anvin hpa@zytor.com wrote:

...

On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel ardb@kernel.org wrote:

...
(cc Peter)

On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow ulrich.gemkow@ikr.uni-stuttgart.de wrote:

...
Hello,

starting with stable kernel 6.6.18 we have problems with PXE booting. A bisect shows that the following patch is guilty:

From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel ardb@kernel.org Date: Tue, 12 Sep 2023 09:00:55 +0000 Subject: x86/boot: Remove the 'bugger off' message

Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: H. Peter Anvin (Intel) hpa@zytor.com Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com

With this patch applied PXE starts, requests the kernel and the initrd. Without showing anything on the console, the boot process stops. It seems, that the kernel crashes very early.

With stable kernel 6.6.17 PXE boot works without problems.

Reverting this single patch (which is part of a larger set of patches) solved the problem for us, PXE boot is working again.

We use the packages syslinux-efi and syslinux-common from Debian 12. The used boot files are /efi64/syslinux.efi and /ldlinux.e64.

I managed to track this down to a bug in syslinux, fixed by the hunk below. The problem is that syslinux violates the x86 boot protocol, which stipulates that the setup header (starting at 0x1f1 bytes into the bzImage) must be copied into a zeroed boot_params structure, but it also copies the preceding bytes, which could be any value, as they overlap with the PE/COFF header or other header data. This produces a command line pointer with garbage in the top 32 bits, resulting in an early crash.

In your case, you might be able to work around this by removing the padding value (=0xffffffff) from arch/x86/boot/setup.ld, given that you are building with CONFIG_EFI_STUB disabled. However, this still requires fixing on the syslinux side.

[syslinux base commit 05ac953c23f90b2328d393f7eecde96e41aed067]

--- a/efi/main.c +++ b/efi/main.c @@ -1139,10 +1139,14 @@ bp = (struct boot_params *)(UINTN)addr;
   memset((void *)bp, 0x0, BOOT_PARAM_BLKSIZE);
  /* Copy the first two sectors to boot_params */
  memcpy((char *)bp, kernel_buf, 2 * 512);
 hdr = (struct linux_header *)bp;
   /* Copy the setup header to boot_params */
   memcpy(&hdr->setup_sects,
         &((struct linux_header *)kernel_buf)->setup_sects,
         sizeof(struct linux_header) -
         offsetof(struct linux_header, setup_sects));
 setup_sz = (hdr->setup_sects + 1) * 512;
 if (hdr->version >= 0x20a) {
         pref_address = hdr->pref_address;
--- a/com32/include/syslinux/linux.h +++ b/com32/include/syslinux/linux.h @@ -116,6 +116,7 @@ struct linux_header { uint64_t pref_address; uint32_t init_size; uint32_t handover_offset;

uint32_t kernel_info_offset;

} __packed;

struct screen_info {
Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.

We're crashing way earlier than the sentinel check - the bogus command line pointer is dereferenced via

startup_64() configure_5level_paging() cmdline_find_option_bool()

whereas sanitize_bootparams() is only called much later, from extract_kernel().

H. Peter Anvin

3:23 p.m.

On March 6, 2025 6:44:11 AM PST, Ard Biesheuvel ardb@kernel.org wrote:

...

On Thu, 6 Mar 2025 at 15:39, H. Peter Anvin hpa@zytor.com wrote:

...
On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel ardb@kernel.org wrote:

...
(cc Peter)

On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow ulrich.gemkow@ikr.uni-stuttgart.de wrote:

...
Hello,

starting with stable kernel 6.6.18 we have problems with PXE booting. A bisect shows that the following patch is guilty:

From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel ardb@kernel.org Date: Tue, 12 Sep 2023 09:00:55 +0000 Subject: x86/boot: Remove the 'bugger off' message

Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: H. Peter Anvin (Intel) hpa@zytor.com Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com

With this patch applied PXE starts, requests the kernel and the initrd. Without showing anything on the console, the boot process stops. It seems, that the kernel crashes very early.

With stable kernel 6.6.17 PXE boot works without problems.

Reverting this single patch (which is part of a larger set of patches) solved the problem for us, PXE boot is working again.

We use the packages syslinux-efi and syslinux-common from Debian 12. The used boot files are /efi64/syslinux.efi and /ldlinux.e64.

I managed to track this down to a bug in syslinux, fixed by the hunk below. The problem is that syslinux violates the x86 boot protocol, which stipulates that the setup header (starting at 0x1f1 bytes into the bzImage) must be copied into a zeroed boot_params structure, but it also copies the preceding bytes, which could be any value, as they overlap with the PE/COFF header or other header data. This produces a command line pointer with garbage in the top 32 bits, resulting in an early crash.

In your case, you might be able to work around this by removing the padding value (=0xffffffff) from arch/x86/boot/setup.ld, given that you are building with CONFIG_EFI_STUB disabled. However, this still requires fixing on the syslinux side.

[syslinux base commit 05ac953c23f90b2328d393f7eecde96e41aed067]

--- a/efi/main.c +++ b/efi/main.c @@ -1139,10 +1139,14 @@ bp = (struct boot_params *)(UINTN)addr;
   memset((void *)bp, 0x0, BOOT_PARAM_BLKSIZE);
  /* Copy the first two sectors to boot_params */
  memcpy((char *)bp, kernel_buf, 2 * 512);
 hdr = (struct linux_header *)bp;
   /* Copy the setup header to boot_params */
   memcpy(&hdr->setup_sects,
         &((struct linux_header *)kernel_buf)->setup_sects,
         sizeof(struct linux_header) -
         offsetof(struct linux_header, setup_sects));
 setup_sz = (hdr->setup_sects + 1) * 512;
 if (hdr->version >= 0x20a) {
         pref_address = hdr->pref_address;
--- a/com32/include/syslinux/linux.h +++ b/com32/include/syslinux/linux.h @@ -116,6 +116,7 @@ struct linux_header { uint64_t pref_address; uint32_t init_size; uint32_t handover_offset;

uint32_t kernel_info_offset;

} __packed;

struct screen_info {
Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.
We're crashing way earlier than the sentinel check - the bogus command line pointer is dereferenced via

startup_64() configure_5level_paging() cmdline_find_option_bool()

whereas sanitize_bootparams() is only called much later, from extract_kernel().

That is a bug in the kernel then. The whole point of the sentinel check is that it needs to be done before any of the fields touched by the sentinel check are accessed.

Ard Biesheuvel

4:03 p.m.

On Thu, 6 Mar 2025 at 16:23, H. Peter Anvin hpa@zytor.com wrote:

...

On March 6, 2025 6:44:11 AM PST, Ard Biesheuvel ardb@kernel.org wrote:

...
On Thu, 6 Mar 2025 at 15:39, H. Peter Anvin hpa@zytor.com wrote:

...
On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel ardb@kernel.org wrote:

...
(cc Peter)

I managed to track this down to a bug in syslinux, fixed by the hunk below. The problem is that syslinux violates the x86 boot protocol, which stipulates that the setup header (starting at 0x1f1 bytes into the bzImage) must be copied into a zeroed boot_params structure, but it also copies the preceding bytes, which could be any value, as they overlap with the PE/COFF header or other header data. This produces a command line pointer with garbage in the top 32 bits, resulting in an early crash.

...

...
...
Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.

We're crashing way earlier than the sentinel check - the bogus command line pointer is dereferenced via

startup_64() configure_5level_paging() cmdline_find_option_bool()

whereas sanitize_bootparams() is only called much later, from extract_kernel().

That is a bug in the kernel then. The whole point of the sentinel check is that it needs to be done before any of the fields touched by the sentinel check are accessed.

Indeed - I have just sent out a fix for this.

Ulrich Gemkow

4:50 p.m.

On Thursday 06 March 2025, Ard Biesheuvel wrote:

...

On Thu, 6 Mar 2025 at 16:23, H. Peter Anvin hpa@zytor.com wrote:

...
On March 6, 2025 6:44:11 AM PST, Ard Biesheuvel ardb@kernel.org wrote:

...
On Thu, 6 Mar 2025 at 15:39, H. Peter Anvin hpa@zytor.com wrote:

...
On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel ardb@kernel.org wrote:

...
(cc Peter)

I managed to track this down to a bug in syslinux, fixed by the hunk below. The problem is that syslinux violates the x86 boot protocol, which stipulates that the setup header (starting at 0x1f1 bytes into the bzImage) must be copied into a zeroed boot_params structure, but it also copies the preceding bytes, which could be any value, as they overlap with the PE/COFF header or other header data. This produces a command line pointer with garbage in the top 32 bits, resulting in an early crash.

...

...
...
...
Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.

We're crashing way earlier than the sentinel check - the bogus command line pointer is dereferenced via

startup_64() configure_5level_paging() cmdline_find_option_bool()

whereas sanitize_bootparams() is only called much later, from extract_kernel().

That is a bug in the kernel then. The whole point of the sentinel check is that it needs to be done before any of the fields touched by the sentinel check are accessed.

Indeed - I have just sent out a fix for this.

Hello Ard,

thanks for the patch! It does not apply cleanly to 6.6.80 (the includes are different) so I applied it manually and it helps - the systems boots.

Please allow the remark regarding the patch description that in our kernel CONFIG_X86_5LEVEL is not set. The patch helps anyway :-)

Thanks again and best regards

Ulrich

Ard Biesheuvel

5:07 p.m.

On Thu, 6 Mar 2025 at 17:50, Ulrich Gemkow ulrich.gemkow@ikr.uni-stuttgart.de wrote:

...

On Thursday 06 March 2025, Ard Biesheuvel wrote:

...
On Thu, 6 Mar 2025 at 16:23, H. Peter Anvin hpa@zytor.com wrote:

...
On March 6, 2025 6:44:11 AM PST, Ard Biesheuvel ardb@kernel.org wrote:

...
On Thu, 6 Mar 2025 at 15:39, H. Peter Anvin hpa@zytor.com wrote:

...
On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel ardb@kernel.org wrote:

...
(cc Peter)

I managed to track this down to a bug in syslinux, fixed by the hunk below. The problem is that syslinux violates the x86 boot protocol, which stipulates that the setup header (starting at 0x1f1 bytes into the bzImage) must be copied into a zeroed boot_params structure, but it also copies the preceding bytes, which could be any value, as they overlap with the PE/COFF header or other header data. This produces a command line pointer with garbage in the top 32 bits, resulting in an early crash.

...

...
...
...
Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.

We're crashing way earlier than the sentinel check - the bogus command line pointer is dereferenced via

startup_64() configure_5level_paging() cmdline_find_option_bool()

whereas sanitize_bootparams() is only called much later, from extract_kernel().

That is a bug in the kernel then. The whole point of the sentinel check is that it needs to be done before any of the fields touched by the sentinel check are accessed.

Indeed - I have just sent out a fix for this.

Hello Ard,

thanks for the patch! It does not apply cleanly to 6.6.80 (the includes are different) so I applied it manually and it helps - the systems boots.

Please allow the remark regarding the patch description that in our kernel CONFIG_X86_5LEVEL is not set. The patch helps anyway :-)

Thanks again and best regards

Thanks for testing. I will take this as a Tested-by.

294

days inactive

296

days old

linux-stable-mirror@lists.linaro.org

12 comments

participants

tags (0)

participants (4)

Ard Biesheuvel
Greg KH
H. Peter Anvin
Ulrich Gemkow