On Fri, Oct 25, 2024 at 09:30:51AM +0200, Ard Biesheuvel wrote:
On Fri, 25 Oct 2024 at 07:09, Jiri Slaby jirislaby@kernel.org wrote:
On 25. 10. 24, 7:07, Jiri Slaby wrote:
On 24. 10. 24, 18:20, Jiri Slaby wrote:
==== EFI_ACPI_RECLAIM_MEMORY
This memory is to be preserved by the UEFI OS loader and OS until ACPI is enabled. Once ACPI is enabled, the memory in this range is available for general use. ====
BTW doesn't the above mean it is released by the time TPM actually reads it?
Isn't the proper fix to actually memblock_reserve() that TPM portion. The same as memattr in efi_memattr_init()?
And this is actually done in efi_tpm_eventlog_init().
EFI_ACPI_RECLAIM_MEMORY may be reclaimed by the OS, but we never actually do that in Linux.
To me, it seems like the use of EFI_ACPI_RECLAIM_MEMORY in this case simply tickles a bug in the firmware that causes it to corrupt the memory attributes table. The fact that cold boot behaves differently is a strong indicator here.
I didn't see the results of the memory attribute table dumps on the bugzilla thread, but dumping this table from EFI is not very useful because it will get regenerated/updated at ExitBootServices() time. Unfortunately, that also takes away the console so capturing the state of that table before the EFI stub boots the kernel is not an easy thing to do.
Is the memattr table completely corrupted? It also has a version field, and only versions 1 and 2 are defined so we might use that to detect corruption.
When we initially identified the TPM log corruption issue, I had a gut feeling we were about to discover a lot more corruption along the same lines. It feels like e820 should have significantly more ACPI entries marked to avoid kexec from touching it - instead of just 1 or 2.
Hopefully I'm wrong, I'll take a look at the raw memory attributes on a few systems and see if there's a disagreement between UEFI and e820.
Not looking forward to a thrilling game of whack-a-mole :[
~Gregory