Nikolay reports that accessing BIOS data (first 1MB of the physical address space) via /dev/mem results in a "crash" / terminated VM (unhandled SEPT violation). See report [1] for details.
The cause is ioremap() (via xlate_dev_mem_ptr()) establishes an unencrypted mapping where the kernel had established an encrypted mapping previously. The CPU enforces mapping consistency with a fault upon detecting a mismatch. A similar situation arises with devmem access to "unaccepted" confidential memory. In summary, it is fraught to allow uncoordinated userspace mapping of confidential memory.
While there is an existing mitigation to simulate and redirect access to the BIOS data area with STRICT_DEVMEM=y, it is insufficient. Specifically, STRICT_DEVMEM=y traps read(2) access to the BIOS data area, and returns a zeroed buffer. However, it turns out the kernel fails to enforce the same via mmap(2), and a direct mapping is established. This is a hole, and unfortunately userspace has learned to exploit it [2].
This means the kernel either needs: a mechanism to ensure consistent plus accepted "encrypted" mappings of this /dev/mem mmap() hole, close the hole by mapping the zero page in the mmap(2) path, block only BIOS data access and let typical STRICT_DEVMEM protect the rest, or disable /dev/mem altogether.
The simplest option for now is arrange for /dev/mem to always behave as if lockdown is enabled for confidential guests. Require confidential guest userspace to jettison legacy dependencies on /dev/mem similar to how other legacy mechanisms are jettisoned for confidential operation. Recall that modern methods for BIOS data access are available like /sys/firmware/dmi/tables.
Now, this begs the question what to do with PCI sysfs which allows userspace mappings of confidential MMIO with similar mapping consistency and acceptance expectations? Here, the existing mitigation of IO_STRICT_DEVMEM is likely sufficient. The kernel is expected to use request_mem_region() when toggling the state of MMIO. With IO_STRICT_DEVMEM that enforces kernel-exclusive access until release_mem_region(), i.e. mapping conflicts are prevented.
Cc: x86@kernel.org Cc: Kees Cook kees@kernel.org Cc: Ingo Molnar mingo@kernel.org Cc: Vishal Annapurve vannapurve@google.com Cc: Kirill Shutemov kirill.shutemov@linux.intel.com Cc: Arnd Bergmann arnd@arndb.de Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Michael Ellerman mpe@ellerman.id.au Cc: Suzuki K Poulose suzuki.poulose@arm.com Link: https://sources.debian.org/src/libdebian-installer/0.125/src/system/subarch-... [2] Reported-by: Nikolay Borisov nik.borisov@suse.com Closes: http://lore.kernel.org/20250318113604.297726-1-nik.borisov@suse.com [1] Fixes: 9aa6ea69852c ("x86/tdx: Make pages shared in ioremap()") Cc: stable@vger.kernel.org Acked-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Nikolay Borisov nik.borisov@suse.com Acked-by: Naveen N Rao (AMD) naveen@kernel.org Tested-by: Naveen N Rao (AMD) naveen@kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com --- arch/x86/Kconfig | 4 ++++ drivers/char/mem.c | 10 ++++++++++ 2 files changed, 14 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4b9f378e05f6..36f11aad1ae5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -891,6 +891,8 @@ config INTEL_TDX_GUEST depends on X86_X2APIC depends on EFI_STUB depends on PARAVIRT + select STRICT_DEVMEM + select IO_STRICT_DEVMEM select ARCH_HAS_CC_PLATFORM select X86_MEM_ENCRYPT select X86_MCE @@ -1510,6 +1512,8 @@ config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD depends on EFI_STUB + select STRICT_DEVMEM + select IO_STRICT_DEVMEM select DMA_COHERENT_POOL select ARCH_USE_MEMREMAP_PROT select INSTRUCTION_DECODER diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 48839958b0b1..47729606b817 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -30,6 +30,7 @@ #include <linux/uio.h> #include <linux/uaccess.h> #include <linux/security.h> +#include <linux/cc_platform.h>
#define DEVMEM_MINOR 1 #define DEVPORT_MINOR 4 @@ -595,6 +596,15 @@ static int open_port(struct inode *inode, struct file *filp) if (rc) return rc;
+ /* + * Enforce encrypted mapping consistency and avoid unaccepted + * memory conflicts, "lockdown" /dev/mem for confidential + * guests. + */ + if (IS_ENABLED(CONFIG_STRICT_DEVMEM) && + cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) + return -EPERM; + if (iminor(inode) != DEVMEM_MINOR) return 0;