[Actually copy Tom]
On Wed, Apr 23, 2025 at 01:36:33PM -0700, Dan Williams wrote:
Naveen N Rao wrote:
On Fri, Apr 18, 2025 at 01:04:02PM -0700, Dan Williams wrote:
Nikolay reports [1] that accessing BIOS data (first 1MB of the physical address space) via /dev/mem results in an SEPT violation.
The cause is ioremap() (via xlate_dev_mem_ptr()) establishes an unencrypted mapping where the kernel had established an encrypted mapping previously.
Linux traps read(2) access to the BIOS data area, and returns zero. However, it turns out the kernel fails to enforce the same via mmap(2). This is a hole, and unfortunately userspace has learned to exploit it [2].
This means the kernel either needs a mechanism to ensure consistent "encrypted" mappings of this /dev/mem mmap() hole, close the hole by mapping the zero page in the mmap(2) path, block only BIOS data access and let typical STRICT_DEVMEM protect the rest, or disable /dev/mem altogether.
The simplest option for now is arrange for /dev/mem to always behave as if lockdown is enabled for confidential guests. Require confidential guest userspace to jettison legacy dependencies on /dev/mem similar to how other legacy mechanisms are jettisoned for confidential operation. Recall that modern methods for BIOS data access are available like /sys/firmware/dmi/tables.
Cc: x86@kernel.org Cc: Kees Cook kees@kernel.org Cc: Ingo Molnar mingo@kernel.org Cc: "Naveen N Rao" naveen@kernel.org Cc: Vishal Annapurve vannapurve@google.com Cc: Kirill Shutemov kirill.shutemov@linux.intel.com Link: https://sources.debian.org/src/libdebian-installer/0.125/src/system/subarch-... [2] Reported-by: Nikolay Borisov nik.borisov@suse.com Closes: http://lore.kernel.org/20250318113604.297726-1-nik.borisov@suse.com [1] Fixes: 9aa6ea69852c ("x86/tdx: Make pages shared in ioremap()") Cc: stable@vger.kernel.org Acked-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com
Changes since v3
- Fix a 0day kbuild robot report about missing cc_platform.h include.
arch/x86/Kconfig | 4 ++++ drivers/char/mem.c | 10 ++++++++++ 2 files changed, 14 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4b9f378e05f6..bf4528d9fd0a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -891,6 +891,8 @@ config INTEL_TDX_GUEST depends on X86_X2APIC depends on EFI_STUB depends on PARAVIRT
- depends on STRICT_DEVMEM
- depends on IO_STRICT_DEVMEM select ARCH_HAS_CC_PLATFORM select X86_MEM_ENCRYPT select X86_MCE
@@ -1510,6 +1512,8 @@ config AMD_MEM_ENCRYPT
As far as I know, AMD_MEM_ENCRYPT is for the host SME support. Since this is for encrypted guests, should the below dependencies be added to CONFIG_SEV_GUEST instead?
Tom?
The placement rationale here was to have the DEVMEM restrictions next to the ARCH_HAS_CC_PLATFORM 'select' statement which is INTEL_TDX_GUEST and AMD_MEM_ENCRYPT with SEV_GUEST depending on AMD_MEM_ENCRYPT.
bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD depends on EFI_STUB
- depends on STRICT_DEVMEM
- depends on IO_STRICT_DEVMEM
Can we use 'select' for the dependency on IO_STRICT_DEVMEM, if not both the above?
IO_STRICT_DEVMEM in particular is not enabled by default, so applying this patch and doing a 'make olddefconfig' disabled AMD_MEM_ENCRYPT, which is not so good. Given that IO_STRICT_DEVMEM only depends on STRICT_DEVMEM, I think a 'select' is ok.
Agree, that makes sense, and I do not think it will lead to any select dependency problems given STRICT_DEVMEM is "default y" for x86.
select DMA_COHERENT_POOL select ARCH_USE_MEMREMAP_PROT select INSTRUCTION_DECODER diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 48839958b0b1..47729606b817 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -30,6 +30,7 @@ #include <linux/uio.h> #include <linux/uaccess.h> #include <linux/security.h> +#include <linux/cc_platform.h> #define DEVMEM_MINOR 1 #define DEVPORT_MINOR 4 @@ -595,6 +596,15 @@ static int open_port(struct inode *inode, struct file *filp) if (rc) return rc;
- /*
* Enforce encrypted mapping consistency and avoid unaccepted
* memory conflicts, "lockdown" /dev/mem for confidential
* guests.
*/
- if (IS_ENABLED(CONFIG_STRICT_DEVMEM) &&
cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return -EPERM;
- if (iminor(inode) != DEVMEM_MINOR) return 0;
Otherwise, this looks good to me.
Thanks Naveen, can I take that as an Acked-by?
Yes. I tested this and it solves the issue we see with SEV-SNP guest userspace access to video ROM range. For this patch: Acked-by: Naveen N Rao (AMD) naveen@kernel.org Tested-by: Naveen N Rao (AMD) naveen@kernel.org
Thanks, Naveen