Changes since v2 [1]: * Drop the new x86_platform_op and just use cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) directly where needed (Naveen) * Make the restriction identical to lockdown and stop playing games with devmem_is_allowed() * Ensure that CONFIG_IO_STRICT_DEVMEM is enabled to avoid conflicting mappings for userspace mappings of PCI MMIO.
The original response to Nikolay's report of an SEPT violation triggered by /dev/mem access to private memory was "let's just turn off /dev/mem".
After some machinations of x86_platform_ops to block a subset of problematic access, spelunking the history of devmem_is_allowed() returning "2" to enable some compatibility benefits while blocking access, and discovering that userspace depends buggy kernel behavior for mmap(2) of the first 1MB of memory on x86, the proposal has circled back to "disable /dev/mem".
Require both STRICT_DEVMEM and IO_STRICT_DEVMEM for x86 confidential guests to close /dev/mem hole while still allowing for userspace mapping of PCI MMIO as long as the kernel and userspace are not mapping the range at the same time.
The range_is_allowed() cleanup is not strictly necessary, but might as well close a 17 year-old "TODO".
---
Dan Williams (2): x86/devmem: Remove duplicate range_is_allowed() definition x86/devmem: Drop /dev/mem access for confidential guests
arch/x86/Kconfig | 4 ++++ arch/x86/mm/pat/memtype.c | 31 ++++--------------------------- drivers/char/mem.c | 27 +++++++++------------------ include/linux/io.h | 21 +++++++++++++++++++++ 4 files changed, 38 insertions(+), 45 deletions(-)
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
Nikolay reports [1] that accessing BIOS data (first 1MB of the physical address space) via /dev/mem results in an SEPT violation.
The cause is ioremap() (via xlate_dev_mem_ptr()) establishes an unencrypted mapping where the kernel had established an encrypted mapping previously.
Linux traps read(2) access to the BIOS data area, and returns zero. However, it turns out the kernel fails to enforce the same via mmap(2). This is a hole, and unfortunately userspace has learned to exploit it [2].
This means the kernel either needs a mechanism to ensure consistent "encrypted" mappings of this /dev/mem mmap() hole, close the hole by mapping the zero page in the mmap(2) path, block only BIOS data access and let typical STRICT_DEVMEM protect the rest, or disable /dev/mem altogether.
The simplest option for now is arrange for /dev/mem to always behave as if lockdown is enabled for confidential guests. Require confidential guest userspace to jettison legacy dependencies on /dev/mem similar to how other legacy mechanisms are jettisoned for confidential operation. Recall that modern methods for BIOS data access are available like /sys/firmware/dmi/tables.
Cc: x86@kernel.org Cc: Kees Cook kees@kernel.org Cc: Ingo Molnar mingo@kernel.org Cc: "Naveen N Rao" naveen@kernel.org Cc: Vishal Annapurve vannapurve@google.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Kirill Shutemov kirill.shutemov@linux.intel.com Link: https://sources.debian.org/src/libdebian-installer/0.125/src/system/subarch-... [2] Reported-by: Nikolay Borisov nik.borisov@suse.com Closes: http://lore.kernel.org/20250318113604.297726-1-nik.borisov@suse.com [1] Fixes: 9aa6ea69852c ("x86/tdx: Make pages shared in ioremap()") Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com --- arch/x86/Kconfig | 4 ++++ drivers/char/mem.c | 9 +++++++++ 2 files changed, 13 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4b9f378e05f6..bf4528d9fd0a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -891,6 +891,8 @@ config INTEL_TDX_GUEST depends on X86_X2APIC depends on EFI_STUB depends on PARAVIRT + depends on STRICT_DEVMEM + depends on IO_STRICT_DEVMEM select ARCH_HAS_CC_PLATFORM select X86_MEM_ENCRYPT select X86_MCE @@ -1510,6 +1512,8 @@ config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD depends on EFI_STUB + depends on STRICT_DEVMEM + depends on IO_STRICT_DEVMEM select DMA_COHERENT_POOL select ARCH_USE_MEMREMAP_PROT select INSTRUCTION_DECODER diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 48839958b0b1..f394f941b113 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -595,6 +595,15 @@ static int open_port(struct inode *inode, struct file *filp) if (rc) return rc;
+ /* + * Enforce encrypted mapping consistency and avoid unaccepted + * memory conflicts, "lockdown" /dev/mem for confidential + * guests. + */ + if (IS_ENABLED(CONFIG_STRICT_DEVMEM) && + cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) + return -EPERM; + if (iminor(inode) != DEVMEM_MINOR) return 0;
On 4/17/25 12:12, Dan Williams wrote: ...
- /*
* Enforce encrypted mapping consistency and avoid unaccepted
* memory conflicts, "lockdown" /dev/mem for confidential
* guests.
*/
- if (IS_ENABLED(CONFIG_STRICT_DEVMEM) &&
cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return -EPERM;
A lot of /dev/mem use seems to be poking at random hardware details like BIOS internals, ACPI tables or hardware devices. Those all have modern alternatives. So while I worry that this will make some userspace mad, I have a hard time imagining that it's _relevant_ userspace on a modern x86 CoCo platform where that userspace isn't buggy already.
Acked-by: Dave Hansen dave.hansen@linux.intel.com
Hi Dan,
kernel test robot noticed the following build errors:
[auto build test ERROR on 0af2f6be1b4281385b618cb86ad946eded089ac8]
url: https://github.com/intel-lab-lkp/linux/commits/Dan-Williams/x86-devmem-Remov... base: 0af2f6be1b4281385b618cb86ad946eded089ac8 patch link: https://lore.kernel.org/r/174491712829.1395340.5054725417641299524.stgit%40d... patch subject: [PATCH v3 2/2] x86/devmem: Drop /dev/mem access for confidential guests config: arc-randconfig-001-20250418 (https://download.01.org/0day-ci/archive/20250418/202504180628.qlDJEl1e-lkp@i...) compiler: arc-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250418/202504180628.qlDJEl1e-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202504180628.qlDJEl1e-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/char/mem.c: In function 'open_port':
drivers/char/mem.c:604:13: error: implicit declaration of function 'cc_platform_has' [-Wimplicit-function-declaration]
604 | cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) | ^~~~~~~~~~~~~~~
drivers/char/mem.c:604:29: error: 'CC_ATTR_GUEST_MEM_ENCRYPT' undeclared (first use in this function)
604 | cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) | ^~~~~~~~~~~~~~~~~~~~~~~~~ drivers/char/mem.c:604:29: note: each undeclared identifier is reported only once for each function it appears in
vim +/cc_platform_has +604 drivers/char/mem.c
586 587 static int open_port(struct inode *inode, struct file *filp) 588 { 589 int rc; 590 591 if (!capable(CAP_SYS_RAWIO)) 592 return -EPERM; 593 594 rc = security_locked_down(LOCKDOWN_DEV_MEM); 595 if (rc) 596 return rc; 597 598 /* 599 * Enforce encrypted mapping consistency and avoid unaccepted 600 * memory conflicts, "lockdown" /dev/mem for confidential 601 * guests. 602 */ 603 if (IS_ENABLED(CONFIG_STRICT_DEVMEM) &&
604 cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
605 return -EPERM; 606 607 if (iminor(inode) != DEVMEM_MINOR) 608 return 0; 609 610 /* 611 * Use a unified address space to have a single point to manage 612 * revocations when drivers want to take over a /dev/mem mapped 613 * range. 614 */ 615 filp->f_mapping = iomem_get_mapping(); 616 617 return 0; 618 } 619
Hi Dan,
kernel test robot noticed the following build errors:
[auto build test ERROR on 0af2f6be1b4281385b618cb86ad946eded089ac8]
url: https://github.com/intel-lab-lkp/linux/commits/Dan-Williams/x86-devmem-Remov... base: 0af2f6be1b4281385b618cb86ad946eded089ac8 patch link: https://lore.kernel.org/r/174491712829.1395340.5054725417641299524.stgit%40d... patch subject: [PATCH v3 2/2] x86/devmem: Drop /dev/mem access for confidential guests config: arm-randconfig-004-20250418 (https://download.01.org/0day-ci/archive/20250418/202504180754.vQCz7zWh-lkp@i...) compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project f819f46284f2a79790038e1f6649172789734ae8) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250418/202504180754.vQCz7zWh-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202504180754.vQCz7zWh-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/char/mem.c:604:6: error: call to undeclared function 'cc_platform_has'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
604 | cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) | ^
drivers/char/mem.c:604:22: error: use of undeclared identifier 'CC_ATTR_GUEST_MEM_ENCRYPT'
604 | cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) | ^ 2 errors generated.
vim +/cc_platform_has +604 drivers/char/mem.c
586 587 static int open_port(struct inode *inode, struct file *filp) 588 { 589 int rc; 590 591 if (!capable(CAP_SYS_RAWIO)) 592 return -EPERM; 593 594 rc = security_locked_down(LOCKDOWN_DEV_MEM); 595 if (rc) 596 return rc; 597 598 /* 599 * Enforce encrypted mapping consistency and avoid unaccepted 600 * memory conflicts, "lockdown" /dev/mem for confidential 601 * guests. 602 */ 603 if (IS_ENABLED(CONFIG_STRICT_DEVMEM) &&
604 cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
605 return -EPERM; 606 607 if (iminor(inode) != DEVMEM_MINOR) 608 return 0; 609 610 /* 611 * Use a unified address space to have a single point to manage 612 * revocations when drivers want to take over a /dev/mem mapped 613 * range. 614 */ 615 filp->f_mapping = iomem_get_mapping(); 616 617 return 0; 618 } 619
Nikolay reports [1] that accessing BIOS data (first 1MB of the physical address space) via /dev/mem results in an SEPT violation.
The cause is ioremap() (via xlate_dev_mem_ptr()) establishes an unencrypted mapping where the kernel had established an encrypted mapping previously.
Linux traps read(2) access to the BIOS data area, and returns zero. However, it turns out the kernel fails to enforce the same via mmap(2). This is a hole, and unfortunately userspace has learned to exploit it [2].
This means the kernel either needs a mechanism to ensure consistent "encrypted" mappings of this /dev/mem mmap() hole, close the hole by mapping the zero page in the mmap(2) path, block only BIOS data access and let typical STRICT_DEVMEM protect the rest, or disable /dev/mem altogether.
The simplest option for now is arrange for /dev/mem to always behave as if lockdown is enabled for confidential guests. Require confidential guest userspace to jettison legacy dependencies on /dev/mem similar to how other legacy mechanisms are jettisoned for confidential operation. Recall that modern methods for BIOS data access are available like /sys/firmware/dmi/tables.
Cc: x86@kernel.org Cc: Kees Cook kees@kernel.org Cc: Ingo Molnar mingo@kernel.org Cc: "Naveen N Rao" naveen@kernel.org Cc: Vishal Annapurve vannapurve@google.com Cc: Kirill Shutemov kirill.shutemov@linux.intel.com Link: https://sources.debian.org/src/libdebian-installer/0.125/src/system/subarch-... [2] Reported-by: Nikolay Borisov nik.borisov@suse.com Closes: http://lore.kernel.org/20250318113604.297726-1-nik.borisov@suse.com [1] Fixes: 9aa6ea69852c ("x86/tdx: Make pages shared in ioremap()") Cc: stable@vger.kernel.org Acked-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com --- Changes since v3 * Fix a 0day kbuild robot report about missing cc_platform.h include.
arch/x86/Kconfig | 4 ++++ drivers/char/mem.c | 10 ++++++++++ 2 files changed, 14 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4b9f378e05f6..bf4528d9fd0a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -891,6 +891,8 @@ config INTEL_TDX_GUEST depends on X86_X2APIC depends on EFI_STUB depends on PARAVIRT + depends on STRICT_DEVMEM + depends on IO_STRICT_DEVMEM select ARCH_HAS_CC_PLATFORM select X86_MEM_ENCRYPT select X86_MCE @@ -1510,6 +1512,8 @@ config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD depends on EFI_STUB + depends on STRICT_DEVMEM + depends on IO_STRICT_DEVMEM select DMA_COHERENT_POOL select ARCH_USE_MEMREMAP_PROT select INSTRUCTION_DECODER diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 48839958b0b1..47729606b817 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -30,6 +30,7 @@ #include <linux/uio.h> #include <linux/uaccess.h> #include <linux/security.h> +#include <linux/cc_platform.h>
#define DEVMEM_MINOR 1 #define DEVPORT_MINOR 4 @@ -595,6 +596,15 @@ static int open_port(struct inode *inode, struct file *filp) if (rc) return rc;
+ /* + * Enforce encrypted mapping consistency and avoid unaccepted + * memory conflicts, "lockdown" /dev/mem for confidential + * guests. + */ + if (IS_ENABLED(CONFIG_STRICT_DEVMEM) && + cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) + return -EPERM; + if (iminor(inode) != DEVMEM_MINOR) return 0;
On 18.04.25 г. 23:04 ч., Dan Williams wrote:
Nikolay reports [1] that accessing BIOS data (first 1MB of the physical address space) via /dev/mem results in an SEPT violation.
The cause is ioremap() (via xlate_dev_mem_ptr()) establishes an unencrypted mapping where the kernel had established an encrypted mapping previously.
Linux traps read(2) access to the BIOS data area, and returns zero. However, it turns out the kernel fails to enforce the same via mmap(2). This is a hole, and unfortunately userspace has learned to exploit it [2].
This means the kernel either needs a mechanism to ensure consistent "encrypted" mappings of this /dev/mem mmap() hole, close the hole by mapping the zero page in the mmap(2) path, block only BIOS data access and let typical STRICT_DEVMEM protect the rest, or disable /dev/mem altogether.
The simplest option for now is arrange for /dev/mem to always behave as if lockdown is enabled for confidential guests. Require confidential guest userspace to jettison legacy dependencies on /dev/mem similar to how other legacy mechanisms are jettisoned for confidential operation. Recall that modern methods for BIOS data access are available like /sys/firmware/dmi/tables.
Cc: x86@kernel.org Cc: Kees Cook kees@kernel.org Cc: Ingo Molnar mingo@kernel.org Cc: "Naveen N Rao" naveen@kernel.org Cc: Vishal Annapurve vannapurve@google.com Cc: Kirill Shutemov kirill.shutemov@linux.intel.com Link: https://sources.debian.org/src/libdebian-installer/0.125/src/system/subarch-... [2] Reported-by: Nikolay Borisov nik.borisov@suse.com Closes: http://lore.kernel.org/20250318113604.297726-1-nik.borisov@suse.com [1] Fixes: 9aa6ea69852c ("x86/tdx: Make pages shared in ioremap()") Cc: stable@vger.kernel.org Acked-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com
Changes since v3
Fix a 0day kbuild robot report about missing cc_platform.h include.
arch/x86/Kconfig | 4 ++++ drivers/char/mem.c | 10 ++++++++++ 2 files changed, 14 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4b9f378e05f6..bf4528d9fd0a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -891,6 +891,8 @@ config INTEL_TDX_GUEST depends on X86_X2APIC depends on EFI_STUB depends on PARAVIRT
- depends on STRICT_DEVMEM
- depends on IO_STRICT_DEVMEM select ARCH_HAS_CC_PLATFORM select X86_MEM_ENCRYPT select X86_MCE
@@ -1510,6 +1512,8 @@ config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD depends on EFI_STUB
- depends on STRICT_DEVMEM
- depends on IO_STRICT_DEVMEM select DMA_COHERENT_POOL select ARCH_USE_MEMREMAP_PROT select INSTRUCTION_DECODER
diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 48839958b0b1..47729606b817 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -30,6 +30,7 @@ #include <linux/uio.h> #include <linux/uaccess.h> #include <linux/security.h> +#include <linux/cc_platform.h> #define DEVMEM_MINOR 1 #define DEVPORT_MINOR 4 @@ -595,6 +596,15 @@ static int open_port(struct inode *inode, struct file *filp) if (rc) return rc;
- /*
* Enforce encrypted mapping consistency and avoid unaccepted
* memory conflicts, "lockdown" /dev/mem for confidential
* guests.
*/
- if (IS_ENABLED(CONFIG_STRICT_DEVMEM) &&
cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return -EPERM;
Just confirming - the STRIC_DEVMEM check here is needed in case other CoCo technologies i.e ARM's CCA or risc-v tvm doesn't depend on it? Because for the x86 world it's redundant since both implementations imply STRICT_DEVMEM.
- if (iminor(inode) != DEVMEM_MINOR) return 0;
On 17.04.25 г. 22:11 ч., Dan Williams wrote:
Changes since v2 [1]:
- Drop the new x86_platform_op and just use cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) directly where needed (Naveen)
- Make the restriction identical to lockdown and stop playing games with devmem_is_allowed()
- Ensure that CONFIG_IO_STRICT_DEVMEM is enabled to avoid conflicting mappings for userspace mappings of PCI MMIO.
The original response to Nikolay's report of an SEPT violation triggered by /dev/mem access to private memory was "let's just turn off /dev/mem".
After some machinations of x86_platform_ops to block a subset of problematic access, spelunking the history of devmem_is_allowed() returning "2" to enable some compatibility benefits while blocking access, and discovering that userspace depends buggy kernel behavior for mmap(2) of the first 1MB of memory on x86, the proposal has circled back to "disable /dev/mem".
Require both STRICT_DEVMEM and IO_STRICT_DEVMEM for x86 confidential guests to close /dev/mem hole while still allowing for userspace mapping of PCI MMIO as long as the kernel and userspace are not mapping the range at the same time.
The range_is_allowed() cleanup is not strictly necessary, but might as well close a 17 year-old "TODO".
Dan Williams (2): x86/devmem: Remove duplicate range_is_allowed() definition x86/devmem: Drop /dev/mem access for confidential guests
arch/x86/Kconfig | 4 ++++ arch/x86/mm/pat/memtype.c | 31 ++++--------------------------- drivers/char/mem.c | 27 +++++++++------------------ include/linux/io.h | 21 +++++++++++++++++++++ 4 files changed, 38 insertions(+), 45 deletions(-)
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
Reviewed-by: Nikolay Borisov nik.borisov@suse.com
linux-stable-mirror@lists.linaro.org