From: Jeff Xu jeffxu@chromium.org
This is V9 version, addressing comments from V8, without code logic change.
------------------------------------------------------------------- As discussed during mseal() upstream process [1], mseal() protects the VMAs of a given virtual memory range against modifications, such as the read/write (RW) and no-execute (NX) bits. For complete descriptions of memory sealing, please see mseal.rst [2].
The mseal() is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system. For example, such an attacker primitive can break control-flow integrity guarantees since read-only memory that is supposed to be trusted can become writable or .text pages can get remapped.
The system mappings are readonly only, memory sealing can protect them from ever changing to writable or unmmap/remapped as different attributes.
System mappings such as vdso, vvar, vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode), are created by the kernel during program initialization, and could be sealed after creation.
Unlike the aforementioned mappings, the uprobe mapping is not established during program startup. However, its lifetime is the same as the process's lifetime [3]. It could be sealed from creation.
The vsyscall on x86-64 uses a special address (0xffffffffff600000), which is outside the mm managed range. This means mprotect, munmap, and mremap won't work on the vsyscall. Since sealing doesn't enhance the vsyscall's security, it is skipped in this patch. If we ever seal the vsyscall, it is probably only for decorative purpose, i.e. showing the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.
It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may alter the system mappings during restore operations. UML(User Mode Linux) and gVisor, rr are also known to change the vdso/vvar mappings. Consequently, this feature cannot be universally enabled across all systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
To support mseal of system mappings, architectures must define CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special mappings calls to pass mseal flag. Additionally, architectures must confirm they do not unmap/remap system mappings during the process lifetime. The existence of this flag for an architecture implies that it does not require the remapping of thest system mappings during process lifetime, so sealing these mappings is safe from a kernel perspective.
This version covers x86-64 and arm64 archiecture as minimum viable feature.
While no specific CPU hardware features are required for enable this feature on an archiecture, memory sealing requires a 64-bit kernel. Other architectures can choose whether or not to adopt this feature. Currently, I'm not aware of any instances in the kernel code that actively munmap/mremap a system mapping without a request from userspace. The PPC does call munmap when _install_special_mapping fails for vdso; however, it's uncertain if this will ever fail for PPC - this needs to be investigated by PPC in the future [4]. The UML kernel can add this support when KUnit tests require it [5].
In this version, we've improved the handling of system mapping sealing from previous versions, instead of modifying the _install_special_mapping function itself, which would affect all architectures, we now call _install_special_mapping with a sealing flag only within the specific architecture that requires it. This targeted approach offers two key advantages: 1) It limits the code change's impact to the necessary architectures, and 2) It aligns with the software architecture by keeping the core memory management within the mm layer, while delegating the decision of sealing system mappings to the individual architecture, which is particularly relevant since 32-bit architectures never require sealing.
Prior to this patch series, we explored sealing special mappings from userspace using glibc's dynamic linker. This approach revealed several issues: - The PT_LOAD header may report an incorrect length for vdso, (smaller than its actual size). The dynamic linker, which relies on PT_LOAD information to determine mapping size, would then split and partially seal the vdso mapping. Since each architecture has its own vdso/vvar code, fixing this in the kernel would require going through each archiecture. Our initial goal was to enable sealing readonly mappings, e.g. .text, across all architectures, sealing vdso from kernel since creation appears to be simpler than sealing vdso at glibc. - The [vvar] mapping header only contains address information, not length information. Similar issues might exist for other special mappings. - Mappings like uprobe are not covered by the dynamic linker, and there is no effective solution for them.
This feature's security enhancements will benefit ChromeOS, Android, and other high security systems.
Testing: This feature was tested on ChromeOS and Android for both x86-64 and ARM64. - Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly, i.e. "sl" shown in the smaps for those mappings, and mremap is blocked. - Passing various automation tests (e.g. pre-checkin) on ChromeOS and Android to ensure the sealing doesn't affect the functionality of Chromebook and Android phone.
I also tested the feature on Ubuntu on x86-64: - With config disabled, vdso/vvar is not sealed, - with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK, normal operations such as browsing the web, open/edit doc are OK.
Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1] Link: Documentation/userspace-api/mseal.rst [2] Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxY... [3] Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt... [4] Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5]
------------------------------------------- History:
V9: - Add negative test in selftest (Kees Cook) - fx typos in text (Kees Cook)
V8: - Change ARCH_SUPPORTS_MSEAL_X to ARCH_SUPPORTS_MSEAL_X (Liam R. Howlett) - Update comments in Kconfig and mseal.rst (Lorenzo Stoakes, Liam R. Howlett) - Change patch header perfix to "mseal sysmap" (Lorenzo Stoakes) - Remove "vm_flags =" (Kees Cook, Liam R. Howlett, Oleg Nesterov) - Drop uml architecture (Lorenzo Stoakes, Kees Cook) - Add a selftest to verify system mappings are sealed (Lorenzo Stoakes)
V7: https://lore.kernel.org/all/20250224225246.3712295-1-jeffxu@google.com/ - Remove cover letter from the first patch (Liam R. Howlett) - Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett) - logging and fclose() in selftest (Liam R. Howlett)
V6: https://lore.kernel.org/all/20250224174513.3600914-1-jeffxu@google.com/ - mseal.rst: fix a typo (Randy Dunlap) - security/Kconfig: add rr into note (Liam R. Howlett) - remove mseal_system_mappings() and use macro instead (Liam R. Howlett) - mseal.rst: add incompatible userland software (Lorenzo Stoakes) - remove RFC from title (Kees Cook)
V5 https://lore.kernel.org/all/20250212032155.1276806-1-jeffxu@google.com/ - Remove kernel cmd line (Lorenzo Stoakes) - Add test info (Lorenzo Stoakes) - Add threat model info (Lorenzo Stoakes) - Fix x86 selftest: test_mremap_vdso - Restrict code change to ARM64/x86-64/UM arch only. - Add userprocess.h to include seal_system_mapping(). - Remove sealing vsyscall. - Split the patch.
V4: https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/ - ARCH_HAS_SEAL_SYSTEM_MAPPINGS (Lorenzo Stoakes) - test info (Lorenzo Stoakes) - Update mseal.rst (Liam R. Howlett) - Update test_mremap_vdso.c (Liam R. Howlett) - Misc. style, comments, doc update (Liam R. Howlett)
V3: https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/ - Revert uprobe to v1 logic (Oleg Nesterov) - use CONFIG_SEAL_SYSTEM_MAPPINGS instead of _ALWAYS/_NEVER (Kees Cook) - Move kernel cmd line from fs/exec.c to mm/mseal.c and misc. (Liam R. Howlett)
V2: https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/ - Seal uprobe always (Oleg Nesterov) - Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov) - Rebase to linux_main
V1: - https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/
--------------------------------------------------
Jeff Xu (7): mseal sysmap: kernel config and header change selftests: x86: test_mremap_vdso: skip if vdso is msealed mseal sysmap: enable x86-64 mseal sysmap: enable arm64 mseal sysmap: uprobe mapping mseal sysmap: update mseal.rst selftest: test system mappings are sealed.
Documentation/userspace-api/mseal.rst | 20 +++ arch/arm64/Kconfig | 1 + arch/arm64/kernel/vdso.c | 12 +- arch/x86/Kconfig | 1 + arch/x86/entry/vdso/vma.c | 7 +- include/linux/mm.h | 10 ++ init/Kconfig | 22 ++++ kernel/events/uprobes.c | 3 +- security/Kconfig | 21 ++++ tools/testing/selftests/Makefile | 1 + .../mseal_system_mappings/.gitignore | 2 + .../selftests/mseal_system_mappings/Makefile | 6 + .../selftests/mseal_system_mappings/config | 1 + .../mseal_system_mappings/sysmap_is_sealed.c | 119 ++++++++++++++++++ .../testing/selftests/x86/test_mremap_vdso.c | 43 +++++++ 15 files changed, 261 insertions(+), 8 deletions(-) create mode 100644 tools/testing/selftests/mseal_system_mappings/.gitignore create mode 100644 tools/testing/selftests/mseal_system_mappings/Makefile create mode 100644 tools/testing/selftests/mseal_system_mappings/config create mode 100644 tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c
From: Jeff Xu jeffxu@chromium.org
Provide infrastructure to mseal system mappings. Establish two kernel configs (CONFIG_MSEAL_SYSTEM_MAPPINGS, ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP macro for future patches.
Signed-off-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Kees Cook kees@kernel.org --- include/linux/mm.h | 10 ++++++++++ init/Kconfig | 22 ++++++++++++++++++++++ security/Kconfig | 21 +++++++++++++++++++++ 3 files changed, 53 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 7b1068ddcbb7..8b800941678d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4155,4 +4155,14 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status); int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
+ +/* + * mseal of userspace process's system mappings. + */ +#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS +#define VM_SEALED_SYSMAP VM_SEALED +#else +#define VM_SEALED_SYSMAP VM_NONE +#endif + #endif /* _LINUX_MM_H */ diff --git a/init/Kconfig b/init/Kconfig index d0d021b3fa3b..7f67d8942a09 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1882,6 +1882,28 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS + bool + help + Control MSEAL_SYSTEM_MAPPINGS access based on architecture. + + A 64-bit kernel is required for the memory sealing feature. + No specific hardware features from the CPU are needed. + + To enable this feature, the architecture needs to update their + special mappings calls to include the sealing flag and confirm + that it doesn't unmap/remap system mappings during the life + time of the process. The existence of this flag for an architecture + implies that it does not require the remapping of the system + mappings during process lifetime, so sealing these mappings is safe + from a kernel perspective. + + After the architecture enables this, a distribution can set + CONFIG_MSEAL_SYSTEM_MAPPING to manage access to the feature. + + For complete descriptions of memory sealing, please see + Documentation/userspace-api/mseal.rst + config HAVE_PERF_EVENTS bool help diff --git a/security/Kconfig b/security/Kconfig index f10dbf15c294..a914a02df27e 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -51,6 +51,27 @@ config PROC_MEM_NO_FORCE
endchoice
+config MSEAL_SYSTEM_MAPPINGS + bool "mseal system mappings" + depends on 64BIT + depends on ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS + depends on !CHECKPOINT_RESTORE + help + Apply mseal on system mappings. + The system mappings includes vdso, vvar, vvar_vclock, + vectors (arm compat-mode), sigpage (arm compat-mode), uprobes. + + A 64-bit kernel is required for the memory sealing feature. + No specific hardware features from the CPU are needed. + + WARNING: This feature breaks programs which rely on relocating + or unmapping system mappings. Known broken software at the time + of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore + this config can't be enabled universally. + + For complete descriptions of memory sealing, please see + Documentation/userspace-api/mseal.rst + config SECURITY bool "Enable different security models" depends on SYSFS
On Wed, Mar 05, 2025 at 02:17:05AM +0000, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@chromium.org
Provide infrastructure to mseal system mappings. Establish two kernel configs (CONFIG_MSEAL_SYSTEM_MAPPINGS, ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP macro for future patches.
Signed-off-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Kees Cook kees@kernel.org
Umm... I reviewed this too? :) unless you made substantial changes here (doesn't appear so), please do propagate tags for each revision :>)
Anyway, FWIW:
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
include/linux/mm.h | 10 ++++++++++ init/Kconfig | 22 ++++++++++++++++++++++ security/Kconfig | 21 +++++++++++++++++++++ 3 files changed, 53 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 7b1068ddcbb7..8b800941678d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4155,4 +4155,14 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status); int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
+/*
- mseal of userspace process's system mappings.
- */
+#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS +#define VM_SEALED_SYSMAP VM_SEALED +#else +#define VM_SEALED_SYSMAP VM_NONE +#endif
#endif /* _LINUX_MM_H */ diff --git a/init/Kconfig b/init/Kconfig index d0d021b3fa3b..7f67d8942a09 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1882,6 +1882,28 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
- bool
- help
Control MSEAL_SYSTEM_MAPPINGS access based on architecture.
A 64-bit kernel is required for the memory sealing feature.
No specific hardware features from the CPU are needed.
To enable this feature, the architecture needs to update their
special mappings calls to include the sealing flag and confirm
that it doesn't unmap/remap system mappings during the life
time of the process. The existence of this flag for an architecture
implies that it does not require the remapping of the system
mappings during process lifetime, so sealing these mappings is safe
from a kernel perspective.
After the architecture enables this, a distribution can set
CONFIG_MSEAL_SYSTEM_MAPPING to manage access to the feature.
For complete descriptions of memory sealing, please see
Documentation/userspace-api/mseal.rst
config HAVE_PERF_EVENTS bool help diff --git a/security/Kconfig b/security/Kconfig index f10dbf15c294..a914a02df27e 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -51,6 +51,27 @@ config PROC_MEM_NO_FORCE
endchoice
+config MSEAL_SYSTEM_MAPPINGS
- bool "mseal system mappings"
- depends on 64BIT
- depends on ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
- depends on !CHECKPOINT_RESTORE
- help
Apply mseal on system mappings.
The system mappings includes vdso, vvar, vvar_vclock,
vectors (arm compat-mode), sigpage (arm compat-mode), uprobes.
A 64-bit kernel is required for the memory sealing feature.
No specific hardware features from the CPU are needed.
WARNING: This feature breaks programs which rely on relocating
or unmapping system mappings. Known broken software at the time
of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore
this config can't be enabled universally.
For complete descriptions of memory sealing, please see
Documentation/userspace-api/mseal.rst
config SECURITY bool "Enable different security models" depends on SYSFS -- 2.48.1.711.g2feabab25a-goog
On Wed, Mar 05, 2025 at 05:54:24AM +0000, Lorenzo Stoakes wrote:
On Wed, Mar 05, 2025 at 02:17:05AM +0000, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@chromium.org
Provide infrastructure to mseal system mappings. Establish two kernel configs (CONFIG_MSEAL_SYSTEM_MAPPINGS, ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP macro for future patches.
Signed-off-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Kees Cook kees@kernel.org
Umm... I reviewed this too? :) unless you made substantial changes here (doesn't appear so), please do propagate tags for each revision :>)
Anyway, FWIW:
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
(you also forgot to propagate Liam's tag here)
include/linux/mm.h | 10 ++++++++++ init/Kconfig | 22 ++++++++++++++++++++++ security/Kconfig | 21 +++++++++++++++++++++ 3 files changed, 53 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 7b1068ddcbb7..8b800941678d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4155,4 +4155,14 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status); int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
+/*
- mseal of userspace process's system mappings.
- */
+#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS +#define VM_SEALED_SYSMAP VM_SEALED +#else +#define VM_SEALED_SYSMAP VM_NONE +#endif
#endif /* _LINUX_MM_H */ diff --git a/init/Kconfig b/init/Kconfig index d0d021b3fa3b..7f67d8942a09 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1882,6 +1882,28 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
- bool
- help
Control MSEAL_SYSTEM_MAPPINGS access based on architecture.
A 64-bit kernel is required for the memory sealing feature.
No specific hardware features from the CPU are needed.
To enable this feature, the architecture needs to update their
special mappings calls to include the sealing flag and confirm
that it doesn't unmap/remap system mappings during the life
time of the process. The existence of this flag for an architecture
implies that it does not require the remapping of the system
mappings during process lifetime, so sealing these mappings is safe
from a kernel perspective.
After the architecture enables this, a distribution can set
CONFIG_MSEAL_SYSTEM_MAPPING to manage access to the feature.
For complete descriptions of memory sealing, please see
Documentation/userspace-api/mseal.rst
config HAVE_PERF_EVENTS bool help diff --git a/security/Kconfig b/security/Kconfig index f10dbf15c294..a914a02df27e 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -51,6 +51,27 @@ config PROC_MEM_NO_FORCE
endchoice
+config MSEAL_SYSTEM_MAPPINGS
- bool "mseal system mappings"
- depends on 64BIT
- depends on ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
- depends on !CHECKPOINT_RESTORE
- help
Apply mseal on system mappings.
The system mappings includes vdso, vvar, vvar_vclock,
vectors (arm compat-mode), sigpage (arm compat-mode), uprobes.
A 64-bit kernel is required for the memory sealing feature.
No specific hardware features from the CPU are needed.
WARNING: This feature breaks programs which rely on relocating
or unmapping system mappings. Known broken software at the time
of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore
this config can't be enabled universally.
For complete descriptions of memory sealing, please see
Documentation/userspace-api/mseal.rst
config SECURITY bool "Enable different security models" depends on SYSFS -- 2.48.1.711.g2feabab25a-goog
On Tue, Mar 4, 2025 at 9:57 PM Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
On Wed, Mar 05, 2025 at 05:54:24AM +0000, Lorenzo Stoakes wrote:
On Wed, Mar 05, 2025 at 02:17:05AM +0000, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@chromium.org
Provide infrastructure to mseal system mappings. Establish two kernel configs (CONFIG_MSEAL_SYSTEM_MAPPINGS, ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP macro for future patches.
Signed-off-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Kees Cook kees@kernel.org
Umm... I reviewed this too? :) unless you made substantial changes here (doesn't appear so), please do propagate tags for each revision :>)
Anyway, FWIW:
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
(you also forgot to propagate Liam's tag here)
Sorry about that, I missed "Reviewed-by" from you and Liam's from V8 [1] [2] [1] https://lore.kernel.org/all/maamck3gjqjikefwlubtzg4ymaa6vh47hlxqqn4v23gqwl2t... [2] https://lore.kernel.org/all/0ea20f84-bd66-4180-aa04-0f66ce91bdf6@lucifer.loc...
Thanks
include/linux/mm.h | 10 ++++++++++ init/Kconfig | 22 ++++++++++++++++++++++ security/Kconfig | 21 +++++++++++++++++++++ 3 files changed, 53 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 7b1068ddcbb7..8b800941678d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4155,4 +4155,14 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status); int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
+/*
- mseal of userspace process's system mappings.
- */
+#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS +#define VM_SEALED_SYSMAP VM_SEALED +#else +#define VM_SEALED_SYSMAP VM_NONE +#endif
#endif /* _LINUX_MM_H */ diff --git a/init/Kconfig b/init/Kconfig index d0d021b3fa3b..7f67d8942a09 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1882,6 +1882,28 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
- bool
- help
Control MSEAL_SYSTEM_MAPPINGS access based on architecture.
A 64-bit kernel is required for the memory sealing feature.
No specific hardware features from the CPU are needed.
To enable this feature, the architecture needs to update their
special mappings calls to include the sealing flag and confirm
that it doesn't unmap/remap system mappings during the life
time of the process. The existence of this flag for an architecture
implies that it does not require the remapping of the system
mappings during process lifetime, so sealing these mappings is safe
from a kernel perspective.
After the architecture enables this, a distribution can set
CONFIG_MSEAL_SYSTEM_MAPPING to manage access to the feature.
For complete descriptions of memory sealing, please see
Documentation/userspace-api/mseal.rst
config HAVE_PERF_EVENTS bool help diff --git a/security/Kconfig b/security/Kconfig index f10dbf15c294..a914a02df27e 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -51,6 +51,27 @@ config PROC_MEM_NO_FORCE
endchoice
+config MSEAL_SYSTEM_MAPPINGS
- bool "mseal system mappings"
- depends on 64BIT
- depends on ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
- depends on !CHECKPOINT_RESTORE
- help
Apply mseal on system mappings.
The system mappings includes vdso, vvar, vvar_vclock,
vectors (arm compat-mode), sigpage (arm compat-mode), uprobes.
A 64-bit kernel is required for the memory sealing feature.
No specific hardware features from the CPU are needed.
WARNING: This feature breaks programs which rely on relocating
or unmapping system mappings. Known broken software at the time
of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore
this config can't be enabled universally.
For complete descriptions of memory sealing, please see
Documentation/userspace-api/mseal.rst
config SECURITY bool "Enable different security models" depends on SYSFS -- 2.48.1.711.g2feabab25a-goog
From: Jeff Xu jeffxu@chromium.org
Add code to detect if the vdso is memory sealed, skip the test if it is.
Signed-off-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Kees Cook kees@kernel.org Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com --- .../testing/selftests/x86/test_mremap_vdso.c | 43 +++++++++++++++++++ 1 file changed, 43 insertions(+)
diff --git a/tools/testing/selftests/x86/test_mremap_vdso.c b/tools/testing/selftests/x86/test_mremap_vdso.c index d53959e03593..94bee6e0c813 100644 --- a/tools/testing/selftests/x86/test_mremap_vdso.c +++ b/tools/testing/selftests/x86/test_mremap_vdso.c @@ -14,6 +14,7 @@ #include <errno.h> #include <unistd.h> #include <string.h> +#include <stdbool.h>
#include <sys/mman.h> #include <sys/auxv.h> @@ -55,13 +56,55 @@ static int try_to_remap(void *vdso_addr, unsigned long size)
}
+#define VDSO_NAME "[vdso]" +#define VMFLAGS "VmFlags:" +#define MSEAL_FLAGS "sl" +#define MAX_LINE_LEN 512 + +bool vdso_sealed(FILE *maps) +{ + char line[MAX_LINE_LEN]; + bool has_vdso = false; + + while (fgets(line, sizeof(line), maps)) { + if (strstr(line, VDSO_NAME)) + has_vdso = true; + + if (has_vdso && !strncmp(line, VMFLAGS, strlen(VMFLAGS))) { + if (strstr(line, MSEAL_FLAGS)) + return true; + + return false; + } + } + + return false; +} + int main(int argc, char **argv, char **envp) { pid_t child; + FILE *maps;
ksft_print_header(); ksft_set_plan(1);
+ maps = fopen("/proc/self/smaps", "r"); + if (!maps) { + ksft_test_result_skip( + "Could not open /proc/self/smaps, errno=%d\n", + errno); + + return 0; + } + + if (vdso_sealed(maps)) { + ksft_test_result_skip("vdso is sealed\n"); + return 0; + } + + fclose(maps); + child = fork(); if (child == -1) ksft_exit_fail_msg("failed to fork (%d): %m\n", errno);
From: Jeff Xu jeffxu@chromium.org
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on x86-64, covering the vdso, vvar, vvar_vclock.
Production release testing passes on Android and Chrome OS.
Signed-off-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com Reviewed-by: Kees Cook kees@kernel.org --- arch/x86/Kconfig | 1 + arch/x86/entry/vdso/vma.c | 7 ++++--- 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index be2c311f5118..c6f9ebcbe009 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -26,6 +26,7 @@ config X86_64 depends on 64BIT # Options that are inherently 64-bit kernel only: select ARCH_HAS_GIGANTIC_PAGE + select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 39e6efc1a9ca..a4f312495de1 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -268,7 +268,8 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr) text_start, image->size, VM_READ|VM_EXEC| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC| + VM_SEALED_SYSMAP, &vdso_mapping);
if (IS_ERR(vma)) { @@ -280,7 +281,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr) addr, (__VVAR_PAGES - VDSO_NR_VCLOCK_PAGES) * PAGE_SIZE, VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP| - VM_PFNMAP, + VM_PFNMAP|VM_SEALED_SYSMAP, &vvar_mapping);
if (IS_ERR(vma)) { @@ -293,7 +294,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr) addr + (__VVAR_PAGES - VDSO_NR_VCLOCK_PAGES) * PAGE_SIZE, VDSO_NR_VCLOCK_PAGES * PAGE_SIZE, VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP| - VM_PFNMAP, + VM_PFNMAP|VM_SEALED_SYSMAP, &vvar_vclock_mapping);
if (IS_ERR(vma)) {
From: Jeff Xu jeffxu@chromium.org
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on arm64, covering the vdso, vvar, and compat-mode vectors and sigpage mappings.
Production release testing passes on Android and Chrome OS.
Signed-off-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com Reviewed-by: Kees Cook kees@kernel.org --- arch/arm64/Kconfig | 1 + arch/arm64/kernel/vdso.c | 12 ++++++++---- 2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 940343beb3d4..282d6cb13cfb 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -38,6 +38,7 @@ config ARM64 select ARCH_HAS_KEEPINITRD select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MEM_ENCRYPT + select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index e8ed8e5b713b..69d2b5ceb092 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -198,7 +198,8 @@ static int __setup_additional_pages(enum vdso_abi abi, }
ret = _install_special_mapping(mm, vdso_base, VVAR_NR_PAGES * PAGE_SIZE, - VM_READ|VM_MAYREAD|VM_PFNMAP, + VM_READ|VM_MAYREAD|VM_PFNMAP| + VM_SEALED_SYSMAP, &vvar_map); if (IS_ERR(ret)) goto up_fail; @@ -210,7 +211,8 @@ static int __setup_additional_pages(enum vdso_abi abi, mm->context.vdso = (void *)vdso_base; ret = _install_special_mapping(mm, vdso_base, vdso_text_len, VM_READ|VM_EXEC|gp_flags| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC| + VM_SEALED_SYSMAP, vdso_info[abi].cm); if (IS_ERR(ret)) goto up_fail; @@ -336,7 +338,8 @@ static int aarch32_kuser_helpers_setup(struct mm_struct *mm) */ ret = _install_special_mapping(mm, AARCH32_VECTORS_BASE, PAGE_SIZE, VM_READ | VM_EXEC | - VM_MAYREAD | VM_MAYEXEC, + VM_MAYREAD | VM_MAYEXEC | + VM_SEALED_SYSMAP, &aarch32_vdso_maps[AA32_MAP_VECTORS]);
return PTR_ERR_OR_ZERO(ret); @@ -359,7 +362,8 @@ static int aarch32_sigreturn_setup(struct mm_struct *mm) */ ret = _install_special_mapping(mm, addr, PAGE_SIZE, VM_READ | VM_EXEC | VM_MAYREAD | - VM_MAYWRITE | VM_MAYEXEC, + VM_MAYWRITE | VM_MAYEXEC | + VM_SEALED_SYSMAP, &aarch32_vdso_maps[AA32_MAP_SIGPAGE]); if (IS_ERR(ret)) goto out;
From: Jeff Xu jeffxu@chromium.org
Provide support to mseal the uprobe mapping.
Unlike other system mappings, the uprobe mapping is not established during program startup. However, its lifetime is the same as the process's lifetime. It could be sealed from creation.
Test was done with perf tool, and observe the uprobe mapping is sealed.
Signed-off-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Oleg Nesterov oleg@redhat.com Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com Reviewed-by: Kees Cook kees@kernel.org --- kernel/events/uprobes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index bf2a87a0a378..98632bc47216 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1683,7 +1683,8 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area) }
vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE, - VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, + VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO| + VM_SEALED_SYSMAP, &xol_mapping); if (IS_ERR(vma)) { ret = PTR_ERR(vma);
From: Jeff Xu jeffxu@chromium.org
Update memory sealing documentation to include details about system mappings.
Signed-off-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Kees Cook kees@kernel.org Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com --- Documentation/userspace-api/mseal.rst | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/Documentation/userspace-api/mseal.rst b/Documentation/userspace-api/mseal.rst index 41102f74c5e2..56aee46a9307 100644 --- a/Documentation/userspace-api/mseal.rst +++ b/Documentation/userspace-api/mseal.rst @@ -130,6 +130,26 @@ Use cases
- Chrome browser: protect some security sensitive data structures.
+- System mappings: + The system mappings are created by the kernel and includes vdso, vvar, + vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode), uprobes. + + Those system mappings are readonly only or execute only, memory sealing can + protect them from ever changing to writable or unmmap/remapped as different + attributes. This is useful to mitigate memory corruption issues where a + corrupted pointer is passed to a memory management system. + + If supported by an architecture (CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS), + the CONFIG_MSEAL_SYSTEM_MAPPINGS seals all system mappings of this + architecture. + + The following architectures currently support this feature: x86-64 and arm64. + + WARNING: This feature breaks programs which rely on relocating + or unmapping system mappings. Known broken software at the time + of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore + this config can't be enabled universally. + When not to use mseal ===================== Applications can apply sealing to any virtual memory region from userspace,
From: Jeff Xu jeffxu@chromium.org
Add sysmap_is_sealed.c to test system mappings are sealed.
Note: CONFIG_MSEAL_SYSTEM_MAPPINGS must be set, as indicated in config file.
Signed-off-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com --- tools/testing/selftests/Makefile | 1 + .../mseal_system_mappings/.gitignore | 2 + .../selftests/mseal_system_mappings/Makefile | 6 + .../selftests/mseal_system_mappings/config | 1 + .../mseal_system_mappings/sysmap_is_sealed.c | 119 ++++++++++++++++++ 5 files changed, 129 insertions(+) create mode 100644 tools/testing/selftests/mseal_system_mappings/.gitignore create mode 100644 tools/testing/selftests/mseal_system_mappings/Makefile create mode 100644 tools/testing/selftests/mseal_system_mappings/config create mode 100644 tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 8daac70c2f9d..be836be8f03f 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -61,6 +61,7 @@ TARGETS += mount TARGETS += mount_setattr TARGETS += move_mount_set_group TARGETS += mqueue +TARGETS += mseal_system_mappings TARGETS += nci TARGETS += net TARGETS += net/af_unix diff --git a/tools/testing/selftests/mseal_system_mappings/.gitignore b/tools/testing/selftests/mseal_system_mappings/.gitignore new file mode 100644 index 000000000000..319c497a595e --- /dev/null +++ b/tools/testing/selftests/mseal_system_mappings/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +sysmap_is_sealed diff --git a/tools/testing/selftests/mseal_system_mappings/Makefile b/tools/testing/selftests/mseal_system_mappings/Makefile new file mode 100644 index 000000000000..2b4504e2f52f --- /dev/null +++ b/tools/testing/selftests/mseal_system_mappings/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0-only +CFLAGS += -std=c99 -pthread -Wall $(KHDR_INCLUDES) + +TEST_GEN_PROGS := sysmap_is_sealed + +include ../lib.mk diff --git a/tools/testing/selftests/mseal_system_mappings/config b/tools/testing/selftests/mseal_system_mappings/config new file mode 100644 index 000000000000..675cb9f37b86 --- /dev/null +++ b/tools/testing/selftests/mseal_system_mappings/config @@ -0,0 +1 @@ +CONFIG_MSEAL_SYSTEM_MAPPINGS=y diff --git a/tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c b/tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c new file mode 100644 index 000000000000..0d2af30c3bf5 --- /dev/null +++ b/tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * test system mappings are sealed when + * KCONFIG_MSEAL_SYSTEM_MAPPINGS=y + */ + +#define _GNU_SOURCE +#include <stdio.h> +#include <errno.h> +#include <unistd.h> +#include <string.h> +#include <stdbool.h> + +#include "../kselftest.h" +#include "../kselftest_harness.h" + +#define VMFLAGS "VmFlags:" +#define MSEAL_FLAGS "sl" +#define MAX_LINE_LEN 512 + +bool has_mapping(char *name, FILE *maps) +{ + char line[MAX_LINE_LEN]; + + while (fgets(line, sizeof(line), maps)) { + if (strstr(line, name)) + return true; + } + + return false; +} + +bool mapping_is_sealed(char *name, FILE *maps) +{ + char line[MAX_LINE_LEN]; + + while (fgets(line, sizeof(line), maps)) { + if (!strncmp(line, VMFLAGS, strlen(VMFLAGS))) { + if (strstr(line, MSEAL_FLAGS)) + return true; + + return false; + } + } + + return false; +} + +FIXTURE(basic) { + FILE *maps; +}; + +FIXTURE_SETUP(basic) +{ + self->maps = fopen("/proc/self/smaps", "r"); + if (!self->maps) + SKIP(return, "Could not open /proc/self/smap, errno=%d", + errno); +}; + +FIXTURE_TEARDOWN(basic) +{ + if (self->maps) + fclose(self->maps); +}; + +FIXTURE_VARIANT(basic) +{ + char *name; + bool sealed; +}; + +FIXTURE_VARIANT_ADD(basic, vdso) { + .name = "[vdso]", + .sealed = true, +}; + +FIXTURE_VARIANT_ADD(basic, vvar) { + .name = "[vvar]", + .sealed = true, +}; + +FIXTURE_VARIANT_ADD(basic, vvar_vclock) { + .name = "[vvar_vclock]", + .sealed = true, +}; + +FIXTURE_VARIANT_ADD(basic, sigpage) { + .name = "[sigpage]", + .sealed = true, +}; + +FIXTURE_VARIANT_ADD(basic, vectors) { + .name = "[vectors]", + .sealed = true, +}; + +FIXTURE_VARIANT_ADD(basic, uprobes) { + .name = "[uprobes]", + .sealed = true, +}; + +FIXTURE_VARIANT_ADD(basic, stack) { + .name = "[stack]", + .sealed = false, +}; + +TEST_F(basic, check_sealed) +{ + if (!has_mapping(variant->name, self->maps)) { + SKIP(return, "could not find the mapping, %s", + variant->name); + } + + EXPECT_EQ(variant->sealed, + mapping_is_sealed(variant->name, self->maps)); +}; + +TEST_HARNESS_MAIN
On Wed, Mar 05, 2025 at 02:17:11AM +0000, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@chromium.org
Add sysmap_is_sealed.c to test system mappings are sealed.
Note: CONFIG_MSEAL_SYSTEM_MAPPINGS must be set, as indicated in config file.
Signed-off-by: Jeff Xu jeffxu@chromium.org
Great! Thanks for the negative test addition. :)
Reviewed-by: Kees Cook kees@kernel.org
On Wed, Mar 05, 2025 at 02:17:11AM +0000, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@chromium.org
Add sysmap_is_sealed.c to test system mappings are sealed.
Note: CONFIG_MSEAL_SYSTEM_MAPPINGS must be set, as indicated in config file.
Signed-off-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
tools/testing/selftests/Makefile | 1 + .../mseal_system_mappings/.gitignore | 2 + .../selftests/mseal_system_mappings/Makefile | 6 + .../selftests/mseal_system_mappings/config | 1 + .../mseal_system_mappings/sysmap_is_sealed.c | 119 ++++++++++++++++++ 5 files changed, 129 insertions(+) create mode 100644 tools/testing/selftests/mseal_system_mappings/.gitignore create mode 100644 tools/testing/selftests/mseal_system_mappings/Makefile create mode 100644 tools/testing/selftests/mseal_system_mappings/config create mode 100644 tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 8daac70c2f9d..be836be8f03f 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -61,6 +61,7 @@ TARGETS += mount TARGETS += mount_setattr TARGETS += move_mount_set_group TARGETS += mqueue +TARGETS += mseal_system_mappings
Thanks!
TARGETS += nci TARGETS += net TARGETS += net/af_unix diff --git a/tools/testing/selftests/mseal_system_mappings/.gitignore b/tools/testing/selftests/mseal_system_mappings/.gitignore new file mode 100644 index 000000000000..319c497a595e --- /dev/null +++ b/tools/testing/selftests/mseal_system_mappings/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +sysmap_is_sealed diff --git a/tools/testing/selftests/mseal_system_mappings/Makefile b/tools/testing/selftests/mseal_system_mappings/Makefile new file mode 100644 index 000000000000..2b4504e2f52f --- /dev/null +++ b/tools/testing/selftests/mseal_system_mappings/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0-only +CFLAGS += -std=c99 -pthread -Wall $(KHDR_INCLUDES)
+TEST_GEN_PROGS := sysmap_is_sealed
+include ../lib.mk diff --git a/tools/testing/selftests/mseal_system_mappings/config b/tools/testing/selftests/mseal_system_mappings/config new file mode 100644 index 000000000000..675cb9f37b86 --- /dev/null +++ b/tools/testing/selftests/mseal_system_mappings/config @@ -0,0 +1 @@ +CONFIG_MSEAL_SYSTEM_MAPPINGS=y diff --git a/tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c b/tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c new file mode 100644 index 000000000000..0d2af30c3bf5 --- /dev/null +++ b/tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- test system mappings are sealed when
- KCONFIG_MSEAL_SYSTEM_MAPPINGS=y
- */
+#define _GNU_SOURCE +#include <stdio.h> +#include <errno.h> +#include <unistd.h> +#include <string.h> +#include <stdbool.h>
+#include "../kselftest.h" +#include "../kselftest_harness.h"
+#define VMFLAGS "VmFlags:" +#define MSEAL_FLAGS "sl" +#define MAX_LINE_LEN 512
+bool has_mapping(char *name, FILE *maps) +{
- char line[MAX_LINE_LEN];
- while (fgets(line, sizeof(line), maps)) {
if (strstr(line, name))
return true;
- }
- return false;
+}
+bool mapping_is_sealed(char *name, FILE *maps) +{
- char line[MAX_LINE_LEN];
- while (fgets(line, sizeof(line), maps)) {
if (!strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
if (strstr(line, MSEAL_FLAGS))
return true;
return false;
}
- }
- return false;
+}
+FIXTURE(basic) {
- FILE *maps;
+};
+FIXTURE_SETUP(basic) +{
- self->maps = fopen("/proc/self/smaps", "r");
- if (!self->maps)
SKIP(return, "Could not open /proc/self/smap, errno=%d",
errno);
+};
+FIXTURE_TEARDOWN(basic) +{
- if (self->maps)
fclose(self->maps);
+};
+FIXTURE_VARIANT(basic) +{
- char *name;
- bool sealed;
+};
+FIXTURE_VARIANT_ADD(basic, vdso) {
- .name = "[vdso]",
- .sealed = true,
+};
+FIXTURE_VARIANT_ADD(basic, vvar) {
- .name = "[vvar]",
- .sealed = true,
+};
+FIXTURE_VARIANT_ADD(basic, vvar_vclock) {
- .name = "[vvar_vclock]",
- .sealed = true,
+};
+FIXTURE_VARIANT_ADD(basic, sigpage) {
- .name = "[sigpage]",
- .sealed = true,
+};
+FIXTURE_VARIANT_ADD(basic, vectors) {
- .name = "[vectors]",
- .sealed = true,
+};
+FIXTURE_VARIANT_ADD(basic, uprobes) {
- .name = "[uprobes]",
- .sealed = true,
+};
+FIXTURE_VARIANT_ADD(basic, stack) {
- .name = "[stack]",
- .sealed = false,
+};
+TEST_F(basic, check_sealed) +{
- if (!has_mapping(variant->name, self->maps)) {
SKIP(return, "could not find the mapping, %s",
variant->name);
- }
- EXPECT_EQ(variant->sealed,
mapping_is_sealed(variant->name, self->maps));
+};
+TEST_HARNESS_MAIN
2.48.1.711.g2feabab25a-goog
On Wed, Mar 05, 2025 at 02:17:04AM +0000, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@chromium.org
This is V9 version, addressing comments from V8, without code logic change.
As discussed during mseal() upstream process [1], mseal() protects the VMAs of a given virtual memory range against modifications, such as the read/write (RW) and no-execute (NX) bits. For complete descriptions of memory sealing, please see mseal.rst [2].
The mseal() is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system. For example, such an attacker primitive can break control-flow integrity guarantees since read-only memory that is supposed to be trusted can become writable or .text pages can get remapped.
The system mappings are readonly only, memory sealing can protect them from ever changing to writable or unmmap/remapped as different attributes.
System mappings such as vdso, vvar, vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode), are created by the kernel during program initialization, and could be sealed after creation.
Unlike the aforementioned mappings, the uprobe mapping is not established during program startup. However, its lifetime is the same as the process's lifetime [3]. It could be sealed from creation.
The vsyscall on x86-64 uses a special address (0xffffffffff600000), which is outside the mm managed range. This means mprotect, munmap, and mremap won't work on the vsyscall. Since sealing doesn't enhance the vsyscall's security, it is skipped in this patch. If we ever seal the vsyscall, it is probably only for decorative purpose, i.e. showing the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.
It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may alter the system mappings during restore operations. UML(User Mode Linux) and gVisor, rr are also known to change the vdso/vvar mappings. Consequently, this feature cannot be universally enabled across all systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
To support mseal of system mappings, architectures must define CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special mappings calls to pass mseal flag. Additionally, architectures must confirm they do not unmap/remap system mappings during the process lifetime. The existence of this flag for an architecture implies that it does not require the remapping of thest system mappings during process lifetime, so sealing these mappings is safe from a kernel perspective.
This version covers x86-64 and arm64 archiecture as minimum viable feature.
While no specific CPU hardware features are required for enable this feature on an archiecture, memory sealing requires a 64-bit kernel. Other architectures can choose whether or not to adopt this feature. Currently, I'm not aware of any instances in the kernel code that actively munmap/mremap a system mapping without a request from userspace. The PPC does call munmap when _install_special_mapping fails for vdso; however, it's uncertain if this will ever fail for PPC - this needs to be investigated by PPC in the future [4]. The UML kernel can add this support when KUnit tests require it [5].
In this version, we've improved the handling of system mapping sealing from previous versions, instead of modifying the _install_special_mapping function itself, which would affect all architectures, we now call _install_special_mapping with a sealing flag only within the specific architecture that requires it. This targeted approach offers two key advantages: 1) It limits the code change's impact to the necessary architectures, and 2) It aligns with the software architecture by keeping the core memory management within the mm layer, while delegating the decision of sealing system mappings to the individual architecture, which is particularly relevant since 32-bit architectures never require sealing.
Prior to this patch series, we explored sealing special mappings from userspace using glibc's dynamic linker. This approach revealed several issues:
- The PT_LOAD header may report an incorrect length for vdso, (smaller than its actual size). The dynamic linker, which relies on PT_LOAD information to determine mapping size, would then split and partially seal the vdso mapping. Since each architecture has its own vdso/vvar code, fixing this in the kernel would require going through each archiecture. Our initial goal was to enable sealing readonly mappings, e.g. .text, across all architectures, sealing vdso from kernel since creation appears to be simpler than sealing vdso at glibc.
- The [vvar] mapping header only contains address information, not length information. Similar issues might exist for other special mappings.
- Mappings like uprobe are not covered by the dynamic linker, and there is no effective solution for them.
This feature's security enhancements will benefit ChromeOS, Android, and other high security systems.
Testing: This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly, i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and Android to ensure the sealing doesn't affect the functionality of Chromebook and Android phone.
I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK, normal operations such as browsing the web, open/edit doc are OK.
Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1] Link: Documentation/userspace-api/mseal.rst [2] Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxY... [3] Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt... [4] Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5]
History:
V9:
- Add negative test in selftest (Kees Cook)
- fx typos in text (Kees Cook)
You have a bad habit of missing stuff off these logs. Usually I don't comment, as it's trivial, but while we're here :)
Please try to keep an accurate log of changes requested so you can populate these properly.
Obviously this is not going to block anything. But for future reference...
- Add selftest to main selftest Makefile (Lorenzo Stoakes)
V8:
Nit, but no lore link?
- Change ARCH_SUPPORTS_MSEAL_X to ARCH_SUPPORTS_MSEAL_X (Liam R. Howlett)
- Update comments in Kconfig and mseal.rst (Lorenzo Stoakes, Liam R. Howlett)
- Change patch header perfix to "mseal sysmap" (Lorenzo Stoakes)
- Remove "vm_flags =" (Kees Cook, Liam R. Howlett, Oleg Nesterov)
- Drop uml architecture (Lorenzo Stoakes, Kees Cook)
- Add a selftest to verify system mappings are sealed (Lorenzo Stoakes)
V7: https://lore.kernel.org/all/20250224225246.3712295-1-jeffxu@google.com/
- Remove cover letter from the first patch (Liam R. Howlett)
- Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
- logging and fclose() in selftest (Liam R. Howlett)
V6: https://lore.kernel.org/all/20250224174513.3600914-1-jeffxu@google.com/
- mseal.rst: fix a typo (Randy Dunlap)
- security/Kconfig: add rr into note (Liam R. Howlett)
- remove mseal_system_mappings() and use macro instead (Liam R. Howlett)
- mseal.rst: add incompatible userland software (Lorenzo Stoakes)
- remove RFC from title (Kees Cook)
V5 https://lore.kernel.org/all/20250212032155.1276806-1-jeffxu@google.com/
- Remove kernel cmd line (Lorenzo Stoakes)
- Add test info (Lorenzo Stoakes)
- Add threat model info (Lorenzo Stoakes)
- Fix x86 selftest: test_mremap_vdso
- Restrict code change to ARM64/x86-64/UM arch only.
- Add userprocess.h to include seal_system_mapping().
- Remove sealing vsyscall.
- Split the patch.
V4: https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/
- ARCH_HAS_SEAL_SYSTEM_MAPPINGS (Lorenzo Stoakes)
- test info (Lorenzo Stoakes)
- Update mseal.rst (Liam R. Howlett)
- Update test_mremap_vdso.c (Liam R. Howlett)
- Misc. style, comments, doc update (Liam R. Howlett)
V3: https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/
- Revert uprobe to v1 logic (Oleg Nesterov)
- use CONFIG_SEAL_SYSTEM_MAPPINGS instead of _ALWAYS/_NEVER (Kees Cook)
- Move kernel cmd line from fs/exec.c to mm/mseal.c and misc. (Liam R. Howlett)
V2: https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/
- Seal uprobe always (Oleg Nesterov)
- Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov)
- Rebase to linux_main
V1:
Jeff Xu (7): mseal sysmap: kernel config and header change selftests: x86: test_mremap_vdso: skip if vdso is msealed mseal sysmap: enable x86-64 mseal sysmap: enable arm64 mseal sysmap: uprobe mapping mseal sysmap: update mseal.rst selftest: test system mappings are sealed.
Documentation/userspace-api/mseal.rst | 20 +++ arch/arm64/Kconfig | 1 + arch/arm64/kernel/vdso.c | 12 +- arch/x86/Kconfig | 1 + arch/x86/entry/vdso/vma.c | 7 +- include/linux/mm.h | 10 ++ init/Kconfig | 22 ++++ kernel/events/uprobes.c | 3 +- security/Kconfig | 21 ++++ tools/testing/selftests/Makefile | 1 + .../mseal_system_mappings/.gitignore | 2 + .../selftests/mseal_system_mappings/Makefile | 6 + .../selftests/mseal_system_mappings/config | 1 + .../mseal_system_mappings/sysmap_is_sealed.c | 119 ++++++++++++++++++ .../testing/selftests/x86/test_mremap_vdso.c | 43 +++++++ 15 files changed, 261 insertions(+), 8 deletions(-) create mode 100644 tools/testing/selftests/mseal_system_mappings/.gitignore create mode 100644 tools/testing/selftests/mseal_system_mappings/Makefile create mode 100644 tools/testing/selftests/mseal_system_mappings/config create mode 100644 tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c
-- 2.48.1.711.g2feabab25a-goog
On Tue, Mar 4, 2025 at 9:51 PM Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
On Wed, Mar 05, 2025 at 02:17:04AM +0000, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@chromium.org
This is V9 version, addressing comments from V8, without code logic change.
As discussed during mseal() upstream process [1], mseal() protects the VMAs of a given virtual memory range against modifications, such as the read/write (RW) and no-execute (NX) bits. For complete descriptions of memory sealing, please see mseal.rst [2].
The mseal() is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system. For example, such an attacker primitive can break control-flow integrity guarantees since read-only memory that is supposed to be trusted can become writable or .text pages can get remapped.
The system mappings are readonly only, memory sealing can protect them from ever changing to writable or unmmap/remapped as different attributes.
System mappings such as vdso, vvar, vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode), are created by the kernel during program initialization, and could be sealed after creation.
Unlike the aforementioned mappings, the uprobe mapping is not established during program startup. However, its lifetime is the same as the process's lifetime [3]. It could be sealed from creation.
The vsyscall on x86-64 uses a special address (0xffffffffff600000), which is outside the mm managed range. This means mprotect, munmap, and mremap won't work on the vsyscall. Since sealing doesn't enhance the vsyscall's security, it is skipped in this patch. If we ever seal the vsyscall, it is probably only for decorative purpose, i.e. showing the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.
It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may alter the system mappings during restore operations. UML(User Mode Linux) and gVisor, rr are also known to change the vdso/vvar mappings. Consequently, this feature cannot be universally enabled across all systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
To support mseal of system mappings, architectures must define CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special mappings calls to pass mseal flag. Additionally, architectures must confirm they do not unmap/remap system mappings during the process lifetime. The existence of this flag for an architecture implies that it does not require the remapping of thest system mappings during process lifetime, so sealing these mappings is safe from a kernel perspective.
This version covers x86-64 and arm64 archiecture as minimum viable feature.
While no specific CPU hardware features are required for enable this feature on an archiecture, memory sealing requires a 64-bit kernel. Other architectures can choose whether or not to adopt this feature. Currently, I'm not aware of any instances in the kernel code that actively munmap/mremap a system mapping without a request from userspace. The PPC does call munmap when _install_special_mapping fails for vdso; however, it's uncertain if this will ever fail for PPC - this needs to be investigated by PPC in the future [4]. The UML kernel can add this support when KUnit tests require it [5].
In this version, we've improved the handling of system mapping sealing from previous versions, instead of modifying the _install_special_mapping function itself, which would affect all architectures, we now call _install_special_mapping with a sealing flag only within the specific architecture that requires it. This targeted approach offers two key advantages: 1) It limits the code change's impact to the necessary architectures, and 2) It aligns with the software architecture by keeping the core memory management within the mm layer, while delegating the decision of sealing system mappings to the individual architecture, which is particularly relevant since 32-bit architectures never require sealing.
Prior to this patch series, we explored sealing special mappings from userspace using glibc's dynamic linker. This approach revealed several issues:
- The PT_LOAD header may report an incorrect length for vdso, (smaller than its actual size). The dynamic linker, which relies on PT_LOAD information to determine mapping size, would then split and partially seal the vdso mapping. Since each architecture has its own vdso/vvar code, fixing this in the kernel would require going through each archiecture. Our initial goal was to enable sealing readonly mappings, e.g. .text, across all architectures, sealing vdso from kernel since creation appears to be simpler than sealing vdso at glibc.
- The [vvar] mapping header only contains address information, not length information. Similar issues might exist for other special mappings.
- Mappings like uprobe are not covered by the dynamic linker, and there is no effective solution for them.
This feature's security enhancements will benefit ChromeOS, Android, and other high security systems.
Testing: This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly, i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and Android to ensure the sealing doesn't affect the functionality of Chromebook and Android phone.
I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK, normal operations such as browsing the web, open/edit doc are OK.
Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1] Link: Documentation/userspace-api/mseal.rst [2] Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxY... [3] Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt... [4] Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5]
History:
V9:
- Add negative test in selftest (Kees Cook)
- fx typos in text (Kees Cook)
You have a bad habit of missing stuff off these logs. Usually I don't comment, as it's trivial, but while we're here :)
Please try to keep an accurate log of changes requested so you can populate these properly.
Obviously this is not going to block anything. But for future reference...
- Add selftest to main selftest Makefile (Lorenzo Stoakes)
V8:
Nit, but no lore link?
https://lore.kernel.org/all/20250303050921.3033083-1-jeffxu@google.com/
Thanks for noticing this.
- Change ARCH_SUPPORTS_MSEAL_X to ARCH_SUPPORTS_MSEAL_X (Liam R. Howlett)
- Update comments in Kconfig and mseal.rst (Lorenzo Stoakes, Liam R. Howlett)
- Change patch header perfix to "mseal sysmap" (Lorenzo Stoakes)
- Remove "vm_flags =" (Kees Cook, Liam R. Howlett, Oleg Nesterov)
- Drop uml architecture (Lorenzo Stoakes, Kees Cook)
- Add a selftest to verify system mappings are sealed (Lorenzo Stoakes)
V7: https://lore.kernel.org/all/20250224225246.3712295-1-jeffxu@google.com/
- Remove cover letter from the first patch (Liam R. Howlett)
- Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
- logging and fclose() in selftest (Liam R. Howlett)
V6: https://lore.kernel.org/all/20250224174513.3600914-1-jeffxu@google.com/
- mseal.rst: fix a typo (Randy Dunlap)
- security/Kconfig: add rr into note (Liam R. Howlett)
- remove mseal_system_mappings() and use macro instead (Liam R. Howlett)
- mseal.rst: add incompatible userland software (Lorenzo Stoakes)
- remove RFC from title (Kees Cook)
V5 https://lore.kernel.org/all/20250212032155.1276806-1-jeffxu@google.com/
- Remove kernel cmd line (Lorenzo Stoakes)
- Add test info (Lorenzo Stoakes)
- Add threat model info (Lorenzo Stoakes)
- Fix x86 selftest: test_mremap_vdso
- Restrict code change to ARM64/x86-64/UM arch only.
- Add userprocess.h to include seal_system_mapping().
- Remove sealing vsyscall.
- Split the patch.
V4: https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/
- ARCH_HAS_SEAL_SYSTEM_MAPPINGS (Lorenzo Stoakes)
- test info (Lorenzo Stoakes)
- Update mseal.rst (Liam R. Howlett)
- Update test_mremap_vdso.c (Liam R. Howlett)
- Misc. style, comments, doc update (Liam R. Howlett)
V3: https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/
- Revert uprobe to v1 logic (Oleg Nesterov)
- use CONFIG_SEAL_SYSTEM_MAPPINGS instead of _ALWAYS/_NEVER (Kees Cook)
- Move kernel cmd line from fs/exec.c to mm/mseal.c and misc. (Liam R. Howlett)
V2: https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/
- Seal uprobe always (Oleg Nesterov)
- Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov)
- Rebase to linux_main
V1:
Jeff Xu (7): mseal sysmap: kernel config and header change selftests: x86: test_mremap_vdso: skip if vdso is msealed mseal sysmap: enable x86-64 mseal sysmap: enable arm64 mseal sysmap: uprobe mapping mseal sysmap: update mseal.rst selftest: test system mappings are sealed.
Documentation/userspace-api/mseal.rst | 20 +++ arch/arm64/Kconfig | 1 + arch/arm64/kernel/vdso.c | 12 +- arch/x86/Kconfig | 1 + arch/x86/entry/vdso/vma.c | 7 +- include/linux/mm.h | 10 ++ init/Kconfig | 22 ++++ kernel/events/uprobes.c | 3 +- security/Kconfig | 21 ++++ tools/testing/selftests/Makefile | 1 + .../mseal_system_mappings/.gitignore | 2 + .../selftests/mseal_system_mappings/Makefile | 6 + .../selftests/mseal_system_mappings/config | 1 + .../mseal_system_mappings/sysmap_is_sealed.c | 119 ++++++++++++++++++ .../testing/selftests/x86/test_mremap_vdso.c | 43 +++++++ 15 files changed, 261 insertions(+), 8 deletions(-) create mode 100644 tools/testing/selftests/mseal_system_mappings/.gitignore create mode 100644 tools/testing/selftests/mseal_system_mappings/Makefile create mode 100644 tools/testing/selftests/mseal_system_mappings/config create mode 100644 tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c
-- 2.48.1.711.g2feabab25a-goog
linux-kselftest-mirror@lists.linaro.org