From: Jeff Xu jeffxu@google.com
This is the first set of Memory mapping (VMA) protection patches using PKU.
* * *
Background:
As discussed previously in the kernel mailing list [1], V8 CFI [2] uses PKU to protect memory, and Stephen Röttger proposes to extend the PKU to memory mapping [3].
We're using PKU for in-process isolation to enforce control-flow integrity for a JIT compiler. In our threat model, an attacker exploits a vulnerability and has arbitrary read/write access to the whole process space concurrently to other threads being executed. This attacker can manipulate some arguments to syscalls from some threads.
Under such a powerful attack, we want to create a “safe/isolated” thread environment. We assign dedicated PKUs to this thread, and use those PKUs to protect the threads’ runtime environment. The thread has exclusive access to its run-time memory. This includes modifying the protection of the memory mapping, or munmap the memory mapping after use. And the other threads won’t be able to access the memory or modify the memory mapping (VMA) belonging to the thread.
* * *
Proposed changes:
This patch introduces a new flag, PKEY_ENFORCE_API, to the pkey_alloc() function. When a PKEY is created with this flag, it is enforced that any thread that wants to make changes to the memory mapping (such as mprotect) of the memory must have write access to the PKEY. PKEYs created without this flag will continue to work as they do now, for backwards compatibility.
Only PKEY created from user space can have the new flag set, the PKEY allocated by the kernel internally will not have it. In other words, ARCH_DEFAULT_PKEY(0) and execute_only_pkey won’t have this flag set, and continue work as today.
This flag is checked only at syscall entry, such as mprotect/munmap in this set of patches. It will not apply to other call paths. In other words, if the kernel want to change attributes of VMA for some reasons, the kernel is free to do that and not affected by this new flag.
This set of patch covers mprotect/munmap, I plan to work on other syscalls after this.
* * *
Testing:
I have tested this patch on a Linux kernel 5.15, 6,1, and 6.4-rc1, new selftest is added in: pkey_enforce_api.c
* * *
Discussion:
We believe that this patch provides a valuable security feature. It allows us to create “safe/isolated” thread environments that are protected from attackers with arbitrary read/write access to the process space.
We believe that the interface change and the patch don't introduce backwards compatibility risk.
We would like to disucss this patch in Linux kernel community for feedback and support.
* * *
Reference:
[1]https://lore.kernel.org/all/202208221331.71C50A6F@keescook/ [2]https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgea... [3]https://docs.google.com/document/d/1qqVoVfRiF2nRylL3yjZyCQvzQaej1HRPh3f5wj1A...
* * * Current status:
There are on-going discussion related to threat model, io_uring, we will continue discuss using v0 thread.
* * * PATCH history:
v1: update code related review comments: mprotect.c: remove syscall from do_mprotect_pkey() remove pr_warn_ratelimited
munmap.c: change syscall to enum caller_origin remove pr_warn_ratelimited
v0: https://lore.kernel.org/linux-mm/20230515130553.2311248-1-jeffxu@chromium.or...
Best Regards, -Jeff Xu
Jeff Xu (6): PKEY: Introduce PKEY_ENFORCE_API flag PKEY: Add arch_check_pkey_enforce_api() PKEY: Apply PKEY_ENFORCE_API to mprotect PKEY:selftest pkey_enforce_api for mprotect PKEY: Apply PKEY_ENFORCE_API to munmap PKEY:selftest pkey_enforce_api for munmap
arch/powerpc/include/asm/pkeys.h | 19 +- arch/x86/include/asm/mmu.h | 7 + arch/x86/include/asm/pkeys.h | 92 +- arch/x86/mm/pkeys.c | 2 +- include/linux/mm.h | 8 +- include/linux/pkeys.h | 18 +- include/uapi/linux/mman.h | 5 + mm/mmap.c | 31 +- mm/mprotect.c | 17 +- mm/mremap.c | 6 +- tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/pkey_enforce_api.c | 1312 +++++++++++++++++ 12 files changed, 1499 insertions(+), 19 deletions(-) create mode 100644 tools/testing/selftests/mm/pkey_enforce_api.c
base-commit: ba0ad6ed89fd5dada3b7b65ef2b08e95d449d4ab