On 11/6/20 3:29 PM, ira.weiny@intel.com wrote:
From: Fenghua Yu fenghua.yu@intel.com
PKS allows kernel users to define domains of page mappings which have additional protections beyond the paging protections.
Add an API to allocate, use, and free a protection key which identifies such a domain. Export 5 new symbols pks_key_alloc(), pks_mknoaccess(), pks_mkread(), pks_mkrdwr(), and pks_key_free(). Add 2 new macros; PAGE_KERNEL_PKEY(key) and _PAGE_PKEY(pkey).
Update the protection key documentation to cover pkeys on supervisor pages.
Co-developed-by: Ira Weiny ira.weiny@intel.com Signed-off-by: Ira Weiny ira.weiny@intel.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com
Documentation/core-api/protection-keys.rst | 102 +++++++++++++--- arch/x86/include/asm/pgtable_types.h | 12 ++ arch/x86/include/asm/pkeys.h | 11 ++ arch/x86/include/asm/pkeys_common.h | 4 + arch/x86/mm/pkeys.c | 128 +++++++++++++++++++++ include/linux/pgtable.h | 4 + include/linux/pkeys.h | 24 ++++ 7 files changed, 267 insertions(+), 18 deletions(-)
diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index ec575e72d0b2..c4e6c480562f 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -4,25 +4,33 @@ Memory Protection Keys ====================== -Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature -which is found on Intel's Skylake (and later) "Scalable Processor" -Server CPUs. It will be available in future non-server Intel parts -and future AMD processors.
-For anyone wishing to test or use this feature, it is available in -Amazon's EC2 C5 instances and is known to work there using an Ubuntu -17.04 image.
Memory Protection Keys provides a mechanism for enforcing page-based
provide
protections, but without requiring modification of the page tables -when an application changes protection domains. It works by -dedicating 4 previously ignored bits in each page table entry to a -"protection key", giving 16 possible keys. +when an application changes protection domains.
+PKeys Userspace (PKU) is a feature which is found on Intel's Skylake "Scalable +Processor" Server CPUs and later. And It will be available in future
it
+non-server Intel parts and future AMD processors.
+Future Intel processors will support Protection Keys for Supervisor pages +(PKS).
+For anyone wishing to test or use user space pkeys, it is available in Amazon's +EC2 C5 instances and is known to work there using an Ubuntu 17.04 image.
+pkeys work by dedicating 4 previously Reserved bits in each page table entry to +a "protection key", giving 16 possible keys. User and Supervisor pages are +treated separately.
+Protections for each page are controlled with per CPU registers for each type
per-CPU
+of page User and Supervisor. Each of these 32 bit register stores two separate
32-bit registers
+bits (Access Disable and Write Disable) for each key. -There is also a new user-accessible register (PKRU) with two separate -bits (Access Disable and Write Disable) for each key. Being a CPU -register, PKRU is inherently thread-local, potentially giving each -thread a different set of protections from every other thread. +For Userspace the register is user-accessible (rdpkru/wrpkru). For +Supervisor, the register (MSR_IA32_PKRS) is accessible only to the kernel.
+Being a CPU register, pkeys are inherently thread-local, potentially giving +each thread an independent set of protections from every other thread. There are two new instructions (RDPKRU/WRPKRU) for reading and writing to the new register. The feature is only available in 64-bit mode, @@ -30,8 +38,11 @@ even though there is theoretically space in the PAE PTEs. These permissions are enforced on data access only and have no effect on instruction fetches.
-Syscalls
+For kernel space rdmsr/wrmsr are used to access the kernel MSRs.
+Syscalls for user space keys +============================ There are 3 system calls which directly interact with pkeys:: @@ -98,3 +109,58 @@ with a read():: The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated.
+Kernel API for PKS support +==========================
+The following interface is used to allocate, use, and free a pkey which defines +a 'protection domain' within the kernel. Setting a pkey value in a supervisor +mapping adds that mapping to the protection domain.
int pks_key_alloc(const char * const pkey_user, int flags);
#define PAGE_KERNEL_PKEY(pkey)
#define _PAGE_KEY(pkey)
void pks_mk_noaccess(int pkey);
void pks_mk_readonly(int pkey);
void pks_mk_readwrite(int pkey);
void pks_key_free(int pkey);
+pks_key_alloc() allocates keys dynamically to allow better use of the limited +key space. 'flags' alter the allocation based on the users need. Currently
user's or maybe users'
+they can request an exclusive key.
+Callers of pks_key_alloc() _must_ be prepared for it to fail and take +appropriate action. This is due mainly to the fact that PKS may not be +available on all arch's. Failure to check the return of pks_key_alloc() and +using any of the rest of the API is undefined.
+Kernel users must set the PTE permissions in the page table entries for the +mappings they want to protect. This can be done with PAGE_KERNEL_PKEY() or +_PAGE_KEY().
+The pks_mk*() family of calls allows kernel users the ability to change the +protections for the domain identified by the pkey specified. 3 states are +available pks_mk_noaccess(), pks_mk_readonly(), and pks_mk_readwrite() which
available:
+set the access to none, read, and read/write respectively.
+Finally, pks_key_free() allows a user to return the key to the allocator for +use by others.
+The interface maintains pks_mk_noaccess() (Access Disabled (AD=1)) for all keys +not currently allocated. Therefore, the user can depend on access being +disabled when pks_key_alloc() returns a key and the user should remove mappings +from the domain (remove the pkey from the PTE) prior to calling pks_key_free().
+It should be noted that the underlying WRMSR(MSR_IA32_PKRS) is not serializing +but still maintains ordering properties similar to WRPKRU. Thus it is safe to +immediately use a mapping when the pks_mk*() functions returns.
return.
+The current SDM section on PKRS needs updating but should be the same as that +of WRPKRU. So to quote from the WRPKRU text:
- WRPKRU will never execute transiently. Memory accesses
- affected by PKRU register will not execute (even transiently)
- until all prior executions of WRPKRU have completed execution
- and updated the PKRU register.