On 10/9/20 12:42 PM, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
The PKRS MSR is defined as a per-logical-processor register. This isolates memory access by logical CPU. Unfortunately, the MSR is not managed by XSAVE. Therefore, tasks must save/restore the MSR value on context switch.
Define a saved PKRS value in the task struct, as well as a cached per-logical-processor MSR value which mirrors the MSR value of the current CPU. Initialize all tasks with the default MSR value. Then, on schedule in, check the saved task MSR vs the per-cpu value. If different proceed to write the MSR. If not avoid the overhead of the MSR write and continue.
It's probably nice to note how the WRMSR is special here, in addition to the comments below.
#endif /*_ASM_X86_PKEYS_INTERNAL_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 97143d87994c..da2381136b2d 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -18,6 +18,7 @@ struct vm86; #include <asm/cpufeatures.h> #include <asm/page.h> #include <asm/pgtable_types.h> +#include <asm/pkeys_common.h> #include <asm/percpu.h> #include <asm/msr.h> #include <asm/desc_defs.h> @@ -542,6 +543,11 @@ struct thread_struct { unsigned int sig_on_uaccess_err:1; +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
- /* Saved Protection key register for supervisor mappings */
- u32 saved_pkrs;
+#endif
Could you take a look around thread_struct and see if there are some other MSRs near which you can stash this? This seems like a bit of a lonely place.
...
void flush_thread(void) { struct task_struct *tsk = current; @@ -195,6 +212,8 @@ void flush_thread(void) memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array)); fpu__clear_all(&tsk->thread.fpu);
- pks_init_task(tsk);
} void disable_TSC(void) @@ -644,6 +663,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p) if ((tifp ^ tifn) & _TIF_SLD) switch_to_sld(tifn);
- pks_sched_in();
} /* diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 3cf8f775f36d..30f65dd3d0c5 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -229,3 +229,31 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags) return pk_reg; }
+DEFINE_PER_CPU(u32, pkrs_cache);
+/**
- It should also be noted that the underlying WRMSR(MSR_IA32_PKRS) is not
- serializing but still maintains ordering properties similar to WRPKRU.
- The current SDM section on PKRS needs updating but should be the same as
- that of WRPKRU. So to quote from the WRPKRU text:
- WRPKRU will never execute transiently. Memory accesses
- affected by PKRU register will not execute (even transiently)
- until all prior executions of WRPKRU have completed execution
- and updated the PKRU register.
- */
+void write_pkrs(u32 new_pkrs) +{
- u32 *pkrs;
- if (!static_cpu_has(X86_FEATURE_PKS))
return;
- pkrs = get_cpu_ptr(&pkrs_cache);
- if (*pkrs != new_pkrs) {
*pkrs = new_pkrs;
wrmsrl(MSR_IA32_PKRS, new_pkrs);
- }
- put_cpu_ptr(pkrs);
+}
It bugs me a *bit* that this is being called in a preempt-disabled region, but we still bother with the get/put_cpu jazz. Are there other future call-sites for this that aren't in preempt-disabled regions?