On Thu, Dec 17 2020 at 15:50, Thomas Gleixner wrote:
On Fri, Nov 06 2020 at 15:29, ira weiny wrote:
+void write_pkrs(u32 new_pkrs) +{
- u32 *pkrs;
- if (!static_cpu_has(X86_FEATURE_PKS))
return;
- pkrs = get_cpu_ptr(&pkrs_cache);
So this is called from various places including schedule and also from the low level entry/exit code. Why do we need to have an extra preempt_disable/enable() there via get/put_cpu_ptr()?
Just because performance in those code paths does not matter?
- if (*pkrs != new_pkrs) {
*pkrs = new_pkrs;
wrmsrl(MSR_IA32_PKRS, new_pkrs);
- }
- put_cpu_ptr(pkrs);
Which made me look at the other branch of your git repo just because I wanted to know about the 'other' storage requirements and I found this gem:
update_global_pkrs() ... /* * If we are preventing access from the old value. Force the * update on all running threads. */ if (((old_val == 0) && protection) || ((old_val & PKR_WD_BIT) && (protection & PKEY_DISABLE_ACCESS))) { int cpu;
for_each_online_cpu(cpu) { u32 *ptr = per_cpu_ptr(&pkrs_cache, cpu); *ptr = update_pkey_val(*ptr, pkey, protection); wrmsrl_on_cpu(cpu, MSR_IA32_PKRS, *ptr); put_cpu_ptr(ptr);
1) per_cpu_ptr() -> put_cpu_ptr() is broken as per_cpu_ptr() is not disabling preemption while put_cpu_ptr() enables it which wreckages the preemption count.
How was that ever tested at all with any debug option enabled?
Answer: Not at all
2) How is that sequence:
ptr = per_cpu_ptr(&pkrs_cache, cpu); *ptr = update_pkey_val(*ptr, pkey, protection); wrmsrl_on_cpu(cpu, MSR_IA32_PKRS, *ptr);
supposed to be correct vs. a concurrent modification of the pkrs_cache of the remote CPU?
Answer: Not at all
Also doing a wrmsrl_on_cpu() on _each_ online CPU is insane at best.
A smp function call on a remote CPU takes ~3-5us when the remote CPU is not idle and can immediately respond. If the remote CPU is deep in idle it can take up to 100us depending on C-State it is in.
Even if the remote CPU is not not idle and just has interrupts disabled for a few dozen of microseconds this adds up.
So on a 256 CPU system depending on the state of the remote CPUs this stalls the CPU doing the update for anything between 1 and 25ms worst case.
Of course that also violates _all_ CPU isolation mechanisms.
What for?
Just for the theoretical chance that _all_ remote CPUs have seen that global permission and have it still active?
You're not serious about that, right?
The only use case for this in your tree is: kmap() and the possible usage of that mapping outside of the thread context which sets it up.
The only hint for doing this at all is:
Some users, such as kmap(), sometimes requires PKS to be global.
'sometime requires' is really _not_ a technical explanation.
Where is the explanation why kmap() usage 'sometimes' requires this global trainwreck in the first place and where is the analysis why this can't be solved differently?
Detailed use case analysis please.
Thanks,
tglx