On Thu, Dec 17 2020 at 23:43, Thomas Gleixner wrote:
The only use case for this in your tree is: kmap() and the possible usage of that mapping outside of the thread context which sets it up.
The only hint for doing this at all is:
Some users, such as kmap(), sometimes requires PKS to be global.
'sometime requires' is really _not_ a technical explanation.
Where is the explanation why kmap() usage 'sometimes' requires this global trainwreck in the first place and where is the analysis why this can't be solved differently?
Detailed use case analysis please.
A lengthy conversation with Dan and Dave over IRC confirmed what I was suspecting.
The approach of this whole PKS thing is to make _all_ existing code magically "work". That means aside of the obvious thread local mappings, the kmap() part is needed to solve the problem of async handling where the mapping is handed to some other context which then uses it and notifies the context which created the mapping when done. That's the principle which was used to make highmem work long time ago.
IMO that was a mistake back then. The right thing would have been to change the code so that it does not rely on a temporary mapping created by the initiator. Instead let the initiator hand the page over to the other context which then creates a temporary mapping for fiddling with it. Water under the bridge...
Glueing PKS on to that kmap() thing is horrible and global PKS is pretty much the opposite of what PKS wants to achieve. It's disabling protection systemwide for an unspecified amount of time and for all contexts.
So instead of trying to make global PKS "work" we really should go and take a smarter approach.
1) Many kmap() use cases are strictly thread local and the mapped address is never handed to some other context, which means this can be replaced with kmap_local() now, which preserves the mapping accross preemption. PKS just works nicely on top of that.
2) Modify kmap() so that it marks the to be mapped page as 'globaly unprotected' instead of doing this global unprotect PKS dance. kunmap() undoes that. That obviously needs some thought vs. refcounting if there are concurrent users, but that's a solvable problem either as part of struct page itself or stored in some global hash.
3) Have PKS modes:
- STRICT: No pardon
- RELAXED: Warn and unprotect temporary for the current context
- SILENT: Like RELAXED, but w/o warning to make sysadmins happy. Default should be RELAXED.
- OFF: Disable the whole PKS thing
4) Have a smart #PF mechanism which does:
if (error_code & X86_PF_PK) { page = virt_to_page(address);
if (!page || !page_is_globaly_unprotected(page)) goto die;
if (pks_mode == PKS_MODE_STRICT) goto die;
WARN_ONCE(pks_mode == PKS_MODE_RELAXED, "Useful info ...");
temporary_unprotect(page, regs); return; }
temporary_unprotect(page, regs) { key = page_to_key(page);
/* Return from #PF will establish this for the faulting context */ extended_state(regs)->pks &= ~PKS_MASK(key); }
This temporary unprotect is undone when the context is left, so depending on the context (thread, interrupt, softirq) the unprotected section might be way wider than actually needed, but that's still orders of magnitudes better than having this fully unrestricted global PKS mode which is completely scopeless.
The above is at least restricted to the pages which are in use for a particular operation. Stray pointers during that time are obviously not caught, but that's not any different from that proposed global thingy.
The warning allows to find the non-obvious places so they can be analyzed and worked on.
5) The DAX case which you made "work" with dev_access_enable() and dev_access_disable(), i.e. with yet another lazy approach of avoiding to change a handful of usage sites.
The use cases are strictly context local which means the global magic is not used at all. Why does it exist in the first place?
Aside of that this global thing would never work at all because the refcounting is per thread and not global.
So that DAX use case is just a matter of:
grant/revoke_access(DEV_PKS_KEY, READ/WRITE)
which is effective for the current execution context and really wants to be a distinct READ/WRITE protection and not the magic global thing which just has on/off. All usage sites know whether they want to read or write.
That leaves the question about the refcount. AFAICT, nothing nests in that use case for a given execution context. I'm surely missing something subtle here.
Hmm?
Thanks,
tglx