On Fri, Dec 18, 2020 at 02:57:51PM +0100, Thomas Gleixner wrote:
On Thu, Dec 17 2020 at 23:43, Thomas Gleixner wrote:
The only use case for this in your tree is: kmap() and the possible usage of that mapping outside of the thread context which sets it up.
The only hint for doing this at all is:
Some users, such as kmap(), sometimes requires PKS to be global.
'sometime requires' is really _not_ a technical explanation.
Where is the explanation why kmap() usage 'sometimes' requires this global trainwreck in the first place and where is the analysis why this can't be solved differently?
Detailed use case analysis please.
A lengthy conversation with Dan and Dave over IRC confirmed what I was suspecting.
The approach of this whole PKS thing is to make _all_ existing code magically "work". That means aside of the obvious thread local mappings, the kmap() part is needed to solve the problem of async handling where the mapping is handed to some other context which then uses it and notifies the context which created the mapping when done. That's the principle which was used to make highmem work long time ago.
IMO that was a mistake back then. The right thing would have been to change the code so that it does not rely on a temporary mapping created by the initiator. Instead let the initiator hand the page over to the other context which then creates a temporary mapping for fiddling with it. Water under the bridge...
But maybe not. We are getting rid of a lot of the kmaps and once the bulk are gone perhaps we can change this and remove kmap completely?
Glueing PKS on to that kmap() thing is horrible and global PKS is pretty much the opposite of what PKS wants to achieve. It's disabling protection systemwide for an unspecified amount of time and for all contexts.
I agree. This is why I have been working on converting kmap() call sites to kmap_local_page().[1]
So instead of trying to make global PKS "work" we really should go and take a smarter approach.
- Many kmap() use cases are strictly thread local and the mapped address is never handed to some other context, which means this can be replaced with kmap_local() now, which preserves the mapping accross preemption. PKS just works nicely on top of that.
Yes hence the massive kmap->kmap_thread patch set which is now becoming kmap_local_page().[2]
- Modify kmap() so that it marks the to be mapped page as 'globaly unprotected' instead of doing this global unprotect PKS dance. kunmap() undoes that. That obviously needs some thought vs. refcounting if there are concurrent users, but that's a solvable problem either as part of struct page itself or stored in some global hash.
How would this globally unprotected flag work? I suppose if kmap created a new PTE we could make that PTE non-PKS protected then we don't have to fiddle with the register... I think I like that idea.
Have PKS modes:
- STRICT: No pardon
- RELAXED: Warn and unprotect temporary for the current context - SILENT: Like RELAXED, but w/o warning to make sysadmins happy. Default should be RELAXED. - OFF: Disable the whole PKS thing
I'm not really sure how this solves the global problem but it is probably worth having in general.
Have a smart #PF mechanism which does:
if (error_code & X86_PF_PK) { page = virt_to_page(address);
if (!page || !page_is_globaly_unprotected(page)) goto die; if (pks_mode == PKS_MODE_STRICT) goto die; WARN_ONCE(pks_mode == PKS_MODE_RELAXED, "Useful info ..."); temporary_unprotect(page, regs); return;
}
I feel like this is very similar to what I had in the global patch you found in my git tree with the exception of the RELAXED mode. I simply had globally unprotected or die.
global_pkey_is_enabled() handles the page_is_globaly_unprotected() and temporary_unprotect().[3]
Anyway, I'm sorry (but not sorry) that you found it. I've been trying to get 0-day and other testing on it and my public tree was the easiest way to do that. Anyway...
The patch as a whole needs work. You are 100% correct that if a mapping is handed to another context it is going to suck performance wise. It has had some internal review but not much.
Regardless I think unprotecting a global context is the easy part. The code you had a problem with (and I see is fully broken) was the restriction of access. A failure to update in that direction would only result in a wider window of access. I contemplated not doing a global update at all and just leave the access open until the next context switch. But the code as it stands tries to force an update for a couple of reasons:
1) kmap_local_page() removes most of the need for global pks. So I was thinking that global PKS could be a slow path.
2) kmap()'s that are handed to other contexts they are likely to be 'long term' and should not need to be updated 'too' often. I will admit that I don't know how often 'too often' is.
But IMO these questions are best left to after the kmaps are converted. Thus this patch set was just basic support. Other uses cases beyond pmem such as trusted keys or secret mem don't need a global pks feature and could build on the patch set submitted. I was trying to break the problem down.
temporary_unprotect(page, regs) { key = page_to_key(page);
/* Return from #PF will establish this for the faulting context */ extended_state(regs)->pks &= ~PKS_MASK(key); }
This temporary unprotect is undone when the context is left, so depending on the context (thread, interrupt, softirq) the unprotected section might be way wider than actually needed, but that's still orders of magnitudes better than having this fully unrestricted global PKS mode which is completely scopeless.
I'm not sure I follow you. How would we know when the context is left?
The above is at least restricted to the pages which are in use for a particular operation. Stray pointers during that time are obviously not caught, but that's not any different from that proposed global thingy. The warning allows to find the non-obvious places so they can be analyzed and worked on.
I could add the warning for sure.
The DAX case which you made "work" with dev_access_enable() and dev_access_disable(), i.e. with yet another lazy approach of avoiding to change a handful of usage sites.
The use cases are strictly context local which means the global magic is not used at all. Why does it exist in the first place?
I'm not following. What is 'it'?
Aside of that this global thing would never work at all because the refcounting is per thread and not global. So that DAX use case is just a matter of: grant/revoke_access(DEV_PKS_KEY, READ/WRITE) which is effective for the current execution context and really wants to be a distinct READ/WRITE protection and not the magic global thing which just has on/off. All usage sites know whether they want to read or write.
That leaves the question about the refcount. AFAICT, nothing nests in that use case for a given execution context. I'm surely missing something subtle here.
The refcount is needed for non-global pks as well as global. I've not resolved if anything needs to be done with the refcount on the global update since the following is legal.
kmap() kmap_local_page() kunmap() kunmap_local()
Which would be a problem. But I don't think it is ever actually done.
Another problem would be if the kmap and kunmap happened in different contexts... :-/ I don't think that is done either but I don't know for certain.
Frankly, my main focus before any of this global support has been to get rid of as many kmaps as possible.[1] Once that is done I think more of these questions can be answered better.
Ira
[1] https://lore.kernel.org/lkml/20201210171834.2472353-1-ira.weiny@intel.com/ [2] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/ [3] Latest untested patch pushed for reference here because I can't find exactly the branch you found. https://github.com/weiny2/linux-kernel/commit/37439e91e141be58c13ccc4462f778...
Hmm?
Thanks,
tglx