On Thu, May 2, 2019 at 11:18 AM Peter Zijlstra peterz@infradead.org wrote:
We could fix this by not using the common exit path on int3; not sure we want to go there, but that is an option.
I don't think it's an option in general, because *some* int3 invocations will need all the usual error return.
But I guess we could make "int3 from kernel space" special.
I'm not sure how much that would help, but it might be worth looking into.
ARGH; I knew it was too pretty :/ Yes, something like what you suggest will be needed, I'll go look at that once my brain recovers a bit from staring at entry code all day.
Looks like it works based on your other email.
What would it look like with the "int3-from-kernel is special" modification?
Because *if* we can make the "kernel int3" entirely special, that would make the "Eww factor" much less of this whole thing.
I forget: is #BP _only_ for the "int3" instruction? I know we have really nasty cases with #DB (int1) because of "pending exceptions happen on the first instruction in kernel space", and that makes it really really nasty to handle with all the stack switch and %cr3 handling etc.
But if "int3 from kernel space" _only_ happens on actual "int3" instructions, then we really could just special-case that case. We'd know that %cr3 has been switched, we'd know that we don't need to do fsgs switching, we'd know we already have a good stack and percpu data etc set up.
So then special casing #BP would actually allow us to have a simple and straightforward kernel-int3-only sequence?
And then having that odd stack setup special case would be *much* more palatable to me.
Linus