On Fri, May 3, 2019 at 10:08 PM Linus Torvalds torvalds@linux-foundation.org wrote:
I'll look at it tomorrow, but I think this actually makes unnecessary changes.
In particular, I think we could keep the existing entry code almost unchanged with this whole approach.
So here's what I *think* should work. Note that I also removed your test-case code, because it really didn't have a chance in hell of working. Doing that
int3_emulate_call(regs, (unsigned long)&int3_magic);
inside of int3_exception_notify() could not possibly be valid, since int3_emulate_call() returns the new pt_regs that need to be used, and throwing it away is clearly wrong.
So you can't use a register_die_notifier() to try to intercept the 'int3' error and then do it manually, it needs to be done by the ftrace_int3_handler() code that actually returns the new regs, and where do_kernel_int3() will then return it to the low-level handler.
End result: I haven't actually tested this code, but I've looked through the patch something like ten times without finding any new errors.
I've also tried *very* hard to make the patch minimal, with the exception of the comments, which I tried to make extensive for any of the subtle cases.
But without testing, it's probably still buggy.
I have to say, I finally like the end result here. Maybe it's because I got to make my mark and pee in the snow, but I will say that
(a) the actual entry code modifications really are minimal now
(b) the instruction emulation really is very simple and straightforward
(c) yes, we play some stack tricks (and yes, we play them differently on x86-64 and x86-32), but the tricks are again at least straightforward, and we never really change the layout of any stack.
So on the whole, I think this is about as good as it gets. Did I get all the details actually right, and it _works_? I guess we'll see.
Linus
On May 4, 2019, at 11:59 AM, Linus Torvalds torvalds@linux-foundation.org wrote:
On Fri, May 3, 2019 at 10:08 PM Linus Torvalds torvalds@linux-foundation.org wrote:
I'll look at it tomorrow, but I think this actually makes unnecessary changes.
In particular, I think we could keep the existing entry code almost unchanged with this whole approach.
So here's what I *think* should work. Note that I also removed your test-case code, because it really didn't have a chance in hell of working. Doing that
int3_emulate_call(regs, (unsigned long)&int3_magic);
inside of int3_exception_notify() could not possibly be valid, since int3_emulate_call() returns the new pt_regs that need to be used, and throwing it away is clearly wrong.
So you can't use a register_die_notifier() to try to intercept the 'int3' error and then do it manually, it needs to be done by the ftrace_int3_handler() code that actually returns the new regs, and where do_kernel_int3() will then return it to the low-level handler.
I hate register_die_notifier(), so I consider this a plus. I’ve occasionally considered removing the ability for the notifiers to skip normal processing, because, as it stands, figuring out what actually happens in the trap handlers is almost impossible.
It generally looks sane to me.
As an aside, is it even *possible* to get #BP from v8086 mode? On a quick SDM read, the INT3 instruction causes #GP if VM=1 and IOPL<3. And, if we allow vm86() to have IOPL=3, we should just remove that ability. It’s nuts.
(We should maybe consider a config option for iopl() that defaults off. We’ve supported ioperm() for a long, long time.)
On Sat, May 4, 2019 at 1:12 PM Andy Lutomirski luto@amacapital.net wrote:
As an aside, is it even *possible* to get #BP from v8086 mode? On a quick SDM read, the INT3 instruction causes #GP if VM=1 and IOPL<3. And, if we allow vm86() to have IOPL=3, we should just remove that ability. It’s nuts.
We've definitely historically allowed IOPL=3 with the whole "iopl()" system call. And yes, afaik it works together with the vm86 system call too. I think we copy the unsafe bits from the original eflags, so if you do iopl(3) followed by vm86(), you will be running in vm86 mode with iopl 3.
(We should maybe consider a config option for iopl() that defaults off. We’ve supported ioperm() for a long, long time.)
It's entirely possible that nobody uses iopl() and we should make it a config option that defaults to off.
But we've already done that with the VM86 support entirely, and I'm not sure modern distros even enable it.
And obviously vm86 mode isn't available at all with a 64-bit kernel, so this is all slowly becoming more or less moot.
Linus
On Sat, May 4, 2019 at 1:12 PM Andy Lutomirski luto@amacapital.net wrote:
As an aside, is it even *possible* to get #BP from v8086 mode? On a quick SDM read, the INT3 instruction causes #GP if VM=1 and IOPL<3. And, if we allow vm86() to have IOPL=3, we should just remove that ability. It’s nuts.
Oh, and I think you mis-read the SDM.
Yes, iopl matters for "int X" (cd xx) instruction in vm86 mode.
But no, iopl does *not* matter for the special "int3/into/int1" (cc/ce/f1) instructions, I think.
Linus
linux-kselftest-mirror@lists.linaro.org