From: Steven Rostedt
Sent: 07 May 2019 15:57 On Tue, 7 May 2019 14:50:26 +0000 David Laight David.Laight@ACULAB.COM wrote:
From: Steven Rostedt
Sent: 07 May 2019 14:14 On Tue, 7 May 2019 12:57:15 +0000 David Laight David.Laight@ACULAB.COM wrote:
The 'user' (ie the kernel code that needs to emulate the call) doesn't write the data to the stack, just to some per-cpu location. (Actually it could be on the stack at the other end of pt-regs.) So you get to the 'register restore and iret' code with the stack unaltered. It is then a SMOP to replace the %flags saved by the int3 with the %ip saved by the int3, the %ip with the address of the function to call, restore the flags (push and popf) and issue a ret.f to remove the %ip and %cs.
How would you handle NMIs doing the same thing? Yes, the NMI handlers have breakpoints that will need to emulated calls as well.
That means you'd have to use a field in the on-stack pt_regs for the 'address to call' rather than a per-cpu variable. Then it would all nest.
...
Actually that means you can do the following in both modes: if not emulated_call_address then pop %ax; iret else # assume kernel<->kernel return push emulated_call_address; push flags_saved_by_int3 load %ax, return_address_from_iret add %ax,#4 store %ax, first_stack_location_written_by_int3 load %ax, value_saved_by_int3_entry popf ret.n
The ret.n discards everything from the %ax to the required return address. So 'n' is the size of the int3 frame, so 12 for i386 and 40 for amd64.
If the register restore (done just before this code) finished with 'add %sp, sizeof *pt_regs' then the emulated_call_address can be loaded in %ax from the other end of pt_regs.
...
This all sounds much more complex and fragile than the proposed solution. Why would we do this over what is being proposed?
It is all complex and fragile however you do it.
I see a problem with converting the 3 register trap frame to a 5 register one is that the entry code and exit code both have to know whether it is necessary or was done.
AFAICT it is actually quite hard to tell from the stack which form it is. Although the %sp value might tell you because %ss:%sp might only be pushed when a stack switch happens so the kernel %sp will be a known value (for the switched to stack).
The advantage of converting the frame is that, as pointed out earlier it does let you have a pt_regs that always contains %ss:%sp.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)