On Mon, 6 May 2019 13:42:12 -0700 Linus Torvalds torvalds@linux-foundation.org wrote:
On Mon, May 6, 2019 at 1:29 PM Steven Rostedt rostedt@goodmis.org wrote:
Because that call to ftrace_stub is also dynamic.
You're missing the point.
We are rewriting a single "cal" instruction to point to something.
The "int3" emulation should call THE SAME THING.
Right now it doesn't.
To do that we would need to rewrite the logic to update each of those 40,000 calls one at a time, or group them together to what gets changed. As each function can call a different trampoline. One of the custom trampolines associated with a ftrace_ops, or the list one when a function has more than one ftrace_ops attached to it.
The ftrace code does this in batches:
1) Add int3 to all functions being affected (could be 40,000 of them) (sync the cores)
(Now all those functions are going through the int3 handler)
2) Modify the address of those call sites (sync the cores)
(Still going through the int3 handlers)
3) Remove the int3
Now each of those functions are calling something, and they may not all be the same thing they are calling.
Part of the code will change it to call the function needed directly.
struct ftrace_ops my_ops { .func = my_handler };
register_ftrace_function(&my_ops);
Will change "call ftrace_stub" into "call my_handler"
But that's not what you're actually *doing*.
Instead, you're now _emulating_ calling ftrace_regs_caller, which will call that ftrace_stub, which in turn will try to update the call site.
The call site gets updated before the loop gets called (at least it should, I have to go and look at the code, but I'm 99% that it does). There should not be a breakpoint on the call to ftrace_stub when we add breakpoints to the functions that call the ftrace_regs_caller.
But that's insane. It's insane because
- it's not even what your call rewriting is doing Why aren't you just
doing the emulation using the *SAME* target that you're rewriting the actual call instruction with?
- even if ftrace_regs_caller ends up being that same function, you'd
be better off just passing the "struct pt_regs" that you *ALREADY HAVE* directly to ftrace_stub in the int3 handler, rather than create *another* pt_regs stack
Are you saying to call the ftrace_ops handlers from int3 directly?
See? In that second case, why don't you just use "int3_emulate_call()" to do the reguired 'struct pt_regs' updates, and then call ftrace_stub() *with* that fixed-up pt_regs thing?
In other words, I think you should always do "int3_emulate_call()" with the *exact* same address that the instruction you are rewriting is using. There's no "three different cases". The only possible cases are "am I rewriting a jump" or "am I rewriting a call".
There is no "am I rewriting a call to one address, and then emulating it with a call to another address" case that makes sense.
What *can* make sense is "Oh, I'm emulating a call, but I know that call will be rewritten, so let me emulate the call and then short-circuit the emulation immediately".
But that is not what the ftrace code is doing. The ftrace code is doing something odd and insane.
And no, your "explanation" makes no sense. Because it doesn't actually touch on the fundamental insanity.
Also, talking with Peter about the ftrace_32.S version of ftrace_regs_caller, makes Peter's patch sound even better.
The "ftrace_regs_caller" was created to allow kprobes to use the function tracer when a probe was added to the start of a function (which is a common occurrence). That is, kprobes uses int3 to inject a handler pretty much anywhere in the code. kprobes can even be used for dynamic trace events.
Now we found that if we use function tracing, it's faster to do the call then to take the int3 hit. So kprobes knows of ftrace, and will register a callback if it happens to be placed on a fentry call site.
Thus, Masami asked me to create a way to have ftrace be able to simulate an int3. As kprobe handlers can do pretty much anything (bpf uses them), I had to make that call from ftrace look like a real int3 just happened.
As ftrace_caller, is optimized for fast function tracing, I needed to make another trampoline for the slower "emulate int3" operation, and that was the birth of ftrace_regs_caller. For x86_64, it was really straight forward and I had that done rather quickly. For i386, it was much more difficult, and that was because of not having regs->sp on the stack. I had to play this game to be able to pass in a pt_regs that would be the same regardless if it was called by ftrace or an int3.
The problem was the call to ftrace_regs_caller would place the return code on the stack, but I had to move it, because the kprobes handlers, expected ®s->sp to point to the location of the stack just before the call was hit! This means, regs->flags was were the return code was.
When we enter ftrace_regs_caller from the function being traced, the top of the stack has the return code. But then we needed to do this:
pushl $__KERNEL_CS pushl 4(%esp) /* Save the return ip */ pushl $0 /* Load 0 into orig_ax */ pushl %gs pushl %fs pushl %es pushl %ds pushl %eax
The above push regs->cs, regs->ip (the return code), then regs->gs... to regs->ax, where now I finally have saved a scratch register to use.
/* Get flags and place them into the return ip slot */ pushf popl %eax movl %eax, 8*4(%esp)
I would then save flags into %eax and move it to where the return address was placed by the call to this trampoline.
At the end, I had to undo this song and dance as well:
/* restore flags */ push 14*4(%esp) popf
/* Move return ip back to its original location */ movl 12*4(%esp), %eax movl %eax, 14*4(%esp)
If we go with Peter's patch, I can make this code much more sane, and not have to worry about having ®s->sp be at the top of the stack. I could simply, just push everything in the order of pt_regs and call the handler.
-- Steve