Interesting, as I sped up the ftrace ring buffer by a substantial amount by adding strategic __always_inline, noinline, likely() and unlikely() throughout the code. It had to do with what was considered the fast path and slow path, and not actually the size of the function. gcc got it horribly wrong.
And what did the compiler people say when you reported gcc was getting it wrong?
Our assumption is, the compiler is better than a human at deciding this. Or at least, a human who does not spend a long time profiling and tuning. If this assumption is not true, we probably should be trying to figure out why, and improving the compiler when possible. That will benefit everybody.
Andrew