On Mon, May 6, 2019 at 5:10 PM Steven Rostedt rostedt@goodmis.org wrote:
But the CPU that was rewriting instructions does a run_sync() after removing the int3:
static void run_sync(void) { int enable_irqs;
/* No need to sync if there's only one CPU */ if (num_online_cpus() == 1) return; enable_irqs = irqs_disabled(); /* We may be called with interrupts disabled (on bootup). */ if (enable_irqs) local_irq_enable(); on_each_cpu(do_sync_core, NULL, 1); if (enable_irqs) local_irq_disable();
}
Which sends an IPI to all CPUs to make sure they no longer see the int3.
Duh. I have been looking back and forth in that file, and I was mixing ftrace_modify_code_direct() (which only does a local sync) with ftrace_modify_code() (which does run_sync()). The dangers of moving around by searching for function names.
That file is a maze of several functions that are very similarly named and do slightly different things.
But yes, I was looking at the "direct" sequence.
I think you are missing the run_sync() which is the heavy hammer to make sure all CPUs are in sync. And this is done at each stage:
add int3 run_sync(); update call cite outside of int3 run_sync() remove int3 run_sync()
HPA said that the last run_sync() isn't needed, but I kept it because I wanted to make sure. Looks like your analysis shows that it is needed.
Absolutely. I think we could get rid of it, but yes, to then avoid the race we'd need to be a lot more clever.
Yeah, with the three run_sunc() things, the races I thought it had can't happen.
Linus