On Tue, Jul 25, 2023 at 06:49:45AM -0400, Joel Fernandes wrote:
Interesting series Valentin. Some high-level question/comments on this one:
On Jul 20, 2023, at 12:34 PM, Valentin Schneider vschneid@redhat.com wrote:
text_poke_bp_batch() sends IPIs to all online CPUs to synchronize them vs the newly patched instruction. CPUs that are executing in userspace do not need this synchronization to happen immediately, and this is actually harmful interference for NOHZ_FULL CPUs.
Does the amount of harm not correspond to practical frequency of text_poke? How often does instruction patching really happen? If it is very infrequent then I am not sure if it is that harmful.
Well, it can happen quite a bit, also from things people would not typically 'expect' it.
For instance, the moment you create the first per-task perf event we frob some jump-labels (and again some second after the last one goes away).
The same for a bunch of runtime network configurations.
As the synchronization IPIs are sent using a blocking call, returning from text_poke_bp_batch() implies all CPUs will observe the patched instruction(s), and this should be preserved even if the IPI is deferred. In other words, to safely defer this synchronization, any kernel instruction leading to the execution of the deferred instruction sync (ct_work_flush()) must *not* be mutable (patchable) at runtime.
If it is not infrequent, then are you handling the case where userland spends multiple seconds before entering the kernel, and all this while the blocking call waits? Perhaps in such situation you want the real IPI to be sent out instead of the deferred one?
Please re-read what Valentin wrote -- nobody is waiting on anything.