The t->rcu_read_unlock_special union's need_qs bit can be set by the scheduler tick (in rcu_flavor_sched_clock_irq) to indicate that help is needed from the rcu_read_unlock path. When this help arrives however, we can do better to speed up the quiescent state reporting which if rcu_read_unlock_special::need_qs is set might be quite urgent. Make use of this information in deciding when to do heavy-weight softirq raising where possible.
Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- kernel/rcu/tree_plugin.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index c588ef98efd3..bff6410fac06 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -622,7 +622,8 @@ static void rcu_read_unlock_special(struct task_struct *t) t->rcu_read_unlock_special.b.exp_hint = false; exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) || (rdp->grpmask & rnp->expmask) || - tick_nohz_full_cpu(rdp->cpu); + tick_nohz_full_cpu(rdp->cpu) || + t->rcu_read_unlock_special.b.need_qs; // Need to defer quiescent state until everything is enabled. if (irqs_were_disabled && use_softirq && (in_interrupt() ||
The rcu_preempt_note_context_switch() tries to handle cases where __rcu_read_unlock() got preempted and then the context switch path does the reporting of the quiscent state along with clearing any bits in the rcu_read_unlock_special union.
This can be handled by just calling rcu_deferred_qs() which was added during the RCU consolidation work and already does these checks.
Tested RCU config TREE03 for an hour which succeeds.
Cc: rcu@vger.kernel.org Cc: kernel-team@android.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- kernel/rcu/tree_plugin.h | 9 --------- 1 file changed, 9 deletions(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index bff6410fac06..ebb4d46a6267 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -313,15 +313,6 @@ void rcu_note_context_switch(bool preempt) ? rnp->gp_seq : rcu_seq_snap(&rnp->gp_seq)); rcu_preempt_ctxt_queue(rnp, rdp); - } else if (t->rcu_read_lock_nesting < 0 && - t->rcu_read_unlock_special.s) { - - /* - * Complete exit from RCU read-side critical section on - * behalf of preempted instance of __rcu_read_unlock(). - */ - rcu_read_unlock_special(t); - rcu_preempt_deferred_qs(t); } else { rcu_preempt_deferred_qs(t); }
On Mon, Jul 01, 2019 at 12:04:14AM -0400, Joel Fernandes (Google) wrote:
The rcu_preempt_note_context_switch() tries to handle cases where __rcu_read_unlock() got preempted and then the context switch path does the reporting of the quiscent state along with clearing any bits in the rcu_read_unlock_special union.
This can be handled by just calling rcu_deferred_qs() which was added during the RCU consolidation work and already does these checks.
Tested RCU config TREE03 for an hour which succeeds.
Cc: rcu@vger.kernel.org Cc: kernel-team@android.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
My first reaction was "that cannot possibly work", but after a bit of digging, it really does appear to work just fine. I therefore expanded the commit log a bit, so please check it to catch any messups on my part.
Very cool, thank you very much! ;-)
Thanx, Paul
------------------------------------------------------------------------
commit ce547cb41ed7662f70d0b07d4c7f7555ba130c61 Author: Joel Fernandes (Google) joel@joelfernandes.org Date: Mon Jul 1 00:04:14 2019 -0400
rcu: Simplify rcu_note_context_switch exit from critical section
Because __rcu_read_unlock() can be preempted just before the call to rcu_read_unlock_special(), it is possible for a task to be preempted just before it would have fully exited its RCU read-side critical section. This would result in a needless extension of that critical section until that task was resumed, which might in turn result in a needlessly long grace period, needless RCU priority boosting, and needless force-quiescent-state actions. Therefore, rcu_note_context_switch() invokes __rcu_read_unlock() followed by rcu_preempt_deferred_qs() when it detects this situation. This action by rcu_note_context_switch() ends the RCU read-side critical section immediately.
Of course, once the task resumes, it will invoke rcu_read_unlock_special() redundantly. This is harmless because the fact that a preemption happened means that interrupts, preemption, and softirqs cannot have been disabled, so there would be no deferred quiescent state. While ->rcu_read_lock_nesting remains less than zero, none of the ->rcu_read_unlock_special.b bits can be set, and they were all zeroed by the call to rcu_note_context_switch() at task-preemption time. Therefore, setting ->rcu_read_unlock_special.b.exp_hint to false has no effect.
Therefore, the extra call to rcu_preempt_deferred_qs_irqrestore() would return immediately. With one possible exception, which is if an expedited grace period started just as the task was being resumed, which could leave ->exp_deferred_qs set. This will cause rcu_preempt_deferred_qs_irqrestore() to invoke rcu_report_exp_rdp(), reporting the quiescent state, just as it should. (Such an expedited grace period won't affect the preemption code path due to interrupts having already been disabled.)
But when rcu_note_context_switch() invokes __rcu_read_unlock(), it is doing so with preemption disabled, hence __rcu_read_unlock() will unconditionally defer the quiescent state, only to immediately invoke rcu_preempt_deferred_qs(), thus immediately reporting the deferred quiescent state. It turns out to be safe (and faster) to instead just invoke rcu_preempt_deferred_qs() without the __rcu_read_unlock() middleman.
Because this is the invocation during the preemption (as opposed to the invocation just after the resume), at least one of the bits in ->rcu_read_unlock_special.b must be set and ->rcu_read_lock_nesting must be negative. This means that rcu_preempt_need_deferred_qs() must return true, avoiding the early exit from rcu_preempt_deferred_qs(). Thus, rcu_preempt_deferred_qs_irqrestore() will be invoked immediately, as required.
This commit therefore simplifies the CONFIG_PREEMPT=y version of rcu_note_context_switch() by removing the "else if" branch of its "if" statement. This change means that all callers that would have invoked rcu_read_unlock_special() followed by rcu_preempt_deferred_qs() will now simply invoke rcu_preempt_deferred_qs(), thus avoiding the rcu_read_unlock_special() middleman when __rcu_read_unlock() is preempted.
Cc: rcu@vger.kernel.org Cc: kernel-team@android.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org Signed-off-by: Paul E. McKenney paulmck@linux.ibm.com
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 187dc076c497..214e4689c29d 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -313,15 +313,6 @@ void rcu_note_context_switch(bool preempt) ? rnp->gp_seq : rcu_seq_snap(&rnp->gp_seq)); rcu_preempt_ctxt_queue(rnp, rdp); - } else if (t->rcu_read_lock_nesting < 0 && - t->rcu_read_unlock_special.s) { - - /* - * Complete exit from RCU read-side critical section on - * behalf of preempted instance of __rcu_read_unlock(). - */ - rcu_read_unlock_special(t); - rcu_preempt_deferred_qs(t); } else { rcu_preempt_deferred_qs(t); }
On Mon, Jul 01, 2019 at 01:03:10PM -0700, Paul E. McKenney wrote:
On Mon, Jul 01, 2019 at 12:04:14AM -0400, Joel Fernandes (Google) wrote:
The rcu_preempt_note_context_switch() tries to handle cases where __rcu_read_unlock() got preempted and then the context switch path does the reporting of the quiscent state along with clearing any bits in the rcu_read_unlock_special union.
This can be handled by just calling rcu_deferred_qs() which was added during the RCU consolidation work and already does these checks.
Tested RCU config TREE03 for an hour which succeeds.
Cc: rcu@vger.kernel.org Cc: kernel-team@android.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
My first reaction was "that cannot possibly work", but after a bit of digging, it really does appear to work just fine. I therefore expanded the commit log a bit, so please check it to catch any messups on my part.
Very cool, thank you very much! ;-)
Awesome! Thanks. I am glad you agree with the change and I agree with your changes to the commit log.
thanks,
- Joel
Thanx, Paul
commit ce547cb41ed7662f70d0b07d4c7f7555ba130c61 Author: Joel Fernandes (Google) joel@joelfernandes.org Date: Mon Jul 1 00:04:14 2019 -0400
rcu: Simplify rcu_note_context_switch exit from critical section
Because __rcu_read_unlock() can be preempted just before the call to rcu_read_unlock_special(), it is possible for a task to be preempted just before it would have fully exited its RCU read-side critical section. This would result in a needless extension of that critical section until that task was resumed, which might in turn result in a needlessly long grace period, needless RCU priority boosting, and needless force-quiescent-state actions. Therefore, rcu_note_context_switch() invokes __rcu_read_unlock() followed by rcu_preempt_deferred_qs() when it detects this situation. This action by rcu_note_context_switch() ends the RCU read-side critical section immediately. Of course, once the task resumes, it will invoke rcu_read_unlock_special() redundantly. This is harmless because the fact that a preemption happened means that interrupts, preemption, and softirqs cannot have been disabled, so there would be no deferred quiescent state. While ->rcu_read_lock_nesting remains less than zero, none of the ->rcu_read_unlock_special.b bits can be set, and they were all zeroed by the call to rcu_note_context_switch() at task-preemption time. Therefore, setting ->rcu_read_unlock_special.b.exp_hint to false has no effect. Therefore, the extra call to rcu_preempt_deferred_qs_irqrestore() would return immediately. With one possible exception, which is if an expedited grace period started just as the task was being resumed, which could leave ->exp_deferred_qs set. This will cause rcu_preempt_deferred_qs_irqrestore() to invoke rcu_report_exp_rdp(), reporting the quiescent state, just as it should. (Such an expedited grace period won't affect the preemption code path due to interrupts having already been disabled.) But when rcu_note_context_switch() invokes __rcu_read_unlock(), it is doing so with preemption disabled, hence __rcu_read_unlock() will unconditionally defer the quiescent state, only to immediately invoke rcu_preempt_deferred_qs(), thus immediately reporting the deferred quiescent state. It turns out to be safe (and faster) to instead just invoke rcu_preempt_deferred_qs() without the __rcu_read_unlock() middleman. Because this is the invocation during the preemption (as opposed to the invocation just after the resume), at least one of the bits in ->rcu_read_unlock_special.b must be set and ->rcu_read_lock_nesting must be negative. This means that rcu_preempt_need_deferred_qs() must return true, avoiding the early exit from rcu_preempt_deferred_qs(). Thus, rcu_preempt_deferred_qs_irqrestore() will be invoked immediately, as required. This commit therefore simplifies the CONFIG_PREEMPT=y version of rcu_note_context_switch() by removing the "else if" branch of its "if" statement. This change means that all callers that would have invoked rcu_read_unlock_special() followed by rcu_preempt_deferred_qs() will now simply invoke rcu_preempt_deferred_qs(), thus avoiding the rcu_read_unlock_special() middleman when __rcu_read_unlock() is preempted. Cc: rcu@vger.kernel.org Cc: kernel-team@android.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org Signed-off-by: Paul E. McKenney paulmck@linux.ibm.com
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 187dc076c497..214e4689c29d 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -313,15 +313,6 @@ void rcu_note_context_switch(bool preempt) ? rnp->gp_seq : rcu_seq_snap(&rnp->gp_seq)); rcu_preempt_ctxt_queue(rnp, rdp);
- } else if (t->rcu_read_lock_nesting < 0 &&
t->rcu_read_unlock_special.s) {
/*
* Complete exit from RCU read-side critical section on
* behalf of preempted instance of __rcu_read_unlock().
*/
rcu_read_unlock_special(t);
} else { rcu_preempt_deferred_qs(t); }rcu_preempt_deferred_qs(t);
On Mon, Jul 01, 2019 at 05:33:28PM -0400, Joel Fernandes wrote:
On Mon, Jul 01, 2019 at 01:03:10PM -0700, Paul E. McKenney wrote:
On Mon, Jul 01, 2019 at 12:04:14AM -0400, Joel Fernandes (Google) wrote:
The rcu_preempt_note_context_switch() tries to handle cases where __rcu_read_unlock() got preempted and then the context switch path does the reporting of the quiscent state along with clearing any bits in the rcu_read_unlock_special union.
This can be handled by just calling rcu_deferred_qs() which was added during the RCU consolidation work and already does these checks.
Tested RCU config TREE03 for an hour which succeeds.
Cc: rcu@vger.kernel.org Cc: kernel-team@android.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
My first reaction was "that cannot possibly work", but after a bit of digging, it really does appear to work just fine. I therefore expanded the commit log a bit, so please check it to catch any messups on my part.
Very cool, thank you very much! ;-)
Awesome! Thanks. I am glad you agree with the change and I agree with your changes to the commit log.
Very good, I will push it to -rcu shortly.
Thanx, Paul
thanks,
- Joel
Thanx, Paul
commit ce547cb41ed7662f70d0b07d4c7f7555ba130c61 Author: Joel Fernandes (Google) joel@joelfernandes.org Date: Mon Jul 1 00:04:14 2019 -0400
rcu: Simplify rcu_note_context_switch exit from critical section
Because __rcu_read_unlock() can be preempted just before the call to rcu_read_unlock_special(), it is possible for a task to be preempted just before it would have fully exited its RCU read-side critical section. This would result in a needless extension of that critical section until that task was resumed, which might in turn result in a needlessly long grace period, needless RCU priority boosting, and needless force-quiescent-state actions. Therefore, rcu_note_context_switch() invokes __rcu_read_unlock() followed by rcu_preempt_deferred_qs() when it detects this situation. This action by rcu_note_context_switch() ends the RCU read-side critical section immediately. Of course, once the task resumes, it will invoke rcu_read_unlock_special() redundantly. This is harmless because the fact that a preemption happened means that interrupts, preemption, and softirqs cannot have been disabled, so there would be no deferred quiescent state. While ->rcu_read_lock_nesting remains less than zero, none of the ->rcu_read_unlock_special.b bits can be set, and they were all zeroed by the call to rcu_note_context_switch() at task-preemption time. Therefore, setting ->rcu_read_unlock_special.b.exp_hint to false has no effect. Therefore, the extra call to rcu_preempt_deferred_qs_irqrestore() would return immediately. With one possible exception, which is if an expedited grace period started just as the task was being resumed, which could leave ->exp_deferred_qs set. This will cause rcu_preempt_deferred_qs_irqrestore() to invoke rcu_report_exp_rdp(), reporting the quiescent state, just as it should. (Such an expedited grace period won't affect the preemption code path due to interrupts having already been disabled.) But when rcu_note_context_switch() invokes __rcu_read_unlock(), it is doing so with preemption disabled, hence __rcu_read_unlock() will unconditionally defer the quiescent state, only to immediately invoke rcu_preempt_deferred_qs(), thus immediately reporting the deferred quiescent state. It turns out to be safe (and faster) to instead just invoke rcu_preempt_deferred_qs() without the __rcu_read_unlock() middleman. Because this is the invocation during the preemption (as opposed to the invocation just after the resume), at least one of the bits in ->rcu_read_unlock_special.b must be set and ->rcu_read_lock_nesting must be negative. This means that rcu_preempt_need_deferred_qs() must return true, avoiding the early exit from rcu_preempt_deferred_qs(). Thus, rcu_preempt_deferred_qs_irqrestore() will be invoked immediately, as required. This commit therefore simplifies the CONFIG_PREEMPT=y version of rcu_note_context_switch() by removing the "else if" branch of its "if" statement. This change means that all callers that would have invoked rcu_read_unlock_special() followed by rcu_preempt_deferred_qs() will now simply invoke rcu_preempt_deferred_qs(), thus avoiding the rcu_read_unlock_special() middleman when __rcu_read_unlock() is preempted. Cc: rcu@vger.kernel.org Cc: kernel-team@android.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org Signed-off-by: Paul E. McKenney paulmck@linux.ibm.com
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 187dc076c497..214e4689c29d 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -313,15 +313,6 @@ void rcu_note_context_switch(bool preempt) ? rnp->gp_seq : rcu_seq_snap(&rnp->gp_seq)); rcu_preempt_ctxt_queue(rnp, rdp);
- } else if (t->rcu_read_lock_nesting < 0 &&
t->rcu_read_unlock_special.s) {
/*
* Complete exit from RCU read-side critical section on
* behalf of preempted instance of __rcu_read_unlock().
*/
rcu_read_unlock_special(t);
} else { rcu_preempt_deferred_qs(t); }rcu_preempt_deferred_qs(t);
This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which causes kvm.sh to not run on my machines. The qemu-system-x86_64 command runs but does nothing.
Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- I am Ok if we want to drop this patch but it is in my tree because without it I can't run the tests.
tools/testing/selftests/rcutorture/bin/functions.sh | 13 +------------ .../selftests/rcutorture/configs/rcu/CFcommon | 3 --- 2 files changed, 1 insertion(+), 15 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/bin/functions.sh b/tools/testing/selftests/rcutorture/bin/functions.sh index c3a49fb4d6f6..6bcb8b5b2ff2 100644 --- a/tools/testing/selftests/rcutorture/bin/functions.sh +++ b/tools/testing/selftests/rcutorture/bin/functions.sh @@ -172,7 +172,7 @@ identify_qemu_append () { local console=ttyS0 case "$1" in qemu-system-x86_64|qemu-system-i386) - echo selinux=0 initcall_debug debug + echo noapic selinux=0 initcall_debug debug ;; qemu-system-aarch64) console=ttyAMA0 @@ -191,19 +191,8 @@ identify_qemu_append () { # Output arguments for qemu arguments based on the TORTURE_QEMU_MAC # and TORTURE_QEMU_INTERACTIVE environment variables. identify_qemu_args () { - local KVM_CPU="" - case "$1" in - qemu-system-x86_64) - KVM_CPU=kvm64 - ;; - qemu-system-i386) - KVM_CPU=kvm32 - ;; - esac case "$1" in qemu-system-x86_64|qemu-system-i386) - echo -machine q35,accel=kvm - echo -cpu ${KVM_CPU} ;; qemu-system-aarch64) echo -machine virt,gic-version=host -cpu host diff --git a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon index e19a444a0684..d2d2a86139db 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon +++ b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon @@ -1,5 +1,2 @@ CONFIG_RCU_TORTURE_TEST=y CONFIG_PRINTK_TIME=y -CONFIG_HYPERVISOR_GUEST=y -CONFIG_PARAVIRT=y -CONFIG_KVM_GUEST=y
On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which causes kvm.sh to not run on my machines. The qemu-system-x86_64 command runs but does nothing.
Nope. I would like to know *why* you need 'noapic' to work. Is it a brand new or old qemu-system-x86_64?
Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
Sebastian
On Mon, Jul 01, 2019 at 02:23:58PM +0200, Sebastian Andrzej Siewior wrote:
On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which causes kvm.sh to not run on my machines. The qemu-system-x86_64 command runs but does nothing.
Nope. I would like to know *why* you need 'noapic' to work. Is it a brand new or old qemu-system-x86_64?
I did not have time to debug yesterday and I posted this particular revert as an 'RFC' just to make aware of this problem.
I spent some more time just now, it looks like this has nothing to do with 'noapic' and appears to be a problem on debian distros with the e1000e NIC. May be this NIC was added to the virtual hardware because of -machine in the patch?
Any if I add the following to the qemu command that kvm.sh runs, it works again: -net nic,model=e1000
Without it I get: qemu-system-x86_64: Initialization of device e1000e failed: failed to find romfile "efi-e1000e.rom"
Seems to be mentioned here: https://bugs.launchpad.net/ubuntu/+source/ipxe/+bug/1737211
And in syzkaller as well: https://github.com/google/syzkaller/blob/master/vm/qemu/qemu.go#L88
Adding Dmitry who is syzkaller owner for any thoughts as well.
I'm happy to write a patch to set the nic model as e1000 and send it out if we agree this solution is good enough.
- Joel
On Mon, Jul 1, 2019 at 4:14 PM Joel Fernandes joel@joelfernandes.org wrote:
On Mon, Jul 01, 2019 at 02:23:58PM +0200, Sebastian Andrzej Siewior wrote:
On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which causes kvm.sh to not run on my machines. The qemu-system-x86_64 command runs but does nothing.
Nope. I would like to know *why* you need 'noapic' to work. Is it a brand new or old qemu-system-x86_64?
I did not have time to debug yesterday and I posted this particular revert as an 'RFC' just to make aware of this problem.
I spent some more time just now, it looks like this has nothing to do with 'noapic' and appears to be a problem on debian distros with the e1000e NIC. May be this NIC was added to the virtual hardware because of -machine in the patch?
Any if I add the following to the qemu command that kvm.sh runs, it works again: -net nic,model=e1000
Without it I get: qemu-system-x86_64: Initialization of device e1000e failed: failed to find romfile "efi-e1000e.rom"
Seems to be mentioned here: https://bugs.launchpad.net/ubuntu/+source/ipxe/+bug/1737211
And in syzkaller as well: https://github.com/google/syzkaller/blob/master/vm/qemu/qemu.go#L88
Adding Dmitry who is syzkaller owner for any thoughts as well.
I don't have many thoughts on this. That particular error looked like a bug in the package in the particular distro/version.
I'm happy to write a patch to set the nic model as e1000 and send it out if we agree this solution is good enough.
- Joel
On Mon, Jul 01, 2019 at 12:04:13AM -0400, Joel Fernandes (Google) wrote:
The t->rcu_read_unlock_special union's need_qs bit can be set by the scheduler tick (in rcu_flavor_sched_clock_irq) to indicate that help is needed from the rcu_read_unlock path. When this help arrives however, we can do better to speed up the quiescent state reporting which if rcu_read_unlock_special::need_qs is set might be quite urgent. Make use of this information in deciding when to do heavy-weight softirq raising where possible.
Just fyi, TREE01-06, SRCU-N and SRCU-t passed overnight testing with this series.
On Mon, Jul 01, 2019 at 12:04:13AM -0400, Joel Fernandes (Google) wrote:
The t->rcu_read_unlock_special union's need_qs bit can be set by the scheduler tick (in rcu_flavor_sched_clock_irq) to indicate that help is needed from the rcu_read_unlock path. When this help arrives however, we can do better to speed up the quiescent state reporting which if rcu_read_unlock_special::need_qs is set might be quite urgent. Make use of this information in deciding when to do heavy-weight softirq raising where possible.
Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
Cute thought, but I am going to have to pass on this one. The reason is that by the time that ->rcu_read_unlock_special.b.need_qs gets set, the grace period is already one full second old. At that point, the extra tick of waiting is down in the noise.
Right now, we do the extra work if we really are blocking an expedited grace period (the first two lines of the original condition) or we are running on a nohz_full CPU (which might never execute a scheduling clock tick, thus potentially delaying forever). And expedited grace periods are supposed to complete in tens or maybe hundreds of microseconds, assuming the RCU readers are being cooperative, which is a whole different level of urgent.
Nevertheless, thank you for looking into this!
Thanx, Paul
kernel/rcu/tree_plugin.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index c588ef98efd3..bff6410fac06 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -622,7 +622,8 @@ static void rcu_read_unlock_special(struct task_struct *t) t->rcu_read_unlock_special.b.exp_hint = false; exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) || (rdp->grpmask & rnp->expmask) ||
tick_nohz_full_cpu(rdp->cpu);
tick_nohz_full_cpu(rdp->cpu) ||
// Need to defer quiescent state until everything is enabled. if (irqs_were_disabled && use_softirq && (in_interrupt() ||t->rcu_read_unlock_special.b.need_qs;
-- 2.22.0.410.gd8fdbe21b5-goog
On Mon, Jul 01, 2019 at 08:47:30PM -0700, Paul E. McKenney wrote:
On Mon, Jul 01, 2019 at 12:04:13AM -0400, Joel Fernandes (Google) wrote:
The t->rcu_read_unlock_special union's need_qs bit can be set by the scheduler tick (in rcu_flavor_sched_clock_irq) to indicate that help is needed from the rcu_read_unlock path. When this help arrives however, we can do better to speed up the quiescent state reporting which if rcu_read_unlock_special::need_qs is set might be quite urgent. Make use of this information in deciding when to do heavy-weight softirq raising where possible.
Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
Cute thought, but I am going to have to pass on this one. The reason is that by the time that ->rcu_read_unlock_special.b.need_qs gets set, the grace period is already one full second old. At that point, the extra tick of waiting is down in the noise.
Right now, we do the extra work if we really are blocking an expedited grace period (the first two lines of the original condition) or we are running on a nohz_full CPU (which might never execute a scheduling clock tick, thus potentially delaying forever). And expedited grace periods are supposed to complete in tens or maybe hundreds of microseconds, assuming the RCU readers are being cooperative, which is a whole different level of urgent.
Makes sense, I agree the patch may not be that helpful right now. I mixed up the different levels or urgencies. No problem dropping it.
Nevertheless, thank you for looking into this!
My pleasure! Will keep them coming.
- Joel
Thanx, Paul
kernel/rcu/tree_plugin.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index c588ef98efd3..bff6410fac06 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -622,7 +622,8 @@ static void rcu_read_unlock_special(struct task_struct *t) t->rcu_read_unlock_special.b.exp_hint = false; exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) || (rdp->grpmask & rnp->expmask) ||
tick_nohz_full_cpu(rdp->cpu);
tick_nohz_full_cpu(rdp->cpu) ||
// Need to defer quiescent state until everything is enabled. if (irqs_were_disabled && use_softirq && (in_interrupt() ||t->rcu_read_unlock_special.b.need_qs;
-- 2.22.0.410.gd8fdbe21b5-goog
linux-kselftest-mirror@lists.linaro.org