On Thu, Dec 17, 2020 at 04:18:52PM +0530, Naresh Kamboju wrote:
Hi Paul,
Thanks for your inputs.
On Wed, 16 Dec 2020 at 21:33, Paul E. McKenney paulmck@kernel.org wrote:
On Wed, Dec 16, 2020 at 03:40:04PM +0530, Naresh Kamboju wrote:
Linux Kernel Functional Testing (LKFT) started running rcu-torture tests on qemu_arm64, qemu_arm qemu_x86_64 and qemu_i386 from our CI build systems.
The following warning(s) noticed on qemu_i386 while running rcu-torture test on Linux mainline and Linux -next master branch. Since we do not have baseline results i can not comment this as regression but when compared with stable-rc 5.4 kernel this warning is new on mainline and next.
The rcutorture testing "stutters", that is, it periodically intentionally drops the test load down to zero for a few seconds. The expectation is that with no load, rcutorture will have no trouble finishing any needed grace periods within that zero-load period. If at the end of the stutter period, RCU work remains undone, then this warning is emitted.
This warning can be a false positive in the following situations:
The system on which you are running rcutorture is under additional heavy load.
The DUT is running the test - rcutorture - only.
You are running multiple guest OSes, each of which is running rcutorture, and vCPUs from each of the guest OSes ends up sharing a core with a vCPU from one of the other guests. This can cause the zero-load period to not be so unloaded.
You built rcutorture into your kernel, so that rcutorture starts immediately at boot time (CONFIG_RCU_TORTURE_TEST=y). If your boot takes long enough, rcutorture can massively overload the single boot CPU, which can in turn result in this warning.
The test was built as a module. CONFIG_RCU_TORTURE_TEST=m
If you are in situation #1, I suggest disabling stuttering using the rcutorture.stutter=0 kernel boot parameter.
If you are in situation #2, I suggest binding the guest-OS vCPUs to avoid them sharing cores with each other.
If you are in situation #3, I have patches that I expect to submit upstream in the v5.12 merge window that can help. Hey, they work for me! If you would like to test them before then, please let me know.
If something else is going on, please let me know what it is so that I can fix it one way or another.
We were running on qemu_i386 today. I have tested on real hardware and the reported problem has been reproduced.
This warning has been present for quite some time, but I continually make rcutorture more aggressive, and this could well be part of the fallout of additional rcutorture aggression.
And either way, thank you for trying out rcutorture!
We are happy to test :)
Is this reproducible? If so, could you please try bisection?
Thanx, Paul