On 24 April 2014 12:55, Daniel Sangorrin daniel.sangorrin@toshiba.co.jp wrote:
I tried your set of patches for isolating particular CPU cores from unpinned timers. On x86_64 they were working fine, however I found out that on ARM they would fail under the following test:
I am happy that these drew attention from somebody Atleast :)
# mount -t cpuset none /cpuset # cd /cpuset # mkdir rt # cd rt # echo 1 > cpus # echo 1 > cpu_exclusive # cd # taskset 0x2 ./setquiesce.sh <--- contains "echo 1 > /cpuset/rt/quiesce" [ 75.622375] ------------[ cut here ]------------ [ 75.627258] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2595 __migrate_hrtimers+0x17c/0x1bc() [ 75.636840] DEBUG_LOCKS_WARN_ON(current->hardirq_context) [ 75.636840] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1-37710-g23c8f02 #1 [ 75.649627] [<c0014d18>] (unwind_backtrace) from [<c00119e8>] (show_stack+0x10/0x14) [ 75.649627] [<c00119e8>] (show_stack) from [<c065b61c>] (dump_stack+0x78/0x94) [ 75.662689] [<c065b61c>] (dump_stack) from [<c003e9a4>] (warn_slowpath_common+0x60/0x84) [ 75.670410] [<c003e9a4>] (warn_slowpath_common) from [<c003ea24>] (warn_slowpath_fmt+0x30/0x40) [ 75.677673] [<c003ea24>] (warn_slowpath_fmt) from [<c005d7b0>] (__migrate_hrtimers+0x17c/0x1bc) [ 75.677673] [<c005d7b0>] (__migrate_hrtimers) from [<c009e004>] (generic_smp_call_function_single_interrupt+0x8c/0x104) [ 75.699645] [<c009e004>] (generic_smp_call_function_single_interrupt) from [<c00134d0>] (handle_IPI+0xa4/0x16c) [ 75.706970] [<c00134d0>] (handle_IPI) from [<c0008614>] (gic_handle_irq+0x54/0x5c) [ 75.715087] [<c0008614>] (gic_handle_irq) from [<c0012624>] (__irq_svc+0x44/0x5c) [ 75.725311] Exception stack(0xc08a3f58 to 0xc08a3fa0)
I couldn't understand why we went via a interrupt here ? Probably CPU1 was idle and was woken up with a IPI and then this happened. But in that case too, shouldn't the script run from process context instead ?
I also backported your patches to Linux 3.10.y and found the same problem both in ARM and x86_64.
There are very few changes in between 3.10 and latest for timers/hrtimers and so things are expected to be the same.
However, I think I figured out the reason for those errors. Please, could you check the patch below (it applies on the top of your tree, branch isolate-cpusets) and let me know what you think?
Okay, just to let you know, I have also found some issues and they are now pushed in my tree.. Also it is rebased over 3.15-rc2 now.
-------------------------PATCH STARTS HERE--------------------------------- cpuset: quiesce: change irq disable/enable by irq save/restore
The function __migrate_timers can be called under interrupt context or thread context depending on the core where the system call was executed. In case it executes under interrupt context, it
How exactly?
seems a bad idea to leave interrupts enabled after migrating the timers. In fact, this caused kernel errors on the ARM architecture and on the x86_64 architecture with the 3.10 kernel (backported version of the cpuset-quiesce patch).
I can't keep it as a separate patch and so would be required to merge it into my original patch..
Thanks for your inputs :)