On Tue, Feb 11, 2014 at 02:22:43PM +0530, Viresh Kumar wrote:
On 28 January 2014 18:53, Frederic Weisbecker fweisbec@gmail.com wrote:
No, when a single task is running on a full dynticks CPU, the tick is supposed to run every seconds. I'm actually suprised it doesn't happen in your traces, did you tweak something specific?
Why do we need this 1 second tick currently? And what will happen if I hotunplug that CPU and get it back? Would the timer for tick move away from CPU in question? I see that when I have changed this 1sec stuff to 300 seconds. But what would be impact of that? Will things still work normally?
So the problem resides in the gazillions accounting maintained in scheduler_tick() and current->sched_class->task_tick().
The scheduler correctness depends on these to be updated regularly. If you deactivate or increase the delay with very high values, the result is unpredictable. Just expect that at least some scheduler feature will behave randomly, like load balancing for example or simply local fairness issues.
So we have that 1 Hz max that makes sure that things are moving forward while keeping a rate that should be still nice for HPC workloads. But we certainly want to find a way to remove the need for any tick altogether for extreme real time workloads which need guarantees rather than just optimizations.
I see two potential solutions for that:
1) Rework the scheduler accounting such that it is safe against full dynticks. That was the initial plan but it's scary. The scheduler accountings is a huge maze. And I'm not sure it's actually worth the complication.
2) Offload the accounting. For example we could imagine that the timekeeping could handle the task_tick() calls on behalf of the full dynticks CPUs. At a small rate like 1 Hz.