Re: CPU excessively long times between frequency scaling driver calls - bisected

23 Feb 2022

      On Wed, Feb 23, 2022 at 3:49 AM Feng Tang feng.tang@intel.com wrote:
...
Hi Rafael,
On Tue, Feb 22, 2022 at 07:04:32PM +0100, Rafael J. Wysocki wrote:
...
On Tue, Feb 22, 2022 at 8:41 AM Feng Tang feng.tang@intel.com wrote:
...
On Mon, Feb 14, 2022 at 07:17:24AM -0800, srinivas pandruvada wrote:
...
Hi Doug,
I think you use CONFIG_NO_HZ_FULL.
Here we are getting callback from scheduler. Can we check that if
scheduler woke up on those CPUs?
We can run "trace-cmd -e sched" and check in kernel shark if there is
similar gaps in activity.
Srinivas analyzed the scheduler trace data from trace-cmd, and thought is
related with the cpufreq callback is not called timeley from scheduling
events:
"
I mean we ignore the callback when the target CPU is not a local CPU as
we have to do IPI to adjust MSRs.
This will happen many times when sched_wake will wake up a new CPU for
the thread (we will get a callack for the target) but once the remote
thread start executing "sched_switch", we will get a callback on local
CPU, so we will adjust frequencies (provided 10ms interval from the
last call).
...
From the trace file I see the scenario where it took 72sec between two
updates:
CPU 2
34412.597161    busy=78         freq=3232653
34484.450725    busy=63         freq=2606793
There is periodic activity in between, related to active load balancing
in scheduler (since last frequency was higher these small work will
also run at higher frequency). But those threads are not CFS class, so
scheduler callback will not be called for them.
So removing the patch removed a trigger which would have caused a
sched_switch to a CFS task and call a cpufreq/intel_pstate callback.
And so this behavior needs to be restored for the time being which
means reverting the problematic commit for 5.17 if possible.
It is unlikely that we'll get a proper fix before -rc7 and we still
need to test it properly.
Thanks for checking this!
I understand the time limit as we are approaching the close of 5.17,
but still I don't think reverting commit b50db7095fe0 is the best
solution as:

b50db7095fe0 is not just an optimization, but solves some real
problems found in servers from big CSP (Cloud Service Provider)
and data center's server room.

IMHO, b50db7095fe0 itself hasn't done anything wrong.

This problem found by Doug is a rarely happened case, though
it is an expected thing as shown in existing comments of
cfs_rq_util_change():
  /*
   * There are a few boundary cases this might miss but it should
   * get called often enough that that should (hopefully) not be
   * a real problem.
   *
   * It will not get called when we go idle, because the idle
   * thread is a different class (!fair), nor will the utilization
   * number include things like RT tasks.
   *
   * As is, the util number is not freq-invariant (we'd have to
   * implement arch_scale_freq_capacity() for that).
   *
   * See cpu_util_cfs().
   */
  cpufreq_update_util(rq, flags);

As this happens with HWP-disabled case and a very calm system, can we
find a proper solution in 5.17/5.18 or later, and cc stable?
We can.

2024

2023

2022

2021

2020

2019

2018

2017

Re: CPU excessively long times between frequency scaling driver calls - bisected