Hi Sergey,
On 11/21/25 03:55, Sergey Senozhatsky wrote:
Hi Christian,
On (25/11/20 10:15), Christian Loehle wrote:
On 11/20/25 04:45, Sergey Senozhatsky wrote:
Hi,
We are observing a performance regression on one of our arm64 boards. We tracked it down to the linux-6.6.y commit ada8d7fa0ad4 ("sched/cpufreq: Rework schedutil governor performance estimation").
UI speedometer benchmark: w/commit: 395 +/-38 w/o commit: 439 +/-14
Hi Sergey, Would be nice to get some details. What board?
It's an MT8196 chromebook.
What do the OPPs look like?
How do I find that out?
Does this system use uclamp during the benchmark? How?
How do I find that out?
Given how large the stddev given by speedometer (version 3?) itself is, can we get the stats of a few runs?
v2.1
w/o patch w/ patch 440 +/-30 406 +/-11 440 +/-14 413 +/-16 444 +/-12 403 +/-14 442 +/-12 412 +/-15
Maybe traces of cpu_frequency for both w/ and w/o?
trace-cmd record -e power:cpu_frequency attached.
"base" is with ada8d7fa0ad4 "revert" is ada8d7fa0ad4 reverted.
I did some analysis based on your trace files. I have been playing some time ago with speedometer performance issues so that's why I'm curious about your report here.
I've filtered your trace purely based on cpu7 (the single biggest cpu). Then I have cut the data from the 'warm-up' phase in both traces, to have similar start point (I think).
It looks like the 2 traces can show similar 'pattern' of that benchmark which is good for analysis. If you align the timestamp: 176.051s and 972.465s then both plots (frequency changes in time) look similar.
There are some differences, though: 1. there are more deeps in the freq in time, so more often you would pay extra penalty for the ramp-up again 2. some of the ramp-up phases are a bit longer ~100ms instead of ~80ms going from 2GHz to 3.6GHz 3.
There are idle phases missing in the trace, so we have to be careful when e.g. comparing avg frequency, because that might not be the real indication of the delivered computation and not indicate the gap in the score.
Here are the stats: 1. revert: frequency count 1.318000e+03 mean 2.932240e+06 std 5.434045e+05 min 2.000000e+06 50% 3.000000e+06 85% 3.600000e+06 90% 3.626000e+06 95% 3.626000e+06 99% 3.626000e+06 max 3.626000e+06
2. base: frequency count 1.551000e+03 mean 2.809391e+06 std 5.369750e+05 min 2.000000e+06 50% 2.800000e+06 85% 3.500000e+06 90% 3.600000e+06 95% 3.626000e+06 99% 3.626000e+06 max 3.626000e+06
A better indication in this case would be comparison of the frequency residency in time, especially for the max freq: 1. revert: 11.92s 2. base: 9.11s
So there is 2.8s longer residency for that fmax (while we even have longer period for finishing that Speedometer 2 test on 'base').
Here is some detail about that run*: +---------------+---------------------+---------------+----------------+ | Trace | Total Trace | Time at Max | % of Total | | | Duration (s) | Freq (s) | Time | +---------------+---------------------+---------------+----------------+ | Base Trace | 24.72 | 9.11 | 36.9% | | Revert Trace | 22.88 | 11.92 | 52.1% | +---------------+---------------------+---------------+----------------+
*We don't know the idle periods which might happen for those frequencies
I wonder if you had a fix patch for the util_est in your kernel... That fix has been recently backported to 6.6 stable [1].
You might want to try that patch as well, w/ or w/o this revert. IMHO it might be worth to have it on top. It might help the main Chrome task ('CrRendererMain') to stay longer on the biggest cpu, since the util_est would be higher. You can read the discussion that I had back then with PeterZ and VincentG [2].
Regards, Lukasz
[1] https://lore.kernel.org/stable/20251121130232.828187990@linuxfoundation.org/ [2] https://lore.kernel.org/lkml/20230912142821.GA22166@noisy.programming.kicks-...