On Mon, Apr 14, 2025 at 11:38:15AM -0400, Chris Mason wrote:
On 4/14/25 5:08 AM, Peter Zijlstra wrote:
[ math and such ]
The zero_vruntime patch I gave earlier should avoid this particular issue.
Here's a crash with the zero runtime patch.
And indeed it doesn't have these massive (negative) avg_vruntime values.
I'm trying to reproduce this outside of prod so we can crank up the iteration speed a bit.
Thanks.
Could you add which pick went boom for the next dump?
I am however, slightly confused by this output format.
It looks like it dumps the cfs_rq the first time it encounters it, either through curr or through the tree.
So if I read this correct the root is something like:
nr_running = 2 zero_vruntime = 19194347104893960 avg_vruntime = 6044054790 avg_load = 2 curr = { cgroup urgent vruntime = 24498183812106172 weight = 3561684 => 3478 } tasks_timeline = [ { cgroup optional vruntime = 19194350126921355 weight = 1168 => 2 }, ]
group 19194347104893960 curr 24498183812106172 3561684 entity 19194350126921355 1168
But if I run those numbers, I get avg_load == 1, seeing how 1168/1024 = 1. But the thing says it should be 2.
Similarly, my avg_vruntime is exactly half of what it says.
avg_vruntime: 3022027395 avg_load: 1
(seeing how 19194350126921355-19194347104893960 = 3022027395)
Anyway, with curr being significantly to the right of that, the 0-lag point is well right of where optional sits. So this pick should be fine, and result in 'optional' getting selected (curr is no longer eligible).
All the urgent/* groups have nr_running == 0, so are not interesting, we'll not pick there.
NOTE: I'm inferring curr is on_rq, because nr_running == 2 and the tree only has 1 entity in it.
NOTE: if we ignore curr, then optional sits at exactly the 0-lag point, with either sets of numbers and so should be eligible.
This then leaves us the optional/* groups.
cgroup optional rq = { nr_running = 2 zero_vruntime = 440280059357029 avg_vruntime = 476 avg_load = 688 tasks_timeline = [ { cgroup optional/-610613050111295488 vruntime = 440280059333960 weight = 291271 => 284 }, { cgroup optional/-610609318858457088 vruntime = 440280059373247 weight = 413911 => 404 },
group 440280059357029 entity 440280059333960 291271 entity 440280059373247 413911
Which gives:
avg_vruntime: 476 avg_load: 688
And that matches.
Next we have:
cgroup optional/-610613050111295488 rq = { nr_running = 5 zero_vruntime = 65179829005 avg_vruntime = 0 avg_load = 75 tasks_timeline = [ { task = 261672 (fc0) vruntime = 65189926507 weight = 15360 => 15 }, { task = 261332 (fc0) vruntime = 65189480962 weight = 15360 => 15 }, { task = 261329 (enc1:0:vp9_fbv) vruntime = 65165843516 weight = 15360 => 15 }, { task = 261334 (dec0:0:hevc_fbv) vruntime = 65174065035 weight = 15360 => 15 }, { task = 261868 (fc0) vruntime = 65179829005 weight = 15360 => 15 }, ] }
avg_vruntime: 0 avg_load: 75
This again matches, leaving the bottom 3 tasks eligible.
And finally:
cgroup optional/-610609318858457088 rq = { nr_running = 1 zero_vruntime = 22819875784 avg_vruntime = 0 avg_load = 15 tasks_timeline = [ { task = 273291 (fc0) vruntime = 22819875784 weight = 15360 => 15 }, ] }
Rather boring indeed, but the numbers appear correct.
So I'm not immediately seeing where it would go boom, but seeing how the root group is the one with dodgy numbers, I would suspect that -- but I'm not immediately seeing how... :-(