On Wed, 2025-01-22 at 03:11 -0800, Ron Economos wrote:
On 1/22/25 02:59, Peter Zijlstra wrote:
On Wed, Jan 22, 2025 at 11:56:13AM +0100, Arnd Bergmann wrote:
On Wed, Jan 22, 2025, at 11:04, Naresh Kamboju wrote:
On Tue, 21 Jan 2025 at 23:28, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote: 0000000000000000 <4>[ 160.712071] Call trace: <4>[ 160.712597] place_entity (kernel/sched/fair.c:5250 (discriminator 1)) <4>[ 160.713221] reweight_entity (kernel/sched/fair.c:3813) <4>[ 160.713802] update_cfs_group (kernel/sched/fair.c:3975 (discriminator 1)) <4>[ 160.714277] dequeue_entities (kernel/sched/fair.c:7091) <4>[ 160.714903] dequeue_task_fair (kernel/sched/fair.c:7144 (discriminator 1)) <4>[ 160.716502] move_queued_task.isra.0 (kernel/sched/core.c:2437 (discriminator 1))
I don't see anything that immediately sticks out as causing this, but I do see five scheduler patches backported in stable-rc on top of v6.12.8, these are the original commits:
66951e4860d3 ("sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE")
This one reworks reweight_entity(), but I've been running with that on top of 13-rc6 for a week or so and not seen this.
The offending commit is 6d71a9c6160479899ee744d2c6d6602a191deb1f "sched/fair: Fix EEVDF entity placement bug causing scheduling lag"
It works fine on 6.13, at least on RISC-V (which is the only arch I test).
Seems 6.13 is gripe free thanks to it containing 4423af84b297.
I stumbled upon a reproducer for my x86_64 desktop box: all I need do is fire up a kvm guest in an enterprise configured host. That inspires libvirt goop to engage group scheduling, splat follows instantly.
Back 4423af84b297 out of 6.13, it starts griping, add it to a 6.12 tree containing 6d71a9c61604, it stops doing so.
It's already been reverted and 6.12.11-rc2 has been pushed out.
So stable should perhaps take 4423af84b297 along with 6d71a9c61604?
-Mike