Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
Hello, we recently got a few kernel crashes with following backtrace. Happened on 6.4.12 (and 6.4.11 I think) but did not happen (I think) on 6.4.4.
[293790.928007] ------------[ cut here ]------------ [293790.929905] rq->clock_update_flags & RQCF_ACT_SKIP [293790.929919] WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 [293790.933694] Modules linked in: xt_owner(E) xt_REDIRECT(E) mptcp_diag(E) xsk_diag(E) raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) tcp_diag(E) udp_diag(E) inet_diag(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) nfs(E) lockd(E) grace(E) fscache(E) netfs(E) nbd(E) rbd(E) libceph(E) dns_resolver(E) xt_set(E) ipt_rpfilter(E) ip_set_hash_ip(E) ip_set_hash_net(E) bpf_preload(E) xt_multiport(E) veth(E) wireguard(E) libchacha20poly1305(E) chacha_x86_64(E) poly1305_x86_64(E) ip6_udp_tunnel(E) udp_tunnel(E) curve25519_x86_64(E) libcurve25519_generic(E) libchacha(E) nf_conntrack_netlink(E) xt_nat(E) xt_statistic(E) xt_addrtype(E) ipt_REJECT(E) nf_reject_ipv4(E) ip_set(E) ip_vs_sh(E) ip_vs_wrr(E) ip_vs_rr(E) ip_vs(E) xt_MASQUERADE(E) nft_chain_nat(E) xt_mark(E) xt_conntrack(E) xt_comment(E) nft_compat(E) nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) iptable_nat(E) nf_nat(E) iptable_filter(E) ip_tables(E) overlay(E) dummy(E) sunrpc(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) [293790.933738] binfmt_misc(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) virtio_gpu(E) ccp(E) virtio_dma_buf(E) drm_shmem_helper(E) virtio_net(E) vfat(E) kvm(E) i2c_i801(E) drm_kms_helper(E) net_failover(E) irqbypass(E) syscopyarea(E) fat(E) i2c_smbus(E) failover(E) sysfillrect(E) virtio_balloon(E) sysimgblt(E) drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) libata(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [293790.946262] Unloaded tainted modules: edac_mce_amd(E):1 [293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G W E 6.4.12-1.gdc.el9.x86_64 #1 [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20230301gitf80f052277c8-2.el9 03/01/2023 [293790.959681] RIP: 0010:__cfsb_csd_unthrottle+0x149/0x160 [293790.960933] Code: 37 fa ff 0f 0b e9 17 ff ff ff 80 3d 41 59 fc 01 00 0f 85 21 ff ff ff 48 c7 c7 98 03 95 9d c6 05 2d 59 fc 01 01 e8 77 37 fa ff <0f> 0b 41 8b 85 88 09 00 00 e9 00 ff ff ff 66 0f 1f 84 00 00 00 00 [293790.964077] RSP: 0000:ffffb708e7217db8 EFLAGS: 00010086 [293790.965160] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 [293790.966340] RDX: ffff905482c5f708 RSI: 0000000000000001 RDI: ffff905482c5f700 [293790.967839] RBP: ffff9029bb0e9e00 R08: 0000000000000000 R09: 00000000ffff7fff [293790.969496] R10: ffffb708e7217c58 R11: ffffffff9e3e2c88 R12: 00000000000317c0 [293790.970859] R13: ffff903602c317c0 R14: 0000000000000000 R15: ffff905482c726b8 [293790.972085] FS: 00007ff3b66fe640(0000) GS:ffff905482c40000(0000) knlGS:0000000000000000 [293790.973678] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [293790.974663] CR2: 00007f16889036c0 CR3: 0000002072e34004 CR4: 0000000000770ee0 [293790.976108] PKRU: 55555554 [293790.977048] Call Trace: [293790.978013] <TASK> [293790.978678] ? __warn+0x80/0x130 [293790.979727] ? __cfsb_csd_unthrottle+0x149/0x160 [293790.980824] ? report_bug+0x195/0x1a0 [293790.981806] ? handle_bug+0x3c/0x70 [293790.982884] ? exc_invalid_op+0x14/0x70 [293790.983837] ? asm_exc_invalid_op+0x16/0x20 [293790.984626] ? __cfsb_csd_unthrottle+0x149/0x160 [293790.985599] ? __cfsb_csd_unthrottle+0x149/0x160 [293790.986583] unregister_fair_sched_group+0x73/0x1d0 [293790.987682] sched_unregister_group_rcu+0x1a/0x40 [293790.988752] rcu_do_batch+0x199/0x4d0 [293790.989643] rcu_core+0x267/0x420 [293790.990418] __do_softirq+0xc8/0x2ab [293790.991285] __irq_exit_rcu+0xb9/0xf0 [293790.992555] sysvec_apic_timer_interrupt+0x3c/0x90 [293790.993477] asm_sysvec_apic_timer_interrupt+0x16/0x20 [293790.994171] RIP: 0033:0x7ff4dca91f60 [293790.994801] Code: 75 15 49 8b f7 c5 f8 77 49 ba 80 6c bf f3 f4 7f 00 00 41 ff d2 eb 0d 4b 89 7c 13 f8 49 83 c2 f8 4d 89 57 70 48 8b c3 c5 f8 77 <48> 83 c4 50 5d 4d 8b 97 08 01 00 00 41 85 02 c3 49 8d 14 fc 8b 7a [293790.997256] RSP: 002b:00007ff3b66fd190 EFLAGS: 00000246 [293790.998138] RAX: 0000000655fd6ed0 RBX: 0000000655fd6ed0 RCX: 0000000000000004 [293790.999184] RDX: 0000000000000000 RSI: 000000066cf4939c RDI: 00007ff4f1180eb7 [293791.000220] RBP: 0000000000000004 R08: 000000066cf48530 R09: 000000066cf493a8 [293791.001274] R10: 00000000000007f0 R11: 00007ff3bc00ca80 R12: 0000000000000000 [293791.002222] R13: 000000066cf49390 R14: 00000000cd9e9272 R15: 00007ff39c033800 [293791.002966] </TASK> [293791.003489] ---[ end trace 0000000000000000 ]--- [293791.004440] ------------[ cut here ]------------ [293791.005479] rq->clock_update_flags < RQCF_ACT_SKIP [293791.005493] WARNING: CPU: 0 PID: 3920513 at kernel/sched/sched.h:1496 update_curr+0x162/0x1d0
Sadly I don't have more info but hopefully this stacktrace will be enough.
See Bugzilla for the full thread.
Anyway, I'm adding this regression to regzbot:
#regzbot introduced: ebb83d84e49b54 https://bugzilla.kernel.org/show_bug.cgi?id=217843
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217843
On 30/08/2023 07:37, Bagas Sanjaya wrote:
[293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G W E 6.4.12-1.gdc.el9.x86_64 #1 [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20230301gitf80f052277c8-2.el9 03/01/2023
To Igor: Does vanilla v6.4.12 also exhibit this regression?
Bagas Sanjaya bagasdotme@gmail.com writes:
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
Hello, we recently got a few kernel crashes with following backtrace. Happened on 6.4.12 (and 6.4.11 I think) but did not happen (I think) on 6.4.4.
[293790.928007] ------------[ cut here ]------------ [293790.929905] rq->clock_update_flags & RQCF_ACT_SKIP [293790.929919] WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 [293790.933694] Modules linked in: [...] [293790.946262] Unloaded tainted modules: edac_mce_amd(E):1 [293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G W E 6.4.12-1.gdc.el9.x86_64 #1 [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20230301gitf80f052277c8-2.el9 03/01/2023 [293790.959681] RIP: 0010:__cfsb_csd_unthrottle+0x149/0x160
See Bugzilla for the full thread.
Anyway, I'm adding this regression to regzbot:
#regzbot introduced: ebb83d84e49b54 https://bugzilla.kernel.org/show_bug.cgi?id=217843
Thanks.
The code in question is literally "rq_lock; update_rq_clock; rq_clock_start_loop_update (the warning)", which suggests to me that RQCF_ACT_SKIP is somehow leaking from somewhere else?
On 2023/8/31 Benjamin Segall wrote: Hi,
Bagas Sanjaya bagasdotme@gmail.com writes:
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
Hello, we recently got a few kernel crashes with following backtrace. Happened on 6.4.12 (and 6.4.11 I think) but did not happen (I think) on 6.4.4.
[293790.928007] ------------[ cut here ]------------ [293790.929905] rq->clock_update_flags & RQCF_ACT_SKIP [293790.929919] WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 [293790.933694] Modules linked in: [...] [293790.946262] Unloaded tainted modules: edac_mce_amd(E):1 [293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G W E 6.4.12-1.gdc.el9.x86_64 #1 [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20230301gitf80f052277c8-2.el9 03/01/2023 [293790.959681] RIP: 0010:__cfsb_csd_unthrottle+0x149/0x160
See Bugzilla for the full thread.
Anyway, I'm adding this regression to regzbot:
#regzbot introduced: ebb83d84e49b54 https://bugzilla.kernel.org/show_bug.cgi?id=217843
Thanks.
The code in question is literally "rq_lock; update_rq_clock; rq_clock_start_loop_update (the warning)", which suggests to me that RQCF_ACT_SKIP is somehow leaking from somewhere else?
If I understand correctly, rq->clock_update_flags may be set to RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq lock may be released briefly in __schedule(), such as newidle_balance(). At this time Other CPUs hold this rq lock, and then calling rq_clock_start_loop_update() may trigger this warning.
This warning check might be wrong. We need to add assert_clock_updated() to check that the rq clock has been updated before calling rq_clock_start_loop_update().
Maybe some things can be like this?
From: Hao Jia jiahao.os@bytedance.com Date: Thu, 31 Aug 2023 11:38:54 +0800 Subject: [PATCH] sched/core: Fix wrong warning check in rq_clock_start_loop_update()
Commit ebb83d84e49b54 ("sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()") add "rq->clock_update_flags & RQCF_ACT_SKIP" warning in rq_clock_start_loop_update(). But this warning is inaccurate and may be triggered incorrectly in the following situations:
CPU0 CPU1
__schedule() *rq->clock_update_flags <<= 1;* unregister_fair_sched_group() pick_next_task_fair+0x4a/0x410 destroy_cfs_bandwidth() newidle_balance+0x115/0x3e0 for_each_possible_cpu(i) *i=0* rq_unpin_lock(this_rq, rf) __cfsb_csd_unthrottle() raw_spin_rq_unlock(this_rq) rq_lock(*CPU0_rq*, &rf) rq_clock_start_loop_update() rq->clock_update_flags & RQCF_ACT_SKIP <--
raw_spin_rq_lock(this_rq)
So we remove this wrong check. Add assert_clock_updated() to check that rq clock has been updated before calling rq_clock_start_loop_update(). And use the variable rq_clock_flags in rq_clock_start_loop_update() to record the previous state of rq->clock_update_flags. Correspondingly, restore rq->clock_update_flags through rq_clock_flags in rq_clock_stop_loop_update() to prevent losing its previous information.
Fixes: ebb83d84e49b ("sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()") Cc: stable@vger.kernel.org Reported-by: Igor Raits igor.raits@gmail.com Reported-by: Bagas Sanjaya bagasdotme@gmail.com Signed-off-by: Hao Jia jiahao.os@bytedance.com --- kernel/sched/fair.c | 10 ++++++---- kernel/sched/sched.h | 12 +++++++----- 2 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 911d0063763c..0f6557c82a4c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5679,6 +5679,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) #ifdef CONFIG_SMP static void __cfsb_csd_unthrottle(void *arg) { + unsigned int rq_clock_flags; struct cfs_rq *cursor, *tmp; struct rq *rq = arg; struct rq_flags rf; @@ -5691,7 +5692,7 @@ static void __cfsb_csd_unthrottle(void *arg) * Do it once and skip the potential next ones. */ update_rq_clock(rq); - rq_clock_start_loop_update(rq); + rq_clock_start_loop_update(rq, &rq_clock_flags);
/* * Since we hold rq lock we're safe from concurrent manipulation of @@ -5712,7 +5713,7 @@ static void __cfsb_csd_unthrottle(void *arg)
rcu_read_unlock();
- rq_clock_stop_loop_update(rq); + rq_clock_stop_loop_update(rq, &rq_clock_flags); rq_unlock(rq, &rf); }
@@ -6230,6 +6231,7 @@ static void __maybe_unused update_runtime_enabled(struct rq *rq) /* cpu offline callback */ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq) { + unsigned int rq_clock_flags; struct task_group *tg;
lockdep_assert_rq_held(rq); @@ -6239,7 +6241,7 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq) * set_rq_offline(), so we should skip updating * the rq clock again in unthrottle_cfs_rq(). */ - rq_clock_start_loop_update(rq); + rq_clock_start_loop_update(rq, &rq_clock_flags);
rcu_read_lock(); list_for_each_entry_rcu(tg, &task_groups, list) { @@ -6264,7 +6266,7 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq) } rcu_read_unlock();
- rq_clock_stop_loop_update(rq); + rq_clock_stop_loop_update(rq, &rq_clock_flags); }
bool cfs_task_bw_constrained(struct task_struct *p) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 04846272409c..ff2864f202f5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1558,20 +1558,22 @@ static inline void rq_clock_cancel_skipupdate(struct rq *rq) * when using list_for_each_entry_*) * rq_clock_start_loop_update() can be called after updating the clock * once and before iterating over the list to prevent multiple update. + * And use @rq_clock_flags to record the previous state of rq->clock_update_flags. * After the iterative traversal, we need to call rq_clock_stop_loop_update() - * to clear RQCF_ACT_SKIP of rq->clock_update_flags. + * to restore rq->clock_update_flags through @rq_clock_flags. */ -static inline void rq_clock_start_loop_update(struct rq *rq) +static inline void rq_clock_start_loop_update(struct rq *rq, unsigned int *rq_clock_flags) { lockdep_assert_rq_held(rq); - SCHED_WARN_ON(rq->clock_update_flags & RQCF_ACT_SKIP); + assert_clock_updated(rq); + *rq_clock_flags = rq->clock_update_flags; rq->clock_update_flags |= RQCF_ACT_SKIP; }
-static inline void rq_clock_stop_loop_update(struct rq *rq) +static inline void rq_clock_stop_loop_update(struct rq *rq, unsigned int *rq_clock_flags) { lockdep_assert_rq_held(rq); - rq->clock_update_flags &= ~RQCF_ACT_SKIP; + rq->clock_update_flags = *rq_clock_flags; }
struct rq_flags { -- 2.20.1
On Thu, Aug 31, 2023 at 04:48:29PM +0800, Hao Jia wrote:
If I understand correctly, rq->clock_update_flags may be set to RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq lock may be released briefly in __schedule(), such as newidle_balance(). At this time Other CPUs hold this rq lock, and then calling rq_clock_start_loop_update() may trigger this warning.
This warning check might be wrong. We need to add assert_clock_updated() to check that the rq clock has been updated before calling rq_clock_start_loop_update().
Maybe some things can be like this?
Urgh, aside from it being white space mangled, I think this is entirely going in the wrong direction.
Leaking ACT_SKIP is dodgy as heck.. it's entirely too late to think clearly though, I'll have to try again tomorrow.
On 2023/9/5 Peter Zijlstra wrote:
On Thu, Aug 31, 2023 at 04:48:29PM +0800, Hao Jia wrote:
If I understand correctly, rq->clock_update_flags may be set to RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq lock may be released briefly in __schedule(), such as newidle_balance(). At this time Other CPUs hold this rq lock, and then calling rq_clock_start_loop_update() may trigger this warning.
This warning check might be wrong. We need to add assert_clock_updated() to check that the rq clock has been updated before calling rq_clock_start_loop_update().
Maybe some things can be like this?
Urgh, aside from it being white space mangled, I think this is entirely going in the wrong direction.
Leaking ACT_SKIP is dodgy as heck.. it's entirely too late to think clearly though, I'll have to try again tomorrow.
Hi Peter,
Do you think this fix method is correct? Or should we go back to the beginning and move update_rq_clock() from unthrottle_cfs_rq()?
Thanks, Hao
On Thu, 2023-09-07 at 16:59 +0800, Hao Jia wrote:
On 2023/9/5 Peter Zijlstra wrote:
On Thu, Aug 31, 2023 at 04:48:29PM +0800, Hao Jia wrote:
If I understand correctly, rq->clock_update_flags may be set to RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq lock may be released briefly in __schedule(), such as newidle_balance(). At this time Other CPUs hold this rq lock, and then calling rq_clock_start_loop_update() may trigger this warning.
This warning check might be wrong. We need to add assert_clock_updated() to check that the rq clock has been updated before calling rq_clock_start_loop_update().
Maybe some things can be like this?
Urgh, aside from it being white space mangled, I think this is entirely going in the wrong direction.
Leaking ACT_SKIP is dodgy as heck.. it's entirely too late to think clearly though, I'll have to try again tomorrow.
I am trying to understand why this is an ACT_SKIP leak. Before call to __cfsb_csd_unthrottle(), is it possible someone else lock the runqueue, set ACT_SKIP and release rq_lock? And then that someone never update the rq_clock?
Hi Peter,
Do you think this fix method is correct? Or should we go back to the beginning and move update_rq_clock() from unthrottle_cfs_rq()?
If anyone who locked the runqueue set ACT_SKIP also will update rq_clock, I think your change is okay. Otherwise rq_clock could be missing update.
Thanks.
Tim
On 2023/9/8 Tim Chen wrote:
On Thu, 2023-09-07 at 16:59 +0800, Hao Jia wrote:
On 2023/9/5 Peter Zijlstra wrote:
On Thu, Aug 31, 2023 at 04:48:29PM +0800, Hao Jia wrote:
If I understand correctly, rq->clock_update_flags may be set to RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq lock may be released briefly in __schedule(), such as newidle_balance(). At this time Other CPUs hold this rq lock, and then calling rq_clock_start_loop_update() may trigger this warning.
This warning check might be wrong. We need to add assert_clock_updated() to check that the rq clock has been updated before calling rq_clock_start_loop_update().
Maybe some things can be like this?
Urgh, aside from it being white space mangled, I think this is entirely going in the wrong direction.
Leaking ACT_SKIP is dodgy as heck.. it's entirely too late to think clearly though, I'll have to try again tomorrow.
I am trying to understand why this is an ACT_SKIP leak. Before call to __cfsb_csd_unthrottle(), is it possible someone else lock the runqueue, set ACT_SKIP and release rq_lock? And then that someone never update the rq_clock?
Yes, we want to set rq->clock_update_flags to RQCF_ACT_SKIP to avoid updating the rq clock multiple times in __cfsb_csd_unthrottle().
But now we find ACT_SKIP leak, so we cannot unconditionally set rq->clock_update_flags to RQCF_ACT_SKIP in rq_clock_start_loop_update().
Hi Peter,
Do you think this fix method is correct? Or should we go back to the beginning and move update_rq_clock() from unthrottle_cfs_rq()?
If anyone who locked the runqueue set ACT_SKIP also will update rq_clock, I think your change is okay. Otherwise rq_clock could be missing update.
Thanks.
Tim
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 5ebde09d91707a4a9bec1e3d213e3c12ffde348f Gitweb: https://git.kernel.org/tip/5ebde09d91707a4a9bec1e3d213e3c12ffde348f Author: Hao Jia jiahao.os@bytedance.com AuthorDate: Thu, 12 Oct 2023 17:00:03 +08:00 Committer: Peter Zijlstra peterz@infradead.org CommitterDate: Tue, 24 Oct 2023 10:38:42 +02:00
sched/core: Fix RQCF_ACT_SKIP leak
Igor Raits and Bagas Sanjaya report a RQCF_ACT_SKIP leak warning.
This warning may be triggered in the following situations:
CPU0 CPU1
__schedule() *rq->clock_update_flags <<= 1;* unregister_fair_sched_group() pick_next_task_fair+0x4a/0x410 destroy_cfs_bandwidth() newidle_balance+0x115/0x3e0 for_each_possible_cpu(i) *i=0* rq_unpin_lock(this_rq, rf) __cfsb_csd_unthrottle() raw_spin_rq_unlock(this_rq) rq_lock(*CPU0_rq*, &rf) rq_clock_start_loop_update() rq->clock_update_flags & RQCF_ACT_SKIP <-- raw_spin_rq_lock(this_rq)
The purpose of RQCF_ACT_SKIP is to skip the update rq clock, but the update is very early in __schedule(), but we clear RQCF_*_SKIP very late, causing it to span that gap above and triggering this warning.
In __schedule() we can clear the RQCF_*_SKIP flag immediately after update_rq_clock() to avoid this RQCF_ACT_SKIP leak warning. And set rq->clock_update_flags to RQCF_UPDATED to avoid rq->clock_update_flags < RQCF_ACT_SKIP warning that may be triggered later.
Fixes: ebb83d84e49b ("sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()") Closes: https://lore.kernel.org/all/20230913082424.73252-1-jiahao.os@bytedance.com Reported-by: Igor Raits igor.raits@gmail.com Reported-by: Bagas Sanjaya bagasdotme@gmail.com Suggested-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Hao Jia jiahao.os@bytedance.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/a5dd536d-041a-2ce9-f4b7-64d8d85c86dc@gmail.com --- kernel/sched/core.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 264c2eb..dc724f5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5361,8 +5361,6 @@ context_switch(struct rq *rq, struct task_struct *prev, /* switch_mm_cid() requires the memory barriers above. */ switch_mm_cid(rq, prev, next);
- rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); - prepare_lock_switch(rq, next, rf);
/* Here we just switch the register state and the stack. */ @@ -6600,6 +6598,7 @@ static void __sched notrace __schedule(unsigned int sched_mode) /* Promote REQ to ACT */ rq->clock_update_flags <<= 1; update_rq_clock(rq); + rq->clock_update_flags = RQCF_UPDATED;
switch_count = &prev->nivcsw;
@@ -6679,8 +6678,6 @@ static void __sched notrace __schedule(unsigned int sched_mode) /* Also unlocks the rq: */ rq = context_switch(rq, prev, next, &rf); } else { - rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); - rq_unpin_lock(rq, &rf); __balance_callbacks(rq); raw_spin_rq_unlock_irq(rq);
[TLDR: This mail in primarily relevant for Linux regression tracking. A change or fix related to the regression discussed in this thread was posted or applied, but it did not use a Closes: tag to point to the report, as Linus and the documentation call for. Things happen, no worries -- but now the regression tracking bot needs to be told manually about the fix. See link in footer if these mails annoy you.]
On 30.08.23 02:37, Bagas Sanjaya wrote:
I notice a regression report on Bugzilla [1]. Quoting from it: [...] #regzbot introduced: ebb83d84e49b54 https://bugzilla.kernel.org/show_bug.cgi?id=217843
#regzbot fix: 5ebde09d91707a4a9bec #regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
linux-stable-mirror@lists.linaro.org