January 2025 - Linux-stable-mirror

[PATCH v3 5.10 1/1] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().

by d.privalov

From: Kuniyuki Iwashima <kuniyu(a)amazon.com> commit e8c526f2bdf1845bedaf6a478816a3d06fa78b8f upstream. Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler(). """ We are seeing a use-after-free from a bpf prog attached to trace_tcp_retransmit_synack. The program passes the req->sk to the bpf_sk_storage_get_tracing kernel helper which does check for null before using it. """ The commit 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not to call del_timer_sync() from reqsk_timer_handler(), but it introduced a small race window. Before the timer is called, expire_timers() calls detach_timer(timer, true) to clear timer->entry.pprev and marks it as not pending. If reqsk_queue_unlink() checks timer_pending() just after expire_timers() calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will continue running and send multiple SYN+ACKs until it expires. The reported UAF could happen if req->sk is close()d earlier than the timer expiration, which is 63s by default. The scenario would be 1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(), but del_timer_sync() is missed 2. reqsk timer is executed and scheduled again 3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but reqsk timer still has another one, and inet_csk_accept() does not clear req->sk for non-TFO sockets 4. sk is close()d 5. reqsk timer is executed again, and BPF touches req->sk Let's not use timer_pending() by passing the caller context to __inet_csk_reqsk_queue_drop(). Note that reqsk timer is pinned, so the issue does not happen in most use cases. [1] [0] BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0 Use-after-free read at 0x00000000a891fb3a (in kfence-#1): bpf_sk_storage_get_tracing+0x2e/0x1b0 bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda bpf_trace_run2+0x4c/0xc0 tcp_rtx_synack+0xf9/0x100 reqsk_timer_handler+0xda/0x3d0 run_timer_softirq+0x292/0x8a0 irq_exit_rcu+0xf5/0x320 sysvec_apic_timer_interrupt+0x6d/0x80 asm_sysvec_apic_timer_interrupt+0x16/0x20 intel_idle_irq+0x5a/0xa0 cpuidle_enter_state+0x94/0x273 cpu_startup_entry+0x15e/0x260 start_secondary+0x8a/0x90 secondary_startup_64_no_verify+0xfa/0xfb kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6 allocated by task 0 on cpu 9 at 260507.901592s: sk_prot_alloc+0x35/0x140 sk_clone_lock+0x1f/0x3f0 inet_csk_clone_lock+0x15/0x160 tcp_create_openreq_child+0x1f/0x410 tcp_v6_syn_recv_sock+0x1da/0x700 tcp_check_req+0x1fb/0x510 tcp_v6_rcv+0x98b/0x1420 ipv6_list_rcv+0x2258/0x26e0 napi_complete_done+0x5b1/0x2990 mlx5e_napi_poll+0x2ae/0x8d0 net_rx_action+0x13e/0x590 irq_exit_rcu+0xf5/0x320 common_interrupt+0x80/0x90 asm_common_interrupt+0x22/0x40 cpuidle_enter_state+0xfb/0x273 cpu_startup_entry+0x15e/0x260 start_secondary+0x8a/0x90 secondary_startup_64_no_verify+0xfa/0xfb freed by task 0 on cpu 9 at 260507.927527s: rcu_core_si+0x4ff/0xf10 irq_exit_rcu+0xf5/0x320 sysvec_apic_timer_interrupt+0x6d/0x80 asm_sysvec_apic_timer_interrupt+0x16/0x20 cpuidle_enter_state+0xfb/0x273 cpu_startup_entry+0x15e/0x260 start_secondary+0x8a/0x90 secondary_startup_64_no_verify+0xfa/0xfb Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()") Reported-by: Martin KaFai Lau <martin.lau(a)kernel.org> Closes: https://lore.kernel.org/netdev/eb6684d0-ffd9-4bdc-9196-33f690c25824@linux.d… Link: https://lore.kernel.org/netdev/b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.d… [1] Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com> Reviewed-by: Eric Dumazet <edumazet(a)google.com> Reviewed-by: Martin KaFai Lau <martin.lau(a)kernel.org> Link: https://patch.msgid.link/20241014223312.4254-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> [d.privalov: adapt calling __inet_csk_reqsk_queue_drop()] Signed-off-by: Dmitriy Privalov <d.privalov(a)omp.ru> --- Backport fix for CVE-2024-50154 Link: https://nvd.nist.gov/vuln/detail/CVE-2024-50154 --- v3: Return using timer_delete_sync() v2: Fix patch applying net/ipv4/inet_connection_sock.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 1dfa561e8f981..d912894ad4adc 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -700,21 +700,31 @@ static bool reqsk_queue_unlink(struct request_sock *req) found = __sk_nulls_del_node_init_rcu(req_to_sk(req)); spin_unlock(lock); } - if (timer_pending(&req->rsk_timer) && del_timer_sync(&req->rsk_timer)) - reqsk_put(req); + return found; } -bool inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req) +static bool __inet_csk_reqsk_queue_drop(struct sock *sk, + struct request_sock *req, + bool from_timer) { bool unlinked = reqsk_queue_unlink(req); + if (!from_timer && timer_delete_sync(&req->rsk_timer)) + reqsk_put(req); + if (unlinked) { reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req); reqsk_put(req); } + return unlinked; } + +bool inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req) +{ + return __inet_csk_reqsk_queue_drop(sk, req, false); +} EXPORT_SYMBOL(inet_csk_reqsk_queue_drop); void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock *req) @@ -781,7 +791,8 @@ static void reqsk_timer_handler(struct timer_list *t) return; } drop: - inet_csk_reqsk_queue_drop_and_put(sk_listener, req); + __inet_csk_reqsk_queue_drop(sk_listener, req, true); + reqsk_put(req); } static void reqsk_queue_hash_req(struct request_sock *req, -- 2.34.1

5 months

1
0
0 0

[PATCH] Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection

by Fedor Pchelkin

One of the possible ways to enable the input MTU auto-selection for L2CAP connections is supposed to be through passing a special "0" value for it as a socket option. Commit [1] added one of those into avdtp. However, it simply wouldn't work because the kernel still treats the specified value as invalid and denies the setting attempt. Recorded BlueZ logs include the following: bluetoothd[496]: profiles/audio/avdtp.c:l2cap_connect() setsockopt(L2CAP_OPTIONS): Invalid argument (22) [1]: https://github.com/bluez/bluez/commit/ae5be371a9f53fed33d2b34748a95a5498fd4… Found by Linux Verification Center (linuxtesting.org). Fixes: 4b6e228e297b ("Bluetooth: Auto tune if input MTU is set to 0") Cc: stable(a)vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru> --- net/bluetooth/l2cap_sock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c index 49f97d4138ea..46ea0bee2259 100644 --- a/net/bluetooth/l2cap_sock.c +++ b/net/bluetooth/l2cap_sock.c @@ -710,12 +710,12 @@ static bool l2cap_valid_mtu(struct l2cap_chan *chan, u16 mtu) { switch (chan->scid) { case L2CAP_CID_ATT: - if (mtu < L2CAP_LE_MIN_MTU) + if (mtu && mtu < L2CAP_LE_MIN_MTU) return false; break; default: - if (mtu < L2CAP_DEFAULT_MIN_MTU) + if (mtu && mtu < L2CAP_DEFAULT_MIN_MTU) return false; } -- 2.39.5

5 months

2
1
0 0

+ mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/hugetlb: fix hugepage allocation for interleaved memory nodes has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: "Ritesh Harjani (IBM)" <ritesh.list(a)gmail.com> Subject: mm/hugetlb: fix hugepage allocation for interleaved memory nodes Date: Sat, 11 Jan 2025 16:36:55 +0530 gather_bootmem_prealloc() assumes the start nid as 0 and size as num_node_state(N_MEMORY). That means in case if memory attached numa nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan few of these nodes. Since memory attached numa nodes can be interleaved in any fashion, hence ensure that the current code checks for all numa node ids (.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that it can distributes all nr_node_ids among the these many no. threads. e.g. qemu cmdline ======================== numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20" mem_cmd="-object memory-backend-ram,id=mem1,size=16G" w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): ========================== ~ # cat /proc/meminfo |grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 0 kB with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): =========================== ~ # cat /proc/meminfo |grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 2 HugePages_Free: 2 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 2097152 kB Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.17365925… Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com> Reported-by: Pavithra Prakash <pavrampu(a)linux.ibm.com> Suggested-by: Muchun Song <muchun.song(a)linux.dev> Tested-by: Sourabh Jain <sourabhjain(a)linux.ibm.com> Reviewed-by: Luiz Capitulino <luizcap(a)redhat.com> Cc: Donet Tom <donettom(a)linux.ibm.com> Cc: Gang Li <gang.li(a)linux.dev> Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com> Cc: David Rientjes <rientjes(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/hugetlb.c~mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes +++ a/mm/hugetlb.c @@ -3309,7 +3309,7 @@ static void __init gather_bootmem_preall .thread_fn = gather_bootmem_prealloc_parallel, .fn_arg = NULL, .start = 0, - .size = num_node_state(N_MEMORY), + .size = nr_node_ids, .align = 1, .min_chunk = 1, .max_threads = num_node_state(N_MEMORY), _ Patches currently in -mm which might be from ritesh.list(a)gmail.com are mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes.patch

5 months

2
1
0 0

[PATCH 5.15] wifi: iwlwifi: add a few rate index validity checks

by Dmitry Antipov

From: Anjaneyulu <pagadala.yesu.anjaneyulu(a)intel.com> commit efbe8f81952fe469d38655744627d860879dcde8 upstream. Validate index before access iwl_rate_mcs to keep rate->index inside the valid boundaries. Use MCS_0_INDEX if index is less than MCS_0_INDEX and MCS_9_INDEX if index is greater than MCS_9_INDEX. Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu(a)intel.com> Signed-off-by: Gregory Greenman <gregory.greenman(a)intel.com> Link: https://lore.kernel.org/r/20230614123447.79f16b3aef32.If1137f894775d6d07b78… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> Signed-off-by: Dmitry Antipov <dmantipov(a)yandex.ru> --- drivers/net/wireless/intel/iwlwifi/dvm/rs.c | 9 ++++++--- drivers/net/wireless/intel/iwlwifi/mvm/rs.c | 11 +++++++---- 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/rs.c b/drivers/net/wireless/intel/iwlwifi/dvm/rs.c index 548540dd0c0f..e1ab5b01f1c4 100644 --- a/drivers/net/wireless/intel/iwlwifi/dvm/rs.c +++ b/drivers/net/wireless/intel/iwlwifi/dvm/rs.c @@ -2,7 +2,7 @@ /****************************************************************************** * * Copyright(c) 2005 - 2014 Intel Corporation. All rights reserved. - * Copyright (C) 2019 - 2020 Intel Corporation + * Copyright (C) 2019 - 2020, 2022 - 2023 Intel Corporation * * Contact Information: * Intel Linux Wireless <linuxwifi(a)intel.com> @@ -130,7 +130,7 @@ static int iwl_hwrate_to_plcp_idx(u32 rate_n_flags) return idx; } - return -1; + return IWL_RATE_INVALID; } static void rs_rate_scale_perform(struct iwl_priv *priv, @@ -3151,7 +3151,10 @@ static ssize_t rs_sta_dbgfs_scale_table_read(struct file *file, for (i = 0; i < LINK_QUAL_MAX_RETRY_NUM; i++) { index = iwl_hwrate_to_plcp_idx( le32_to_cpu(lq_sta->lq.rs_table[i].rate_n_flags)); - if (is_legacy(tbl->lq_type)) { + if (index == IWL_RATE_INVALID) { + desc += sprintf(buff + desc, " rate[%d] 0x%X invalid rate\n", + i, le32_to_cpu(lq_sta->lq.rs_table[i].rate_n_flags)); + } else if (is_legacy(tbl->lq_type)) { desc += sprintf(buff+desc, " rate[%d] 0x%X %smbps\n", i, le32_to_cpu(lq_sta->lq.rs_table[i].rate_n_flags), iwl_rate_mcs[index].mbps); diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rs.c b/drivers/net/wireless/intel/iwlwifi/mvm/rs.c index b97708cb869d..2078768b6824 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rs.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rs.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-only /****************************************************************************** * - * Copyright(c) 2005 - 2014, 2018 - 2021 Intel Corporation. All rights reserved. + * Copyright(c) 2005 - 2014, 2018 - 2023 Intel Corporation. All rights reserved. * Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH * Copyright(c) 2016 - 2017 Intel Deutschland GmbH * @@ -1119,10 +1119,13 @@ static void rs_get_lower_rate_down_column(struct iwl_lq_sta *lq_sta, rate->bw = RATE_MCS_CHAN_WIDTH_20; - WARN_ON_ONCE(rate->index < IWL_RATE_MCS_0_INDEX || - rate->index > IWL_RATE_MCS_9_INDEX); + if (WARN_ON_ONCE(rate->index < IWL_RATE_MCS_0_INDEX)) + rate->index = rs_ht_to_legacy[IWL_RATE_MCS_0_INDEX]; + else if (WARN_ON_ONCE(rate->index > IWL_RATE_MCS_9_INDEX)) + rate->index = rs_ht_to_legacy[IWL_RATE_MCS_9_INDEX]; + else + rate->index = rs_ht_to_legacy[rate->index]; - rate->index = rs_ht_to_legacy[rate->index]; rate->ldpc = false; } else { /* Downgrade to SISO with same MCS if in MIMO */ -- 2.48.1

5 months

2
1
0 0

[PATCH 5.10] wifi: iwlwifi: add a few rate index validity checks

by Dmitry Antipov

From: Anjaneyulu <pagadala.yesu.anjaneyulu(a)intel.com> commit efbe8f81952fe469d38655744627d860879dcde8 upstream. Validate index before access iwl_rate_mcs to keep rate->index inside the valid boundaries. Use MCS_0_INDEX if index is less than MCS_0_INDEX and MCS_9_INDEX if index is greater than MCS_9_INDEX. Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu(a)intel.com> Signed-off-by: Gregory Greenman <gregory.greenman(a)intel.com> Link: https://lore.kernel.org/r/20230614123447.79f16b3aef32.If1137f894775d6d07b78… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> Signed-off-by: Dmitry Antipov <dmantipov(a)yandex.ru> --- drivers/net/wireless/intel/iwlwifi/dvm/rs.c | 9 ++++++--- drivers/net/wireless/intel/iwlwifi/mvm/rs.c | 12 ++++++++---- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/rs.c b/drivers/net/wireless/intel/iwlwifi/dvm/rs.c index 548540dd0c0f..a2b477bc574a 100644 --- a/drivers/net/wireless/intel/iwlwifi/dvm/rs.c +++ b/drivers/net/wireless/intel/iwlwifi/dvm/rs.c @@ -2,7 +2,7 @@ /****************************************************************************** * * Copyright(c) 2005 - 2014 Intel Corporation. All rights reserved. - * Copyright (C) 2019 - 2020 Intel Corporation + * Copyright (C) 2019 - 2020, 2023 Intel Corporation * * Contact Information: * Intel Linux Wireless <linuxwifi(a)intel.com> @@ -130,7 +130,7 @@ static int iwl_hwrate_to_plcp_idx(u32 rate_n_flags) return idx; } - return -1; + return IWL_RATE_INVALID; } static void rs_rate_scale_perform(struct iwl_priv *priv, @@ -3151,7 +3151,10 @@ static ssize_t rs_sta_dbgfs_scale_table_read(struct file *file, for (i = 0; i < LINK_QUAL_MAX_RETRY_NUM; i++) { index = iwl_hwrate_to_plcp_idx( le32_to_cpu(lq_sta->lq.rs_table[i].rate_n_flags)); - if (is_legacy(tbl->lq_type)) { + if (index == IWL_RATE_INVALID) { + desc += sprintf(buff + desc, " rate[%d] 0x%X invalid rate\n", + i, le32_to_cpu(lq_sta->lq.rs_table[i].rate_n_flags)); + } else if (is_legacy(tbl->lq_type)) { desc += sprintf(buff+desc, " rate[%d] 0x%X %smbps\n", i, le32_to_cpu(lq_sta->lq.rs_table[i].rate_n_flags), iwl_rate_mcs[index].mbps); diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rs.c b/drivers/net/wireless/intel/iwlwifi/mvm/rs.c index ed7382e7ea17..0a9634d9006c 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rs.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rs.c @@ -1,7 +1,8 @@ // SPDX-License-Identifier: GPL-2.0-only /****************************************************************************** * - * Copyright(c) 2005 - 2014, 2018 - 2020 Intel Corporation. All rights reserved. + * Copyright(c) 2005 - 2014, 2018 - 2020, 2023 Intel Corporation. + * All rights reserved. * Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH * Copyright(c) 2016 - 2017 Intel Deutschland GmbH * @@ -1120,10 +1121,13 @@ static void rs_get_lower_rate_down_column(struct iwl_lq_sta *lq_sta, rate->bw = RATE_MCS_CHAN_WIDTH_20; - WARN_ON_ONCE(rate->index < IWL_RATE_MCS_0_INDEX || - rate->index > IWL_RATE_MCS_9_INDEX); + if (WARN_ON_ONCE(rate->index < IWL_RATE_MCS_0_INDEX)) + rate->index = rs_ht_to_legacy[IWL_RATE_MCS_0_INDEX]; + else if (WARN_ON_ONCE(rate->index > IWL_RATE_MCS_9_INDEX)) + rate->index = rs_ht_to_legacy[IWL_RATE_MCS_9_INDEX]; + else + rate->index = rs_ht_to_legacy[rate->index]; - rate->index = rs_ht_to_legacy[rate->index]; rate->ldpc = false; } else { /* Downgrade to SISO with same MCS if in MIMO */ -- 2.48.1

5 months

3
4
0 0

[PATCH 6.1] softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel

by Florian Bezdeka

From: K Prateek Nayak <kprateek.nayak(a)amd.com> commit 6675ce20046d149e1e1ffe7e9577947dee17aad5 upstream. do_softirq_post_smp_call_flush() on PREEMPT_RT kernels carries a WARN_ON_ONCE() for any SOFTIRQ being raised from an SMP-call-function. Since do_softirq_post_smp_call_flush() is called with preempt disabled, raising a SOFTIRQ during flush_smp_call_function_queue() can lead to longer preempt disabled sections. Since commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") IPIs to an idle CPU in TIF_POLLING_NRFLAG mode can be optimized out by instead setting TIF_NEED_RESCHED bit in idle task's thread_info and relying on the flush_smp_call_function_queue() in the idle-exit path to run the SMP-call-function. To trigger an idle load balancing, the scheduler queues nohz_csd_function() responsible for triggering an idle load balancing on a target nohz idle CPU and sends an IPI. Only now, this IPI is optimized out and the SMP-call-function is executed from flush_smp_call_function_queue() in do_idle() which can raise a SCHED_SOFTIRQ to trigger the balancing. So far, this went undetected since, the need_resched() check in nohz_csd_function() would make it bail out of idle load balancing early as the idle thread does not clear TIF_POLLING_NRFLAG before calling flush_smp_call_function_queue(). The need_resched() check was added with the intent to catch a new task wakeup, however, it has recently discovered to be unnecessary and will be removed in the subsequent commit after which nohz_csd_function() can raise a SCHED_SOFTIRQ from flush_smp_call_function_queue() to trigger an idle load balance on an idle target in TIF_POLLING_NRFLAG mode. nohz_csd_function() bails out early if "idle_cpu()" check for the target CPU, and does not lock the target CPU's rq until the very end, once it has found tasks to run on the CPU and will not inhibit the wakeup of, or running of a newly woken up higher priority task. Account for this and prevent a WARN_ON_ONCE() when SCHED_SOFTIRQ is raised from flush_smp_call_function_queue(). Signed-off-by: K Prateek Nayak <kprateek.nayak(a)amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Link: https://lore.kernel.org/r/20241119054432.6405-2-kprateek.nayak@amd.com Tested-by: Felix Moessbauer <felix.moessbauer(a)siemens.com> Signed-off-by: Florian Bezdeka <florian.bezdeka(a)siemens.com> --- Newer stable branches (6.12, 6.6) got this already, 5.10 and lower are not affected. The warning triggered for SCHED_SOFTIRQ under high network load while testing. kernel/softirq.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index a47396161843..6665f5cd60cb 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -294,17 +294,24 @@ static inline void invoke_softirq(void) wakeup_softirqd(); } +#define SCHED_SOFTIRQ_MASK BIT(SCHED_SOFTIRQ) + /* * flush_smp_call_function_queue() can raise a soft interrupt in a function - * call. On RT kernels this is undesired and the only known functionality - * in the block layer which does this is disabled on RT. If soft interrupts - * get raised which haven't been raised before the flush, warn so it can be + * call. On RT kernels this is undesired and the only known functionalities + * are in the block layer which is disabled on RT, and in the scheduler for + * idle load balancing. If soft interrupts get raised which haven't been + * raised before the flush, warn if it is not a SCHED_SOFTIRQ so it can be * investigated. */ void do_softirq_post_smp_call_flush(unsigned int was_pending) { - if (WARN_ON_ONCE(was_pending != local_softirq_pending())) + unsigned int is_pending = local_softirq_pending(); + + if (unlikely(was_pending != is_pending)) { + WARN_ON_ONCE(was_pending != (is_pending & ~SCHED_SOFTIRQ_MASK)); invoke_softirq(); + } } #else /* CONFIG_PREEMPT_RT */ -- 2.39.5

5 months

3
5
0 0

[PATCH v4 5/5] arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists

by Douglas Anderson

When comparing to the ARM list [1], it appears that several ARM cores were missing from the lists in spectre_bhb_loop_affected(). Add them. NOTE: for some of these cores it may not matter since other ways of clearing the BHB may be used (like the CLRBHB instruction or ECBHB), but it still seems good to have all the info from ARM's whitepaper included. [1] https://developer.arm.com/Arm%20Security%20Center/Spectre-BHB Fixes: 558c303c9734 ("arm64: Mitigate spectre style branch history side channels") Cc: stable(a)vger.kernel.org Signed-off-by: Douglas Anderson <dianders(a)chromium.org> --- (no changes since v3) Changes in v3: - New arch/arm64/kernel/proton-pack.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c index 89405be53d8f..0f51fd10b4b0 100644 --- a/arch/arm64/kernel/proton-pack.c +++ b/arch/arm64/kernel/proton-pack.c @@ -876,6 +876,14 @@ static u8 spectre_bhb_loop_affected(void) { u8 k = 0; + static const struct midr_range spectre_bhb_k132_list[] = { + MIDR_ALL_VERSIONS(MIDR_CORTEX_X3), + MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V2), + }; + static const struct midr_range spectre_bhb_k38_list[] = { + MIDR_ALL_VERSIONS(MIDR_CORTEX_A715), + MIDR_ALL_VERSIONS(MIDR_CORTEX_A720), + }; static const struct midr_range spectre_bhb_k32_list[] = { MIDR_ALL_VERSIONS(MIDR_CORTEX_A78), MIDR_ALL_VERSIONS(MIDR_CORTEX_A78AE), @@ -889,6 +897,7 @@ static u8 spectre_bhb_loop_affected(void) }; static const struct midr_range spectre_bhb_k24_list[] = { MIDR_ALL_VERSIONS(MIDR_CORTEX_A76), + MIDR_ALL_VERSIONS(MIDR_CORTEX_A76AE), MIDR_ALL_VERSIONS(MIDR_CORTEX_A77), MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1), MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_4XX_GOLD), @@ -904,7 +913,11 @@ static u8 spectre_bhb_loop_affected(void) {}, }; - if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k32_list)) + if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k132_list)) + k = 132; + else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k38_list)) + k = 38; + else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k32_list)) k = 32; else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k24_list)) k = 24; -- 2.47.1.613.gc27f4b7a9f-goog

5 months

2
1
0 0

[tip: locking/urgent] rcuref: Plug slowpath race in rcuref_put()

by tip-bot2 for Thomas Gleixner

The following commit has been merged into the locking/urgent branch of tip: Commit-ID: b9a49520679e98700d3d89689cc91c08a1c88c1d Gitweb: https://git.kernel.org/tip/b9a49520679e98700d3d89689cc91c08a1c88c1d Author: Thomas Gleixner <tglx(a)linutronix.de> AuthorDate: Sun, 19 Jan 2025 00:55:32 +01:00 Committer: Thomas Gleixner <tglx(a)linutronix.de> CommitterDate: Wed, 29 Jan 2025 15:21:31 +01:00 rcuref: Plug slowpath race in rcuref_put() Kernel test robot reported an "imbalanced put" in the rcuref_put() slow path, which turned out to be a false positive. Consider the following race: ref = 0 (via rcuref_init(ref, 1)) T1 T2 rcuref_put(ref) -> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff -> rcuref_put_slowpath(ref) rcuref_get(ref) -> atomic_add_negative_relaxed(1, &ref->refcnt) -> return true; # ref -> 0 rcuref_put(ref) -> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff -> rcuref_put_slowpath() -> cnt = atomic_read(&ref->refcnt); # cnt -> 0xffffffff / RCUREF_NOREF -> atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD)) # ref -> 0xe0000000 / RCUREF_DEAD -> return true -> cnt = atomic_read(&ref->refcnt); # cnt -> 0xe0000000 / RCUREF_DEAD -> if (cnt > RCUREF_RELEASED) # 0xe0000000 > 0xc0000000 -> WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()") The problem is the additional read in the slow path (after it decremented to RCUREF_NOREF) which can happen after the counter has been marked RCUREF_DEAD. Prevent this by reusing the return value of the decrement. Now every "final" put uses RCUREF_NOREF in the slow path and attempts the final cmpxchg() to RCUREF_DEAD. [ bigeasy: Add changelog ] Fixes: ee1ee6db07795 ("atomics: Provide rcuref - scalable reference counting") Reported-by: kernel test robot <oliver.sang(a)intel.com> Debugged-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Cc: stable(a)vger.kernel.org Closes: https://lore.kernel.org/oe-lkp/202412311453.9d7636a2-lkp@intel.com --- include/linux/rcuref.h | 9 ++++++--- lib/rcuref.c | 5 ++--- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/include/linux/rcuref.h b/include/linux/rcuref.h index 2c8bfd0..6322d8c 100644 --- a/include/linux/rcuref.h +++ b/include/linux/rcuref.h @@ -71,27 +71,30 @@ static inline __must_check bool rcuref_get(rcuref_t *ref) return rcuref_get_slowpath(ref); } -extern __must_check bool rcuref_put_slowpath(rcuref_t *ref); +extern __must_check bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt); /* * Internal helper. Do not invoke directly. */ static __always_inline __must_check bool __rcuref_put(rcuref_t *ref) { + int cnt; + RCU_LOCKDEP_WARN(!rcu_read_lock_held() && preemptible(), "suspicious rcuref_put_rcusafe() usage"); /* * Unconditionally decrease the reference count. The saturation and * dead zones provide enough tolerance for this. */ - if (likely(!atomic_add_negative_release(-1, &ref->refcnt))) + cnt = atomic_sub_return_release(1, &ref->refcnt); + if (likely(cnt >= 0)) return false; /* * Handle the last reference drop and cases inside the saturation * and dead zones. */ - return rcuref_put_slowpath(ref); + return rcuref_put_slowpath(ref, cnt); } /** diff --git a/lib/rcuref.c b/lib/rcuref.c index 97f300e..5bd726b 100644 --- a/lib/rcuref.c +++ b/lib/rcuref.c @@ -220,6 +220,7 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath); /** * rcuref_put_slowpath - Slowpath of __rcuref_put() * @ref: Pointer to the reference count + * @cnt: The resulting value of the fastpath decrement * * Invoked when the reference count is outside of the valid zone. * @@ -233,10 +234,8 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath); * with a concurrent get()/put() pair. Caller is not allowed to * deconstruct the protected object. */ -bool rcuref_put_slowpath(rcuref_t *ref) +bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt) { - unsigned int cnt = atomic_read(&ref->refcnt); - /* Did this drop the last reference? */ if (likely(cnt == RCUREF_NOREF)) { /*

5 months

1
0
0 0

Re: [PATCH] libfs: fix infinite directory reads for offset dir

by Andrea Ciprietti

Hi all, thanks for the replies! Understood the NACK, so let's drop this backport. Just not to let Greg's questoins go unanswered: this was meant for linux-6.6.y (sorry forgot to mention it) and the additional cast is required because the type of `ctx->next_offset` is a u32 in the 6.6 version, later changed to unsigned long which made it possible to cast it to (void *) directly. Best, Andrea

5 months

1
0
0 0

[PATCH AUTOSEL 5.4] vfio/pci: Enable iowrite64 and ioread64 for vfio pci

by Sasha Levin

From: Ramesh Thomas <ramesh.thomas(a)intel.com> [ Upstream commit 2b938e3db335e3670475e31a722c2bee34748c5a ] Definitions of ioread64 and iowrite64 macros in asm/io.h called by vfio pci implementations are enclosed inside check for CONFIG_GENERIC_IOMAP. They don't get defined if CONFIG_GENERIC_IOMAP is defined. Include linux/io-64-nonatomic-lo-hi.h to define iowrite64 and ioread64 macros when they are not defined. io-64-nonatomic-lo-hi.h maps the macros to generic implementation in lib/iomap.c. The generic implementation does 64 bit rw if readq/writeq is defined for the architecture, otherwise it would do 32 bit back to back rw. Note that there are two versions of the generic implementation that differs in the order the 32 bit words are written if 64 bit support is not present. This is not the little/big endian ordering, which is handled separately. This patch uses the lo followed by hi word ordering which is consistent with current back to back implementation in the vfio/pci code. Signed-off-by: Ramesh Thomas <ramesh.thomas(a)intel.com> Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com> Link: https://lore.kernel.org/r/20241210131938.303500-2-ramesh.thomas@intel.com Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/vfio/pci/vfio_pci_rdwr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 83f81d24df78e..94e3fb9f42243 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -16,6 +16,7 @@ #include <linux/io.h> #include <linux/vfio.h> #include <linux/vgaarb.h> +#include <linux/io-64-nonatomic-lo-hi.h> #include "vfio_pci_private.h" -- 2.39.5

5 months

1
0
0 0