From: Kuniyuki Iwashima <kuniyu(a)amazon.com>
commit e8c526f2bdf1845bedaf6a478816a3d06fa78b8f upstream.
Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
"""
We are seeing a use-after-free from a bpf prog attached to
trace_tcp_retransmit_synack. The program passes the req->sk to the
bpf_sk_storage_get_tracing kernel helper which does check for null
before using it.
"""
The commit 83fccfc3940c ("inet: fix potential deadlock in
reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not
to call del_timer_sync() from reqsk_timer_handler(), but it introduced a
small race window.
Before the timer is called, expire_timers() calls detach_timer(timer, true)
to clear timer->entry.pprev and marks it as not pending.
If reqsk_queue_unlink() checks timer_pending() just after expire_timers()
calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will
continue running and send multiple SYN+ACKs until it expires.
The reported UAF could happen if req->sk is close()d earlier than the timer
expiration, which is 63s by default.
The scenario would be
1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
but del_timer_sync() is missed
2. reqsk timer is executed and scheduled again
3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but
reqsk timer still has another one, and inet_csk_accept() does not
clear req->sk for non-TFO sockets
4. sk is close()d
5. reqsk timer is executed again, and BPF touches req->sk
Let's not use timer_pending() by passing the caller context to
__inet_csk_reqsk_queue_drop().
Note that reqsk timer is pinned, so the issue does not happen in most
use cases. [1]
[0]
BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0
Use-after-free read at 0x00000000a891fb3a (in kfence-#1):
bpf_sk_storage_get_tracing+0x2e/0x1b0
bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda
bpf_trace_run2+0x4c/0xc0
tcp_rtx_synack+0xf9/0x100
reqsk_timer_handler+0xda/0x3d0
run_timer_softirq+0x292/0x8a0
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
intel_idle_irq+0x5a/0xa0
cpuidle_enter_state+0x94/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6
allocated by task 0 on cpu 9 at 260507.901592s:
sk_prot_alloc+0x35/0x140
sk_clone_lock+0x1f/0x3f0
inet_csk_clone_lock+0x15/0x160
tcp_create_openreq_child+0x1f/0x410
tcp_v6_syn_recv_sock+0x1da/0x700
tcp_check_req+0x1fb/0x510
tcp_v6_rcv+0x98b/0x1420
ipv6_list_rcv+0x2258/0x26e0
napi_complete_done+0x5b1/0x2990
mlx5e_napi_poll+0x2ae/0x8d0
net_rx_action+0x13e/0x590
irq_exit_rcu+0xf5/0x320
common_interrupt+0x80/0x90
asm_common_interrupt+0x22/0x40
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
freed by task 0 on cpu 9 at 260507.927527s:
rcu_core_si+0x4ff/0xf10
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()")
Reported-by: Martin KaFai Lau <martin.lau(a)kernel.org>
Closes: https://lore.kernel.org/netdev/eb6684d0-ffd9-4bdc-9196-33f690c25824@linux.d…
Link: https://lore.kernel.org/netdev/b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.d… [1]
Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Reviewed-by: Martin KaFai Lau <martin.lau(a)kernel.org>
Link: https://patch.msgid.link/20241014223312.4254-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[d.privalov: adapt calling __inet_csk_reqsk_queue_drop()]
Signed-off-by: Dmitriy Privalov <d.privalov(a)omp.ru>
---
Backport fix for CVE-2024-50154
Link: https://nvd.nist.gov/vuln/detail/CVE-2024-50154
---
v3: Return using timer_delete_sync()
v2: Fix patch applying
net/ipv4/inet_connection_sock.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 1dfa561e8f981..d912894ad4adc 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -700,21 +700,31 @@ static bool reqsk_queue_unlink(struct request_sock *req)
found = __sk_nulls_del_node_init_rcu(req_to_sk(req));
spin_unlock(lock);
}
- if (timer_pending(&req->rsk_timer) && del_timer_sync(&req->rsk_timer))
- reqsk_put(req);
+
return found;
}
-bool inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req)
+static bool __inet_csk_reqsk_queue_drop(struct sock *sk,
+ struct request_sock *req,
+ bool from_timer)
{
bool unlinked = reqsk_queue_unlink(req);
+ if (!from_timer && timer_delete_sync(&req->rsk_timer))
+ reqsk_put(req);
+
if (unlinked) {
reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
reqsk_put(req);
}
+
return unlinked;
}
+
+bool inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req)
+{
+ return __inet_csk_reqsk_queue_drop(sk, req, false);
+}
EXPORT_SYMBOL(inet_csk_reqsk_queue_drop);
void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock *req)
@@ -781,7 +791,8 @@ static void reqsk_timer_handler(struct timer_list *t)
return;
}
drop:
- inet_csk_reqsk_queue_drop_and_put(sk_listener, req);
+ __inet_csk_reqsk_queue_drop(sk_listener, req, true);
+ reqsk_put(req);
}
static void reqsk_queue_hash_req(struct request_sock *req,
--
2.34.1
One of the possible ways to enable the input MTU auto-selection for L2CAP
connections is supposed to be through passing a special "0" value for it
as a socket option. Commit [1] added one of those into avdtp. However, it
simply wouldn't work because the kernel still treats the specified value
as invalid and denies the setting attempt. Recorded BlueZ logs include the
following:
bluetoothd[496]: profiles/audio/avdtp.c:l2cap_connect() setsockopt(L2CAP_OPTIONS): Invalid argument (22)
[1]: https://github.com/bluez/bluez/commit/ae5be371a9f53fed33d2b34748a95a5498fd4…
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4b6e228e297b ("Bluetooth: Auto tune if input MTU is set to 0")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
net/bluetooth/l2cap_sock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 49f97d4138ea..46ea0bee2259 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -710,12 +710,12 @@ static bool l2cap_valid_mtu(struct l2cap_chan *chan, u16 mtu)
{
switch (chan->scid) {
case L2CAP_CID_ATT:
- if (mtu < L2CAP_LE_MIN_MTU)
+ if (mtu && mtu < L2CAP_LE_MIN_MTU)
return false;
break;
default:
- if (mtu < L2CAP_DEFAULT_MIN_MTU)
+ if (mtu && mtu < L2CAP_DEFAULT_MIN_MTU)
return false;
}
--
2.39.5
The patch titled
Subject: mm/hugetlb: fix hugepage allocation for interleaved memory nodes
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Ritesh Harjani (IBM)" <ritesh.list(a)gmail.com>
Subject: mm/hugetlb: fix hugepage allocation for interleaved memory nodes
Date: Sat, 11 Jan 2025 16:36:55 +0530
gather_bootmem_prealloc() assumes the start nid as 0 and size as
num_node_state(N_MEMORY). That means in case if memory attached numa
nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail
to scan few of these nodes.
Since memory attached numa nodes can be interleaved in any fashion, hence
ensure that the current code checks for all numa node ids
(.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
it can distributes all nr_node_ids among the these many no. threads.
e.g. qemu cmdline
========================
numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
mem_cmd="-object memory-backend-ram,id=mem1,size=16G"
w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
==========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 0 kB
with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
===========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 2
HugePages_Free: 2
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 2097152 kB
Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.17365925…
Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Reported-by: Pavithra Prakash <pavrampu(a)linux.ibm.com>
Suggested-by: Muchun Song <muchun.song(a)linux.dev>
Tested-by: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Reviewed-by: Luiz Capitulino <luizcap(a)redhat.com>
Cc: Donet Tom <donettom(a)linux.ibm.com>
Cc: Gang Li <gang.li(a)linux.dev>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes
+++ a/mm/hugetlb.c
@@ -3309,7 +3309,7 @@ static void __init gather_bootmem_preall
.thread_fn = gather_bootmem_prealloc_parallel,
.fn_arg = NULL,
.start = 0,
- .size = num_node_state(N_MEMORY),
+ .size = nr_node_ids,
.align = 1,
.min_chunk = 1,
.max_threads = num_node_state(N_MEMORY),
_
Patches currently in -mm which might be from ritesh.list(a)gmail.com are
mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes.patch
From: K Prateek Nayak <kprateek.nayak(a)amd.com>
commit 6675ce20046d149e1e1ffe7e9577947dee17aad5 upstream.
do_softirq_post_smp_call_flush() on PREEMPT_RT kernels carries a
WARN_ON_ONCE() for any SOFTIRQ being raised from an SMP-call-function.
Since do_softirq_post_smp_call_flush() is called with preempt disabled,
raising a SOFTIRQ during flush_smp_call_function_queue() can lead to
longer preempt disabled sections.
Since commit b2a02fc43a1f ("smp: Optimize
send_call_function_single_ipi()") IPIs to an idle CPU in
TIF_POLLING_NRFLAG mode can be optimized out by instead setting
TIF_NEED_RESCHED bit in idle task's thread_info and relying on the
flush_smp_call_function_queue() in the idle-exit path to run the
SMP-call-function.
To trigger an idle load balancing, the scheduler queues
nohz_csd_function() responsible for triggering an idle load balancing on
a target nohz idle CPU and sends an IPI. Only now, this IPI is optimized
out and the SMP-call-function is executed from
flush_smp_call_function_queue() in do_idle() which can raise a
SCHED_SOFTIRQ to trigger the balancing.
So far, this went undetected since, the need_resched() check in
nohz_csd_function() would make it bail out of idle load balancing early
as the idle thread does not clear TIF_POLLING_NRFLAG before calling
flush_smp_call_function_queue(). The need_resched() check was added with
the intent to catch a new task wakeup, however, it has recently
discovered to be unnecessary and will be removed in the subsequent
commit after which nohz_csd_function() can raise a SCHED_SOFTIRQ from
flush_smp_call_function_queue() to trigger an idle load balance on an
idle target in TIF_POLLING_NRFLAG mode.
nohz_csd_function() bails out early if "idle_cpu()" check for the
target CPU, and does not lock the target CPU's rq until the very end,
once it has found tasks to run on the CPU and will not inhibit the
wakeup of, or running of a newly woken up higher priority task. Account
for this and prevent a WARN_ON_ONCE() when SCHED_SOFTIRQ is raised from
flush_smp_call_function_queue().
Signed-off-by: K Prateek Nayak <kprateek.nayak(a)amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20241119054432.6405-2-kprateek.nayak@amd.com
Tested-by: Felix Moessbauer <felix.moessbauer(a)siemens.com>
Signed-off-by: Florian Bezdeka <florian.bezdeka(a)siemens.com>
---
Newer stable branches (6.12, 6.6) got this already, 5.10 and lower are
not affected.
The warning triggered for SCHED_SOFTIRQ under high network load while
testing.
kernel/softirq.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index a47396161843..6665f5cd60cb 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -294,17 +294,24 @@ static inline void invoke_softirq(void)
wakeup_softirqd();
}
+#define SCHED_SOFTIRQ_MASK BIT(SCHED_SOFTIRQ)
+
/*
* flush_smp_call_function_queue() can raise a soft interrupt in a function
- * call. On RT kernels this is undesired and the only known functionality
- * in the block layer which does this is disabled on RT. If soft interrupts
- * get raised which haven't been raised before the flush, warn so it can be
+ * call. On RT kernels this is undesired and the only known functionalities
+ * are in the block layer which is disabled on RT, and in the scheduler for
+ * idle load balancing. If soft interrupts get raised which haven't been
+ * raised before the flush, warn if it is not a SCHED_SOFTIRQ so it can be
* investigated.
*/
void do_softirq_post_smp_call_flush(unsigned int was_pending)
{
- if (WARN_ON_ONCE(was_pending != local_softirq_pending()))
+ unsigned int is_pending = local_softirq_pending();
+
+ if (unlikely(was_pending != is_pending)) {
+ WARN_ON_ONCE(was_pending != (is_pending & ~SCHED_SOFTIRQ_MASK));
invoke_softirq();
+ }
}
#else /* CONFIG_PREEMPT_RT */
--
2.39.5
When comparing to the ARM list [1], it appears that several ARM cores
were missing from the lists in spectre_bhb_loop_affected(). Add them.
NOTE: for some of these cores it may not matter since other ways of
clearing the BHB may be used (like the CLRBHB instruction or ECBHB),
but it still seems good to have all the info from ARM's whitepaper
included.
[1] https://developer.arm.com/Arm%20Security%20Center/Spectre-BHB
Fixes: 558c303c9734 ("arm64: Mitigate spectre style branch history side channels")
Cc: stable(a)vger.kernel.org
Signed-off-by: Douglas Anderson <dianders(a)chromium.org>
---
(no changes since v3)
Changes in v3:
- New
arch/arm64/kernel/proton-pack.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c
index 89405be53d8f..0f51fd10b4b0 100644
--- a/arch/arm64/kernel/proton-pack.c
+++ b/arch/arm64/kernel/proton-pack.c
@@ -876,6 +876,14 @@ static u8 spectre_bhb_loop_affected(void)
{
u8 k = 0;
+ static const struct midr_range spectre_bhb_k132_list[] = {
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_X3),
+ MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V2),
+ };
+ static const struct midr_range spectre_bhb_k38_list[] = {
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A715),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A720),
+ };
static const struct midr_range spectre_bhb_k32_list[] = {
MIDR_ALL_VERSIONS(MIDR_CORTEX_A78),
MIDR_ALL_VERSIONS(MIDR_CORTEX_A78AE),
@@ -889,6 +897,7 @@ static u8 spectre_bhb_loop_affected(void)
};
static const struct midr_range spectre_bhb_k24_list[] = {
MIDR_ALL_VERSIONS(MIDR_CORTEX_A76),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A76AE),
MIDR_ALL_VERSIONS(MIDR_CORTEX_A77),
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_4XX_GOLD),
@@ -904,7 +913,11 @@ static u8 spectre_bhb_loop_affected(void)
{},
};
- if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k32_list))
+ if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k132_list))
+ k = 132;
+ else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k38_list))
+ k = 38;
+ else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k32_list))
k = 32;
else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k24_list))
k = 24;
--
2.47.1.613.gc27f4b7a9f-goog
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: b9a49520679e98700d3d89689cc91c08a1c88c1d
Gitweb: https://git.kernel.org/tip/b9a49520679e98700d3d89689cc91c08a1c88c1d
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Sun, 19 Jan 2025 00:55:32 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Wed, 29 Jan 2025 15:21:31 +01:00
rcuref: Plug slowpath race in rcuref_put()
Kernel test robot reported an "imbalanced put" in the rcuref_put() slow
path, which turned out to be a false positive. Consider the following race:
ref = 0 (via rcuref_init(ref, 1))
T1 T2
rcuref_put(ref)
-> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff
-> rcuref_put_slowpath(ref)
rcuref_get(ref)
-> atomic_add_negative_relaxed(1, &ref->refcnt)
-> return true; # ref -> 0
rcuref_put(ref)
-> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff
-> rcuref_put_slowpath()
-> cnt = atomic_read(&ref->refcnt); # cnt -> 0xffffffff / RCUREF_NOREF
-> atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD)) # ref -> 0xe0000000 / RCUREF_DEAD
-> return true
-> cnt = atomic_read(&ref->refcnt); # cnt -> 0xe0000000 / RCUREF_DEAD
-> if (cnt > RCUREF_RELEASED) # 0xe0000000 > 0xc0000000
-> WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")
The problem is the additional read in the slow path (after it
decremented to RCUREF_NOREF) which can happen after the counter has been
marked RCUREF_DEAD.
Prevent this by reusing the return value of the decrement. Now every "final"
put uses RCUREF_NOREF in the slow path and attempts the final cmpxchg() to
RCUREF_DEAD.
[ bigeasy: Add changelog ]
Fixes: ee1ee6db07795 ("atomics: Provide rcuref - scalable reference counting")
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Debugged-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Closes: https://lore.kernel.org/oe-lkp/202412311453.9d7636a2-lkp@intel.com
---
include/linux/rcuref.h | 9 ++++++---
lib/rcuref.c | 5 ++---
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/linux/rcuref.h b/include/linux/rcuref.h
index 2c8bfd0..6322d8c 100644
--- a/include/linux/rcuref.h
+++ b/include/linux/rcuref.h
@@ -71,27 +71,30 @@ static inline __must_check bool rcuref_get(rcuref_t *ref)
return rcuref_get_slowpath(ref);
}
-extern __must_check bool rcuref_put_slowpath(rcuref_t *ref);
+extern __must_check bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt);
/*
* Internal helper. Do not invoke directly.
*/
static __always_inline __must_check bool __rcuref_put(rcuref_t *ref)
{
+ int cnt;
+
RCU_LOCKDEP_WARN(!rcu_read_lock_held() && preemptible(),
"suspicious rcuref_put_rcusafe() usage");
/*
* Unconditionally decrease the reference count. The saturation and
* dead zones provide enough tolerance for this.
*/
- if (likely(!atomic_add_negative_release(-1, &ref->refcnt)))
+ cnt = atomic_sub_return_release(1, &ref->refcnt);
+ if (likely(cnt >= 0))
return false;
/*
* Handle the last reference drop and cases inside the saturation
* and dead zones.
*/
- return rcuref_put_slowpath(ref);
+ return rcuref_put_slowpath(ref, cnt);
}
/**
diff --git a/lib/rcuref.c b/lib/rcuref.c
index 97f300e..5bd726b 100644
--- a/lib/rcuref.c
+++ b/lib/rcuref.c
@@ -220,6 +220,7 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath);
/**
* rcuref_put_slowpath - Slowpath of __rcuref_put()
* @ref: Pointer to the reference count
+ * @cnt: The resulting value of the fastpath decrement
*
* Invoked when the reference count is outside of the valid zone.
*
@@ -233,10 +234,8 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath);
* with a concurrent get()/put() pair. Caller is not allowed to
* deconstruct the protected object.
*/
-bool rcuref_put_slowpath(rcuref_t *ref)
+bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt)
{
- unsigned int cnt = atomic_read(&ref->refcnt);
-
/* Did this drop the last reference? */
if (likely(cnt == RCUREF_NOREF)) {
/*
Hi all, thanks for the replies! Understood the NACK, so let's drop this
backport.
Just not to let Greg's questoins go unanswered: this was meant for
linux-6.6.y (sorry forgot to mention it) and the additional cast is
required because the type of `ctx->next_offset` is a u32 in the 6.6
version, later changed to unsigned long which made it possible to cast it
to (void *) directly.
Best,
Andrea
From: Ramesh Thomas <ramesh.thomas(a)intel.com>
[ Upstream commit 2b938e3db335e3670475e31a722c2bee34748c5a ]
Definitions of ioread64 and iowrite64 macros in asm/io.h called by vfio
pci implementations are enclosed inside check for CONFIG_GENERIC_IOMAP.
They don't get defined if CONFIG_GENERIC_IOMAP is defined. Include
linux/io-64-nonatomic-lo-hi.h to define iowrite64 and ioread64 macros
when they are not defined. io-64-nonatomic-lo-hi.h maps the macros to
generic implementation in lib/iomap.c. The generic implementation does
64 bit rw if readq/writeq is defined for the architecture, otherwise it
would do 32 bit back to back rw.
Note that there are two versions of the generic implementation that
differs in the order the 32 bit words are written if 64 bit support is
not present. This is not the little/big endian ordering, which is
handled separately. This patch uses the lo followed by hi word ordering
which is consistent with current back to back implementation in the
vfio/pci code.
Signed-off-by: Ramesh Thomas <ramesh.thomas(a)intel.com>
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Link: https://lore.kernel.org/r/20241210131938.303500-2-ramesh.thomas@intel.com
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/vfio/pci/vfio_pci_rdwr.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 83f81d24df78e..94e3fb9f42243 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -16,6 +16,7 @@
#include <linux/io.h>
#include <linux/vfio.h>
#include <linux/vgaarb.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
#include "vfio_pci_private.h"
--
2.39.5