July 2025 - Linux-stable-mirror

[PATCH 1/3] phy: tegra: xusb: fix device and OF node leak at probe

by Johan Hovold

Make sure to drop the references taken to the PMC OF node and device by of_parse_phandle() and of_find_device_by_node() during probe. Note the holding a reference to the PMC device does not prevent the PMC regmap from going away (e.g. if the PMC driver is unbound) so there is no need to keep the reference. Fixes: 2d1021487273 ("phy: tegra: xusb: Add wake/sleepwalk for Tegra210") Cc: stable(a)vger.kernel.org # 5.14 Cc: JC Kuo <jckuo(a)nvidia.com> Signed-off-by: Johan Hovold <johan(a)kernel.org> --- drivers/phy/tegra/xusb-tegra210.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/phy/tegra/xusb-tegra210.c b/drivers/phy/tegra/xusb-tegra210.c index ebc8a7e21a31..3409924498e9 100644 --- a/drivers/phy/tegra/xusb-tegra210.c +++ b/drivers/phy/tegra/xusb-tegra210.c @@ -3164,18 +3164,22 @@ tegra210_xusb_padctl_probe(struct device *dev, } pdev = of_find_device_by_node(np); + of_node_put(np); if (!pdev) { dev_warn(dev, "PMC device is not available\n"); goto out; } - if (!platform_get_drvdata(pdev)) + if (!platform_get_drvdata(pdev)) { + put_device(&pdev->dev); return ERR_PTR(-EPROBE_DEFER); + } padctl->regmap = dev_get_regmap(&pdev->dev, "usb_sleepwalk"); if (!padctl->regmap) dev_info(dev, "failed to find PMC regmap\n"); + put_device(&pdev->dev); out: return &padctl->base; } -- 2.49.1

2 months, 4 weeks

2
1
0 0

[PATCH] blk-wbt: Fix io starvation in wbt_rqw_done()

by Julian Sun

Recently, we encountered the following hungtask: INFO: task kworker/11:2:2981147 blocked for more than 6266 seconds "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/11:2 D 0 2981147 2 0x80004000 Workqueue: cgroup_destroy css_free_rwork_fn Call Trace: __schedule+0x934/0xe10 schedule+0x40/0xb0 wb_wait_for_completion+0x52/0x80 ? finish_wait+0x80/0x80 mem_cgroup_css_free+0x3a/0x1b0 css_free_rwork_fn+0x42/0x380 process_one_work+0x1a2/0x360 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x110/0x130 ? __kthread_cancel_work+0x40/0x40 ret_from_fork+0x1f/0x30 This is because the writeback thread has been continuously and repeatedly throttled by wbt, but at the same time, the writes of another thread proceed quite smoothly. After debugging, I believe it is caused by the following reasons. When thread A is blocked by wbt, the I/O issued by thread B will use a deeper queue depth(rwb->rq_depth.max_depth) because it meets the conditions of wb_recent_wait(), thus allowing thread B's I/O to be issued smoothly and resulting in the inflight I/O of wbt remaining relatively high. However, when I/O completes, due to the high inflight I/O of wbt, the condition "limit - inflight >= rwb->wb_background / 2" in wbt_rqw_done() cannot be satisfied, causing thread A's I/O to remain unable to be woken up. Some on-site information: >>> rwb.rq_depth.max_depth (unsigned int)48 >>> rqw.inflight.counter.value_() 44 >>> rqw.inflight.counter.value_() 35 >>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep (unsigned long)3 >>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep (unsigned long)2 >>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep (unsigned long)20 >>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep (unsigned long)12 cat wb_normal 24 cat wb_background 12 To fix this issue, we can use max_depth in wbt_rqw_done(), so that the handling of wb_recent_wait by wbt_rqw_done() and get_limit() will also be consistent, which is more reasonable. Signed-off-by: Julian Sun <sunjunchao(a)bytedance.com> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism") --- block/blk-wbt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/blk-wbt.c b/block/blk-wbt.c index a50d4cd55f41..d6a2782d442f 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -210,6 +210,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw, else if (blk_queue_write_cache(rwb->rqos.disk->queue) && !wb_recent_wait(rwb)) limit = 0; + else if (wb_recent_wait(rwb)) + limit = rwb->rq_depth.max_depth; else limit = rwb->wb_normal; -- 2.20.1

2 months, 4 weeks

4
4
0 0

[PATCH] drm/mediatek: Fix device/node reference count leaks in mtk_drm_get_all_drm_priv

by Ma Ke

Using device_find_child() and of_find_device_by_node() to locate devices could cause an imbalance in the device's reference count. device_find_child() and of_find_device_by_node() both call get_device() to increment the reference count of the found device before returning the pointer. In mtk_drm_get_all_drm_priv(), these references are never released through put_device(), resulting in permanent reference count increments. Additionally, the for_each_child_of_node() iterator fails to release node references in all code paths. This leaks device node references when loop termination occurs before reaching MAX_CRTC. These reference count leaks may prevent device/node resources from being properly released during driver unbind operations. As comment of device_find_child() says, 'NOTE: you will need to drop the reference with put_device() after use'. Found by code review. Cc: stable(a)vger.kernel.org Fixes: 1ef7ed48356c ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support") Signed-off-by: Ma Ke <make24(a)iscas.ac.cn> --- drivers/gpu/drm/mediatek/mtk_drm_drv.c | 27 +++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c index 7c0c12dde488..c78186debd3e 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -388,19 +388,24 @@ static bool mtk_drm_get_all_drm_priv(struct device *dev) of_id = of_match_node(mtk_drm_of_ids, node); if (!of_id) - continue; + goto next; pdev = of_find_device_by_node(node); if (!pdev) - continue; + goto next; drm_dev = device_find_child(&pdev->dev, NULL, mtk_drm_match); - if (!drm_dev) - continue; + if (!drm_dev) { + put_device(&pdev->dev); + goto next; + } temp_drm_priv = dev_get_drvdata(drm_dev); - if (!temp_drm_priv) - continue; + if (!temp_drm_priv) { + put_device(drm_dev); + put_device(&pdev->dev); + goto next; + } if (temp_drm_priv->data->main_len) all_drm_priv[CRTC_MAIN] = temp_drm_priv; @@ -412,10 +417,14 @@ static bool mtk_drm_get_all_drm_priv(struct device *dev) if (temp_drm_priv->mtk_drm_bound) cnt++; - if (cnt == MAX_CRTC) { - of_node_put(node); + put_device(drm_dev); + put_device(&pdev->dev); + +next: + of_node_put(node); + + if (cnt == MAX_CRTC) break; - } } if (drm_priv->data->mmsys_dev_num == cnt) { -- 2.25.1

2 months, 4 weeks

2
1
0 0

[PATCH v1 0/6] Backport "x86: fix off-by-one in access_ok()" to 6.6.y

by Jimmy Tran

This patch series backports a critical security fix, identified as CVE-2020-12965 ("Transient Execution of Non-Canonical Accesses"), to the 6.6.y stable kernel tree. commit 573f45a9f9a47fed4c7957609689b772121b33d7 upstream. David Laight (1): x86: fix off-by-one in access_ok() Linus Torvalds (5): vfs: dcache: move hashlen_hash() from callers into d_hash() runtime constants: add default dummy infrastructure runtime constants: add x86 architecture support arm64: add 'runtime constant' support x86: fix user address masking non-canonical speculation issue arch/arm64/include/asm/runtime-const.h | 92 ++++++++++++++++++++++++++ arch/arm64/kernel/vmlinux.lds.S | 3 + arch/x86/include/asm/runtime-const.h | 61 +++++++++++++++++ arch/x86/include/asm/uaccess_64.h | 45 ++++++++----- arch/x86/kernel/cpu/common.c | 10 +++ arch/x86/kernel/vmlinux.lds.S | 4 ++ arch/x86/lib/getuser.S | 9 ++- fs/dcache.c | 17 +++-- include/asm-generic/Kbuild | 1 + include/asm-generic/runtime-const.h | 15 +++++ include/asm-generic/vmlinux.lds.h | 8 +++ 11 files changed, 243 insertions(+), 22 deletions(-) create mode 100644 arch/arm64/include/asm/runtime-const.h create mode 100644 arch/x86/include/asm/runtime-const.h create mode 100644 include/asm-generic/runtime-const.h -- 2.50.0.727.gbf7dc18ff4-goog

2 months, 4 weeks

4
17
0 0

[PATCH v2] RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages

by Pedro Falcato

Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"), we have been doing this: static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, size_t size) [...] /* Calculate the number of bytes we need to push, for this page * specifically */ size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); /* If we can't splice it, then copy it in, as normal */ if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; /* Set the bvec pointing to the page, with len $bytes */ bvec_set_page(&bvec, page[i], bytes, offset); /* Set the iter to $size, aka the size of the whole sendpages (!!!) */ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); try_page_again: lock_sock(sk); /* Sendmsg with $size size (!!!) */ rv = tcp_sendmsg_locked(sk, &msg, size); This means we've been sending oversized iov_iters and tcp_sendmsg calls for a while. This has a been a benign bug because sendpage_ok() always returned true. With the recent slab allocator changes being slowly introduced into next (that disallow sendpage on large kmalloc allocations), we have recently hit out-of-bounds crashes, due to slight differences in iov_iter behavior between the MSG_SPLICE_PAGES and "regular" copy paths: (MSG_SPLICE_PAGES) skb_splice_from_iter iov_iter_extract_pages iov_iter_extract_bvec_pages uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere skb_splice_from_iter gets a "short" read (!MSG_SPLICE_PAGES) skb_copy_to_page_nocache copy=iov_iter_count [...] copy_from_iter /* this doesn't help */ if (unlikely(iter->count < len)) len = iter->count; iterate_bvec ... and we run off the bvecs Fix this by properly setting the iov_iter's byte count, plus sending the correct byte count to tcp_sendmsg_locked. Cc: stable(a)vger.kernel.org Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()") Reported-by: kernel test robot <oliver.sang(a)intel.com> Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com Reviewed-by: David Howells <dhowells(a)redhat.com> Signed-off-by: Pedro Falcato <pfalcato(a)suse.de> --- v2: - Add David Howells's Rb on the original patch - Remove the offset increment, since it's dead code drivers/infiniband/sw/siw/siw_qp_tx.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index 3a08f57d2211..f7dd32c6e5ba 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -340,18 +340,17 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; bvec_set_page(&bvec, page[i], bytes, offset); - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes); try_page_again: lock_sock(sk); - rv = tcp_sendmsg_locked(sk, &msg, size); + rv = tcp_sendmsg_locked(sk, &msg, bytes); release_sock(sk); if (rv > 0) { size -= rv; sent += rv; if (rv != bytes) { - offset += rv; bytes -= rv; goto try_page_again; } -- 2.50.1

3 months

4
5
0 0

[PATCH] drm/amdgpu: Raven: don't allow mixing GTT and VRAM

by Brian Geffon

Commit 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)") allowed for newer ASICs to mix GTT and VRAM, this change also noted that some older boards, such as Stoney and Carrizo do not support this. It appears that at least one additional ASIC does not support this which is Raven. We observed this issue when migrating a device from a 5.4 to 6.6 kernel and have confirmed that Raven also needs to be excluded from mixing GTT and VRAM. Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)") Cc: Luben Tuikov <luben.tuikov(a)amd.com> Cc: Christian König <christian.koenig(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org # 6.1+ Tested-by: Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com> Signed-off-by: Brian Geffon <bgeffon(a)google.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 73403744331a..5d7f13e25b7c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1545,7 +1545,8 @@ uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev, uint32_t domain) { if ((domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) && - ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == CHIP_STONEY))) { + ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == CHIP_STONEY) || + (adev->asic_type == CHIP_RAVEN))) { domain = AMDGPU_GEM_DOMAIN_VRAM; if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD) domain = AMDGPU_GEM_DOMAIN_GTT; -- 2.50.0.727.gbf7dc18ff4-goog

3 months

5
16
0 0

[PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference

by Nikolay Kuratov

When operating on struct vhost_net_ubuf_ref, the following execution sequence is theoretically possible: CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND // &ubufs->refcount == 2 vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs) vhost_net_ubuf_put_and_wait() vhost_net_ubuf_put() int r = atomic_sub_return(1, &ubufs->refcount); // r = 1 int r = atomic_sub_return(1, &ubufs->refcount); // r = 0 wait_event(ubufs->wait, !atomic_read(&ubufs->refcount)); // no wait occurs here because condition is already true kfree(ubufs); if (unlikely(!r)) wake_up(&ubufs->wait); // use-after-free This leads to use-after-free on ubufs access. This happens because CPU1 skips waiting for wake_up() when refcount is already zero. To prevent that use a completion instead of wait_queue as the ubufs notification mechanism. wait_for_completion() guarantees that there will be complete() call prior to its return. We also need to reinit completion in vhost_net_flush(), because refcnt == 0 does not mean freeing in that case. Cc: stable(a)vger.kernel.org Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock") Reported-by: Andrey Ryabinin <arbn(a)yandex-team.com> Suggested-by: Andrey Smetanin <asmetanin(a)yandex-team.ru> Suggested-by: Hillf Danton <hdanton(a)sina.com> Tested-by: Lei Yang <leiyang(a)redhat.com> (v1) Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru> --- v2: * move reinit_completion() into vhost_net_flush(), thanks to Hillf Danton * add Tested-by: Lei Yang * check that usages of put_and_wait() are consistent across LTS kernels drivers/vhost/net.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 7cbfc7d718b3..69e1bfb9627e 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref { * >1: outstanding ubufs */ atomic_t refcount; - wait_queue_head_t wait; + struct completion wait; struct vhost_virtqueue *vq; }; @@ -240,7 +240,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy) if (!ubufs) return ERR_PTR(-ENOMEM); atomic_set(&ubufs->refcount, 1); - init_waitqueue_head(&ubufs->wait); + init_completion(&ubufs->wait); ubufs->vq = vq; return ubufs; } @@ -249,14 +249,14 @@ static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs) { int r = atomic_sub_return(1, &ubufs->refcount); if (unlikely(!r)) - wake_up(&ubufs->wait); + complete_all(&ubufs->wait); return r; } static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs) { vhost_net_ubuf_put(ubufs); - wait_event(ubufs->wait, !atomic_read(&ubufs->refcount)); + wait_for_completion(&ubufs->wait); } static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs) @@ -1381,6 +1381,7 @@ static void vhost_net_flush(struct vhost_net *n) mutex_lock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex); n->tx_flush = false; atomic_set(&n->vqs[VHOST_NET_VQ_TX].ubufs->refcount, 1); + reinit_completion(&n->vqs[VHOST_NET_VQ_TX].ubufs->wait); mutex_unlock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex); } } -- 2.34.1

3 months

2
1
0 0

[PATCH v6] riscv: hwprobe: Fix stale vDSO data for late-initialized keys at boot

by Jingwei Wang

The hwprobe vDSO data for some keys, like MISALIGNED_VECTOR_PERF, is determined by an asynchronous kthread. This can create a race condition where the kthread finishes after the vDSO data has already been populated, causing userspace to read stale values. To fix this race, a new 'ready' flag is added to the vDSO data, initialized to 'false' during late_initcall. This flag is checked by both the vDSO's user-space code and the riscv_hwprobe syscall. The syscall serves as a one-time gate, using a completion to wait for any pending probes before populating the data and setting the flag to 'true', thus ensuring userspace reads fresh values on its first request. Reported-by: Tsukasa OI <research_trasio(a)irq.a4lg.com> Closes: https://lore.kernel.org/linux-riscv/760d637b-b13b-4518-b6bf-883d55d44e7f@ir… Fixes: e7c9d66e313b ("RISC-V: Report vector unaligned access speed hwprobe") Cc: Palmer Dabbelt <palmer(a)dabbelt.com> Cc: Alexandre Ghiti <alexghiti(a)rivosinc.com> Cc: Olof Johansson <olof(a)lixom.net> Cc: stable(a)vger.kernel.org Co-developed-by: Palmer Dabbelt <palmer(a)dabbelt.com> Signed-off-by: Jingwei Wang <wangjingwei(a)iscas.ac.cn> --- Changes in v6: - Based on Palmer's feedback, reworked the synchronization to be on-demand, deferring the wait until the first hwprobe syscall via a 'ready' flag. This avoids the boot-time regression from v5's approach. Changes in v5: - Reworked the synchronization logic to a robust "sentinel-count" pattern based on feedback from Alexandre. - Fixed a "multiple definition" linker error for nommu builds by changing the header-file stub functions to `static inline`, as pointed out by Olof. - Updated the commit message to better explain the rationale for moving the vDSO initialization to `late_initcall`. Changes in v4: - Reworked the synchronization mechanism based on feedback from Palmer and Alexandre. - Instead of a post-hoc refresh, this version introduces a robust completion-based framework using an atomic counter to ensure async probes are finished before populating the vDSO. - Moved the vdso data initialization to a late_initcall to avoid impacting boot time. Changes in v3: - Retained existing blank line. Changes in v2: - Addressed feedback from Yixun's regarding #ifdef CONFIG_MMU usage. - Updated commit message to provide a high-level summary. - Added Fixes tag for commit e7c9d66e313b. v1: https://lore.kernel.org/linux-riscv/20250521052754.185231-1-wangjingwei@isc… arch/riscv/include/asm/hwprobe.h | 8 ++- arch/riscv/include/asm/vdso/arch_data.h | 6 ++ arch/riscv/kernel/sys_hwprobe.c | 71 ++++++++++++++++++---- arch/riscv/kernel/unaligned_access_speed.c | 9 ++- arch/riscv/kernel/vdso/hwprobe.c | 2 +- 5 files changed, 79 insertions(+), 17 deletions(-) diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h index 7fe0a379474ae2c6..3b2888126e659ea1 100644 --- a/arch/riscv/include/asm/hwprobe.h +++ b/arch/riscv/include/asm/hwprobe.h @@ -40,5 +40,11 @@ static inline bool riscv_hwprobe_pair_cmp(struct riscv_hwprobe *pair, return pair->value == other_pair->value; } - +#ifdef CONFIG_MMU +void riscv_hwprobe_register_async_probe(void); +void riscv_hwprobe_complete_async_probe(void); +#else +static inline void riscv_hwprobe_register_async_probe(void) {} +static inline void riscv_hwprobe_complete_async_probe(void) {} +#endif #endif diff --git a/arch/riscv/include/asm/vdso/arch_data.h b/arch/riscv/include/asm/vdso/arch_data.h index da57a3786f7a53c8..88b37af55175129b 100644 --- a/arch/riscv/include/asm/vdso/arch_data.h +++ b/arch/riscv/include/asm/vdso/arch_data.h @@ -12,6 +12,12 @@ struct vdso_arch_data { /* Boolean indicating all CPUs have the same static hwprobe values. */ __u8 homogeneous_cpus; + + /* + * A gate to check and see if the hwprobe data is actually ready, as + * probing is deferred to avoid boot slowdowns. + */ + __u8 ready; }; #endif /* __RISCV_ASM_VDSO_ARCH_DATA_H */ diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c index 0b170e18a2beba57..fecb6790fa88e96c 100644 --- a/arch/riscv/kernel/sys_hwprobe.c +++ b/arch/riscv/kernel/sys_hwprobe.c @@ -5,6 +5,8 @@ * more details. */ #include <linux/syscalls.h> +#include <linux/completion.h> +#include <linux/atomic.h> #include <asm/cacheflush.h> #include <asm/cpufeature.h> #include <asm/hwprobe.h> @@ -452,28 +454,36 @@ static int hwprobe_get_cpus(struct riscv_hwprobe __user *pairs, return 0; } -static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, - size_t pair_count, size_t cpusetsize, - unsigned long __user *cpus_user, - unsigned int flags) -{ - if (flags & RISCV_HWPROBE_WHICH_CPUS) - return hwprobe_get_cpus(pairs, pair_count, cpusetsize, - cpus_user, flags); +#ifdef CONFIG_MMU - return hwprobe_get_values(pairs, pair_count, cpusetsize, - cpus_user, flags); +static DECLARE_COMPLETION(boot_probes_done); +static atomic_t pending_boot_probes = ATOMIC_INIT(1); + +void riscv_hwprobe_register_async_probe(void) +{ + atomic_inc(&pending_boot_probes); } -#ifdef CONFIG_MMU +void riscv_hwprobe_complete_async_probe(void) +{ + if (atomic_dec_and_test(&pending_boot_probes)) + complete(&boot_probes_done); +} -static int __init init_hwprobe_vdso_data(void) +static int complete_hwprobe_vdso_data(void) { struct vdso_arch_data *avd = vdso_k_arch_data; u64 id_bitsmash = 0; struct riscv_hwprobe pair; int key; + /* We've probably already produced these values. */ + if (likely(avd->ready)) + return 0; + + if (unlikely(!atomic_dec_and_test(&pending_boot_probes))) + wait_for_completion(&boot_probes_done); + /* * Initialize vDSO data with the answers for the "all CPUs" case, to * save a syscall in the common case. @@ -501,13 +511,48 @@ static int __init init_hwprobe_vdso_data(void) * vDSO should defer to the kernel for exotic cpu masks. */ avd->homogeneous_cpus = id_bitsmash != 0 && id_bitsmash != -1; + + /* + * Make sure all the VDSO values are visible before we look at them. + * This pairs with the implicit "no speculativly visible accesses" + * barrier in the VDSO hwprobe code. + */ + smp_wmb(); + avd->ready = true; + return 0; +} + +static int __init init_hwprobe_vdso_data(void) +{ + struct vdso_arch_data *avd = vdso_k_arch_data; + + /* + * Prevent the vDSO cached values from being used, as they're not ready + * yet. + */ + avd->ready = false; return 0; } -arch_initcall_sync(init_hwprobe_vdso_data); +late_initcall(init_hwprobe_vdso_data); #endif /* CONFIG_MMU */ +static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, + size_t pair_count, size_t cpusetsize, + unsigned long __user *cpus_user, + unsigned int flags) +{ + complete_hwprobe_vdso_data(); + + if (flags & RISCV_HWPROBE_WHICH_CPUS) + return hwprobe_get_cpus(pairs, pair_count, cpusetsize, + cpus_user, flags); + + return hwprobe_get_values(pairs, pair_count, cpusetsize, + cpus_user, flags); +} + SYSCALL_DEFINE5(riscv_hwprobe, struct riscv_hwprobe __user *, pairs, size_t, pair_count, size_t, cpusetsize, unsigned long __user *, cpus, unsigned int, flags) diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c index ae2068425fbcd207..4b8ad2673b0f7470 100644 --- a/arch/riscv/kernel/unaligned_access_speed.c +++ b/arch/riscv/kernel/unaligned_access_speed.c @@ -379,6 +379,7 @@ static void check_vector_unaligned_access(struct work_struct *work __always_unus static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __always_unused) { schedule_on_each_cpu(check_vector_unaligned_access); + riscv_hwprobe_complete_async_probe(); return 0; } @@ -473,8 +474,12 @@ static int __init check_unaligned_access_all_cpus(void) per_cpu(vector_misaligned_access, cpu) = unaligned_vector_speed_param; } else if (!check_vector_unaligned_access_emulated_all_cpus() && IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) { - kthread_run(vec_check_unaligned_access_speed_all_cpus, - NULL, "vec_check_unaligned_access_speed_all_cpus"); + riscv_hwprobe_register_async_probe(); + if (IS_ERR(kthread_run(vec_check_unaligned_access_speed_all_cpus, + NULL, "vec_check_unaligned_access_speed_all_cpus"))) { + pr_warn("Failed to create vec_unalign_check kthread\n"); + riscv_hwprobe_complete_async_probe(); + } } /* diff --git a/arch/riscv/kernel/vdso/hwprobe.c b/arch/riscv/kernel/vdso/hwprobe.c index 2ddeba6c68dda09b..bf77b4c1d2d8e803 100644 --- a/arch/riscv/kernel/vdso/hwprobe.c +++ b/arch/riscv/kernel/vdso/hwprobe.c @@ -27,7 +27,7 @@ static int riscv_vdso_get_values(struct riscv_hwprobe *pairs, size_t pair_count, * homogeneous, then this function can handle requests for arbitrary * masks. */ - if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus)) + if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus) || unlikely(!avd->ready)) return riscv_hwprobe(pairs, pair_count, cpusetsize, cpus, flags); /* This is something we can handle, fill out the pairs. */ -- 2.50.1

3 months

2
2
0 0

[PATCH] amdgpu/amdgpu_discovery: increase timeout limit for IFWI init

by Xaver Hugl

With a timeout of only 1 second, my rx 5700XT fails to initialize, so this increases the timeout to 2s. Closes https://gitlab.freedesktop.org/drm/amd/-/issues/3697 Signed-off-by: Xaver Hugl <xaver.hugl(a)kde.org> Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index 6d34eac0539d..ae6908b57d78 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -275,7 +275,7 @@ static int amdgpu_discovery_read_binary_from_mem(struct amdgpu_device *adev, int i, ret = 0; if (!amdgpu_sriov_vf(adev)) { - /* It can take up to a second for IFWI init to complete on some dGPUs, + /* It can take up to two seconds for IFWI init to complete on some dGPUs, * but generally it should be in the 60-100ms range. Normally this starts * as soon as the device gets power so by the time the OS loads this has long * completed. However, when a card is hotplugged via e.g., USB4, we need to @@ -283,7 +283,7 @@ static int amdgpu_discovery_read_binary_from_mem(struct amdgpu_device *adev, * continue. */ - for (i = 0; i < 1000; i++) { + for (i = 0; i < 2000; i++) { msg = RREG32(mmMP0_SMN_C2PMSG_33); if (msg & 0x80000000) break; -- 2.50.1

3 months

2
1
0 0

[PATCH] eventpoll: Fix semi-unbounded recursion

by Jann Horn

Ensure that epoll instances can never form a graph deeper than EP_MAX_NESTS+1 links. Currently, ep_loop_check_proc() ensures that the graph is loop-free and does some recursion depth checks, but those recursion depth checks don't limit the depth of the resulting tree for two reasons: - They don't look upwards in the tree. - If there are multiple downwards paths of different lengths, only one of the paths is actually considered for the depth check since commit 28d82dc1c4ed ("epoll: limit paths"). Essentially, the current recursion depth check in ep_loop_check_proc() just serves to prevent it from recursing too deeply while checking for loops. A more thorough check is done in reverse_path_check() after the new graph edge has already been created; this checks, among other things, that no paths going upwards from any non-epoll file with a length of more than 5 edges exist. However, this check does not apply to non-epoll files. As a result, it is possible to recurse to a depth of at least roughly 500, tested on v6.15. (I am unsure if deeper recursion is possible; and this may have changed with commit 8c44dac8add7 ("eventpoll: Fix priority inversion problem").) To fix it: 1. In ep_loop_check_proc(), note the subtree depth of each visited node, and use subtree depths for the total depth calculation even when a subtree has already been visited. 2. Add ep_get_upwards_depth_proc() for similarly determining the maximum depth of an upwards walk. 3. In ep_loop_check(), use these values to limit the total path length between epoll nodes to EP_MAX_NESTS edges. Fixes: 22bacca48a17 ("epoll: prevent creating circular epoll structures") Cc: stable(a)vger.kernel.org Signed-off-by: Jann Horn <jannh(a)google.com> --- fs/eventpoll.c | 60 ++++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 46 insertions(+), 14 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index d4dbffdedd08..44648cc09250 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -218,6 +218,7 @@ struct eventpoll { /* used to optimize loop detection check */ u64 gen; struct hlist_head refs; + u8 loop_check_depth; /* * usage count, used together with epitem->dying to @@ -2142,23 +2143,24 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, } /** - * ep_loop_check_proc - verify that adding an epoll file inside another - * epoll structure does not violate the constraints, in - * terms of closed loops, or too deep chains (which can - * result in excessive stack usage). + * ep_loop_check_proc - verify that adding an epoll file @ep inside another + * epoll file does not create closed loops, and + * determine the depth of the subtree starting at @ep * * @ep: the &struct eventpoll to be currently checked. * @depth: Current depth of the path being checked. * - * Return: %zero if adding the epoll @file inside current epoll - * structure @ep does not violate the constraints, or %-1 otherwise. + * Return: depth of the subtree, or INT_MAX if we found a loop or went too deep. */ static int ep_loop_check_proc(struct eventpoll *ep, int depth) { - int error = 0; + int result = 0; struct rb_node *rbp; struct epitem *epi; + if (ep->gen == loop_check_gen) + return ep->loop_check_depth; + mutex_lock_nested(&ep->mtx, depth + 1); ep->gen = loop_check_gen; for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = rb_next(rbp)) { @@ -2166,13 +2168,11 @@ static int ep_loop_check_proc(struct eventpoll *ep, int depth) if (unlikely(is_file_epoll(epi->ffd.file))) { struct eventpoll *ep_tovisit; ep_tovisit = epi->ffd.file->private_data; - if (ep_tovisit->gen == loop_check_gen) - continue; if (ep_tovisit == inserting_into || depth > EP_MAX_NESTS) - error = -1; + result = INT_MAX; else - error = ep_loop_check_proc(ep_tovisit, depth + 1); - if (error != 0) + result = max(result, ep_loop_check_proc(ep_tovisit, depth + 1) + 1); + if (result > EP_MAX_NESTS) break; } else { /* @@ -2186,9 +2186,27 @@ static int ep_loop_check_proc(struct eventpoll *ep, int depth) list_file(epi->ffd.file); } } + ep->loop_check_depth = result; mutex_unlock(&ep->mtx); - return error; + return result; +} + +/** + * ep_get_upwards_depth_proc - determine depth of @ep when traversed upwards + */ +static int ep_get_upwards_depth_proc(struct eventpoll *ep, int depth) +{ + int result = 0; + struct epitem *epi; + + if (ep->gen == loop_check_gen) + return ep->loop_check_depth; + hlist_for_each_entry_rcu(epi, &ep->refs, fllink) + result = max(result, ep_get_upwards_depth_proc(epi->ep, depth + 1) + 1); + ep->gen = loop_check_gen; + ep->loop_check_depth = result; + return result; } /** @@ -2204,8 +2222,22 @@ static int ep_loop_check_proc(struct eventpoll *ep, int depth) */ static int ep_loop_check(struct eventpoll *ep, struct eventpoll *to) { + int depth, upwards_depth; + inserting_into = ep; - return ep_loop_check_proc(to, 0); + /* + * Check how deep down we can get from @to, and whether it is possible + * to loop up to @ep. + */ + depth = ep_loop_check_proc(to, 0); + if (depth > EP_MAX_NESTS) + return -1; + /* Check how far up we can go from @ep. */ + rcu_read_lock(); + upwards_depth = ep_get_upwards_depth_proc(ep, 0); + rcu_read_unlock(); + + return (depth+1+upwards_depth > EP_MAX_NESTS) ? -1 : 0; } static void clear_tfile_check_list(void) --- base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca change-id: 20250711-epoll-recursion-fix-fb0e336b2aeb -- Jann Horn <jannh(a)google.com>

3 months

2
3
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror July 2025