October 2025 - Linux-stable-mirror

[for-linus][PATCH 2/2] tracing: Stop fortify-string from warning in tracing_mark_raw_write()

by Steven Rostedt

From: Steven Rostedt <rostedt(a)goodmis.org> The way tracing_mark_raw_write() records its data is that it has the following structure: struct { struct trace_entry; int id; char dynamic_array[]; }; But memcpy(&entry->id, buf, size) triggers the following warning when the size is greater than the id: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 6) of single field "&entry->id" at kernel/trace/trace.c:7458 (size 4) WARNING: CPU: 7 PID: 995 at kernel/trace/trace.c:7458 write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0 Modules linked in: CPU: 7 UID: 0 PID: 995 Comm: bash Not tainted 6.17.0-test-00007-g60b82183e78a-dirty #211 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0 Code: 04 00 75 a7 b9 04 00 00 00 48 89 de 48 89 04 24 48 c7 c2 e0 b1 d1 b2 48 c7 c7 40 b2 d1 b2 c6 05 2d 88 6a 04 01 e8 f7 e8 bd ff <0f> 0b 48 8b 04 24 e9 76 ff ff ff 49 8d 7c 24 04 49 8d 5c 24 08 48 RSP: 0018:ffff888104c3fc78 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 1ffffffff6b363b4 RDI: 0000000000000001 RBP: ffff888100058a00 R08: ffffffffb041d459 R09: ffffed1020987f40 R10: 0000000000000007 R11: 0000000000000001 R12: ffff888100bb9010 R13: 0000000000000000 R14: 00000000000003e3 R15: ffff888134800000 FS: 00007fa61d286740(0000) GS:ffff888286cad000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560d28d509f1 CR3: 00000001047a4006 CR4: 0000000000172ef0 Call Trace: <TASK> tracing_mark_raw_write+0x1fe/0x290 ? __pfx_tracing_mark_raw_write+0x10/0x10 ? security_file_permission+0x50/0xf0 ? rw_verify_area+0x6f/0x4b0 vfs_write+0x1d8/0xdd0 ? __pfx_vfs_write+0x10/0x10 ? __pfx_css_rstat_updated+0x10/0x10 ? count_memcg_events+0xd9/0x410 ? fdget_pos+0x53/0x5e0 ksys_write+0x182/0x200 ? __pfx_ksys_write+0x10/0x10 ? do_user_addr_fault+0x4af/0xa30 do_syscall_64+0x63/0x350 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fa61d318687 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff RSP: 002b:00007ffd87fe0120 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fa61d286740 RCX: 00007fa61d318687 RDX: 0000000000000006 RSI: 0000560d28d509f0 RDI: 0000000000000001 RBP: 0000560d28d509f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006 R13: 00007fa61d4715c0 R14: 00007fa61d46ee80 R15: 0000000000000000 </TASK> ---[ end trace 0000000000000000 ]--- This is because fortify string sees that the size of entry->id is only 4 bytes, but it is writing more than that. But this is OK as the dynamic_array is allocated to handle that copy. Use a void pointer and get the offset via offsetof() to keep fortify string from warning about this copy. Cc: stable(a)vger.kernel.org Fixes: 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space") Reported-by: syzbot+9a2ede1643175f350105(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/trace.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index bbb89206a891..27855fc9e0f2 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7440,6 +7440,7 @@ static ssize_t write_raw_marker_to_buffer(struct trace_array *tr, struct raw_data_entry *entry; ssize_t written; size_t size; + void *ptr; size = sizeof(*entry) + cnt; @@ -7455,7 +7456,10 @@ static ssize_t write_raw_marker_to_buffer(struct trace_array *tr, return -EBADF; entry = ring_buffer_event_data(event); - memcpy(&entry->id, buf, cnt); + /* Do not let fortify-string warn copying to &entry->id */ + ptr = (void *)entry; + ptr += offsetof(typeof(*entry), id); + memcpy(ptr, buf, cnt); written = cnt; __buffer_unlock_commit(buffer, event); -- 2.51.0

2 weeks, 5 days

1
0
0 0

[for-linus][PATCH 1/2] tracing: Fix tracing_mark_raw_write() to use buf and not ubuf

by Steven Rostedt

From: Steven Rostedt <rostedt(a)goodmis.org> The fix to use a per CPU buffer to read user space tested only the writes to trace_marker. But it appears that the selftests are missing tests to the trace_maker_raw file. The trace_maker_raw file is used by applications that writes data structures and not strings into the file, and the tools read the raw ring buffer to process the structures it writes. The fix that reads the per CPU buffers passes the new per CPU buffer to the trace_marker file writes, but the update to the trace_marker_raw write read the data from user space into the per CPU buffer, but then still used then passed the user space address to the function that records the data. Pass in the per CPU buffer and not the user space address. TODO: Add a test to better test trace_marker_raw. Cc: stable(a)vger.kernel.org Fixes: 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space") Reported-by: syzbot+9a2ede1643175f350105(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/trace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 0fd582651293..bbb89206a891 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7497,12 +7497,12 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf, if (tr == &global_trace) { guard(rcu)(); list_for_each_entry_rcu(tr, &marker_copies, marker_list) { - written = write_raw_marker_to_buffer(tr, ubuf, cnt); + written = write_raw_marker_to_buffer(tr, buf, cnt); if (written < 0) break; } } else { - written = write_raw_marker_to_buffer(tr, ubuf, cnt); + written = write_raw_marker_to_buffer(tr, buf, cnt); } return written; -- 2.51.0

2 weeks, 5 days

1
0
0 0

[LINUX PATCH] PCI: xilinx-nwl: Fix ECAM programming

by Sai Krishna Musham

From: Jani Nurminen <jani.nurminen(a)windriver.com> When PCIe has been set up by the bootloader, the ecam_size field in the E_ECAM_CONTROL register already contains a value. The driver previously programmed it to 0xc (for 16 busses; 16 MB), but bumped to 0x10 (for 256 busses; 256 MB) by the commit 2fccd11518f1 ("PCI: xilinx-nwl: Modify ECAM size to enable support for 256 buses"). Regardless of what the bootloader has programmed, the driver ORs in a new maximal value without doing a proper RMW sequence. This can lead to problems. For example, if the bootloader programs in 0xc and the driver uses 0x10, the ORed result is 0x1c, which is beyond the ecam_max_size limit of 0x10 (from E_ECAM_CAPABILITIES). Avoid the problems by doing a proper RMW. Fixes: 2fccd11518f1 ("PCI: xilinx-nwl: Modify ECAM size to enable support for 256 buses") Signed-off-by: Jani Nurminen <jani.nurminen(a)windriver.com> [mani: added stable tag] Signed-off-by: Manivannan Sadhasivam <mani(a)kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Cc: stable(a)vger.kernel.org Link: https://patch.msgid.link/e83a2af2-af0b-4670-bcf5-ad408571c2b0@windriver.com --- CR: CR-1250694 Branch: master-next-test --- drivers/pci/controller/pcie-xilinx-nwl.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/pci/controller/pcie-xilinx-nwl.c b/drivers/pci/controller/pcie-xilinx-nwl.c index a91eed8812c8..63494b67e42b 100644 --- a/drivers/pci/controller/pcie-xilinx-nwl.c +++ b/drivers/pci/controller/pcie-xilinx-nwl.c @@ -665,9 +665,10 @@ static int nwl_pcie_bridge_init(struct nwl_pcie *pcie) nwl_bridge_writel(pcie, nwl_bridge_readl(pcie, E_ECAM_CONTROL) | E_ECAM_CR_ENABLE, E_ECAM_CONTROL); - nwl_bridge_writel(pcie, nwl_bridge_readl(pcie, E_ECAM_CONTROL) | - (NWL_ECAM_MAX_SIZE << E_ECAM_SIZE_SHIFT), - E_ECAM_CONTROL); + ecam_val = nwl_bridge_readl(pcie, E_ECAM_CONTROL); + ecam_val &= ~E_ECAM_SIZE_LOC; + ecam_val |= NWL_ECAM_MAX_SIZE << E_ECAM_SIZE_SHIFT; + nwl_bridge_writel(pcie, ecam_val, E_ECAM_CONTROL); nwl_bridge_writel(pcie, lower_32_bits(pcie->phys_ecam_base), E_ECAM_BASE_LO); -- 2.44.1

2 weeks, 5 days

2
1
0 0

Re: regression in hostfs (ARCH=um)

by Hongbo Li

Hi Geoffrey, On 2025/10/9 7:22, Geoffrey Thorpe wrote: > Any trivial usage of hostfs seems to be broken since commit cd140ce9 > ("hostfs: convert hostfs to use the new mount API") - I bisected it down > to this commit to make sure. > Sorry to trouble you, can you provide your information about mount version and kernel version (use mount -v and uname -ar) ? Thanks, Hongbo > Steps to reproduce; > > The following assumes that the ARCH=um kernel has already been compiled > (and the 'vmlinux' executable is in the local directory, as is the case > when building from the top directory of a source tree). I built mine > from a fresh clone using 'defconfig'. The uml_run.sh script creates a > bootable root FS image (from debian, via docker) and then boots it with > a hostfs mount to demonstrate the regression. This should be observable > with any other bootable image though, simply pass "hostfs=<hostpath>" to > the ./vmlinux kernel and then try to mount it from within the booted VM > ("mount -t hostfs none <guestpath>"). > > The following 3 text files are used, and as they're small enough for > copy-n-paste I figured (hoped) it was best to inline them rather than > post attachments. > > uml_run.sh: > #!/bin/bash > set -ex > cat Dockerfile | docker build -t foobar:foobar - > docker export -o foobar.tar \ > `docker run -d foobar:foobar /bin/true` > dd if=/dev/zero of=rootfs.img \ > bs=$(expr 2048 \* 1024 \* 1024 / 512) count=512 > mkfs.ext4 rootfs.img > sudo ./uml_root.sh > cp rootfs.img temp.img > dd if=/dev/zero of=swapfile bs=1M count=1024 > chmod 600 swapfile > mkswap swapfile > ./vmlinux mem=4G ubd0=temp.img rw ubd1=swapfile \ > hostfs=$(pwd) > > uml_root.sh: > #!/bin/bash > set -ex > losetup -D > LOOPDEVICE=$(losetup -f) > losetup ${LOOPDEVICE} rootfs.img > mkdir -p tmpmnt > mount -t auto ${LOOPDEVICE} tmpmnt/ > (cd tmpmnt && tar xf ../foobar.tar) > umount tmpmnt > losetup -D > > Dockerfile: > FROM debian:trixie > RUN echo 'debconf debconf/frontend select Noninteractive' | \ > debconf-set-selections > RUN apt-get update > RUN apt-get install -y apt-utils > RUN apt-get -y full-upgrade > RUN echo "US/Eastern" > /etc/timezone > RUN chmod 644 /etc/timezone > RUN cd /etc && rm -f localtime && \ > ln -s /usr/share/zoneinfo/$$MYTZ localtime > RUN apt-get install -y systemd-sysv kmod > RUN echo "root:root" | chpasswd > RUN echo "/dev/ubdb swap swap defaults 0 0" >> /etc/fstab > RUN mkdir /hosthack > RUN echo "none /hosthack hostfs defaults 0 0" >> /etc/fstab > RUN systemctl set-default multi-user.target > > Execute ./uml_run.sh to build the rootfs image and boot the VM. This > requires a system with docker, and will also require a sudo password > when creating the rootfs. The boot log indicates whether the hostfs > mount succeeds or not - the boot should degrade to emergency mode if the > mount fails, otherwise a login prompt should indicate success. (Login is > root:root, e.g. if you prefer to go in and shutdown the VM gracefully.) > > Please let me know if I can/should provide anything else. > > Cheers, > Geoff >

2 weeks, 5 days

2
3
0 0

[PATCH net v2] r8152: add error handling in rtl8152_driver_init

by yicongsrfy＠163.com

From: Yi Cong <yicong(a)kylinos.cn> rtl8152_driver_init missing error handling. If cannot register rtl8152_driver, rtl8152_cfgselector_driver should be deregistered. Fixes: ec51fbd1b8a2 ("r8152: add USB device driver for config selection") Cc: stable(a)vger.kernel.org Signed-off-by: Yi Cong <yicong(a)kylinos.cn> Reviewed-by: Simon Horman <horms(a)kernel.org> v2: replacing return 0 with return ret and adding Cc stable --- drivers/net/usb/r8152.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 44cba7acfe7d..8a0c824e9eb4 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -10122,7 +10122,14 @@ static int __init rtl8152_driver_init(void) ret = usb_register_device_driver(&rtl8152_cfgselector_driver, THIS_MODULE); if (ret) return ret; - return usb_register(&rtl8152_driver); + + ret = usb_register(&rtl8152_driver); + if (ret) { + usb_deregister_device_driver(&rtl8152_cfgselector_driver); + return ret; + } + + return ret; } static void __exit rtl8152_driver_exit(void) -- 2.25.1

2 weeks, 5 days

2
2
0 0

[PATCH v2 1/1] fuse: fix readahead reclaim deadlock

by Joanne Koong

Commit e26ee4efbc79 ("fuse: allocate ff->release_args only if release is needed") skips allocating ff->release_args if the server does not implement open. However in doing so, fuse_prepare_release() now skips grabbing the reference on the inode, which makes it possible for an inode to be evicted from the dcache while there are inflight readahead requests. This causes a deadlock if the server triggers reclaim while servicing the readahead request and reclaim attempts to evict the inode of the file being read ahead. Since the folio is locked during readahead, when reclaim evicts the fuse inode and fuse_evict_inode() attempts to remove all folios associated with the inode from the page cache (truncate_inode_pages_range()), reclaim will block forever waiting for the lock since readahead cannot relinquish the lock because it is itself blocked in reclaim: >>> stack_trace(1504735) folio_wait_bit_common (mm/filemap.c:1308:4) folio_lock (./include/linux/pagemap.h:1052:3) truncate_inode_pages_range (mm/truncate.c:336:10) fuse_evict_inode (fs/fuse/inode.c:161:2) evict (fs/inode.c:704:3) dentry_unlink_inode (fs/dcache.c:412:3) __dentry_kill (fs/dcache.c:615:3) shrink_kill (fs/dcache.c:1060:12) shrink_dentry_list (fs/dcache.c:1087:3) prune_dcache_sb (fs/dcache.c:1168:2) super_cache_scan (fs/super.c:221:10) do_shrink_slab (mm/shrinker.c:435:9) shrink_slab (mm/shrinker.c:626:10) shrink_node (mm/vmscan.c:5951:2) shrink_zones (mm/vmscan.c:6195:3) do_try_to_free_pages (mm/vmscan.c:6257:3) do_swap_page (mm/memory.c:4136:11) handle_pte_fault (mm/memory.c:5562:10) handle_mm_fault (mm/memory.c:5870:9) do_user_addr_fault (arch/x86/mm/fault.c:1338:10) handle_page_fault (arch/x86/mm/fault.c:1481:3) exc_page_fault (arch/x86/mm/fault.c:1539:2) asm_exc_page_fault+0x22/0x27 Fix this deadlock by allocating ff->release_args and grabbing the reference on the inode when preparing the file for release even if the server does not implement open. The inode reference will be dropped when the last reference on the fuse file is dropped (see fuse_file_put() -> fuse_release_end()). Fixes: e26ee4efbc79 ("fuse: allocate ff->release_args only if release is needed") Cc: stable(a)vger.kernel.org Signed-off-by: Joanne Koong <joannelkoong(a)gmail.com> Reported-by: Omar Sandoval <osandov(a)fb.com> --- fs/fuse/file.c | 40 ++++++++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index f1ef77a0be05..654e21ee93fb 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -100,7 +100,7 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args, kfree(ra); } -static void fuse_file_put(struct fuse_file *ff, bool sync) +static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir) { if (refcount_dec_and_test(&ff->count)) { struct fuse_release_args *ra = &ff->args->release_args; @@ -110,7 +110,9 @@ static void fuse_file_put(struct fuse_file *ff, bool sync) fuse_file_io_release(ff, ra->inode); if (!args) { - /* Do nothing when server does not implement 'open' */ + /* Do nothing when server does not implement 'opendir' */ + } else if (!isdir && ff->fm->fc->no_open) { + fuse_release_end(ff->fm, args, 0); } else if (sync) { fuse_simple_request(ff->fm, args); fuse_release_end(ff->fm, args, 0); @@ -131,8 +133,17 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, struct fuse_file *ff; int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN; bool open = isdir ? !fc->no_opendir : !fc->no_open; + bool release = !isdir || open; - ff = fuse_file_alloc(fm, open); + /* + * ff->args->release_args still needs to be allocated (so we can hold an + * inode reference while there are pending inflight file operations when + * ->release() is called, see fuse_prepare_release()) even if + * fc->no_open is set else it becomes possible for reclaim to deadlock + * if while servicing the readahead request the server triggers reclaim + * and reclaim evicts the inode of the file being read ahead. + */ + ff = fuse_file_alloc(fm, release); if (!ff) return ERR_PTR(-ENOMEM); @@ -152,13 +163,14 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, fuse_file_free(ff); return ERR_PTR(err); } else { - /* No release needed */ - kfree(ff->args); - ff->args = NULL; - if (isdir) + if (isdir) { + /* No release needed */ + kfree(ff->args); + ff->args = NULL; fc->no_opendir = 1; - else + } else { fc->no_open = 1; + } } } @@ -363,7 +375,7 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff, * own ref to the file, the IO completion has to drop the ref, which is * how the fuse server can end up closing its clients' files. */ - fuse_file_put(ff, false); + fuse_file_put(ff, false, isdir); } void fuse_release_common(struct file *file, bool isdir) @@ -394,7 +406,7 @@ void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff, { WARN_ON(refcount_read(&ff->count) > 1); fuse_prepare_release(fi, ff, flags, FUSE_RELEASE, true); - fuse_file_put(ff, true); + fuse_file_put(ff, true, false); } EXPORT_SYMBOL_GPL(fuse_sync_release); @@ -891,7 +903,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, folio_put(ap->folios[i]); } if (ia->ff) - fuse_file_put(ia->ff, false); + fuse_file_put(ia->ff, false, false); fuse_io_free(ia); } @@ -1815,7 +1827,7 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) if (wpa->bucket) fuse_sync_bucket_dec(wpa->bucket); - fuse_file_put(wpa->ia.ff, false); + fuse_file_put(wpa->ia.ff, false, false); kfree(ap->folios); kfree(wpa); @@ -1968,7 +1980,7 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc) ff = __fuse_write_file_get(fi); err = fuse_flush_times(inode, ff); if (ff) - fuse_file_put(ff, false); + fuse_file_put(ff, false, false); return err; } @@ -2186,7 +2198,7 @@ static int fuse_iomap_writeback_submit(struct iomap_writepage_ctx *wpc, } if (data->ff) - fuse_file_put(data->ff, false); + fuse_file_put(data->ff, false, false); return error; } -- 2.47.3

2 weeks, 5 days

1
0
0 0

[PATCH] drm/amdgpu: use atomic functions with memory barriers for vm fault info

by Gui-Dong Han

The atomic variable vm_fault_info_updated is used to synchronize access to adev->gmc.vm_fault_info between the interrupt handler and get_vm_fault_info(). The default atomic functions like atomic_set() and atomic_read() do not provide memory barriers. This allows for CPU instruction reordering, meaning the memory accesses to vm_fault_info and the vm_fault_info_updated flag are not guaranteed to occur in the intended order. This creates a race condition that can lead to inconsistent or stale data being used. The previous implementation, which used an explicit mb(), was incomplete and inefficient. It failed to account for all potential CPU reorderings, such as the access of vm_fault_info being reordered before the atomic_read of the flag. This approach is also more verbose and less performant than using the proper atomic functions with acquire/release semantics. Fix this by switching to atomic_set_release() and atomic_read_acquire(). These functions provide the necessary acquire and release semantics, which act as memory barriers to ensure the correct order of operations. It is also more efficient and idiomatic than using explicit full memory barriers. Fixes: b97dfa27ef3a ("drm/amdgpu: save vm fault information for amdkfd") Cc: stable(a)vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02(a)gmail.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 ++--- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 7 +++---- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 7 +++---- 3 files changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index b16cce7c22c3..ac09bbe51634 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -2325,10 +2325,9 @@ void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_mem *mem) int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct amdgpu_device *adev, struct kfd_vm_fault_info *mem) { - if (atomic_read(&adev->gmc.vm_fault_info_updated) == 1) { + if (atomic_read_acquire(&adev->gmc.vm_fault_info_updated) == 1) { *mem = *adev->gmc.vm_fault_info; - mb(); /* make sure read happened */ - atomic_set(&adev->gmc.vm_fault_info_updated, 0); + atomic_set_release(&adev->gmc.vm_fault_info_updated, 0); } return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index a8d5795084fc..cf30d3332050 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -1066,7 +1066,7 @@ static int gmc_v7_0_sw_init(struct amdgpu_ip_block *ip_block) GFP_KERNEL); if (!adev->gmc.vm_fault_info) return -ENOMEM; - atomic_set(&adev->gmc.vm_fault_info_updated, 0); + atomic_set_release(&adev->gmc.vm_fault_info_updated, 0); return 0; } @@ -1288,7 +1288,7 @@ static int gmc_v7_0_process_interrupt(struct amdgpu_device *adev, vmid = REG_GET_FIELD(status, VM_CONTEXT1_PROTECTION_FAULT_STATUS, VMID); if (amdgpu_amdkfd_is_kfd_vmid(adev, vmid) - && !atomic_read(&adev->gmc.vm_fault_info_updated)) { + && !atomic_read_acquire(&adev->gmc.vm_fault_info_updated)) { struct kfd_vm_fault_info *info = adev->gmc.vm_fault_info; u32 protections = REG_GET_FIELD(status, VM_CONTEXT1_PROTECTION_FAULT_STATUS, @@ -1304,8 +1304,7 @@ static int gmc_v7_0_process_interrupt(struct amdgpu_device *adev, info->prot_read = protections & 0x8 ? true : false; info->prot_write = protections & 0x10 ? true : false; info->prot_exec = protections & 0x20 ? true : false; - mb(); - atomic_set(&adev->gmc.vm_fault_info_updated, 1); + atomic_set_release(&adev->gmc.vm_fault_info_updated, 1); } return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c index b45fa0cea9d2..0d4c93ff6f74 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c @@ -1179,7 +1179,7 @@ static int gmc_v8_0_sw_init(struct amdgpu_ip_block *ip_block) GFP_KERNEL); if (!adev->gmc.vm_fault_info) return -ENOMEM; - atomic_set(&adev->gmc.vm_fault_info_updated, 0); + atomic_set_release(&adev->gmc.vm_fault_info_updated, 0); return 0; } @@ -1474,7 +1474,7 @@ static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev, vmid = REG_GET_FIELD(status, VM_CONTEXT1_PROTECTION_FAULT_STATUS, VMID); if (amdgpu_amdkfd_is_kfd_vmid(adev, vmid) - && !atomic_read(&adev->gmc.vm_fault_info_updated)) { + && !atomic_read_acquire(&adev->gmc.vm_fault_info_updated)) { struct kfd_vm_fault_info *info = adev->gmc.vm_fault_info; u32 protections = REG_GET_FIELD(status, VM_CONTEXT1_PROTECTION_FAULT_STATUS, @@ -1490,8 +1490,7 @@ static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev, info->prot_read = protections & 0x8 ? true : false; info->prot_write = protections & 0x10 ? true : false; info->prot_exec = protections & 0x20 ? true : false; - mb(); - atomic_set(&adev->gmc.vm_fault_info_updated, 1); + atomic_set_release(&adev->gmc.vm_fault_info_updated, 1); } return 0; -- 2.25.1

2 weeks, 5 days

2
1
0 0

[PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation

by Alexander Lobakin

Turned out certain clearly invalid values passed in &xdp_desc from userspace can pass xp_{,un}aligned_validate_desc() and then lead to UBs or just invalid frames to be queued for xmit. desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len can cause positive integer overflow and wraparound, the same way low enough desc->addr with a non-zero pool->tx_metadata_len can cause negative integer overflow. Both scenarios can then pass the validation successfully. This doesn't happen with valid XSk applications, but can be used to perform attacks. Always promote desc->len to ``u64`` first to exclude positive overflows of it. Use explicit check_{add,sub}_overflow() when validating desc->addr (which is ``u64`` already). bloat-o-meter reports a little growth of the code size: add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44) Function old new delta xskq_cons_peek_desc 299 330 +31 xsk_tx_peek_release_desc_batch 973 1002 +29 xsk_generic_xmit 3148 3132 -16 but hopefully this doesn't hurt the performance much. Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len") Cc: stable(a)vger.kernel.org # 6.8+ Signed-off-by: Alexander Lobakin <aleksander.lobakin(a)intel.com> --- net/xdp/xsk_queue.h | 45 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 35 insertions(+), 10 deletions(-) diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index f16f390370dc..1eb8d9f8b104 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -143,14 +143,24 @@ static inline bool xp_unused_options_set(u32 options) static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { - u64 addr = desc->addr - pool->tx_metadata_len; - u64 len = desc->len + pool->tx_metadata_len; - u64 offset = addr & (pool->chunk_size - 1); + u64 len = desc->len; + u64 addr, offset; - if (!desc->len) + if (!len) return false; - if (offset + len > pool->chunk_size) + /* Can overflow if desc->addr < pool->tx_metadata_len */ + if (check_sub_overflow(desc->addr, pool->tx_metadata_len, &addr)) + return false; + + offset = addr & (pool->chunk_size - 1); + + /* + * Can't overflow: @offset is guaranteed to be < ``U32_MAX`` + * (pool->chunk_size is ``u32``), @len is guaranteed + * to be <= ``U32_MAX``. + */ + if (offset + len + pool->tx_metadata_len > pool->chunk_size) return false; if (addr >= pool->addrs_cnt) @@ -158,27 +168,42 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, if (xp_unused_options_set(desc->options)) return false; + return true; } static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { - u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len; - u64 len = desc->len + pool->tx_metadata_len; + u64 len = desc->len; + u64 addr, end; - if (!desc->len) + if (!len) return false; + /* Can't overflow: @len is guaranteed to be <= ``U32_MAX`` */ + len += pool->tx_metadata_len; if (len > pool->chunk_size) return false; - if (addr >= pool->addrs_cnt || addr + len > pool->addrs_cnt || - xp_desc_crosses_non_contig_pg(pool, addr, len)) + /* Can overflow if desc->addr is close to 0 */ + if (check_sub_overflow(xp_unaligned_add_offset_to_addr(desc->addr), + pool->tx_metadata_len, &addr)) + return false; + + if (addr >= pool->addrs_cnt) + return false; + + /* Can overflow if pool->addrs_cnt is high enough */ + if (check_add_overflow(addr, len, &end) || end > pool->addrs_cnt) + return false; + + if (xp_desc_crosses_non_contig_pg(pool, addr, len)) return false; if (xp_unused_options_set(desc->options)) return false; + return true; } -- 2.51.0

2 weeks, 5 days

4
6
0 0

[PATCH] f2fs: fix wrong block mapping for multi-devices

by Jaegeuk Kim

Assuming the disk layout as below, disk0: 0 --- 0x00035abfff disk1: 0x00035ac000 --- 0x00037abfff disk2: 0x00037ac000 --- 0x00037ebfff and we want to read data from offset=13568 having len=128 across the block devices, we can illustrate the block addresses like below. 0 .. 0x00037ac000 ------------------- 0x00037ebfff, 0x00037ec000 ------- | ^ ^ ^ | fofs 0 13568 13568+128 | ------------------------------------------------------ | LBA 0x37e8aa9 0x37ebfa9 0x37ec029 --- map 0x3caa9 0x3ffa9 In this example, we should give the relative map of the target block device ranging from 0x3caa9 to 0x3ffa9 where the length should be calculated by 0x37ebfff + 1 - 0x37ebfa9. In the below equation, however, map->m_pblk was supposed to be the original address instead of the one from the target block address. - map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk); Cc: stable(a)vger.kernel.org Fixes: 71f2c8206202 ("f2fs: multidevice: support direct IO") Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org> --- fs/f2fs/data.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index ef38e62cda8f..775aa4f63aa3 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1497,8 +1497,8 @@ static bool f2fs_map_blocks_cached(struct inode *inode, struct f2fs_dev_info *dev = &sbi->devs[bidx]; map->m_bdev = dev->bdev; - map->m_pblk -= dev->start_blk; map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk); + map->m_pblk -= dev->start_blk; } else { map->m_bdev = inode->i_sb->s_bdev; } -- 2.51.0.710.ga91ca5db03-goog

2 weeks, 6 days

3
2
0 0

[PATCH v6.1] smb: prevent use-after-free due to open_cached_dir error paths

by Shivani Agarwal

From: Paul Aurich <paul(a)darkrain42.org> commit a9685b409a03b73d2980bbfa53eb47555802d0a9 upstream. If open_cached_dir() encounters an error parsing the lease from the server, the error handling may race with receiving a lease break, resulting in open_cached_dir() freeing the cfid while the queued work is pending. Update open_cached_dir() to drop refs rather than directly freeing the cfid. Have cached_dir_lease_break(), cfids_laundromat_worker(), and invalidate_all_cached_dirs() clear has_lease immediately while still holding cfids->cfid_list_lock, and then use this to also simplify the reference counting in cfids_laundromat_worker() and invalidate_all_cached_dirs(). Fixes this KASAN splat (which manually injects an error and lease break in open_cached_dir()): ================================================================== BUG: KASAN: slab-use-after-free in smb2_cached_lease_break+0x27/0xb0 Read of size 8 at addr ffff88811cc24c10 by task kworker/3:1/65 CPU: 3 UID: 0 PID: 65 Comm: kworker/3:1 Not tainted 6.12.0-rc6-g255cf264e6e5-dirty #87 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 Workqueue: cifsiod smb2_cached_lease_break Call Trace: <TASK> dump_stack_lvl+0x77/0xb0 print_report+0xce/0x660 kasan_report+0xd3/0x110 smb2_cached_lease_break+0x27/0xb0 process_one_work+0x50a/0xc50 worker_thread+0x2ba/0x530 kthread+0x17c/0x1c0 ret_from_fork+0x34/0x60 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 2464: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0xaa/0xb0 open_cached_dir+0xa7d/0x1fb0 smb2_query_path_info+0x43c/0x6e0 cifs_get_fattr+0x346/0xf10 cifs_get_inode_info+0x157/0x210 cifs_revalidate_dentry_attr+0x2d1/0x460 cifs_getattr+0x173/0x470 vfs_statx_path+0x10f/0x160 vfs_statx+0xe9/0x150 vfs_fstatat+0x5e/0xc0 __do_sys_newfstatat+0x91/0xf0 do_syscall_64+0x95/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 2464: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x51/0x70 kfree+0x174/0x520 open_cached_dir+0x97f/0x1fb0 smb2_query_path_info+0x43c/0x6e0 cifs_get_fattr+0x346/0xf10 cifs_get_inode_info+0x157/0x210 cifs_revalidate_dentry_attr+0x2d1/0x460 cifs_getattr+0x173/0x470 vfs_statx_path+0x10f/0x160 vfs_statx+0xe9/0x150 vfs_fstatat+0x5e/0xc0 __do_sys_newfstatat+0x91/0xf0 do_syscall_64+0x95/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Last potentially related work creation: kasan_save_stack+0x33/0x60 __kasan_record_aux_stack+0xad/0xc0 insert_work+0x32/0x100 __queue_work+0x5c9/0x870 queue_work_on+0x82/0x90 open_cached_dir+0x1369/0x1fb0 smb2_query_path_info+0x43c/0x6e0 cifs_get_fattr+0x346/0xf10 cifs_get_inode_info+0x157/0x210 cifs_revalidate_dentry_attr+0x2d1/0x460 cifs_getattr+0x173/0x470 vfs_statx_path+0x10f/0x160 vfs_statx+0xe9/0x150 vfs_fstatat+0x5e/0xc0 __do_sys_newfstatat+0x91/0xf0 do_syscall_64+0x95/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The buggy address belongs to the object at ffff88811cc24c00 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 16 bytes inside of freed 1024-byte region [ffff88811cc24c00, ffff88811cc25000) Cc: stable(a)vger.kernel.org Signed-off-by: Paul Aurich <paul(a)darkrain42.org> Signed-off-by: Steve French <stfrench(a)microsoft.com> [ Do not apply the change for cfids_laundromat_worker() since there is no this function and related feature on 6.1.y. Update open_cached_dir() according to method of upstream patch. ] Signed-off-by: Cliff Liu <donghua.liu(a)windriver.com> Signed-off-by: He Zhe <Zhe.He(a)windriver.com> [Shivani: Modified to apply on 6.1.y] Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com> --- fs/smb/client/cached_dir.c | 39 ++++++++++++++++---------------------- 1 file changed, 16 insertions(+), 23 deletions(-) diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c index 3d028b6a2..23a57a0c8 100644 --- a/fs/smb/client/cached_dir.c +++ b/fs/smb/client/cached_dir.c @@ -320,17 +320,13 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon, /* * We are guaranteed to have two references at this point. * One for the caller and one for a potential lease. - * Release the Lease-ref so that the directory will be closed - * when the caller closes the cached handle. + * Release one here, and the second below. */ kref_put(&cfid->refcount, smb2_close_cached_fid); } if (rc) { - if (cfid->is_open) - SMB2_close(0, cfid->tcon, cfid->fid.persistent_fid, - cfid->fid.volatile_fid); - free_cached_dir(cfid); - cfid = NULL; + cfid->has_lease = false; + kref_put(&cfid->refcount, smb2_close_cached_fid); } if (rc == 0) { @@ -462,25 +458,24 @@ void invalidate_all_cached_dirs(struct cifs_tcon *tcon) cfids->num_entries--; cfid->is_open = false; cfid->on_list = false; - /* To prevent race with smb2_cached_lease_break() */ - kref_get(&cfid->refcount); + if (cfid->has_lease) { + /* + * The lease was never cancelled from the server, + * so steal that reference. + */ + cfid->has_lease = false; + } else + kref_get(&cfid->refcount); } spin_unlock(&cfids->cfid_list_lock); list_for_each_entry_safe(cfid, q, &entry, entry) { list_del(&cfid->entry); cancel_work_sync(&cfid->lease_break); - if (cfid->has_lease) { - /* - * We lease was never cancelled from the server so we - * need to drop the reference. - */ - spin_lock(&cfids->cfid_list_lock); - cfid->has_lease = false; - spin_unlock(&cfids->cfid_list_lock); - kref_put(&cfid->refcount, smb2_close_cached_fid); - } - /* Drop the extra reference opened above*/ + /* + * Drop the ref-count from above, either the lease-ref (if there + * was one) or the extra one acquired. + */ kref_put(&cfid->refcount, smb2_close_cached_fid); } } @@ -491,9 +486,6 @@ smb2_cached_lease_break(struct work_struct *work) struct cached_fid *cfid = container_of(work, struct cached_fid, lease_break); - spin_lock(&cfid->cfids->cfid_list_lock); - cfid->has_lease = false; - spin_unlock(&cfid->cfids->cfid_list_lock); kref_put(&cfid->refcount, smb2_close_cached_fid); } @@ -511,6 +503,7 @@ int cached_dir_lease_break(struct cifs_tcon *tcon, __u8 lease_key[16]) !memcmp(lease_key, cfid->fid.lease_key, SMB2_LEASE_KEY_SIZE)) { + cfid->has_lease = false; cfid->time = 0; /* * We found a lease remove it from the list -- 2.40.4

2 weeks, 6 days

3
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror October 2025