From: Steven Rostedt <rostedt(a)goodmis.org>
The fix to use a per CPU buffer to read user space tested only the writes
to trace_marker. But it appears that the selftests are missing tests to
the trace_maker_raw file. The trace_maker_raw file is used by applications
that writes data structures and not strings into the file, and the tools
read the raw ring buffer to process the structures it writes.
The fix that reads the per CPU buffers passes the new per CPU buffer to
the trace_marker file writes, but the update to the trace_marker_raw write
read the data from user space into the per CPU buffer, but then still used
then passed the user space address to the function that records the data.
Pass in the per CPU buffer and not the user space address.
TODO: Add a test to better test trace_marker_raw.
Cc: stable(a)vger.kernel.org
Fixes: 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space")
Reported-by: syzbot+9a2ede1643175f350105(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0fd582651293..bbb89206a891 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7497,12 +7497,12 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
if (tr == &global_trace) {
guard(rcu)();
list_for_each_entry_rcu(tr, &marker_copies, marker_list) {
- written = write_raw_marker_to_buffer(tr, ubuf, cnt);
+ written = write_raw_marker_to_buffer(tr, buf, cnt);
if (written < 0)
break;
}
} else {
- written = write_raw_marker_to_buffer(tr, ubuf, cnt);
+ written = write_raw_marker_to_buffer(tr, buf, cnt);
}
return written;
--
2.51.0
When fsl_edma_alloc_chan_resources() fails after clk_prepare_enable(),
the error paths only free IRQs and destroy the TCD pool, but forget to
call clk_disable_unprepare(). This causes the channel clock to remain
enabled, leaking power and resources.
Fix it by disabling the channel clock in the error unwind path.
Fixes: d8d4355861d8 ("dmaengine: fsl-edma: add i.MX8ULP edma support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhen Ni <zhen.ni(a)easystack.cn>
---
drivers/dma/fsl-edma-common.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c
index 4976d7dde080..bd673f08f610 100644
--- a/drivers/dma/fsl-edma-common.c
+++ b/drivers/dma/fsl-edma-common.c
@@ -852,6 +852,8 @@ int fsl_edma_alloc_chan_resources(struct dma_chan *chan)
free_irq(fsl_chan->txirq, fsl_chan);
err_txirq:
dma_pool_destroy(fsl_chan->tcd_pool);
+ if (fsl_edma_drvflags(fsl_chan) & FSL_EDMA_DRV_HAS_CHCLK)
+ clk_disable_unprepare(fsl_chan->clk);
return ret;
}
--
2.20.1
From: Jani Nurminen <jani.nurminen(a)windriver.com>
When PCIe has been set up by the bootloader, the ecam_size field in the
E_ECAM_CONTROL register already contains a value.
The driver previously programmed it to 0xc (for 16 busses; 16 MB), but
bumped to 0x10 (for 256 busses; 256 MB) by the commit 2fccd11518f1 ("PCI:
xilinx-nwl: Modify ECAM size to enable support for 256 buses").
Regardless of what the bootloader has programmed, the driver ORs in a
new maximal value without doing a proper RMW sequence. This can lead to
problems.
For example, if the bootloader programs in 0xc and the driver uses 0x10,
the ORed result is 0x1c, which is beyond the ecam_max_size limit of 0x10
(from E_ECAM_CAPABILITIES).
Avoid the problems by doing a proper RMW.
Fixes: 2fccd11518f1 ("PCI: xilinx-nwl: Modify ECAM size to enable support for 256 buses")
Signed-off-by: Jani Nurminen <jani.nurminen(a)windriver.com>
[mani: added stable tag]
Signed-off-by: Manivannan Sadhasivam <mani(a)kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/e83a2af2-af0b-4670-bcf5-ad408571c2b0@windriver.com
---
CR: CR-1250694
Branch: master-next-test
---
drivers/pci/controller/pcie-xilinx-nwl.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/controller/pcie-xilinx-nwl.c b/drivers/pci/controller/pcie-xilinx-nwl.c
index a91eed8812c8..63494b67e42b 100644
--- a/drivers/pci/controller/pcie-xilinx-nwl.c
+++ b/drivers/pci/controller/pcie-xilinx-nwl.c
@@ -665,9 +665,10 @@ static int nwl_pcie_bridge_init(struct nwl_pcie *pcie)
nwl_bridge_writel(pcie, nwl_bridge_readl(pcie, E_ECAM_CONTROL) |
E_ECAM_CR_ENABLE, E_ECAM_CONTROL);
- nwl_bridge_writel(pcie, nwl_bridge_readl(pcie, E_ECAM_CONTROL) |
- (NWL_ECAM_MAX_SIZE << E_ECAM_SIZE_SHIFT),
- E_ECAM_CONTROL);
+ ecam_val = nwl_bridge_readl(pcie, E_ECAM_CONTROL);
+ ecam_val &= ~E_ECAM_SIZE_LOC;
+ ecam_val |= NWL_ECAM_MAX_SIZE << E_ECAM_SIZE_SHIFT;
+ nwl_bridge_writel(pcie, ecam_val, E_ECAM_CONTROL);
nwl_bridge_writel(pcie, lower_32_bits(pcie->phys_ecam_base),
E_ECAM_BASE_LO);
--
2.44.1
Hi Geoffrey,
On 2025/10/9 7:22, Geoffrey Thorpe wrote:
> Any trivial usage of hostfs seems to be broken since commit cd140ce9
> ("hostfs: convert hostfs to use the new mount API") - I bisected it down
> to this commit to make sure.
>
Sorry to trouble you, can you provide your information about mount
version and kernel version (use mount -v and uname -ar) ?
Thanks,
Hongbo
> Steps to reproduce;
>
> The following assumes that the ARCH=um kernel has already been compiled
> (and the 'vmlinux' executable is in the local directory, as is the case
> when building from the top directory of a source tree). I built mine
> from a fresh clone using 'defconfig'. The uml_run.sh script creates a
> bootable root FS image (from debian, via docker) and then boots it with
> a hostfs mount to demonstrate the regression. This should be observable
> with any other bootable image though, simply pass "hostfs=<hostpath>" to
> the ./vmlinux kernel and then try to mount it from within the booted VM
> ("mount -t hostfs none <guestpath>").
>
> The following 3 text files are used, and as they're small enough for
> copy-n-paste I figured (hoped) it was best to inline them rather than
> post attachments.
>
> uml_run.sh:
> #!/bin/bash
> set -ex
> cat Dockerfile | docker build -t foobar:foobar -
> docker export -o foobar.tar \
> `docker run -d foobar:foobar /bin/true`
> dd if=/dev/zero of=rootfs.img \
> bs=$(expr 2048 \* 1024 \* 1024 / 512) count=512
> mkfs.ext4 rootfs.img
> sudo ./uml_root.sh
> cp rootfs.img temp.img
> dd if=/dev/zero of=swapfile bs=1M count=1024
> chmod 600 swapfile
> mkswap swapfile
> ./vmlinux mem=4G ubd0=temp.img rw ubd1=swapfile \
> hostfs=$(pwd)
>
> uml_root.sh:
> #!/bin/bash
> set -ex
> losetup -D
> LOOPDEVICE=$(losetup -f)
> losetup ${LOOPDEVICE} rootfs.img
> mkdir -p tmpmnt
> mount -t auto ${LOOPDEVICE} tmpmnt/
> (cd tmpmnt && tar xf ../foobar.tar)
> umount tmpmnt
> losetup -D
>
> Dockerfile:
> FROM debian:trixie
> RUN echo 'debconf debconf/frontend select Noninteractive' | \
> debconf-set-selections
> RUN apt-get update
> RUN apt-get install -y apt-utils
> RUN apt-get -y full-upgrade
> RUN echo "US/Eastern" > /etc/timezone
> RUN chmod 644 /etc/timezone
> RUN cd /etc && rm -f localtime && \
> ln -s /usr/share/zoneinfo/$$MYTZ localtime
> RUN apt-get install -y systemd-sysv kmod
> RUN echo "root:root" | chpasswd
> RUN echo "/dev/ubdb swap swap defaults 0 0" >> /etc/fstab
> RUN mkdir /hosthack
> RUN echo "none /hosthack hostfs defaults 0 0" >> /etc/fstab
> RUN systemctl set-default multi-user.target
>
> Execute ./uml_run.sh to build the rootfs image and boot the VM. This
> requires a system with docker, and will also require a sudo password
> when creating the rootfs. The boot log indicates whether the hostfs
> mount succeeds or not - the boot should degrade to emergency mode if the
> mount fails, otherwise a login prompt should indicate success. (Login is
> root:root, e.g. if you prefer to go in and shutdown the VM gracefully.)
>
> Please let me know if I can/should provide anything else.
>
> Cheers,
> Geoff
>
Commit e26ee4efbc79 ("fuse: allocate ff->release_args only if release is
needed") skips allocating ff->release_args if the server does not
implement open. However in doing so, fuse_prepare_release() now skips
grabbing the reference on the inode, which makes it possible for an
inode to be evicted from the dcache while there are inflight readahead
requests. This causes a deadlock if the server triggers reclaim while
servicing the readahead request and reclaim attempts to evict the inode
of the file being read ahead. Since the folio is locked during
readahead, when reclaim evicts the fuse inode and fuse_evict_inode()
attempts to remove all folios associated with the inode from the page
cache (truncate_inode_pages_range()), reclaim will block forever waiting
for the lock since readahead cannot relinquish the lock because it is
itself blocked in reclaim:
>>> stack_trace(1504735)
folio_wait_bit_common (mm/filemap.c:1308:4)
folio_lock (./include/linux/pagemap.h:1052:3)
truncate_inode_pages_range (mm/truncate.c:336:10)
fuse_evict_inode (fs/fuse/inode.c:161:2)
evict (fs/inode.c:704:3)
dentry_unlink_inode (fs/dcache.c:412:3)
__dentry_kill (fs/dcache.c:615:3)
shrink_kill (fs/dcache.c:1060:12)
shrink_dentry_list (fs/dcache.c:1087:3)
prune_dcache_sb (fs/dcache.c:1168:2)
super_cache_scan (fs/super.c:221:10)
do_shrink_slab (mm/shrinker.c:435:9)
shrink_slab (mm/shrinker.c:626:10)
shrink_node (mm/vmscan.c:5951:2)
shrink_zones (mm/vmscan.c:6195:3)
do_try_to_free_pages (mm/vmscan.c:6257:3)
do_swap_page (mm/memory.c:4136:11)
handle_pte_fault (mm/memory.c:5562:10)
handle_mm_fault (mm/memory.c:5870:9)
do_user_addr_fault (arch/x86/mm/fault.c:1338:10)
handle_page_fault (arch/x86/mm/fault.c:1481:3)
exc_page_fault (arch/x86/mm/fault.c:1539:2)
asm_exc_page_fault+0x22/0x27
Fix this deadlock by allocating ff->release_args and grabbing the
reference on the inode when preparing the file for release even if the
server does not implement open. The inode reference will be dropped when
the last reference on the fuse file is dropped (see fuse_file_put() ->
fuse_release_end()).
Fixes: e26ee4efbc79 ("fuse: allocate ff->release_args only if release is needed")
Cc: stable(a)vger.kernel.org
Signed-off-by: Joanne Koong <joannelkoong(a)gmail.com>
Reported-by: Omar Sandoval <osandov(a)fb.com>
---
fs/fuse/file.c | 40 ++++++++++++++++++++++++++--------------
1 file changed, 26 insertions(+), 14 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f1ef77a0be05..654e21ee93fb 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -100,7 +100,7 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args,
kfree(ra);
}
-static void fuse_file_put(struct fuse_file *ff, bool sync)
+static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir)
{
if (refcount_dec_and_test(&ff->count)) {
struct fuse_release_args *ra = &ff->args->release_args;
@@ -110,7 +110,9 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
fuse_file_io_release(ff, ra->inode);
if (!args) {
- /* Do nothing when server does not implement 'open' */
+ /* Do nothing when server does not implement 'opendir' */
+ } else if (!isdir && ff->fm->fc->no_open) {
+ fuse_release_end(ff->fm, args, 0);
} else if (sync) {
fuse_simple_request(ff->fm, args);
fuse_release_end(ff->fm, args, 0);
@@ -131,8 +133,17 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
struct fuse_file *ff;
int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
bool open = isdir ? !fc->no_opendir : !fc->no_open;
+ bool release = !isdir || open;
- ff = fuse_file_alloc(fm, open);
+ /*
+ * ff->args->release_args still needs to be allocated (so we can hold an
+ * inode reference while there are pending inflight file operations when
+ * ->release() is called, see fuse_prepare_release()) even if
+ * fc->no_open is set else it becomes possible for reclaim to deadlock
+ * if while servicing the readahead request the server triggers reclaim
+ * and reclaim evicts the inode of the file being read ahead.
+ */
+ ff = fuse_file_alloc(fm, release);
if (!ff)
return ERR_PTR(-ENOMEM);
@@ -152,13 +163,14 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
fuse_file_free(ff);
return ERR_PTR(err);
} else {
- /* No release needed */
- kfree(ff->args);
- ff->args = NULL;
- if (isdir)
+ if (isdir) {
+ /* No release needed */
+ kfree(ff->args);
+ ff->args = NULL;
fc->no_opendir = 1;
- else
+ } else {
fc->no_open = 1;
+ }
}
}
@@ -363,7 +375,7 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff,
* own ref to the file, the IO completion has to drop the ref, which is
* how the fuse server can end up closing its clients' files.
*/
- fuse_file_put(ff, false);
+ fuse_file_put(ff, false, isdir);
}
void fuse_release_common(struct file *file, bool isdir)
@@ -394,7 +406,7 @@ void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff,
{
WARN_ON(refcount_read(&ff->count) > 1);
fuse_prepare_release(fi, ff, flags, FUSE_RELEASE, true);
- fuse_file_put(ff, true);
+ fuse_file_put(ff, true, false);
}
EXPORT_SYMBOL_GPL(fuse_sync_release);
@@ -891,7 +903,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
folio_put(ap->folios[i]);
}
if (ia->ff)
- fuse_file_put(ia->ff, false);
+ fuse_file_put(ia->ff, false, false);
fuse_io_free(ia);
}
@@ -1815,7 +1827,7 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa)
if (wpa->bucket)
fuse_sync_bucket_dec(wpa->bucket);
- fuse_file_put(wpa->ia.ff, false);
+ fuse_file_put(wpa->ia.ff, false, false);
kfree(ap->folios);
kfree(wpa);
@@ -1968,7 +1980,7 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
ff = __fuse_write_file_get(fi);
err = fuse_flush_times(inode, ff);
if (ff)
- fuse_file_put(ff, false);
+ fuse_file_put(ff, false, false);
return err;
}
@@ -2186,7 +2198,7 @@ static int fuse_iomap_writeback_submit(struct iomap_writepage_ctx *wpc,
}
if (data->ff)
- fuse_file_put(data->ff, false);
+ fuse_file_put(data->ff, false, false);
return error;
}
--
2.47.3
The atomic variable vm_fault_info_updated is used to synchronize access to
adev->gmc.vm_fault_info between the interrupt handler and
get_vm_fault_info().
The default atomic functions like atomic_set() and atomic_read() do not
provide memory barriers. This allows for CPU instruction reordering,
meaning the memory accesses to vm_fault_info and the vm_fault_info_updated
flag are not guaranteed to occur in the intended order. This creates a
race condition that can lead to inconsistent or stale data being used.
The previous implementation, which used an explicit mb(), was incomplete
and inefficient. It failed to account for all potential CPU reorderings,
such as the access of vm_fault_info being reordered before the atomic_read
of the flag. This approach is also more verbose and less performant than
using the proper atomic functions with acquire/release semantics.
Fix this by switching to atomic_set_release() and atomic_read_acquire().
These functions provide the necessary acquire and release semantics,
which act as memory barriers to ensure the correct order of operations.
It is also more efficient and idiomatic than using explicit full memory
barriers.
Fixes: b97dfa27ef3a ("drm/amdgpu: save vm fault information for amdkfd")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02(a)gmail.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 ++---
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 7 +++----
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 7 +++----
3 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index b16cce7c22c3..ac09bbe51634 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2325,10 +2325,9 @@ void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_mem *mem)
int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct amdgpu_device *adev,
struct kfd_vm_fault_info *mem)
{
- if (atomic_read(&adev->gmc.vm_fault_info_updated) == 1) {
+ if (atomic_read_acquire(&adev->gmc.vm_fault_info_updated) == 1) {
*mem = *adev->gmc.vm_fault_info;
- mb(); /* make sure read happened */
- atomic_set(&adev->gmc.vm_fault_info_updated, 0);
+ atomic_set_release(&adev->gmc.vm_fault_info_updated, 0);
}
return 0;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index a8d5795084fc..cf30d3332050 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -1066,7 +1066,7 @@ static int gmc_v7_0_sw_init(struct amdgpu_ip_block *ip_block)
GFP_KERNEL);
if (!adev->gmc.vm_fault_info)
return -ENOMEM;
- atomic_set(&adev->gmc.vm_fault_info_updated, 0);
+ atomic_set_release(&adev->gmc.vm_fault_info_updated, 0);
return 0;
}
@@ -1288,7 +1288,7 @@ static int gmc_v7_0_process_interrupt(struct amdgpu_device *adev,
vmid = REG_GET_FIELD(status, VM_CONTEXT1_PROTECTION_FAULT_STATUS,
VMID);
if (amdgpu_amdkfd_is_kfd_vmid(adev, vmid)
- && !atomic_read(&adev->gmc.vm_fault_info_updated)) {
+ && !atomic_read_acquire(&adev->gmc.vm_fault_info_updated)) {
struct kfd_vm_fault_info *info = adev->gmc.vm_fault_info;
u32 protections = REG_GET_FIELD(status,
VM_CONTEXT1_PROTECTION_FAULT_STATUS,
@@ -1304,8 +1304,7 @@ static int gmc_v7_0_process_interrupt(struct amdgpu_device *adev,
info->prot_read = protections & 0x8 ? true : false;
info->prot_write = protections & 0x10 ? true : false;
info->prot_exec = protections & 0x20 ? true : false;
- mb();
- atomic_set(&adev->gmc.vm_fault_info_updated, 1);
+ atomic_set_release(&adev->gmc.vm_fault_info_updated, 1);
}
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index b45fa0cea9d2..0d4c93ff6f74 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1179,7 +1179,7 @@ static int gmc_v8_0_sw_init(struct amdgpu_ip_block *ip_block)
GFP_KERNEL);
if (!adev->gmc.vm_fault_info)
return -ENOMEM;
- atomic_set(&adev->gmc.vm_fault_info_updated, 0);
+ atomic_set_release(&adev->gmc.vm_fault_info_updated, 0);
return 0;
}
@@ -1474,7 +1474,7 @@ static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev,
vmid = REG_GET_FIELD(status, VM_CONTEXT1_PROTECTION_FAULT_STATUS,
VMID);
if (amdgpu_amdkfd_is_kfd_vmid(adev, vmid)
- && !atomic_read(&adev->gmc.vm_fault_info_updated)) {
+ && !atomic_read_acquire(&adev->gmc.vm_fault_info_updated)) {
struct kfd_vm_fault_info *info = adev->gmc.vm_fault_info;
u32 protections = REG_GET_FIELD(status,
VM_CONTEXT1_PROTECTION_FAULT_STATUS,
@@ -1490,8 +1490,7 @@ static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev,
info->prot_read = protections & 0x8 ? true : false;
info->prot_write = protections & 0x10 ? true : false;
info->prot_exec = protections & 0x20 ? true : false;
- mb();
- atomic_set(&adev->gmc.vm_fault_info_updated, 1);
+ atomic_set_release(&adev->gmc.vm_fault_info_updated, 1);
}
return 0;
--
2.25.1