The patch titled
Subject: ocfs2: fix slab-use-after-free due to dangling pointer dqi_priv
has been added to the -mm mm-nonmm-unstable branch. Its filename is
ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Dennis Lam <dennis.lamerice(a)gmail.com>
Subject: ocfs2: fix slab-use-after-free due to dangling pointer dqi_priv
Date: Tue, 17 Dec 2024 21:39:25 -0500
When mounting ocfs2 and then remounting it as read-only, a
slab-use-after-free occurs after the user uses a syscall to
quota_getnextquota. Specifically, sb_dqinfo(sb, type)->dqi_priv is the
dangling pointer.
During the remounting process, the pointer dqi_priv is freed but is never
set as null leaving it to to be accessed. Additionally, the read-only
option for remounting sets the DQUOT_SUSPENDED flag instead of setting the
DQUOT_USAGE_ENABLED flags. Moreover, later in the process of getting the
next quota, the function ocfs2_get_next_id is called and only checks the
quota usage flags and not the quota suspended flags.
To fix this, I set dqi_priv to null when it is freed after remounting with
read-only and put a check for DQUOT_SUSPENDED in ocfs2_get_next_id.
Link: https://lkml.kernel.org/r/20241218023924.22821-2-dennis.lamerice@gmail.com
Fixes: 8f9e8f5fcc05 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas")
Signed-off-by: Dennis Lam <dennis.lamerice(a)gmail.com>
Reported-by: syzbot+d173bf8a5a7faeede34c(a)syzkaller.appspotmail.com
Tested-by: syzbot+d173bf8a5a7faeede34c(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6731d26f.050a0220.1fb99c.014b.GAE@google.com/T/
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/quota_global.c | 2 +-
fs/ocfs2/quota_local.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/quota_global.c~ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv
+++ a/fs/ocfs2/quota_global.c
@@ -893,7 +893,7 @@ static int ocfs2_get_next_id(struct supe
int status = 0;
trace_ocfs2_get_next_id(from_kqid(&init_user_ns, *qid), type);
- if (!sb_has_quota_loaded(sb, type)) {
+ if (!sb_has_quota_active(sb, type)){
status = -ESRCH;
goto out;
}
--- a/fs/ocfs2/quota_local.c~ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv
+++ a/fs/ocfs2/quota_local.c
@@ -867,6 +867,7 @@ out:
brelse(oinfo->dqi_libh);
brelse(oinfo->dqi_lqi_bh);
kfree(oinfo);
+ info->dqi_priv = NULL;
return status;
}
_
Patches currently in -mm which might be from dennis.lamerice(a)gmail.com are
ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch
From: Steven Rostedt <rostedt(a)goodmis.org>
The TP_printk() of a TRACE_EVENT() is a generic printf format that any
developer can create for their event. It may include pointers to strings
and such. A boot mapped buffer may contain data from a previous kernel
where the strings addresses are different.
One solution is to copy the event content and update the pointers by the
recorded delta, but a simpler solution (for now) is to just use the
print_fields() function to print these events. The print_fields() function
just iterates the fields and prints them according to what type they are,
and ignores the TP_printk() format from the event itself.
To understand the difference, when printing via TP_printk() the output
looks like this:
4582.696626: kmem_cache_alloc: call_site=getname_flags+0x47/0x1f0 ptr=00000000e70e10e0 bytes_req=4096 bytes_alloc=4096 gfp_flags=GFP_KERNEL node=-1 accounted=false
4582.696629: kmem_cache_alloc: call_site=alloc_empty_file+0x6b/0x110 ptr=0000000095808002 bytes_req=360 bytes_alloc=384 gfp_flags=GFP_KERNEL node=-1 accounted=false
4582.696630: kmem_cache_alloc: call_site=security_file_alloc+0x24/0x100 ptr=00000000576339c3 bytes_req=16 bytes_alloc=16 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1 accounted=false
4582.696653: kmem_cache_free: call_site=do_sys_openat2+0xa7/0xd0 ptr=00000000e70e10e0 name=names_cache
But when printing via print_fields() (echo 1 > /sys/kernel/tracing/options/fields)
the same event output looks like this:
4582.696626: kmem_cache_alloc: call_site=0xffffffff92d10d97 (-1831793257) ptr=0xffff9e0e8571e000 (-107689771147264) bytes_req=0x1000 (4096) bytes_alloc=0x1000 (4096) gfp_flags=0xcc0 (3264) node=0xffffffff (-1) accounted=(0)
4582.696629: kmem_cache_alloc: call_site=0xffffffff92d0250b (-1831852789) ptr=0xffff9e0e8577f800 (-107689770747904) bytes_req=0x168 (360) bytes_alloc=0x180 (384) gfp_flags=0xcc0 (3264) node=0xffffffff (-1) accounted=(0)
4582.696630: kmem_cache_alloc: call_site=0xffffffff92efca74 (-1829778828) ptr=0xffff9e0e8d35d3b0 (-107689640864848) bytes_req=0x10 (16) bytes_alloc=0x10 (16) gfp_flags=0xdc0 (3520) node=0xffffffff (-1) accounted=(0)
4582.696653: kmem_cache_free: call_site=0xffffffff92cfbea7 (-1831879001) ptr=0xffff9e0e8571e000 (-107689771147264) name=names_cache
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/20241218141507.28389a1d@gandalf.local.home
Fixes: 07714b4bb3f98 ("tracing: Handle old buffer mappings for event strings and functions")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index be62f0ea1814..6581cb2bc67f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4353,6 +4353,15 @@ static enum print_line_t print_trace_fmt(struct trace_iterator *iter)
if (event) {
if (tr->trace_flags & TRACE_ITER_FIELDS)
return print_event_fields(iter, event);
+ /*
+ * For TRACE_EVENT() events, the print_fmt is not
+ * safe to use if the array has delta offsets
+ * Force printing via the fields.
+ */
+ if ((tr->text_delta || tr->data_delta) &&
+ event->type > __TRACE_LAST_TYPE)
+ return print_event_fields(iter, event);
+
return event->funcs->trace(iter, sym_flags, event);
}
--
2.45.2
Currently, io_uring_unreg_ringfd() (which cleans up registered rings) is
only called on exit, but __io_uring_free (which frees the tctx in which the
registered ring pointers are stored) is also called on execve (via
begin_new_exec -> io_uring_task_cancel -> __io_uring_cancel ->
io_uring_cancel_generic -> __io_uring_free).
This means: A process going through execve while having registered rings
will leak references to the rings' `struct file`.
Fix it by zapping registered rings on execve(). This is implemented by
moving the io_uring_unreg_ringfd() from io_uring_files_cancel() into its
callee __io_uring_cancel(), which is called from io_uring_task_cancel() on
execve.
This could probably be exploited *on 32-bit kernels* by leaking 2^32
references to the same ring, because the file refcount is stored in a
pointer-sized field and get_file() doesn't have protection against
refcount overflow, just a WARN_ONCE(); but on 64-bit it should have no
impact beyond a memory leak.
Cc: stable(a)vger.kernel.org
Fixes: e7a6c00dc77a ("io_uring: add support for registering ring file descriptors")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
testcase:
```
#define _GNU_SOURCE
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <linux/io_uring.h>
#define SYSCHK(x) ({ \
typeof(x) __res = (x); \
if (__res == (typeof(x))-1) \
err(1, "SYSCHK(" #x ")"); \
__res; \
})
#define NUM_SQ_PAGES 4
static int uring_fd;
int main(void) {
int memfd_sq = SYSCHK(memfd_create("", 0));
int memfd_cq = SYSCHK(memfd_create("", 0));
SYSCHK(ftruncate(memfd_sq, NUM_SQ_PAGES * 0x1000));
SYSCHK(ftruncate(memfd_cq, NUM_SQ_PAGES * 0x1000));
// sq
void *sq_data = SYSCHK(mmap(NULL, NUM_SQ_PAGES*0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, memfd_sq, 0));
// cq
void *cq_data = SYSCHK(mmap(NULL, NUM_SQ_PAGES*0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, memfd_cq, 0));
*(volatile unsigned int *)(cq_data+4) = 64 * NUM_SQ_PAGES;
close(memfd_sq);
close(memfd_cq);
// initialize uring
struct io_uring_params params = {
.flags = IORING_SETUP_REGISTERED_FD_ONLY|IORING_SETUP_NO_MMAP|IORING_SETUP_NO_SQARRAY,
.sq_off = { .user_addr = (unsigned long)sq_data },
.cq_off = { .user_addr = (unsigned long)cq_data }
};
uring_fd = SYSCHK(syscall(__NR_io_uring_setup, /*entries=*/10, ¶ms));
// re-execute
execl("/proc/self/exe", NULL);
err(1, "execve");
}
```
Leave it running for a while and monitor `grep ^filp /proc/slabinfo` and
memory usage - without the patch, both will go up slowly but steadily.
---
include/linux/io_uring.h | 4 +---
io_uring/io_uring.c | 1 +
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index e123d5e17b526148054872ee513f665adea80eb6..85fe4e6b275c7de260ea9a8552b8e1c3e7f7e5ec 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -15,10 +15,8 @@ bool io_is_uring_fops(struct file *file);
static inline void io_uring_files_cancel(void)
{
- if (current->io_uring) {
- io_uring_unreg_ringfd();
+ if (current->io_uring)
__io_uring_cancel(false);
- }
}
static inline void io_uring_task_cancel(void)
{
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 06ff41484e29c6e7d8779bd7ff8317ebae003a8d..b1e888999cea137e043bf00372576861182857ac 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3214,6 +3214,7 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd)
void __io_uring_cancel(bool cancel_all)
{
+ io_uring_unreg_ringfd();
io_uring_cancel_generic(cancel_all, NULL);
}
---
base-commit: aef25be35d23ec768eed08bfcf7ca3cf9685bc28
change-id: 20241218-uring-reg-ring-cleanup-15dd7343194a
--
Jann Horn <jannh(a)google.com>
commit b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols")
avoids checking number_of_same_symbols() for module symbol in
__trace_kprobe_create(), but create_local_trace_kprobe() should avoid this
check too. Doing this check leads to ENOENT for module_name:symbol_name
constructions passed over perf_event_open.
No bug in mainline and 6.12 as those contain more general fix
commit 9d8616034f16 ("tracing/kprobes: Add symbol counting check when module loads")
Link: https://lore.kernel.org/linux-trace-kernel/20240705161030.b3ddb33a8167013b9…
Fixes: b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols")
Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru>
---
v1 -> v2:
* Reword commit title and message
* Send for stable instead of mainline
v2 -> v3:
* Specify first good LTS version in commit message
* Remove explicit versions from the subject since 6.1 and 5.10 need fix too
kernel/trace/trace_kprobe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 12d997bb3e78..94cb09d44115 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1814,7 +1814,7 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
int ret;
char *event;
- if (func) {
+ if (func && !strchr(func, ':')) {
unsigned int count;
count = number_of_same_symbols(func);
--
2.34.1
As per zynqmp-ipi bindings, zynqmp IPI node can have multiple child nodes.
Current IPI setup function is set only for first child node. If IPI node
has multiple child nodes in the device-tree, then IPI setup fails for
child nodes other than first child node. In such case kernel will crash.
Fix this crash by registering IPI setup function for each available child
node.
Cc: stable(a)vger.kernel.org
Fixes: 41bcf30100c5 (mailbox: zynqmp: Move buffered IPI setup)
Signed-off-by: Tanmay Shah <tanmay.shah(a)amd.com>
Reviewed-by: Michal Simek <michal.simek(a)amd.com>
---
Changes in v2:
- Add Fixes tag
drivers/mailbox/zynqmp-ipi-mailbox.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mailbox/zynqmp-ipi-mailbox.c b/drivers/mailbox/zynqmp-ipi-mailbox.c
index 521d08b9ab47..815e0492f029 100644
--- a/drivers/mailbox/zynqmp-ipi-mailbox.c
+++ b/drivers/mailbox/zynqmp-ipi-mailbox.c
@@ -940,10 +940,10 @@ static int zynqmp_ipi_probe(struct platform_device *pdev)
pdata->num_mboxes = num_mboxes;
mbox = pdata->ipi_mboxes;
- mbox->setup_ipi_fn = ipi_fn;
-
for_each_available_child_of_node(np, nc) {
mbox->pdata = pdata;
+ mbox->setup_ipi_fn = ipi_fn;
+
ret = zynqmp_ipi_mbox_probe(mbox, nc);
if (ret) {
of_node_put(nc);
base-commit: 28955f4fa2823e39f1ecfb3a37a364563527afbc
--
2.34.1
From: Sumit Gupta <sumitg(a)nvidia.com>
Access to safety cluster engine (SCE) fabric registers was blocked
by firewall after the introduction of Functional Safety Island in
Tegra234. After that, any access by software to SCE registers is
correctly resulting in the internal bus error. However, when CPUs
try accessing the SCE-fabric registers to print error info,
another firewall error occurs as the fabric registers are also
firewall protected. This results in a second error to be printed.
Disable the SCE fabric node to avoid printing the misleading error.
The first error info will be printed by the interrupt from the
fabric causing the actual access.
Cc: stable(a)vger.kernel.org
Fixes: 302e154000ec ("arm64: tegra: Add node for CBB 2.0 on Tegra234")
Signed-off-by: Sumit Gupta <sumitg(a)nvidia.com>
Signed-off-by: Ivy Huang <yijuh(a)nvidia.com>
---
arch/arm64/boot/dts/nvidia/tegra234.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
index d08faf6bb505..05a771ab1ed5 100644
--- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
@@ -3815,7 +3815,7 @@ sce-fabric@b600000 {
compatible = "nvidia,tegra234-sce-fabric";
reg = <0x0 0xb600000 0x0 0x40000>;
interrupts = <GIC_SPI 173 IRQ_TYPE_LEVEL_HIGH>;
- status = "okay";
+ status = "disabled";
};
rce-fabric@be00000 {
--
2.25.1
From: Sumit Gupta <sumitg(a)nvidia.com>
The compatible string for the Tegra DCE fabric is currently defined as
'nvidia,tegra234-sce-fabric' but this is incorrect because this is the
compatible string for SCE fabric. Update the compatible for the DCE
fabric to correct the compatible string.
This compatible needs to be correct in order for the interconnect
to catch things such as improper data accesses.
Cc: stable(a)vger.kernel.org
Fixes: 302e154000ec ("arm64: tegra: Add node for CBB 2.0 on Tegra234")
Signed-off-by: Sumit Gupta <sumitg(a)nvidia.com>
Signed-off-by: Ivy Huang <yijuh(a)nvidia.com>
---
arch/arm64/boot/dts/nvidia/tegra234.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
index 984c85eab41a..d08faf6bb505 100644
--- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
@@ -3995,7 +3995,7 @@ bpmp-fabric@d600000 {
};
dce-fabric@de00000 {
- compatible = "nvidia,tegra234-sce-fabric";
+ compatible = "nvidia,tegra234-dce-fabric";
reg = <0x0 0xde00000 0x0 0x40000>;
interrupts = <GIC_SPI 381 IRQ_TYPE_LEVEL_HIGH>;
status = "okay";
--
2.25.1
Hello,
This series contains backports for 6.6 from the 6.11 release. This patchset
has gone through xfs testing and review.
Chen Ni (1):
xfs: convert comma to semicolon
Christoph Hellwig (1):
xfs: fix the contact address for the sysfs ABI documentation
Darrick J. Wong (10):
xfs: verify buffer, inode, and dquot items every tx commit
xfs: use consistent uid/gid when grabbing dquots for inodes
xfs: declare xfs_file.c symbols in xfs_file.h
xfs: create a new helper to return a file's allocation unit
xfs: fix file_path handling in tracepoints
xfs: attr forks require attr, not attr2
xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set
xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
xfs: take m_growlock when running growfsrt
xfs: reset rootdir extent size hint after growfsrt
John Garry (2):
xfs: Fix xfs_flush_unmap_range() range for RT
xfs: Fix xfs_prepare_shift() range for RT
Julian Sun (1):
xfs: remove unused parameter in macro XFS_DQUOT_LOGRES
Zizhi Wo (1):
xfs: Fix the owner setting issue for rmap query in xfs fsmap
lei lu (1):
xfs: don't walk off the end of a directory data block
Documentation/ABI/testing/sysfs-fs-xfs | 8 +--
fs/xfs/Kconfig | 12 ++++
fs/xfs/libxfs/xfs_dir2_data.c | 31 ++++++++--
fs/xfs/libxfs/xfs_dir2_priv.h | 7 +++
fs/xfs/libxfs/xfs_quota_defs.h | 2 +-
fs/xfs/libxfs/xfs_trans_resv.c | 28 ++++-----
fs/xfs/scrub/agheader_repair.c | 2 +-
fs/xfs/scrub/bmap.c | 8 ++-
fs/xfs/scrub/trace.h | 10 ++--
fs/xfs/xfs.h | 4 ++
fs/xfs/xfs_bmap_util.c | 22 +++++---
fs/xfs/xfs_buf_item.c | 32 +++++++++++
fs/xfs/xfs_dquot_item.c | 31 ++++++++++
fs/xfs/xfs_file.c | 33 +++++------
fs/xfs/xfs_file.h | 15 +++++
fs/xfs/xfs_fsmap.c | 6 +-
fs/xfs/xfs_inode.c | 29 ++++++++--
fs/xfs/xfs_inode.h | 2 +
fs/xfs/xfs_inode_item.c | 32 +++++++++++
fs/xfs/xfs_ioctl.c | 12 ++++
fs/xfs/xfs_iops.c | 1 +
fs/xfs/xfs_iops.h | 3 -
fs/xfs/xfs_rtalloc.c | 78 +++++++++++++++++++++-----
fs/xfs/xfs_symlink.c | 8 ++-
24 files changed, 328 insertions(+), 88 deletions(-)
create mode 100644 fs/xfs/xfs_file.h
--
2.39.3