Calling intotify_show_fdinfo() on fd watching an overlayfs inode, while
the overlayfs is being unmounted, can lead to dereferencing NULL ptr.
This issue was found by syzkaller.
Race Condition Diagram:
Thread 1 Thread 2
-------- --------
generic_shutdown_super()
shrink_dcache_for_umount
sb->s_root = NULL
|
| vfs_read()
| inotify_fdinfo()
| * inode get from mark *
| show_mark_fhandle(m, inode)
| exportfs_encode_fid(inode, ..)
| ovl_encode_fh(inode, ..)
| ovl_check_encode_origin(inode)
| * deref i_sb->s_root *
|
|
v
fsnotify_sb_delete(sb)
Which then leads to:
[ 32.133461] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 32.134438] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
[ 32.135032] CPU: 1 UID: 0 PID: 4468 Comm: systemd-coredum Not tainted 6.17.0-rc6 #22 PREEMPT(none)
<snip registers, unreliable trace>
[ 32.143353] Call Trace:
[ 32.143732] ovl_encode_fh+0xd5/0x170
[ 32.144031] exportfs_encode_inode_fh+0x12f/0x300
[ 32.144425] show_mark_fhandle+0xbe/0x1f0
[ 32.145805] inotify_fdinfo+0x226/0x2d0
[ 32.146442] inotify_show_fdinfo+0x1c5/0x350
[ 32.147168] seq_show+0x530/0x6f0
[ 32.147449] seq_read_iter+0x503/0x12a0
[ 32.148419] seq_read+0x31f/0x410
[ 32.150714] vfs_read+0x1f0/0x9e0
[ 32.152297] ksys_read+0x125/0x240
IOW ovl_check_encode_origin derefs inode->i_sb->s_root, after it was set
to NULL in the unmount path.
Minimize the window of opportunity by adding explicit check.
Fixes: c45beebfde34 ("ovl: support encoding fid from inode with no alias")
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Cc: Miklos Szeredi <miklos(a)szeredi.hu>
Cc: Amir Goldstein <amir73il(a)gmail.com>
Cc: linux-unionfs(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
I'm happy to take suggestions for a better fix - I looked at taking
s_umount for reading, but it wasn't clear to me for how long would the
fdinfo path need to hold it. Hence the most primitive suggestion in this
v1.
I'm also not sure if ENOENT or EBUSY is better?.. or even something else?
fs/overlayfs/export.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 83f80fdb1567..424c73188e06 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -195,6 +195,8 @@ static int ovl_check_encode_origin(struct inode *inode)
if (!ovl_inode_lower(inode))
return 0;
+ if (!inode->i_sb->s_root)
+ return -ENOENT;
/*
* Root is never indexed, so if there's an upper layer, encode upper for
* root.
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Forbid USB runtime PM (autosuspend) for AX88772* in bind.
usbnet enables runtime PM by default in probe, so disabling it via the
usb_driver flag is ineffective. For AX88772B, autosuspend shows no
measurable power saving in my tests (no link partner, admin up/down).
The ~0.453 W -> ~0.248 W reduction on 6.1 comes from phylib powering
the PHY off on admin-down, not from USB autosuspend.
With autosuspend active, resume paths may require calling phylink/phylib
(caller must hold RTNL) and doing MDIO I/O. Taking RTNL from a USB PM
resume can deadlock (RTNL may already be held), and MDIO can attempt a
runtime-wake while the USB PM lock is held. Given the lack of benefit
and poor test coverage (autosuspend is usually disabled by default in
distros), forbid runtime PM here to avoid these hazards.
This affects only AX88772* devices (per-interface in bind). System
sleep/resume is unchanged.
Fixes: 4a2c7217cd5a ("net: usb: asix: ax88772: manage PHY PM from MAC")
Reported-by: Hubert Wiśniewski <hubert.wisniewski.25632(a)gmail.com>
Closes: https://lore.kernel.org/all/20220622141638.GE930160@montezuma.acc.umu.se
Reported-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
Closes: https://lore.kernel.org/all/b5ea8296-f981-445d-a09a-2f389d7f6fdd@samsung.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
---
Link to the measurement results:
https://lore.kernel.org/all/aMkPMa650kfKfmF4@pengutronix.de/
---
drivers/net/usb/asix_devices.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
index 792ddda1ad49..0d341d7e6154 100644
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -625,6 +625,22 @@ static void ax88772_suspend(struct usbnet *dev)
asix_read_medium_status(dev, 1));
}
+/*
+ * Notes on PM callbacks and locking context:
+ *
+ * - asix_suspend()/asix_resume() are invoked for both runtime PM and
+ * system-wide suspend/resume. For struct usb_driver the ->resume()
+ * callback does not receive pm_message_t, so the resume type cannot
+ * be distinguished here.
+ *
+ * - The MAC driver must hold RTNL when calling phylink interfaces such as
+ * phylink_suspend()/resume(). Those calls will also perform MDIO I/O.
+ *
+ * - Taking RTNL and doing MDIO from a runtime-PM resume callback (while
+ * the USB PM lock is held) is fragile. Since autosuspend brings no
+ * measurable power saving for this device with current driver version, it is
+ * disabled below.
+ */
static int asix_suspend(struct usb_interface *intf, pm_message_t message)
{
struct usbnet *dev = usb_get_intfdata(intf);
@@ -919,6 +935,16 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
if (ret)
goto initphy_err;
+ /* Disable USB runtime PM (autosuspend) for this interface.
+ * Rationale:
+ * - No measurable power saving from autosuspend for this device.
+ * - phylink/phylib calls require caller-held RTNL and do MDIO I/O,
+ * which is unsafe from USB PM resume paths (possible RTNL already
+ * held, USB PM lock held).
+ * System suspend/resume is unaffected.
+ */
+ pm_runtime_forbid(&intf->dev);
+
return 0;
initphy_err:
@@ -948,6 +974,10 @@ static void ax88772_unbind(struct usbnet *dev, struct usb_interface *intf)
phylink_destroy(priv->phylink);
ax88772_mdio_unregister(priv);
asix_rx_fixup_common_free(dev->driver_priv);
+ /* Re-allow runtime PM on disconnect for tidiness. The interface
+ * goes away anyway, but this balances forbid for debug sanity.
+ */
+ pm_runtime_allow(&intf->dev);
}
static void ax88178_unbind(struct usbnet *dev, struct usb_interface *intf)
@@ -1600,6 +1630,10 @@ static struct usb_driver asix_driver = {
.resume = asix_resume,
.reset_resume = asix_resume,
.disconnect = usbnet_disconnect,
+ /* usbnet will force supports_autosuspend=1; we explicitly forbid RPM
+ * per-interface in bind to keep autosuspend disabled for this driver
+ * by using pm_runtime_forbid().
+ */
.supports_autosuspend = 1,
.disable_hub_initiated_lpm = 1,
};
--
2.47.3
On Wed, Sep 17, 2025 at 10:03 AM Andrei Vagin <avagin(a)google.com> wrote:
>
> is
>
> On Wed, Sep 17, 2025 at 8:59 AM Eric Dumazet <edumazet(a)google.com> wrote:
> >
> > On Wed, Sep 17, 2025 at 8:39 AM Andrei Vagin <avagin(a)google.com> wrote:
> > >
> > > On Wed, Sep 17, 2025 at 6:53 AM Eric Dumazet <edumazet(a)google.com> wrote:
> > > >
> > > > Andrei Vagin reported that blamed commit broke CRIU.
> > > >
> > > > Indeed, while we want to keep sk_uid unchanged when a socket
> > > > is cloned, we want to clear sk->sk_ino.
> > > >
> > > > Otherwise, sock_diag might report multiple sockets sharing
> > > > the same inode number.
> > > >
> > > > Move the clearing part from sock_orphan() to sk_set_socket(sk, NULL),
> > > > called both from sock_orphan() and sk_clone_lock().
> > > >
> > > > Fixes: 5d6b58c932ec ("net: lockless sock_i_ino()")
> > > > Closes: https://lore.kernel.org/netdev/aMhX-VnXkYDpKd9V@google.com/
> > > > Closes: https://github.com/checkpoint-restore/criu/issues/2744
> > > > Reported-by: Andrei Vagin <avagin(a)google.com>
> > > > Signed-off-by: Eric Dumazet <edumazet(a)google.com>
> > >
> > > Acked-by: Andrei Vagin <avagin(a)google.com>
> > > I think we need to add `Cc: stable(a)vger.kernel.org`.
> >
> > I never do this. Note that the prior patch had no such CC.
>
> The original patch has been ported to the v6.16 kernels. According to the
> kernel documentation
> (https://www.kernel.org/doc/html/v6.5/process/stable-kernel-rules.html),
> adding Cc: stable(a)vger.kernel.org is required for automatic porting into
> stable trees. Without this tag, someone will likely need to manually request
> that this patch be ported. This is my understanding of how the stable
> branch process works, sorry if I missed something.
Andrei, I think I know pretty well what I am doing. You do not have to
explain to me anything.
Thank you.
This patch series enables a future version of tune2fs to be able to
modify certain parts of the ext4 superblock without to write to the
block device.
The first patch fixes a potential buffer overrun caused by a
maliciously moified superblock. The second patch adds support for
32-bit uid and gid's which can have access to the reserved blocks pool.
The last patch adds the ioctl's which will be used by tune2fs.
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
---
Changes in v2:
- fix bugs that were detected using sparse
- remove tune (unsafe) ability to clear certain compat faatures
- add the ability to set the encoding and encoding flags for case folding
- Link to v1: https://lore.kernel.org/r/20250908-tune2fs-v1-0-e3a6929f3355@mit.edu
---
Theodore Ts'o (3):
ext4: avoid potential buffer over-read in parse_apply_sb_mount_options()
ext4: add support for 32-bit default reserved uid and gid values
ext4: implemet new ioctls to set and get superblock parameters
fs/ext4/ext4.h | 16 +++-
fs/ext4/ioctl.c | 312 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
fs/ext4/super.c | 25 +++----
include/uapi/linux/ext4.h | 53 +++++++++++++
4 files changed, 382 insertions(+), 24 deletions(-)
---
base-commit: b320789d6883cc00ac78ce83bccbfe7ed58afcf0
change-id: 20250830-tune2fs-3376beb72403
Best regards,
--
Theodore Ts'o <tytso(a)mit.edu>
Running sha224_kunit on a KMSAN-enabled kernel results in a crash in
kmsan_internal_set_shadow_origin():
BUG: unable to handle page fault for address: ffffbc3840291000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 #10 PREEMPT(voluntary)
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100
[...]
Call Trace:
<TASK>
__msan_memset+0xee/0x1a0
sha224_final+0x9e/0x350
test_hash_buffer_overruns+0x46f/0x5f0
? kmsan_get_shadow_origin_ptr+0x46/0xa0
? __pfx_test_hash_buffer_overruns+0x10/0x10
kunit_try_run_case+0x198/0xa00
This occurs when memset() is called on a buffer that is not 4-byte
aligned and extends to the end of a guard page, i.e. the next page is
unmapped.
The bug is that the loop at the end of
kmsan_internal_set_shadow_origin() accesses the wrong shadow memory
bytes when the address is not 4-byte aligned. Since each 4 bytes are
associated with an origin, it rounds the address and size so that it can
access all the origins that contain the buffer. However, when it checks
the corresponding shadow bytes for a particular origin, it incorrectly
uses the original unrounded shadow address. This results in reads from
shadow memory beyond the end of the buffer's shadow memory, which
crashes when that memory is not mapped.
To fix this, correctly align the shadow address before accessing the 4
shadow bytes corresponding to each origin.
Fixes: 2ef3cec44c60 ("kmsan: do not wipe out origin when doing partial unpoisoning")
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
---
mm/kmsan/core.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
index 1ea711786c522..8bca7fece47f0 100644
--- a/mm/kmsan/core.c
+++ b/mm/kmsan/core.c
@@ -193,11 +193,12 @@ depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b,
u32 origin, bool checked)
{
u64 address = (u64)addr;
- u32 *shadow_start, *origin_start;
+ void *shadow_start;
+ u32 *aligned_shadow, *origin_start;
size_t pad = 0;
KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
shadow_start = kmsan_get_metadata(addr, KMSAN_META_SHADOW);
if (!shadow_start) {
@@ -212,13 +213,16 @@ void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b,
}
return;
}
__memset(shadow_start, b, size);
- if (!IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) {
+ if (IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) {
+ aligned_shadow = shadow_start;
+ } else {
pad = address % KMSAN_ORIGIN_SIZE;
address -= pad;
+ aligned_shadow = shadow_start - pad;
size += pad;
}
size = ALIGN(size, KMSAN_ORIGIN_SIZE);
origin_start =
(u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN);
@@ -228,11 +232,11 @@ void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b,
* and unconditionally overwrite the old origin slot.
* If the new origin is zero, overwrite the old origin slot iff the
* corresponding shadow slot is zero.
*/
for (int i = 0; i < size / KMSAN_ORIGIN_SIZE; i++) {
- if (origin || !shadow_start[i])
+ if (origin || !aligned_shadow[i])
origin_start[i] = origin;
}
}
struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
base-commit: 1b237f190eb3d36f52dffe07a40b5eb210280e00
--
2.50.1
On Wed 17-09-25 11:18:50, Eric Hagberg wrote:
> I stumbled across a problem where the 6.6.103 kernel will fail when
> running the ioctl_loop06 test from the LTP test suite... and worse
> than failing the test, it leaves the system in a state where you can't
> run "losetup -a" again because the /dev/loopN device that the test
> created and failed the test on... hangs in a LOOP_GET_STATUS64 ioctl.
>
> It also leaves the system in a state where you can't re-kexec into a
> copy of the kernel as it gets completely hung at the point where it
> says "starting Reboot via kexec"...
Thanks for the report! Please report issues with stable kernels to
stable(a)vger.kernel.org (CCed now) because they can act on them.
> If I revert just that patch from 6.6.103 (or newer) kernels, then the
> test succeeds and doesn't leave the host in a bad state. The patch
> applied to 6.12 doesn't cause this problem, but I also see that there
> are quite a few other changes to the loop subsystem in 6.12 that never
> made it to 6.6.
>
> For now, I'll probably just revert your patch in my 6.6 kernel builds,
> but I wouldn't be surprised if others stumble across this issue as
> well, so maybe it should be reverted or fixed some other way.
Yes, I think revert from 6.6 stable kernel is warranted (unless somebody
has time to figure out what else is missing to make the patch work with
that stable branch).
Honza
--
Jan Kara <jack(a)suse.com>
SUSE Labs, CR