Currently pivot_root() doesn't work on the real rootfs because it
cannot be unmounted. Userspace has to do a recursive removal of the
initramfs contents manually before continuing the boot.
Really all we want from the real rootfs is to serve as the parent mount
for anything that is actually useful such as the tmpfs or ramfs for
initramfs unpacking or the rootfs itself. There's no need for the real
rootfs to actually be anything meaningful or useful. Add a immutable
rootfs called "nullfs" that can be selected via the "nullfs_rootfs"
kernel command line option.
The kernel will mount a tmpfs/ramfs on top of it, unpack the initramfs
and fire up userspace which mounts the rootfs and can then just do:
chdir(rootfs);
pivot_root(".", ".");
umount2(".", MNT_DETACH);
and be done with it. (Ofc, userspace can also choose to retain the
initramfs contents by using something like pivot_root(".", "/initramfs")
without unmounting it.)
Technically this also means that the rootfs mount in unprivileged
namespaces doesn't need to become MNT_LOCKED anymore as it's guaranteed
that the immutable rootfs remains permanently empty so there cannot be
anything revealed by unmounting the covering mount.
In the future this will also allow us to create completely empty mount
namespaces without risking to leak anything.
systemd already handles this all correctly as it tries to pivot_root()
first and falls back to MS_MOVE only when that fails.
This goes back to various discussion in previous years and a LPC 2024
presentation about this very topic.
Now in vfs-7.0.nullfs.
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
Changes in v2:
- Rename to "nullfs".
- Update documentation.
- Link to v1: https://patch.msgid.link/20260102-work-immutable-rootfs-v1-0-f2073b2d1602@k…
---
Christian Brauner (4):
fs: ensure that internal tmpfs mount gets mount id zero
fs: add init_pivot_root()
fs: add immutable rootfs
docs: mention nullfs
.../filesystems/ramfs-rootfs-initramfs.rst | 32 +++-
fs/Makefile | 2 +-
fs/init.c | 17 ++
fs/internal.h | 1 +
fs/mount.h | 1 +
fs/namespace.c | 181 ++++++++++++++-------
fs/nullfs.c | 70 ++++++++
include/linux/init_syscalls.h | 1 +
include/uapi/linux/magic.h | 1 +
init/do_mounts.c | 14 ++
init/do_mounts.h | 1 +
11 files changed, 254 insertions(+), 67 deletions(-)
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20260102-work-immutable-rootfs-b5f23e0f5a27
The patch titled
Subject: mm/vmalloc: prevent RCU stalls in kasan_release_vmalloc_node
has been added to the -mm mm-new branch. Its filename is
mm-vmalloc-prevent-rcu-stalls-in-kasan_release_vmalloc_node.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
The mm-new branch of mm.git is not included in linux-next
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Deepanshu Kartikey <kartikey406(a)gmail.com>
Subject: mm/vmalloc: prevent RCU stalls in kasan_release_vmalloc_node
Date: Mon, 12 Jan 2026 16:06:12 +0530
When CONFIG_PAGE_OWNER is enabled, freeing KASAN shadow pages during
vmalloc cleanup triggers expensive stack unwinding that acquires RCU read
locks. Processing a large purge_list without rescheduling can cause the
task to hold CPU for extended periods (10+ seconds), leading to RCU stalls
and potential OOM conditions.
The issue manifests in purge_vmap_node() -> kasan_release_vmalloc_node()
where iterating through hundreds or thousands of vmap_area entries and
freeing their associated shadow pages causes:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6229/1:b..l
...
task:kworker/0:17 state:R running task stack:28840 pid:6229
...
kasan_release_vmalloc_node+0x1ba/0xad0 mm/vmalloc.c:2299
purge_vmap_node+0x1ba/0xad0 mm/vmalloc.c:2299
Each call to kasan_release_vmalloc() can free many pages, and with
page_owner tracking, each free triggers save_stack() which performs stack
unwinding under RCU read lock. Without yielding, this creates an
unbounded RCU critical section.
Add periodic cond_resched() calls within the loop to allow:
- RCU grace periods to complete
- Other tasks to run
- Scheduler to preempt when needed
The fix uses need_resched() for immediate response under load, with a
batch count of 32 as a guaranteed upper bound to prevent worst-case stalls
even under light load.
Link: https://lkml.kernel.org/r/20260112103612.627247-1-kartikey406@gmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406(a)gmail.com>
Reported-by: syzbot+d8d4c31d40f868eaea30(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8d4c31d40f868eaea30
Link: https://lore.kernel.org/all/20260112084723.622910-1-kartikey406@gmail.com/T/ [v1]
Suggested-by: Uladzislau Rezki <urezki(a)gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/vmalloc.c~mm-vmalloc-prevent-rcu-stalls-in-kasan_release_vmalloc_node
+++ a/mm/vmalloc.c
@@ -2273,11 +2273,14 @@ decay_va_pool_node(struct vmap_node *vn,
reclaim_list_global(&decay_list);
}
+#define KASAN_RELEASE_BATCH_SIZE 32
+
static void
kasan_release_vmalloc_node(struct vmap_node *vn)
{
struct vmap_area *va;
unsigned long start, end;
+ unsigned int batch_count = 0;
start = list_first_entry(&vn->purge_list, struct vmap_area, list)->va_start;
end = list_last_entry(&vn->purge_list, struct vmap_area, list)->va_end;
@@ -2287,6 +2290,11 @@ kasan_release_vmalloc_node(struct vmap_n
kasan_release_vmalloc(va->va_start, va->va_end,
va->va_start, va->va_end,
KASAN_VMALLOC_PAGE_RANGE);
+
+ if (need_resched() || (++batch_count >= KASAN_RELEASE_BATCH_SIZE)) {
+ cond_resched();
+ batch_count = 0;
+ }
}
kasan_release_vmalloc(start, end, start, end, KASAN_VMALLOC_TLB_FLUSH);
_
Patches currently in -mm which might be from kartikey406(a)gmail.com are
mm-swap_cgroup-fix-kernel-bug-in-swap_cgroup_record.patch
mm-vmalloc-prevent-rcu-stalls-in-kasan_release_vmalloc_node.patch
ocfs2-validate-i_refcount_loc-when-refcount-flag-is-set.patch
ocfs2-validate-inline-data-i_size-during-inode-read.patch
ocfs2-add-check-for-free-bits-before-allocation-in-ocfs2_move_extent.patch
Add two missing goto statements to exit ecryptfs_read_metadata() when an
error occurs.
The first goto is required; otherwise ECRYPTFS_METADATA_IN_XATTR may be
set when xattr metadata is enabled even though parsing the metadata
failed. The second goto is not strictly necessary, but it makes the
error path explicit instead of relying on falling through to 'out'.
Cc: stable(a)vger.kernel.org
Fixes: dd2a3b7ad98f ("[PATCH] eCryptfs: Generalize metadata read/write")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
fs/ecryptfs/crypto.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 260f8a4938b0..d49cdf7292ab 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -1328,6 +1328,7 @@ int ecryptfs_read_metadata(struct dentry *ecryptfs_dentry)
"file xattr region either, inode %lu\n",
ecryptfs_inode->i_ino);
rc = -EINVAL;
+ goto out;
}
if (crypt_stat->mount_crypt_stat->flags
& ECRYPTFS_XATTR_METADATA_ENABLED) {
@@ -1340,6 +1341,7 @@ int ecryptfs_read_metadata(struct dentry *ecryptfs_dentry)
"this like an encrypted file, inode %lu\n",
ecryptfs_inode->i_ino);
rc = -EINVAL;
+ goto out;
}
}
out:
--
Thorsten Blum <thorsten.blum(a)linux.dev>
GPG: 1D60 735E 8AEF 3BE4 73B6 9D84 7336 78FD 8DFE EAD4
On Mon, Jan 12, 2026 at 12:23:45PM -0500, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> bcache: fix improper use of bi_end_io
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> bcache-fix-improper-use-of-bi_end_io.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Yeah, this is broken.
Coly, please revert this.
>
>
>
> commit 81e7e43a810e8f40e163928d441de02d2816b073
> Author: Shida Zhang <zhangshida(a)kylinos.cn>
> Date: Tue Dec 9 17:01:56 2025 +0800
>
> bcache: fix improper use of bi_end_io
>
> [ Upstream commit 53280e398471f0bddbb17b798a63d41264651325 ]
>
> Don't call bio->bi_end_io() directly. Use the bio_endio() helper
> function instead, which handles completion more safely and uniformly.
>
> Suggested-by: Christoph Hellwig <hch(a)infradead.org>
> Reviewed-by: Christoph Hellwig <hch(a)lst.de>
> Signed-off-by: Shida Zhang <zhangshida(a)kylinos.cn>
> Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index a9b1f3896249b..b4059d2daa326 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -1090,7 +1090,7 @@ static void detached_dev_end_io(struct bio *bio)
> }
>
> kfree(ddip);
> - bio->bi_end_io(bio);
> + bio_endio(bio);
> }
>
> static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
> @@ -1107,7 +1107,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
> ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO);
> if (!ddip) {
> bio->bi_status = BLK_STS_RESOURCE;
> - bio->bi_end_io(bio);
> + bio_endio(bio);
> return;
> }
>
> @@ -1122,7 +1122,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
>
> if ((bio_op(bio) == REQ_OP_DISCARD) &&
> !bdev_max_discard_sectors(dc->bdev))
> - bio->bi_end_io(bio);
> + detached_dev_end_io(bio);
> else
> submit_bio_noacct(bio);
> }
The patch below does not apply to the 6.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.18.y
git checkout FETCH_HEAD
git cherry-pick -x 1e876e5a0875e71e34148c9feb2eedd3bf6b2b43
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011202-scheming-operating-3cbb@gregkh' --subject-prefix 'PATCH 6.18.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e876e5a0875e71e34148c9feb2eedd3bf6b2b43 Mon Sep 17 00:00:00 2001
From: Abdun Nihaal <nihaal(a)cse.iitm.ac.in>
Date: Fri, 26 Dec 2025 11:34:10 +0530
Subject: [PATCH] gpio: mpsse: fix reference leak in gpio_mpsse_probe() error
paths
The reference obtained by calling usb_get_dev() is not released in the
gpio_mpsse_probe() error paths. Fix that by using device managed helper
functions. Also remove the usb_put_dev() call in the disconnect function
since now it will be released automatically.
Cc: stable(a)vger.kernel.org
Fixes: c46a74ff05c0 ("gpio: add support for FTDI's MPSSE as GPIO")
Signed-off-by: Abdun Nihaal <nihaal(a)cse.iitm.ac.in>
Link: https://lore.kernel.org/r/20251226060414.20785-1-nihaal@cse.iitm.ac.in
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)oss.qualcomm.com>
diff --git a/drivers/gpio/gpio-mpsse.c b/drivers/gpio/gpio-mpsse.c
index ace652ba4df1..12191aeb6566 100644
--- a/drivers/gpio/gpio-mpsse.c
+++ b/drivers/gpio/gpio-mpsse.c
@@ -548,6 +548,13 @@ static void gpio_mpsse_ida_remove(void *data)
ida_free(&gpio_mpsse_ida, priv->id);
}
+static void gpio_mpsse_usb_put_dev(void *data)
+{
+ struct mpsse_priv *priv = data;
+
+ usb_put_dev(priv->udev);
+}
+
static int mpsse_init_valid_mask(struct gpio_chip *chip,
unsigned long *valid_mask,
unsigned int ngpios)
@@ -592,6 +599,10 @@ static int gpio_mpsse_probe(struct usb_interface *interface,
INIT_LIST_HEAD(&priv->workers);
priv->udev = usb_get_dev(interface_to_usbdev(interface));
+ err = devm_add_action_or_reset(dev, gpio_mpsse_usb_put_dev, priv);
+ if (err)
+ return err;
+
priv->intf = interface;
priv->intf_id = interface->cur_altsetting->desc.bInterfaceNumber;
@@ -713,7 +724,6 @@ static void gpio_mpsse_disconnect(struct usb_interface *intf)
priv->intf = NULL;
usb_set_intfdata(intf, NULL);
- usb_put_dev(priv->udev);
}
static struct usb_driver gpio_mpsse_driver = {
On Mon, Jan 12, 2026 at 12:23:45PM -0500, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> bcache: fix improper use of bi_end_io
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> bcache-fix-improper-use-of-bi_end_io.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Has this code been tested?
bio_endio() is not equivalent to calling bi_end_io; if the code is
swapping out bi_end_io to pass it down the stack, then you have two
completions on the same bio - you cannot call bio_endio() twice on the
same bio.
>
>
>
> commit 81e7e43a810e8f40e163928d441de02d2816b073
> Author: Shida Zhang <zhangshida(a)kylinos.cn>
> Date: Tue Dec 9 17:01:56 2025 +0800
>
> bcache: fix improper use of bi_end_io
>
> [ Upstream commit 53280e398471f0bddbb17b798a63d41264651325 ]
>
> Don't call bio->bi_end_io() directly. Use the bio_endio() helper
> function instead, which handles completion more safely and uniformly.
>
> Suggested-by: Christoph Hellwig <hch(a)infradead.org>
> Reviewed-by: Christoph Hellwig <hch(a)lst.de>
> Signed-off-by: Shida Zhang <zhangshida(a)kylinos.cn>
> Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index a9b1f3896249b..b4059d2daa326 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -1090,7 +1090,7 @@ static void detached_dev_end_io(struct bio *bio)
> }
>
> kfree(ddip);
> - bio->bi_end_io(bio);
> + bio_endio(bio);
> }
>
> static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
> @@ -1107,7 +1107,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
> ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO);
> if (!ddip) {
> bio->bi_status = BLK_STS_RESOURCE;
> - bio->bi_end_io(bio);
> + bio_endio(bio);
> return;
> }
>
> @@ -1122,7 +1122,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
>
> if ((bio_op(bio) == REQ_OP_DISCARD) &&
> !bdev_max_discard_sectors(dc->bdev))
> - bio->bi_end_io(bio);
> + detached_dev_end_io(bio);
> else
> submit_bio_noacct(bio);
> }
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 830988b6cf197e6dcffdfe2008c5738e6c6c3c0f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011239-rockband-chewable-857b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 830988b6cf197e6dcffdfe2008c5738e6c6c3c0f Mon Sep 17 00:00:00 2001
From: Haoxiang Li <lihaoxiang(a)isrc.iscas.ac.cn>
Date: Sat, 20 Dec 2025 00:28:45 +0800
Subject: [PATCH] ALSA: ac97: fix a double free in
snd_ac97_controller_register()
If ac97_add_adapter() fails, put_device() is the correct way to drop
the device reference. kfree() is not required.
Add kfree() if idr_alloc() fails and in ac97_adapter_release() to do
the cleanup.
Found by code review.
Fixes: 74426fbff66e ("ALSA: ac97: add an ac97 bus")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <lihaoxiang(a)isrc.iscas.ac.cn>
Link: https://patch.msgid.link/20251219162845.657525-1-lihaoxiang@isrc.iscas.ac.cn
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/ac97/bus.c b/sound/ac97/bus.c
index f4254703d29f..bb9b795e0226 100644
--- a/sound/ac97/bus.c
+++ b/sound/ac97/bus.c
@@ -298,6 +298,7 @@ static void ac97_adapter_release(struct device *dev)
idr_remove(&ac97_adapter_idr, ac97_ctrl->nr);
dev_dbg(&ac97_ctrl->adap, "adapter unregistered by %s\n",
dev_name(ac97_ctrl->parent));
+ kfree(ac97_ctrl);
}
static const struct device_type ac97_adapter_type = {
@@ -319,7 +320,9 @@ static int ac97_add_adapter(struct ac97_controller *ac97_ctrl)
ret = device_register(&ac97_ctrl->adap);
if (ret)
put_device(&ac97_ctrl->adap);
- }
+ } else
+ kfree(ac97_ctrl);
+
if (!ret) {
list_add(&ac97_ctrl->controllers, &ac97_controllers);
dev_dbg(&ac97_ctrl->adap, "adapter registered by %s\n",
@@ -361,14 +364,11 @@ struct ac97_controller *snd_ac97_controller_register(
ret = ac97_add_adapter(ac97_ctrl);
if (ret)
- goto err;
+ return ERR_PTR(ret);
ac97_bus_reset(ac97_ctrl);
ac97_bus_scan(ac97_ctrl);
return ac97_ctrl;
-err:
- kfree(ac97_ctrl);
- return ERR_PTR(ret);
}
EXPORT_SYMBOL_GPL(snd_ac97_controller_register);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 23f9485510c338476b9735d516c1d4aacb810d46
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011244-unbaked-pajamas-9c74@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 23f9485510c338476b9735d516c1d4aacb810d46 Mon Sep 17 00:00:00 2001
From: Alexander Sverdlin <alexander.sverdlin(a)gmail.com>
Date: Tue, 18 Nov 2025 09:35:48 +0100
Subject: [PATCH] counter: interrupt-cnt: Drop IRQF_NO_THREAD flag
An IRQ handler can either be IRQF_NO_THREAD or acquire spinlock_t, as
CONFIG_PROVE_RAW_LOCK_NESTING warns:
=============================
[ BUG: Invalid wait context ]
6.18.0-rc1+git... #1
-----------------------------
some-user-space-process/1251 is trying to lock:
(&counter->events_list_lock){....}-{3:3}, at: counter_push_event [counter]
other info that might help us debug this:
context-{2:2}
no locks held by some-user-space-process/....
stack backtrace:
CPU: 0 UID: 0 PID: 1251 Comm: some-user-space-process 6.18.0-rc1+git... #1 PREEMPT
Call trace:
show_stack (C)
dump_stack_lvl
dump_stack
__lock_acquire
lock_acquire
_raw_spin_lock_irqsave
counter_push_event [counter]
interrupt_cnt_isr [interrupt_cnt]
__handle_irq_event_percpu
handle_irq_event
handle_simple_irq
handle_irq_desc
generic_handle_domain_irq
gpio_irq_handler
handle_irq_desc
generic_handle_domain_irq
gic_handle_irq
call_on_irq_stack
do_interrupt_handler
el0_interrupt
__el0_irq_handler_common
el0t_64_irq_handler
el0t_64_irq
... and Sebastian correctly points out. Remove IRQF_NO_THREAD as an
alternative to switching to raw_spinlock_t, because the latter would limit
all potential nested locks to raw_spinlock_t only.
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20251117151314.xwLAZrWY@linutronix.de/
Fixes: a55ebd47f21f ("counter: add IRQ or GPIO based counter")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Reviewed-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Link: https://lore.kernel.org/r/20251118083603.778626-1-alexander.sverdlin@siemen…
Signed-off-by: William Breathitt Gray <wbg(a)kernel.org>
diff --git a/drivers/counter/interrupt-cnt.c b/drivers/counter/interrupt-cnt.c
index 6c0c1d2d7027..e6100b5fb082 100644
--- a/drivers/counter/interrupt-cnt.c
+++ b/drivers/counter/interrupt-cnt.c
@@ -229,8 +229,7 @@ static int interrupt_cnt_probe(struct platform_device *pdev)
irq_set_status_flags(priv->irq, IRQ_NOAUTOEN);
ret = devm_request_irq(dev, priv->irq, interrupt_cnt_isr,
- IRQF_TRIGGER_RISING | IRQF_NO_THREAD,
- dev_name(dev), counter);
+ IRQF_TRIGGER_RISING, dev_name(dev), counter);
if (ret)
return ret;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 830988b6cf197e6dcffdfe2008c5738e6c6c3c0f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011239-sustained-underpass-b703@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 830988b6cf197e6dcffdfe2008c5738e6c6c3c0f Mon Sep 17 00:00:00 2001
From: Haoxiang Li <lihaoxiang(a)isrc.iscas.ac.cn>
Date: Sat, 20 Dec 2025 00:28:45 +0800
Subject: [PATCH] ALSA: ac97: fix a double free in
snd_ac97_controller_register()
If ac97_add_adapter() fails, put_device() is the correct way to drop
the device reference. kfree() is not required.
Add kfree() if idr_alloc() fails and in ac97_adapter_release() to do
the cleanup.
Found by code review.
Fixes: 74426fbff66e ("ALSA: ac97: add an ac97 bus")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <lihaoxiang(a)isrc.iscas.ac.cn>
Link: https://patch.msgid.link/20251219162845.657525-1-lihaoxiang@isrc.iscas.ac.cn
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/ac97/bus.c b/sound/ac97/bus.c
index f4254703d29f..bb9b795e0226 100644
--- a/sound/ac97/bus.c
+++ b/sound/ac97/bus.c
@@ -298,6 +298,7 @@ static void ac97_adapter_release(struct device *dev)
idr_remove(&ac97_adapter_idr, ac97_ctrl->nr);
dev_dbg(&ac97_ctrl->adap, "adapter unregistered by %s\n",
dev_name(ac97_ctrl->parent));
+ kfree(ac97_ctrl);
}
static const struct device_type ac97_adapter_type = {
@@ -319,7 +320,9 @@ static int ac97_add_adapter(struct ac97_controller *ac97_ctrl)
ret = device_register(&ac97_ctrl->adap);
if (ret)
put_device(&ac97_ctrl->adap);
- }
+ } else
+ kfree(ac97_ctrl);
+
if (!ret) {
list_add(&ac97_ctrl->controllers, &ac97_controllers);
dev_dbg(&ac97_ctrl->adap, "adapter registered by %s\n",
@@ -361,14 +364,11 @@ struct ac97_controller *snd_ac97_controller_register(
ret = ac97_add_adapter(ac97_ctrl);
if (ret)
- goto err;
+ return ERR_PTR(ret);
ac97_bus_reset(ac97_ctrl);
ac97_bus_scan(ac97_ctrl);
return ac97_ctrl;
-err:
- kfree(ac97_ctrl);
- return ERR_PTR(ret);
}
EXPORT_SYMBOL_GPL(snd_ac97_controller_register);
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 830988b6cf197e6dcffdfe2008c5738e6c6c3c0f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011238-subduing-rural-1bbc@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 830988b6cf197e6dcffdfe2008c5738e6c6c3c0f Mon Sep 17 00:00:00 2001
From: Haoxiang Li <lihaoxiang(a)isrc.iscas.ac.cn>
Date: Sat, 20 Dec 2025 00:28:45 +0800
Subject: [PATCH] ALSA: ac97: fix a double free in
snd_ac97_controller_register()
If ac97_add_adapter() fails, put_device() is the correct way to drop
the device reference. kfree() is not required.
Add kfree() if idr_alloc() fails and in ac97_adapter_release() to do
the cleanup.
Found by code review.
Fixes: 74426fbff66e ("ALSA: ac97: add an ac97 bus")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <lihaoxiang(a)isrc.iscas.ac.cn>
Link: https://patch.msgid.link/20251219162845.657525-1-lihaoxiang@isrc.iscas.ac.cn
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/ac97/bus.c b/sound/ac97/bus.c
index f4254703d29f..bb9b795e0226 100644
--- a/sound/ac97/bus.c
+++ b/sound/ac97/bus.c
@@ -298,6 +298,7 @@ static void ac97_adapter_release(struct device *dev)
idr_remove(&ac97_adapter_idr, ac97_ctrl->nr);
dev_dbg(&ac97_ctrl->adap, "adapter unregistered by %s\n",
dev_name(ac97_ctrl->parent));
+ kfree(ac97_ctrl);
}
static const struct device_type ac97_adapter_type = {
@@ -319,7 +320,9 @@ static int ac97_add_adapter(struct ac97_controller *ac97_ctrl)
ret = device_register(&ac97_ctrl->adap);
if (ret)
put_device(&ac97_ctrl->adap);
- }
+ } else
+ kfree(ac97_ctrl);
+
if (!ret) {
list_add(&ac97_ctrl->controllers, &ac97_controllers);
dev_dbg(&ac97_ctrl->adap, "adapter registered by %s\n",
@@ -361,14 +364,11 @@ struct ac97_controller *snd_ac97_controller_register(
ret = ac97_add_adapter(ac97_ctrl);
if (ret)
- goto err;
+ return ERR_PTR(ret);
ac97_bus_reset(ac97_ctrl);
ac97_bus_scan(ac97_ctrl);
return ac97_ctrl;
-err:
- kfree(ac97_ctrl);
- return ERR_PTR(ret);
}
EXPORT_SYMBOL_GPL(snd_ac97_controller_register);
Commit 53326135d0e0 ("i2c: riic: Add suspend/resume support") added
suspend support for the Renesas I2C driver and following this change
on RZ/G3E the following WARNING is seen on entering suspend ...
[ 134.275704] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 134.285536] ------------[ cut here ]------------
[ 134.290298] i2c i2c-2: Transfer while suspended
[ 134.295174] WARNING: drivers/i2c/i2c-core.h:56 at __i2c_smbus_xfer+0x1e4/0x214, CPU#0: systemd-sleep/388
[ 134.365507] Tainted: [W]=WARN
[ 134.368485] Hardware name: Renesas SMARC EVK version 2 based on r9a09g047e57 (DT)
[ 134.375961] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 134.382935] pc : __i2c_smbus_xfer+0x1e4/0x214
[ 134.387329] lr : __i2c_smbus_xfer+0x1e4/0x214
[ 134.391717] sp : ffff800083f23860
[ 134.395040] x29: ffff800083f23860 x28: 0000000000000000 x27: ffff800082ed5d60
[ 134.402226] x26: 0000001f4395fd74 x25: 0000000000000007 x24: 0000000000000001
[ 134.409408] x23: 0000000000000000 x22: 000000000000006f x21: ffff800083f23936
[ 134.416589] x20: ffff0000c090e140 x19: ffff0000c090e0d0 x18: 0000000000000006
[ 134.423771] x17: 6f63657320313030 x16: 2e30206465737061 x15: ffff800083f23280
[ 134.430953] x14: 0000000000000000 x13: ffff800082b16ce8 x12: 0000000000000f09
[ 134.438134] x11: 0000000000000503 x10: ffff800082b6ece8 x9 : ffff800082b16ce8
[ 134.445315] x8 : 00000000ffffefff x7 : ffff800082b6ece8 x6 : 80000000fffff000
[ 134.452495] x5 : 0000000000000504 x4 : 0000000000000000 x3 : 0000000000000000
[ 134.459672] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c9ee9e80
[ 134.466851] Call trace:
[ 134.469311] __i2c_smbus_xfer+0x1e4/0x214 (P)
[ 134.473715] i2c_smbus_xfer+0xbc/0x120
[ 134.477507] i2c_smbus_read_byte_data+0x4c/0x84
[ 134.482077] isl1208_i2c_read_time+0x44/0x178 [rtc_isl1208]
[ 134.487703] isl1208_rtc_read_time+0x14/0x20 [rtc_isl1208]
[ 134.493226] __rtc_read_time+0x44/0x88
[ 134.497012] rtc_read_time+0x3c/0x68
[ 134.500622] rtc_suspend+0x9c/0x170
The warning is triggered because I2C transfers can still be attempted
while the controller is already suspended, due to inappropriate ordering
of the system sleep callbacks.
If the controller is autosuspended, there is no way to wake it up once
runtime PM disabled (in suspend_late()). During system resume, the I2C
controller will be available only after runtime PM is re-enabled
(in resume_early()). However, this may be too late for some devices.
Wake up the controller in the suspend() callback while runtime PM is
still enabled. The I2C controller will remain available until the
suspend_noirq() callback (pm_runtime_force_suspend()) is called. During
resume, the I2C controller can be restored by the resume_noirq() callback
(pm_runtime_force_resume()). Finally, the resume() callback re-enables
autosuspend. As a result, the I2C controller can remain available until
the system enters suspend_noirq() and from resume_noirq().
Cc: stable(a)vger.kernel.org
Fixes: 53326135d0e0 ("i2c: riic: Add suspend/resume support")
Signed-off-by: Tommaso Merciai <tommaso.merciai.xr(a)bp.renesas.com>
---
v1->v2:
- Taking as reference commit:
4262df2a69c3 ("i2c: imx-lpi2c: make controller available until the system
enters suspend_noirq() and from resume_noirq().") reworked the patch with
a similar approach. Updated commit body accordingly.
drivers/i2c/busses/i2c-riic.c | 46 +++++++++++++++++++++++++++++------
1 file changed, 39 insertions(+), 7 deletions(-)
diff --git a/drivers/i2c/busses/i2c-riic.c b/drivers/i2c/busses/i2c-riic.c
index 3e8f126cb7f7..9e3595b3623e 100644
--- a/drivers/i2c/busses/i2c-riic.c
+++ b/drivers/i2c/busses/i2c-riic.c
@@ -670,12 +670,39 @@ static const struct riic_of_data riic_rz_t2h_info = {
static int riic_i2c_suspend(struct device *dev)
{
- struct riic_dev *riic = dev_get_drvdata(dev);
- int ret;
+ /*
+ * Some I2C devices may need the I2C controller to remain active
+ * during resume_noirq() or suspend_noirq(). If the controller is
+ * autosuspended, there is no way to wake it up once runtime PM is
+ * disabled (in suspend_late()).
+ *
+ * During system resume, the I2C controller will be available only
+ * after runtime PM is re-enabled (in resume_early()). However, this
+ * may be too late for some devices.
+ *
+ * Wake up the controller in the suspend() callback while runtime PM
+ * is still enabled. The I2C controller will remain available until
+ * the suspend_noirq() callback (pm_runtime_force_suspend()) is
+ * called. During resume, the I2C controller can be restored by the
+ * resume_noirq() callback (pm_runtime_force_resume()).
+ *
+ * Finally, the resume() callback re-enables autosuspend, ensuring
+ * the I2C controller remains available until the system enters
+ * suspend_noirq() and from resume_noirq().
+ */
+ return pm_runtime_resume_and_get(dev);
+}
- ret = pm_runtime_resume_and_get(dev);
- if (ret)
- return ret;
+static int riic_i2c_resume(struct device *dev)
+{
+ pm_runtime_put_autosuspend(dev);
+
+ return 0;
+}
+
+static int riic_i2c_suspend_noirq(struct device *dev)
+{
+ struct riic_dev *riic = dev_get_drvdata(dev);
i2c_mark_adapter_suspended(&riic->adapter);
@@ -683,12 +710,12 @@ static int riic_i2c_suspend(struct device *dev)
riic_clear_set_bit(riic, ICCR1_ICE, 0, RIIC_ICCR1);
pm_runtime_mark_last_busy(dev);
- pm_runtime_put_sync(dev);
+ pm_runtime_force_suspend(dev);
return reset_control_assert(riic->rstc);
}
-static int riic_i2c_resume(struct device *dev)
+static int riic_i2c_resume_noirq(struct device *dev)
{
struct riic_dev *riic = dev_get_drvdata(dev);
int ret;
@@ -697,6 +724,10 @@ static int riic_i2c_resume(struct device *dev)
if (ret)
return ret;
+ ret = pm_runtime_force_resume(dev);
+ if (ret)
+ return ret;
+
ret = riic_init_hw(riic);
if (ret) {
/*
@@ -714,6 +745,7 @@ static int riic_i2c_resume(struct device *dev)
}
static const struct dev_pm_ops riic_i2c_pm_ops = {
+ NOIRQ_SYSTEM_SLEEP_PM_OPS(riic_i2c_suspend_noirq, riic_i2c_resume_noirq)
SYSTEM_SLEEP_PM_OPS(riic_i2c_suspend, riic_i2c_resume)
};
--
2.43.0
Some of the hardware registers of the DMM-32-AT board are multiplexed,
using the least significant two bits of the Miscellaneous Control
register to select the function of registers at offsets 12 to 15:
00 => 8254 timer/counter registers are accessible
01 => 8255 digital I/O registers are accessible
10 => Reserved
11 => Calibration registers are accessible
The interrupt service routine (`dmm32at_isr()`) clobbers the bottom two
bits of the register with value 00, which would interfere with access to
the 8255 registers by the `dm32at_8255_io()` function (used for Comedi
instruction handling on the digital I/O subdevice).
Make use of the generic Comedi device spin-lock `dev->spinlock` (which
is otherwise unused by this driver) to serialize access to the
miscellaneous control register and paged registers.
Fixes: 3c501880ac44 ("Staging: comedi: add dmm32at driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/comedi/drivers/dmm32at.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/drivers/comedi/drivers/dmm32at.c b/drivers/comedi/drivers/dmm32at.c
index 644e3b643c79..910cd24b1bed 100644
--- a/drivers/comedi/drivers/dmm32at.c
+++ b/drivers/comedi/drivers/dmm32at.c
@@ -330,6 +330,7 @@ static int dmm32at_ai_cmdtest(struct comedi_device *dev,
static void dmm32at_setaitimer(struct comedi_device *dev, unsigned int nansec)
{
+ unsigned long irq_flags;
unsigned char lo1, lo2, hi2;
unsigned short both2;
@@ -342,6 +343,9 @@ static void dmm32at_setaitimer(struct comedi_device *dev, unsigned int nansec)
/* set counter clocks to 10MHz, disable all aux dio */
outb(0, dev->iobase + DMM32AT_CTRDIO_CFG_REG);
+ /* serialize access to control register and paged registers */
+ spin_lock_irqsave(&dev->spinlock, irq_flags);
+
/* get access to the clock regs */
outb(DMM32AT_CTRL_PAGE_8254, dev->iobase + DMM32AT_CTRL_REG);
@@ -354,6 +358,8 @@ static void dmm32at_setaitimer(struct comedi_device *dev, unsigned int nansec)
outb(lo2, dev->iobase + DMM32AT_CLK2);
outb(hi2, dev->iobase + DMM32AT_CLK2);
+ spin_unlock_irqrestore(&dev->spinlock, irq_flags);
+
/* enable the ai conversion interrupt and the clock to start scans */
outb(DMM32AT_INTCLK_ADINT |
DMM32AT_INTCLK_CLKEN | DMM32AT_INTCLK_CLKSEL,
@@ -363,13 +369,19 @@ static void dmm32at_setaitimer(struct comedi_device *dev, unsigned int nansec)
static int dmm32at_ai_cmd(struct comedi_device *dev, struct comedi_subdevice *s)
{
struct comedi_cmd *cmd = &s->async->cmd;
+ unsigned long irq_flags;
int ret;
dmm32at_ai_set_chanspec(dev, s, cmd->chanlist[0], cmd->chanlist_len);
+ /* serialize access to control register and paged registers */
+ spin_lock_irqsave(&dev->spinlock, irq_flags);
+
/* reset the interrupt just in case */
outb(DMM32AT_CTRL_INTRST, dev->iobase + DMM32AT_CTRL_REG);
+ spin_unlock_irqrestore(&dev->spinlock, irq_flags);
+
/*
* wait for circuit to settle
* we don't have the 'insn' here but it's not needed
@@ -429,8 +441,13 @@ static irqreturn_t dmm32at_isr(int irq, void *d)
comedi_handle_events(dev, s);
}
+ /* serialize access to control register and paged registers */
+ spin_lock(&dev->spinlock);
+
/* reset the interrupt */
outb(DMM32AT_CTRL_INTRST, dev->iobase + DMM32AT_CTRL_REG);
+
+ spin_unlock(&dev->spinlock);
return IRQ_HANDLED;
}
@@ -481,14 +498,25 @@ static int dmm32at_ao_insn_write(struct comedi_device *dev,
static int dmm32at_8255_io(struct comedi_device *dev,
int dir, int port, int data, unsigned long regbase)
{
+ unsigned long irq_flags;
+ int ret;
+
+ /* serialize access to control register and paged registers */
+ spin_lock_irqsave(&dev->spinlock, irq_flags);
+
/* get access to the DIO regs */
outb(DMM32AT_CTRL_PAGE_8255, dev->iobase + DMM32AT_CTRL_REG);
if (dir) {
outb(data, dev->iobase + regbase + port);
- return 0;
+ ret = 0;
+ } else {
+ ret = inb(dev->iobase + regbase + port);
}
- return inb(dev->iobase + regbase + port);
+
+ spin_unlock_irqrestore(&dev->spinlock, irq_flags);
+
+ return ret;
}
/* Make sure the board is there and put it to a known state */
--
2.51.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2857bd59feb63fcf40fe4baf55401baea6b4feb4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011224-thesaurus-unmixed-7797@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2857bd59feb63fcf40fe4baf55401baea6b4feb4 Mon Sep 17 00:00:00 2001
From: NeilBrown <neil(a)brown.name>
Date: Sat, 13 Dec 2025 13:41:59 -0500
Subject: [PATCH] nfsd: provide locking for v4_end_grace
Writing to v4_end_grace can race with server shutdown and result in
memory being accessed after it was freed - reclaim_str_hashtbl in
particularly.
We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
held while client_tracking_op->init() is called and that can wait for
an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
deadlock.
nfsd4_end_grace() is also called by the landromat work queue and this
doesn't require locking as server shutdown will stop the work and wait
for it before freeing anything that nfsd4_end_grace() might access.
However, we must be sure that writing to v4_end_grace doesn't restart
the work item after shutdown has already waited for it. For this we
add a new flag protected with nn->client_lock. It is set only while it
is safe to make client tracking calls, and v4_end_grace only schedules
work while the flag is set with the spinlock held.
So this patch adds a nfsd_net field "client_tracking_active" which is
set as described. Another field "grace_end_forced", is set when
v4_end_grace is written. After this is set, and providing
client_tracking_active is set, the laundromat is scheduled.
This "grace_end_forced" field bypasses other checks for whether the
grace period has finished.
This resolves a race which can result in use-after-free.
Reported-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/…
Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
Cc: stable(a)vger.kernel.org
Signed-off-by: NeilBrown <neil(a)brown.name>
Tested-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 3e2d0fde80a7..fe8338735e7c 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -66,6 +66,8 @@ struct nfsd_net {
struct lock_manager nfsd4_manager;
bool grace_ended;
+ bool grace_end_forced;
+ bool client_tracking_active;
time64_t boot_time;
struct dentry *nfsd_client_dir;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 5b83cb33bf83..a1dccce8b99c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -84,7 +84,7 @@ static u64 current_sessionid = 1;
/* forward declarations */
static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner);
static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
-void nfsd4_end_grace(struct nfsd_net *nn);
+static void nfsd4_end_grace(struct nfsd_net *nn);
static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
static void nfsd4_file_hash_remove(struct nfs4_file *fi);
static void deleg_reaper(struct nfsd_net *nn);
@@ -6570,7 +6570,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return nfs_ok;
}
-void
+static void
nfsd4_end_grace(struct nfsd_net *nn)
{
/* do nothing if grace period already ended */
@@ -6603,6 +6603,33 @@ nfsd4_end_grace(struct nfsd_net *nn)
*/
}
+/**
+ * nfsd4_force_end_grace - forcibly end the NFSv4 grace period
+ * @nn: network namespace for the server instance to be updated
+ *
+ * Forces bypass of normal grace period completion, then schedules
+ * the laundromat to end the grace period immediately. Does not wait
+ * for the grace period to fully terminate before returning.
+ *
+ * Return values:
+ * %true: Grace termination schedule
+ * %false: No action was taken
+ */
+bool nfsd4_force_end_grace(struct nfsd_net *nn)
+{
+ if (!nn->client_tracking_ops)
+ return false;
+ spin_lock(&nn->client_lock);
+ if (nn->grace_ended || !nn->client_tracking_active) {
+ spin_unlock(&nn->client_lock);
+ return false;
+ }
+ WRITE_ONCE(nn->grace_end_forced, true);
+ mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
+ spin_unlock(&nn->client_lock);
+ return true;
+}
+
/*
* If we've waited a lease period but there are still clients trying to
* reclaim, wait a little longer to give them a chance to finish.
@@ -6612,6 +6639,8 @@ static bool clients_still_reclaiming(struct nfsd_net *nn)
time64_t double_grace_period_end = nn->boot_time +
2 * nn->nfsd4_lease;
+ if (READ_ONCE(nn->grace_end_forced))
+ return false;
if (nn->track_reclaim_completes &&
atomic_read(&nn->nr_reclaim_complete) ==
nn->reclaim_str_hashtbl_size)
@@ -8931,6 +8960,8 @@ static int nfs4_state_create_net(struct net *net)
nn->unconf_name_tree = RB_ROOT;
nn->boot_time = ktime_get_real_seconds();
nn->grace_ended = false;
+ nn->grace_end_forced = false;
+ nn->client_tracking_active = false;
nn->nfsd4_manager.block_opens = true;
INIT_LIST_HEAD(&nn->nfsd4_manager.list);
INIT_LIST_HEAD(&nn->client_lru);
@@ -9011,6 +9042,10 @@ nfs4_state_start_net(struct net *net)
return ret;
locks_start_grace(net, &nn->nfsd4_manager);
nfsd4_client_tracking_init(net);
+ /* safe for laundromat to run now */
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = true;
+ spin_unlock(&nn->client_lock);
if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
goto skip_grace;
printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
@@ -9059,6 +9094,9 @@ nfs4_state_shutdown_net(struct net *net)
shrinker_free(nn->nfsd_client_shrinker);
cancel_work_sync(&nn->nfsd_shrinker_work);
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = false;
+ spin_unlock(&nn->client_lock);
cancel_delayed_work_sync(&nn->laundromat_work);
locks_end_grace(&nn->nfsd4_manager);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 2b79129703d5..36ce3ca97d97 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1082,10 +1082,9 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
case 'Y':
case 'y':
case '1':
- if (!nn->nfsd_serv)
+ if (!nfsd4_force_end_grace(nn))
return -EBUSY;
trace_nfsd_end_grace(netns(file));
- nfsd4_end_grace(nn);
break;
default:
return -EINVAL;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 1e736f402426..50d2b2963390 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -849,7 +849,7 @@ static inline void nfsd4_revoke_states(struct net *net, struct super_block *sb)
#endif
/* grace period management */
-void nfsd4_end_grace(struct nfsd_net *nn);
+bool nfsd4_force_end_grace(struct nfsd_net *nn);
/* nfs4recover operations */
extern int nfsd4_client_tracking_init(struct net *net);
The arm64 kernel doesn't boot with annotated branches
(PROFILE_ANNOTATED_BRANCHES) enabled and CONFIG_DEBUG_VIRTUAL together.
Bisecting it, I found that disabling branch profiling in arch/arm64/mm
solved the problem. Narrowing down a bit further, I found that
physaddr.c is the file that needs to have branch profiling disabled to
get the machine to boot.
I suspect that it might invoke some ftrace helper very early in the boot
process and ftrace is still not enabled(!?).
Disable branch profiling for physaddr.o to allow booting an arm64
machine with CONFIG_PROFILE_ANNOTATED_BRANCHES and
CONFIG_DEBUG_VIRTUAL together.
Cc: stable(a)vger.kernel.org
Fixes: ec6d06efb0bac ("arm64: Add support for CONFIG_DEBUG_VIRTUAL")
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Another approach is to disable profiling on all arch/arm64 code, similarly to
x86, where DISABLE_BRANCH_PROFILING is called for all arch/x86 code. See
commit 2cbb20b008dba ("tracing: Disable branch profiling in noinstr
code").
---
arch/arm64/mm/Makefile | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index c26489cf96cd..8bfe2451ea26 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -14,5 +14,10 @@ obj-$(CONFIG_ARM64_MTE) += mteswap.o
obj-$(CONFIG_ARM64_GCS) += gcs.o
KASAN_SANITIZE_physaddr.o += n
+# Branch profiling isn't noinstr-safe
+ifdef CONFIG_TRACE_BRANCH_PROFILING
+CFLAGS_physaddr.o += -DDISABLE_BRANCH_PROFILING
+endif
+
obj-$(CONFIG_KASAN) += kasan_init.o
KASAN_SANITIZE_kasan_init.o := n
---
base-commit: c8ebd433459bcbf068682b09544e830acd7ed222
change-id: 20251231-annotated-75de3f33cd7b
Best regards,
--
Breno Leitao <leitao(a)debian.org>
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2857bd59feb63fcf40fe4baf55401baea6b4feb4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011223-account-preteen-f696@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2857bd59feb63fcf40fe4baf55401baea6b4feb4 Mon Sep 17 00:00:00 2001
From: NeilBrown <neil(a)brown.name>
Date: Sat, 13 Dec 2025 13:41:59 -0500
Subject: [PATCH] nfsd: provide locking for v4_end_grace
Writing to v4_end_grace can race with server shutdown and result in
memory being accessed after it was freed - reclaim_str_hashtbl in
particularly.
We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
held while client_tracking_op->init() is called and that can wait for
an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
deadlock.
nfsd4_end_grace() is also called by the landromat work queue and this
doesn't require locking as server shutdown will stop the work and wait
for it before freeing anything that nfsd4_end_grace() might access.
However, we must be sure that writing to v4_end_grace doesn't restart
the work item after shutdown has already waited for it. For this we
add a new flag protected with nn->client_lock. It is set only while it
is safe to make client tracking calls, and v4_end_grace only schedules
work while the flag is set with the spinlock held.
So this patch adds a nfsd_net field "client_tracking_active" which is
set as described. Another field "grace_end_forced", is set when
v4_end_grace is written. After this is set, and providing
client_tracking_active is set, the laundromat is scheduled.
This "grace_end_forced" field bypasses other checks for whether the
grace period has finished.
This resolves a race which can result in use-after-free.
Reported-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/…
Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
Cc: stable(a)vger.kernel.org
Signed-off-by: NeilBrown <neil(a)brown.name>
Tested-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 3e2d0fde80a7..fe8338735e7c 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -66,6 +66,8 @@ struct nfsd_net {
struct lock_manager nfsd4_manager;
bool grace_ended;
+ bool grace_end_forced;
+ bool client_tracking_active;
time64_t boot_time;
struct dentry *nfsd_client_dir;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 5b83cb33bf83..a1dccce8b99c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -84,7 +84,7 @@ static u64 current_sessionid = 1;
/* forward declarations */
static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner);
static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
-void nfsd4_end_grace(struct nfsd_net *nn);
+static void nfsd4_end_grace(struct nfsd_net *nn);
static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
static void nfsd4_file_hash_remove(struct nfs4_file *fi);
static void deleg_reaper(struct nfsd_net *nn);
@@ -6570,7 +6570,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return nfs_ok;
}
-void
+static void
nfsd4_end_grace(struct nfsd_net *nn)
{
/* do nothing if grace period already ended */
@@ -6603,6 +6603,33 @@ nfsd4_end_grace(struct nfsd_net *nn)
*/
}
+/**
+ * nfsd4_force_end_grace - forcibly end the NFSv4 grace period
+ * @nn: network namespace for the server instance to be updated
+ *
+ * Forces bypass of normal grace period completion, then schedules
+ * the laundromat to end the grace period immediately. Does not wait
+ * for the grace period to fully terminate before returning.
+ *
+ * Return values:
+ * %true: Grace termination schedule
+ * %false: No action was taken
+ */
+bool nfsd4_force_end_grace(struct nfsd_net *nn)
+{
+ if (!nn->client_tracking_ops)
+ return false;
+ spin_lock(&nn->client_lock);
+ if (nn->grace_ended || !nn->client_tracking_active) {
+ spin_unlock(&nn->client_lock);
+ return false;
+ }
+ WRITE_ONCE(nn->grace_end_forced, true);
+ mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
+ spin_unlock(&nn->client_lock);
+ return true;
+}
+
/*
* If we've waited a lease period but there are still clients trying to
* reclaim, wait a little longer to give them a chance to finish.
@@ -6612,6 +6639,8 @@ static bool clients_still_reclaiming(struct nfsd_net *nn)
time64_t double_grace_period_end = nn->boot_time +
2 * nn->nfsd4_lease;
+ if (READ_ONCE(nn->grace_end_forced))
+ return false;
if (nn->track_reclaim_completes &&
atomic_read(&nn->nr_reclaim_complete) ==
nn->reclaim_str_hashtbl_size)
@@ -8931,6 +8960,8 @@ static int nfs4_state_create_net(struct net *net)
nn->unconf_name_tree = RB_ROOT;
nn->boot_time = ktime_get_real_seconds();
nn->grace_ended = false;
+ nn->grace_end_forced = false;
+ nn->client_tracking_active = false;
nn->nfsd4_manager.block_opens = true;
INIT_LIST_HEAD(&nn->nfsd4_manager.list);
INIT_LIST_HEAD(&nn->client_lru);
@@ -9011,6 +9042,10 @@ nfs4_state_start_net(struct net *net)
return ret;
locks_start_grace(net, &nn->nfsd4_manager);
nfsd4_client_tracking_init(net);
+ /* safe for laundromat to run now */
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = true;
+ spin_unlock(&nn->client_lock);
if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
goto skip_grace;
printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
@@ -9059,6 +9094,9 @@ nfs4_state_shutdown_net(struct net *net)
shrinker_free(nn->nfsd_client_shrinker);
cancel_work_sync(&nn->nfsd_shrinker_work);
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = false;
+ spin_unlock(&nn->client_lock);
cancel_delayed_work_sync(&nn->laundromat_work);
locks_end_grace(&nn->nfsd4_manager);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 2b79129703d5..36ce3ca97d97 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1082,10 +1082,9 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
case 'Y':
case 'y':
case '1':
- if (!nn->nfsd_serv)
+ if (!nfsd4_force_end_grace(nn))
return -EBUSY;
trace_nfsd_end_grace(netns(file));
- nfsd4_end_grace(nn);
break;
default:
return -EINVAL;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 1e736f402426..50d2b2963390 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -849,7 +849,7 @@ static inline void nfsd4_revoke_states(struct net *net, struct super_block *sb)
#endif
/* grace period management */
-void nfsd4_end_grace(struct nfsd_net *nn);
+bool nfsd4_force_end_grace(struct nfsd_net *nn);
/* nfs4recover operations */
extern int nfsd4_client_tracking_init(struct net *net);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 2857bd59feb63fcf40fe4baf55401baea6b4feb4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011222-abdomen-balsamic-f41c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2857bd59feb63fcf40fe4baf55401baea6b4feb4 Mon Sep 17 00:00:00 2001
From: NeilBrown <neil(a)brown.name>
Date: Sat, 13 Dec 2025 13:41:59 -0500
Subject: [PATCH] nfsd: provide locking for v4_end_grace
Writing to v4_end_grace can race with server shutdown and result in
memory being accessed after it was freed - reclaim_str_hashtbl in
particularly.
We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
held while client_tracking_op->init() is called and that can wait for
an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
deadlock.
nfsd4_end_grace() is also called by the landromat work queue and this
doesn't require locking as server shutdown will stop the work and wait
for it before freeing anything that nfsd4_end_grace() might access.
However, we must be sure that writing to v4_end_grace doesn't restart
the work item after shutdown has already waited for it. For this we
add a new flag protected with nn->client_lock. It is set only while it
is safe to make client tracking calls, and v4_end_grace only schedules
work while the flag is set with the spinlock held.
So this patch adds a nfsd_net field "client_tracking_active" which is
set as described. Another field "grace_end_forced", is set when
v4_end_grace is written. After this is set, and providing
client_tracking_active is set, the laundromat is scheduled.
This "grace_end_forced" field bypasses other checks for whether the
grace period has finished.
This resolves a race which can result in use-after-free.
Reported-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/…
Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
Cc: stable(a)vger.kernel.org
Signed-off-by: NeilBrown <neil(a)brown.name>
Tested-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 3e2d0fde80a7..fe8338735e7c 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -66,6 +66,8 @@ struct nfsd_net {
struct lock_manager nfsd4_manager;
bool grace_ended;
+ bool grace_end_forced;
+ bool client_tracking_active;
time64_t boot_time;
struct dentry *nfsd_client_dir;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 5b83cb33bf83..a1dccce8b99c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -84,7 +84,7 @@ static u64 current_sessionid = 1;
/* forward declarations */
static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner);
static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
-void nfsd4_end_grace(struct nfsd_net *nn);
+static void nfsd4_end_grace(struct nfsd_net *nn);
static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
static void nfsd4_file_hash_remove(struct nfs4_file *fi);
static void deleg_reaper(struct nfsd_net *nn);
@@ -6570,7 +6570,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return nfs_ok;
}
-void
+static void
nfsd4_end_grace(struct nfsd_net *nn)
{
/* do nothing if grace period already ended */
@@ -6603,6 +6603,33 @@ nfsd4_end_grace(struct nfsd_net *nn)
*/
}
+/**
+ * nfsd4_force_end_grace - forcibly end the NFSv4 grace period
+ * @nn: network namespace for the server instance to be updated
+ *
+ * Forces bypass of normal grace period completion, then schedules
+ * the laundromat to end the grace period immediately. Does not wait
+ * for the grace period to fully terminate before returning.
+ *
+ * Return values:
+ * %true: Grace termination schedule
+ * %false: No action was taken
+ */
+bool nfsd4_force_end_grace(struct nfsd_net *nn)
+{
+ if (!nn->client_tracking_ops)
+ return false;
+ spin_lock(&nn->client_lock);
+ if (nn->grace_ended || !nn->client_tracking_active) {
+ spin_unlock(&nn->client_lock);
+ return false;
+ }
+ WRITE_ONCE(nn->grace_end_forced, true);
+ mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
+ spin_unlock(&nn->client_lock);
+ return true;
+}
+
/*
* If we've waited a lease period but there are still clients trying to
* reclaim, wait a little longer to give them a chance to finish.
@@ -6612,6 +6639,8 @@ static bool clients_still_reclaiming(struct nfsd_net *nn)
time64_t double_grace_period_end = nn->boot_time +
2 * nn->nfsd4_lease;
+ if (READ_ONCE(nn->grace_end_forced))
+ return false;
if (nn->track_reclaim_completes &&
atomic_read(&nn->nr_reclaim_complete) ==
nn->reclaim_str_hashtbl_size)
@@ -8931,6 +8960,8 @@ static int nfs4_state_create_net(struct net *net)
nn->unconf_name_tree = RB_ROOT;
nn->boot_time = ktime_get_real_seconds();
nn->grace_ended = false;
+ nn->grace_end_forced = false;
+ nn->client_tracking_active = false;
nn->nfsd4_manager.block_opens = true;
INIT_LIST_HEAD(&nn->nfsd4_manager.list);
INIT_LIST_HEAD(&nn->client_lru);
@@ -9011,6 +9042,10 @@ nfs4_state_start_net(struct net *net)
return ret;
locks_start_grace(net, &nn->nfsd4_manager);
nfsd4_client_tracking_init(net);
+ /* safe for laundromat to run now */
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = true;
+ spin_unlock(&nn->client_lock);
if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
goto skip_grace;
printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
@@ -9059,6 +9094,9 @@ nfs4_state_shutdown_net(struct net *net)
shrinker_free(nn->nfsd_client_shrinker);
cancel_work_sync(&nn->nfsd_shrinker_work);
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = false;
+ spin_unlock(&nn->client_lock);
cancel_delayed_work_sync(&nn->laundromat_work);
locks_end_grace(&nn->nfsd4_manager);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 2b79129703d5..36ce3ca97d97 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1082,10 +1082,9 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
case 'Y':
case 'y':
case '1':
- if (!nn->nfsd_serv)
+ if (!nfsd4_force_end_grace(nn))
return -EBUSY;
trace_nfsd_end_grace(netns(file));
- nfsd4_end_grace(nn);
break;
default:
return -EINVAL;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 1e736f402426..50d2b2963390 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -849,7 +849,7 @@ static inline void nfsd4_revoke_states(struct net *net, struct super_block *sb)
#endif
/* grace period management */
-void nfsd4_end_grace(struct nfsd_net *nn);
+bool nfsd4_force_end_grace(struct nfsd_net *nn);
/* nfs4recover operations */
extern int nfsd4_client_tracking_init(struct net *net);
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x e9e3b22ddfa760762b696ac6417c8d6edd182e49
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011242-empirical-gullible-4683@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e9e3b22ddfa760762b696ac6417c8d6edd182e49 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Thu, 11 Dec 2025 12:45:17 +1030
Subject: [PATCH] btrfs: fix beyond-EOF write handling
[BUG]
For the following write sequence with 64K page size and 4K fs block size,
it will lead to file extent items to be inserted without any data
checksum:
mkfs.btrfs -s 4k -f $dev > /dev/null
mount $dev $mnt
xfs_io -f -c "pwrite 0 16k" -c "pwrite 32k 4k" -c pwrite "60k 64K" \
-c "truncate 16k" $mnt/foobar
umount $mnt
This will result the following 2 file extent items to be inserted (extra
trace point added to insert_ordered_extent_file_extent()):
btrfs_finish_one_ordered: root=5 ino=257 file_off=61440 num_bytes=4096 csum_bytes=0
btrfs_finish_one_ordered: root=5 ino=257 file_off=0 num_bytes=16384 csum_bytes=16384
Note for file offset 60K, we're inserting a file extent without any
data checksum.
Also note that range [32K, 36K) didn't reach
insert_ordered_extent_file_extent(), which is the correct behavior as
that OE is fully truncated, should not result any file extent.
Although file extent at 60K will be later dropped by btrfs_truncate(),
if the transaction got committed after file extent inserted but before
the file extent dropping, we will have a small window where we have a
file extent beyond EOF and without any data checksum.
That will cause "btrfs check" to report error.
[CAUSE]
The sequence happens like this:
- Buffered write dirtied the page cache and updated isize
Now the inode size is 64K, with the following page cache layout:
0 16K 32K 48K 64K
|/////////////| |//| |//|
- Truncate the inode to 16K
Which will trigger writeback through:
btrfs_setsize()
|- truncate_setsize()
| Now the inode size is set to 16K
|
|- btrfs_truncate()
|- btrfs_wait_ordered_range() for [16K, u64(-1)]
|- btrfs_fdatawrite_range() for [16K, u64(-1)}
|- extent_writepage() for folio 0
|- writepage_delalloc()
| Generated OE for [0, 16K), [32K, 36K] and [60K, 64K)
|
|- extent_writepage_io()
Then inside extent_writepage_io(), the dirty fs blocks are handled
differently:
- Submit write for range [0, 16K)
As they are still inside the inode size (16K).
- Mark OE [32K, 36K) as truncated
Since we only call btrfs_lookup_first_ordered_range() once, which
returned the first OE after file offset 16K.
- Mark all OEs inside range [16K, 64K) as finished
Which will mark OE ranges [32K, 36K) and [60K, 64K) as finished.
For OE [32K, 36K) since it's already marked as truncated, and its
truncated length is 0, no file extent will be inserted.
For OE [60K, 64K) it has never been submitted thus has no data
checksum, and we insert the file extent as usual.
This is the root cause of file extent at 60K to be inserted without
any data checksum.
- Clear dirty flags for range [16K, 64K)
It is the function btrfs_folio_clear_dirty() which searches and clears
any dirty blocks inside that range.
[FIX]
The bug itself was introduced a long time ago, way before subpage and
large folio support.
At that time, fs block size must match page size, thus the range
[cur, end) is just one fs block.
But later with subpage and large folios, the same range [cur, end)
can have multiple blocks and ordered extents.
Later commit 18de34daa7c6 ("btrfs: truncate ordered extent when skipping
writeback past i_size") was fixing a bug related to subpage/large
folios, but it's still utilizing the old range [cur, end), meaning only
the first OE will be marked as truncated.
The proper fix here is to make EOF handling block-by-block, not trying
to handle the whole range to @end.
By this we always locate and truncate the OE for every dirty block.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2d32dfc34ae3..97748d0d54d9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1728,7 +1728,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
struct btrfs_ordered_extent *ordered;
ordered = btrfs_lookup_first_ordered_range(inode, cur,
- folio_end - cur);
+ fs_info->sectorsize);
/*
* We have just run delalloc before getting here, so
* there must be an ordered extent.
@@ -1742,7 +1742,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
btrfs_put_ordered_extent(ordered);
btrfs_mark_ordered_io_finished(inode, folio, cur,
- end - cur, true);
+ fs_info->sectorsize, true);
/*
* This range is beyond i_size, thus we don't need to
* bother writing back.
@@ -1751,8 +1751,8 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
* writeback the sectors with subpage dirty bits,
* causing writeback without ordered extent.
*/
- btrfs_folio_clear_dirty(fs_info, folio, cur, end - cur);
- break;
+ btrfs_folio_clear_dirty(fs_info, folio, cur, fs_info->sectorsize);
+ continue;
}
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
if (unlikely(ret < 0)) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c6c209ceb87f64a6ceebe61761951dcbbf4a0baa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011223-capitol-diploma-75b9@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c6c209ceb87f64a6ceebe61761951dcbbf4a0baa Mon Sep 17 00:00:00 2001
From: Chuck Lever <chuck.lever(a)oracle.com>
Date: Tue, 9 Dec 2025 19:28:49 -0500
Subject: [PATCH] NFSD: Remove NFSERR_EAGAIN
I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881.
None of these RFCs have an NFS status code that match the numeric
value "11".
Based on the meaning of the EAGAIN errno, I presume the use of this
status in NFSD means NFS4ERR_DELAY. So replace the one usage of
nfserr_eagain, and remove it from NFSD's NFS status conversion
tables.
As far as I can tell, NFSERR_EAGAIN has existed since the pre-git
era, but was not actually used by any code until commit f4e44b393389
("NFSD: delay unmount source's export after inter-server copy
completed."), at which time it become possible for NFSD to return
a status code of 11 (which is not valid NFS protocol).
Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Cc: stable(a)vger.kernel.org
Reviewed-by: NeilBrown <neil(a)brown.name>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfs_common/common.c b/fs/nfs_common/common.c
index af09aed09fd2..0778743ae2c2 100644
--- a/fs/nfs_common/common.c
+++ b/fs/nfs_common/common.c
@@ -17,7 +17,6 @@ static const struct {
{ NFSERR_NOENT, -ENOENT },
{ NFSERR_IO, -EIO },
{ NFSERR_NXIO, -ENXIO },
-/* { NFSERR_EAGAIN, -EAGAIN }, */
{ NFSERR_ACCES, -EACCES },
{ NFSERR_EXIST, -EEXIST },
{ NFSERR_XDEV, -EXDEV },
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 7f7e6bb23a90..42a6b914c0fe 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1506,7 +1506,7 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr,
(schedule_timeout(20*HZ) == 0)) {
finish_wait(&nn->nfsd_ssc_waitq, &wait);
kfree(work);
- return nfserr_eagain;
+ return nfserr_jukebox;
}
finish_wait(&nn->nfsd_ssc_waitq, &wait);
goto try_again;
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 50be785f1d2c..b0283213a8f5 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -233,7 +233,6 @@ void nfsd_lockd_shutdown(void);
#define nfserr_noent cpu_to_be32(NFSERR_NOENT)
#define nfserr_io cpu_to_be32(NFSERR_IO)
#define nfserr_nxio cpu_to_be32(NFSERR_NXIO)
-#define nfserr_eagain cpu_to_be32(NFSERR_EAGAIN)
#define nfserr_acces cpu_to_be32(NFSERR_ACCES)
#define nfserr_exist cpu_to_be32(NFSERR_EXIST)
#define nfserr_xdev cpu_to_be32(NFSERR_XDEV)
diff --git a/include/trace/misc/nfs.h b/include/trace/misc/nfs.h
index c82233e950ac..a394b4d38e18 100644
--- a/include/trace/misc/nfs.h
+++ b/include/trace/misc/nfs.h
@@ -16,7 +16,6 @@ TRACE_DEFINE_ENUM(NFSERR_PERM);
TRACE_DEFINE_ENUM(NFSERR_NOENT);
TRACE_DEFINE_ENUM(NFSERR_IO);
TRACE_DEFINE_ENUM(NFSERR_NXIO);
-TRACE_DEFINE_ENUM(NFSERR_EAGAIN);
TRACE_DEFINE_ENUM(NFSERR_ACCES);
TRACE_DEFINE_ENUM(NFSERR_EXIST);
TRACE_DEFINE_ENUM(NFSERR_XDEV);
@@ -52,7 +51,6 @@ TRACE_DEFINE_ENUM(NFSERR_JUKEBOX);
{ NFSERR_NXIO, "NXIO" }, \
{ ECHILD, "CHILD" }, \
{ ETIMEDOUT, "TIMEDOUT" }, \
- { NFSERR_EAGAIN, "AGAIN" }, \
{ NFSERR_ACCES, "ACCES" }, \
{ NFSERR_EXIST, "EXIST" }, \
{ NFSERR_XDEV, "XDEV" }, \
diff --git a/include/uapi/linux/nfs.h b/include/uapi/linux/nfs.h
index f356f2ba3814..71c7196d3281 100644
--- a/include/uapi/linux/nfs.h
+++ b/include/uapi/linux/nfs.h
@@ -49,7 +49,6 @@
NFSERR_NOENT = 2, /* v2 v3 v4 */
NFSERR_IO = 5, /* v2 v3 v4 */
NFSERR_NXIO = 6, /* v2 v3 v4 */
- NFSERR_EAGAIN = 11, /* v2 v3 */
NFSERR_ACCES = 13, /* v2 v3 v4 */
NFSERR_EXIST = 17, /* v2 v3 v4 */
NFSERR_XDEV = 18, /* v3 v4 */
Commit 9beeee6584b9aa4f ("USB: EHCI: log a warning if ehci-hcd is not
loaded first") said that ehci-hcd should be loaded before ohci-hcd and
uhci-hcd. However, commit 05c92da0c52494ca ("usb: ohci/uhci - add soft
dependencies on ehci_pci") only makes ohci-pci/uhci-pci depend on ehci-
pci, which is not enough and we may still see the warnings in boot log.
To eliminate the warnings we should make ohci-hcd/uhci-hcd depend on
ehci-hcd. But Alan said that the warning introduced by 9beeee6584b9aa4f
is bogus, we only need the soft dependencies in the PCI level rather
than the HCD level.
However, there is really another neccessary soft dependencies between
ohci-platform/uhci-platform and ehci-platform, which is added by this
patch. The boot logs are below.
1. ohci-platform loaded before ehci-platform:
ohci-platform 1f058000.usb: Generic Platform OHCI controller
ohci-platform 1f058000.usb: new USB bus registered, assigned bus number 1
ohci-platform 1f058000.usb: irq 28, io mem 0x1f058000
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 4 ports detected
Warning! ehci_hcd should always be loaded before uhci_hcd and ohci_hcd, not after
usb 1-4: new low-speed USB device number 2 using ohci-platform
ehci-platform 1f050000.usb: EHCI Host Controller
ehci-platform 1f050000.usb: new USB bus registered, assigned bus number 2
ehci-platform 1f050000.usb: irq 29, io mem 0x1f050000
ehci-platform 1f050000.usb: USB 2.0 started, EHCI 1.00
usb 1-4: device descriptor read/all, error -62
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 4 ports detected
usb 1-4: new low-speed USB device number 3 using ohci-platform
input: YSPRINGTECH USB OPTICAL MOUSE as /devices/platform/bus@10000000/1f058000.usb/usb1/1-4/1-4:1.0/0003:10C4:8105.0001/input/input0
hid-generic 0003:10C4:8105.0001: input,hidraw0: USB HID v1.11 Mouse [YSPRINGTECH USB OPTICAL MOUSE] on usb-1f058000.usb-4/input0
2. ehci-platform loaded before ohci-platform:
ehci-platform 1f050000.usb: EHCI Host Controller
ehci-platform 1f050000.usb: new USB bus registered, assigned bus number 1
ehci-platform 1f050000.usb: irq 28, io mem 0x1f050000
ehci-platform 1f050000.usb: USB 2.0 started, EHCI 1.00
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 4 ports detected
ohci-platform 1f058000.usb: Generic Platform OHCI controller
ohci-platform 1f058000.usb: new USB bus registered, assigned bus number 2
ohci-platform 1f058000.usb: irq 29, io mem 0x1f058000
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 4 ports detected
usb 2-4: new low-speed USB device number 2 using ohci-platform
input: YSPRINGTECH USB OPTICAL MOUSE as /devices/platform/bus@10000000/1f058000.usb/usb2/2-4/2-4:1.0/0003:10C4:8105.0001/input/input0
hid-generic 0003:10C4:8105.0001: input,hidraw0: USB HID v1.11 Mouse [YSPRINGTECH USB OPTICAL MOUSE] on usb-1f058000.usb-4/input0
In the later case, there is no re-connection for USB-1.0/1.1 devices,
which is expected.
Cc: stable(a)vger.kernel.org
Reported-by: Shengwen Xiao <atzlinux(a)sina.com>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
drivers/usb/host/ohci-platform.c | 1 +
drivers/usb/host/uhci-platform.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/usb/host/ohci-platform.c b/drivers/usb/host/ohci-platform.c
index 2e4bb5cc2165..c801527d5bd2 100644
--- a/drivers/usb/host/ohci-platform.c
+++ b/drivers/usb/host/ohci-platform.c
@@ -392,3 +392,4 @@ MODULE_DESCRIPTION(DRIVER_DESC);
MODULE_AUTHOR("Hauke Mehrtens");
MODULE_AUTHOR("Alan Stern");
MODULE_LICENSE("GPL");
+MODULE_SOFTDEP("pre: ehci_platform");
diff --git a/drivers/usb/host/uhci-platform.c b/drivers/usb/host/uhci-platform.c
index 5e02f2ceafb6..f4419d4526c4 100644
--- a/drivers/usb/host/uhci-platform.c
+++ b/drivers/usb/host/uhci-platform.c
@@ -211,3 +211,4 @@ static struct platform_driver uhci_platform_driver = {
.of_match_table = platform_uhci_ids,
},
};
+MODULE_SOFTDEP("pre: ehci_platform");
--
2.47.3