A race condition during gadget teardown can lead to a use-after-free
in usb_gadget_state_work(), as reported by KASAN:
BUG: KASAN: invalid-access in sysfs_notify+0_x_2c/0_x_d0
Workqueue: events usb_gadget_state_work
The fundamental race occurs because a concurrent event (e.g., an
interrupt) can call usb_gadget_set_state() and schedule gadget->work
at any time during the cleanup process in usb_del_gadget().
Commit 399a45e5237c ("usb: gadget: core: flush gadget workqueue after
device removal") attempted to fix this by moving flush_work() to after
device_del(). However, this does not fully solve the race, as a new
work item can still be scheduled *after* flush_work() completes but
before the gadget's memory is freed, leading to the same use-after-free.
This patch fixes the race condition robustly by introducing a 'teardown'
flag and a 'state_lock' spinlock to the usb_gadget struct. The flag is
set during cleanup in usb_del_gadget() *before* calling flush_work() to
prevent any new work from being scheduled once cleanup has commenced.
The scheduling site, usb_gadget_set_state(), now checks this flag under
the lock before queueing the work, thus safely closing the race window.
Fixes: 5702f75375aa9 ("usb: gadget: udc-core: move sysfs_notify() to a workqueue")
Signed-off-by: Jimmy Hu <hhhuuu(a)google.com>
Cc: stable(a)vger.kernel.org
---
drivers/usb/gadget/udc/core.c | 18 +++++++++++++++++-
include/linux/usb/gadget.h | 6 ++++++
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index d709e24c1fd4..c4268b76d747 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -1123,8 +1123,13 @@ static void usb_gadget_state_work(struct work_struct *work)
void usb_gadget_set_state(struct usb_gadget *gadget,
enum usb_device_state state)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&gadget->state_lock, flags);
gadget->state = state;
- schedule_work(&gadget->work);
+ if (!gadget->teardown)
+ schedule_work(&gadget->work);
+ spin_unlock_irqrestore(&gadget->state_lock, flags);
}
EXPORT_SYMBOL_GPL(usb_gadget_set_state);
@@ -1357,6 +1362,9 @@ static void usb_udc_nop_release(struct device *dev)
void usb_initialize_gadget(struct device *parent, struct usb_gadget *gadget,
void (*release)(struct device *dev))
{
+ /* For race-free teardown */
+ spin_lock_init(&gadget->state_lock);
+ gadget->teardown = false;
INIT_WORK(&gadget->work, usb_gadget_state_work);
gadget->dev.parent = parent;
@@ -1531,6 +1539,7 @@ EXPORT_SYMBOL_GPL(usb_add_gadget_udc);
void usb_del_gadget(struct usb_gadget *gadget)
{
struct usb_udc *udc = gadget->udc;
+ unsigned long flags;
if (!udc)
return;
@@ -1544,6 +1553,13 @@ void usb_del_gadget(struct usb_gadget *gadget)
kobject_uevent(&udc->dev.kobj, KOBJ_REMOVE);
sysfs_remove_link(&udc->dev.kobj, "gadget");
device_del(&gadget->dev);
+ /*
+ * Set the teardown flag before flushing the work to prevent new work
+ * from being scheduled while we are cleaning up.
+ */
+ spin_lock_irqsave(&gadget->state_lock, flags);
+ gadget->teardown = true;
+ spin_unlock_irqrestore(&gadget->state_lock, flags);
flush_work(&gadget->work);
ida_free(&gadget_id_numbers, gadget->id_number);
cancel_work_sync(&udc->vbus_work);
diff --git a/include/linux/usb/gadget.h b/include/linux/usb/gadget.h
index 0f28c5512fcb..8302aeaea82e 100644
--- a/include/linux/usb/gadget.h
+++ b/include/linux/usb/gadget.h
@@ -351,6 +351,9 @@ struct usb_gadget_ops {
* can handle. The UDC must support this and all slower speeds and lower
* number of lanes.
* @state: the state we are now (attached, suspended, configured, etc)
+ * @state_lock: Spinlock protecting the `state` and `teardown` members.
+ * @teardown: True if the device is undergoing teardown, used to prevent
+ * new work from being scheduled during cleanup.
* @name: Identifies the controller hardware type. Used in diagnostics
* and sometimes configuration.
* @dev: Driver model state for this abstract device.
@@ -426,6 +429,9 @@ struct usb_gadget {
enum usb_ssp_rate max_ssp_rate;
enum usb_device_state state;
+ /* For race-free teardown and state management */
+ spinlock_t state_lock;
+ bool teardown;
const char *name;
struct device dev;
unsigned isoch_delay;
--
2.51.0.618.g983fd99d29-goog
The patch titled
Subject: mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
has been added to the -mm mm-new branch. Its filename is
mm-huge_memory-add-pmd-folio-to-ds_queue-in-do_huge_zero_wp_pmd.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Wei Yang <richard.weiyang(a)gmail.com>
Subject: mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
Date: Wed, 8 Oct 2025 09:54:52 +0000
We add pmd folio into ds_queue on the first page fault in
__do_huge_pmd_anonymous_page(), so that we can split it in case of memory
pressure. This should be the same for a pmd folio during wp page fault.
Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss to
add it to ds_queue, which means system may not reclaim enough memory in
case of memory pressure even the pmd folio is under used.
Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
folio installation consistent.
Link: https://lkml.kernel.org/r/20251008095453.18772-2-richard.weiyang@gmail.com
Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Lance Yang <lance.yang(a)linux.dev>
Reviewed-by: Dev Jain <dev.jain(a)arm.com>
Acked-by: Usama Arif <usamaarif642(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-huge_memory-add-pmd-folio-to-ds_queue-in-do_huge_zero_wp_pmd
+++ a/mm/huge_memory.c
@@ -1317,6 +1317,7 @@ static void map_anon_folio_pmd(struct fo
count_vm_event(THP_FAULT_ALLOC);
count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
+ deferred_split_folio(folio, false);
}
static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
@@ -1357,7 +1358,6 @@ static vm_fault_t __do_huge_pmd_anonymou
pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
map_anon_folio_pmd(folio, vmf->pmd, vma, haddr);
mm_inc_nr_ptes(vma->vm_mm);
- deferred_split_folio(folio, false);
spin_unlock(vmf->ptl);
}
_
Patches currently in -mm which might be from richard.weiyang(a)gmail.com are
mm-compaction-check-the-range-to-pageblock_pfn_to_page-is-within-the-zone-first.patch
mm-compaction-fix-the-range-to-pageblock_pfn_to_page.patch
mm-huge_memory-add-pmd-folio-to-ds_queue-in-do_huge_zero_wp_pmd.patch
mm-khugepaged-unify-pmd-folio-installation-with-map_anon_folio_pmd.patch
This series backports commit [52f1783ff414 ("drm/amd/display: Fix potential null dereference")]
to stable branch 5.10.y. However to apply this i had to backport commit
[3beac533b8da ("drm/amd/display: Remove redundant safeguards for dmub-srv destroy()")] first.
Igor Artemiev (1):
drm/amd/display: Fix potential null dereference
Roman Li (1):
drm/amd/display: Remove redundant safeguards for dmub-srv destroy()
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--
2.43.0
From: Leon Yen <leon.yen(a)mediatek.com>
The buf_len is used to limit the iterations for retrieving the country
power setting and may underflow under certain conditions due to changes
in the power table in CLC.
This underflow leads to an almost infinite loop or an invalid power
setting resulting in driver initialization failure.
Cc: stable(a)vger.kernel.org
Fixes: fa6ad88e023d ("wifi: mt76: mt7921: fix country count limitation for CLC")
Signed-off-by: Leon Yen <leon.yen(a)mediatek.com>
Signed-off-by: Ming Yen Hsieh <mingyen.hsieh(a)mediatek.com>
---
drivers/net/wireless/mediatek/mt76/mt7921/mcu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c
index 86bd33b916a9..80ccd56409b3 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c
@@ -1353,6 +1353,9 @@ int __mt7921_mcu_set_clc(struct mt792x_dev *dev, u8 *alpha2,
u16 len = le16_to_cpu(rule->len);
u16 offset = len + sizeof(*rule);
+ if (buf_len < offset)
+ break;
+
pos += offset;
buf_len -= offset;
if (rule->alpha2[0] != alpha2[0] ||
--
2.34.1
Any trivial usage of hostfs seems to be broken since commit cd140ce9
("hostfs: convert hostfs to use the new mount API") - that's what it
bisected down to.
Steps to reproduce;
The following assumes that the ARCH=um kernel has already been compiled
(and the 'vmlinux' executable is in the local directory, as is the case
when building from the top directory of a source tree). I built mine
from a fresh clone using 'defconfig'. The uml_run.sh script creates a
bootable root FS image (from debian, via docker) and then boots it with
a hostfs mount to demonstrate the regression. This should be observable
with any other bootable image though, simply pass "hostfs=<hostpath>" to
the ./vmlinux kernel and then try to mount it from within the booted VM
("mount -t hostfs none <guestpath>").
The following 3 text files are used, and as they're small enough for
copy-n-paste I figured (hoped) it was best to inline them rather than
post attachments.
uml_run.sh:
#!/bin/bash
set -ex
cat Dockerfile | docker build -t foobar:foobar -
docker export -o foobar.tar \
`docker run -d foobar:foobar /bin/true`
dd if=/dev/zero of=rootfs.img \
bs=$(expr 2048 \* 1024 \* 1024 / 512) count=512
mkfs.ext4 rootfs.img
sudo ./uml_root.sh
cp rootfs.img temp.img
dd if=/dev/zero of=swapfile bs=1M count=1024
chmod 600 swapfile
mkswap swapfile
./vmlinux mem=4G ubd0=temp.img rw ubd1=swapfile \
hostfs=$(pwd)
uml_root.sh:
#!/bin/bash
set -ex
losetup -D
LOOPDEVICE=$(losetup -f)
losetup ${LOOPDEVICE} rootfs.img
mkdir -p tmpmnt
mount -t auto ${LOOPDEVICE} tmpmnt/
(cd tmpmnt && tar xf ../foobar.tar)
umount tmpmnt
losetup -D
Dockerfile:
FROM debian:trixie
RUN echo 'debconf debconf/frontend select Noninteractive' | \
debconf-set-selections
RUN apt-get update
RUN apt-get install -y apt-utils
RUN apt-get -y full-upgrade
RUN echo "US/Eastern" > /etc/timezone
RUN chmod 644 /etc/timezone
RUN cd /etc && rm -f localtime && \
ln -s /usr/share/zoneinfo/US/Eastern localtime
RUN apt-get install -y systemd-sysv kmod
RUN echo "root:root" | chpasswd
RUN echo "/dev/ubdb swap swap defaults 0 0" >> /etc/fstab
RUN mkdir /hosthack
RUN echo "none /hosthack hostfs defaults 0 0" >> /etc/fstab
RUN systemctl set-default multi-user.target
Execute ./uml_run.sh to build the rootfs image and boot the VM. This
requires a system with docker, and will also require a sudo password
when creating the rootfs. The boot output indicates whether the hostfs
mount succeeds or not - the boot should degrade to emergency mode if the
mount fails, otherwise a login prompt indicates success. (Login is
root:root, e.g. if you prefer to go in and shutdown the VM gracefully.)
Please let me know if I can/should provide anything else.
Cheers,
Geoff
If pgshift is 63 then BITS_PER_TYPE(*bitmap->bitmap) * pgsize will overflow
to 0 and this triggers divide by 0.
In this case the index should just be 0, so reorganize things to divide
by shift and avoid hitting any overflows.
Cc: stable(a)vger.kernel.org
Fixes: 58ccf0190d19 ("vfio: Add an IOVA bitmap support")
Reported-by: syzbot+093a8a8b859472e6c257(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=093a8a8b859472e6c257
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
---
drivers/iommu/iommufd/iova_bitmap.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/iommufd/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c
index 4514575818fc07..b5b67a9d3fb35e 100644
--- a/drivers/iommu/iommufd/iova_bitmap.c
+++ b/drivers/iommu/iommufd/iova_bitmap.c
@@ -130,9 +130,8 @@ struct iova_bitmap {
static unsigned long iova_bitmap_offset_to_index(struct iova_bitmap *bitmap,
unsigned long iova)
{
- unsigned long pgsize = 1UL << bitmap->mapped.pgshift;
-
- return iova / (BITS_PER_TYPE(*bitmap->bitmap) * pgsize);
+ return (iova >> bitmap->mapped.pgshift) /
+ BITS_PER_TYPE(*bitmap->bitmap);
}
/*
base-commit: 2a918911ed3d0841923525ed0fe707762ee78844
--
2.43.0
In setups where the same codec DAI is reused across multiple DAI
links, mute controls via `snd_soc_dai_digital_mute()` is skipped for
non-dynamic links. The trigger operations are not invoked when
`dai_link->dynamic == 0`, and mute controls is currently conditioned
only on `snd_soc_dai_mute_is_ctrled_at_trigger()`. This patch ensures
that mute and unmute is applied explicitly for non-dynamic links.
Fixes: f0220575e65a ("ASoC: soc-dai: add flag to mute and unmute stream during trigger")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mohammad Rafi Shaik <mohammad.rafi.shaik(a)oss.qualcomm.com>
---
sound/soc/soc-pcm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 2c21fd528afd..4ed829b49bc2 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -949,7 +949,7 @@ static int __soc_pcm_prepare(struct snd_soc_pcm_runtime *rtd,
SND_SOC_DAPM_STREAM_START);
for_each_rtd_dais(rtd, i, dai) {
- if (!snd_soc_dai_mute_is_ctrled_at_trigger(dai))
+ if (!snd_soc_dai_mute_is_ctrled_at_trigger(dai) || !rtd->dai_link->dynamic)
snd_soc_dai_digital_mute(dai, 0, substream->stream);
}
@@ -1007,7 +1007,7 @@ static int soc_pcm_hw_clean(struct snd_soc_pcm_runtime *rtd,
soc_pcm_set_dai_params(dai, NULL);
if (snd_soc_dai_stream_active(dai, substream->stream) == 1) {
- if (!snd_soc_dai_mute_is_ctrled_at_trigger(dai))
+ if (!snd_soc_dai_mute_is_ctrled_at_trigger(dai) || !rtd->dai_link->dynamic)
snd_soc_dai_digital_mute(dai, 1, substream->stream);
}
}
--
2.34.1
This series backports 19 patches to update minmax.h in the 5.15.y branch,
aligning it with v6.17-rc7.
The ultimate goal is to synchronize all longterm branches so that they
include the full set of minmax.h changes.
6.12.y was already backported and changes are part of v6.12.49.
6.6.y was already backported and changes are part of v6.6.109.
6.1.y was already backported and changes are currently in the 6.1-stable
tree.
The key motivation is to bring in commit d03eba99f5bf ("minmax: allow
min()/max()/clamp() if the arguments have the same signedness"), which
is missing in kernel 5.10.y.
In mainline, this change enables min()/max()/clamp() to accept mixed
argument types, provided both have the same signedness. Without it,
backported patches that use these forms may trigger compiler warnings,
which escalate to build failures when -Werror is enabled.
Changes in v3:
- Fix fs/erofs/zdata.h in patch 06/19 to use MIN_T instead of min_t to
fix build on the following patch (07/19):
In file included from ./include/linux/kernel.h:16,
from ./include/linux/list.h:9,
from ./include/linux/wait.h:7,
from ./include/linux/wait_bit.h:8,
from ./include/linux/fs.h:6,
from fs/erofs/internal.h:10,
from fs/erofs/zdata.h:9,
from fs/erofs/zdata.c:6:
fs/erofs/zdata.c: In function ‘z_erofs_decompress_pcluster’:
fs/erofs/zdata.h:185:61: error: ISO C90 forbids variable length array ‘pages_onstack’ [-Werror=vla]
185 | min_t(unsigned int, THREAD_SIZE / 8 / sizeof(struct page *), 96U)
| ^~~~
./include/linux/minmax.h:49:23: note: in definition of macro ‘__cmp_once_unique’
49 | ({ type ux = (x); type uy = (y); __cmp(op, ux, uy); })
| ^
./include/linux/minmax.h:164:27: note: in expansion of macro ‘__cmp_once’
164 | #define min_t(type, x, y) __cmp_once(min, type, x, y)
| ^~~~~~~~~~
fs/erofs/zdata.h:185:9: note: in expansion of macro ‘min_t’
185 | min_t(unsigned int, THREAD_SIZE / 8 / sizeof(struct page *), 96U)
| ^~~~~
fs/erofs/zdata.c:847:36: note: in expansion of macro ‘Z_EROFS_VMAP_ONSTACK_PAGES’
847 | struct page *pages_onstack[Z_EROFS_VMAP_ONSTACK_PAGES];
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
- Increase test coverage using `make allyesconfig` and
`make allmodconfig` for arm64, arm, x86_64 and i386 architectures.
Changes in v2:
- Fix the order of patches 6 - 10 according to order in mainline branch.
- Use same style of [ Upstream commit <HASH> ] in all patches.
Andy Shevchenko (1):
minmax: deduplicate __unconst_integer_typeof()
David Laight (8):
minmax: fix indentation of __cmp_once() and __clamp_once()
minmax.h: add whitespace around operators and after commas
minmax.h: update some comments
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: simplify the variants of clamp()
minmax.h: remove some #defines that are only expanded once
Herve Codina (1):
minmax: Introduce {min,max}_array()
Linus Torvalds (8):
minmax: avoid overly complicated constant expressions in VM code
minmax: add a few more MIN_T/MAX_T users
minmax: simplify and clarify min_t()/max_t() implementation
minmax: make generic MIN() and MAX() macros available everywhere
minmax: don't use max() in situations that want a C constant
expression
minmax: simplify min()/max()/clamp() implementation
minmax: improve macro expansion and type checking
minmax: fix up min3() and max3() too
Matthew Wilcox (Oracle) (1):
minmax: add in_range() macro
arch/arm/mm/pageattr.c | 6 +-
arch/um/drivers/mconsole_user.c | 2 +
arch/x86/mm/pgtable.c | 2 +-
drivers/edac/sb_edac.c | 4 +-
drivers/edac/skx_common.h | 1 -
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +
.../drm/amd/display/modules/hdcp/hdcp_ddc.c | 2 +
.../drm/amd/pm/powerplay/hwmgr/ppevvmath.h | 14 +-
.../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 2 +
.../drm/arm/display/include/malidp_utils.h | 2 +-
.../display/komeda/komeda_pipeline_state.c | 24 +-
drivers/gpu/drm/drm_color_mgmt.c | 2 +-
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 6 -
drivers/gpu/drm/radeon/evergreen_cs.c | 2 +
drivers/hwmon/adt7475.c | 24 +-
drivers/input/touchscreen/cyttsp4_core.c | 2 +-
drivers/irqchip/irq-sun6i-r.c | 2 +-
drivers/md/dm-integrity.c | 4 +-
drivers/media/dvb-frontends/stv0367_priv.h | 3 +
.../net/ethernet/chelsio/cxgb3/cxgb3_main.c | 18 +-
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
drivers/net/fjes/fjes_main.c | 4 +-
drivers/nfc/pn544/i2c.c | 2 -
drivers/platform/x86/sony-laptop.c | 1 -
drivers/scsi/isci/init.c | 6 +-
.../pci/hive_isp_css_include/math_support.h | 5 -
drivers/virt/acrn/ioreq.c | 4 +-
fs/btrfs/misc.h | 2 -
fs/btrfs/tree-checker.c | 2 +-
fs/erofs/zdata.h | 2 +-
fs/ext2/balloc.c | 2 -
fs/ext4/ext4.h | 2 -
fs/ufs/util.h | 6 -
include/linux/compiler.h | 9 +
include/linux/minmax.h | 264 +++++++++++++-----
kernel/trace/preemptirq_delay_test.c | 2 -
lib/btree.c | 1 -
lib/decompress_unlzma.c | 2 +
lib/logic_pio.c | 3 -
lib/vsprintf.c | 2 +-
lib/zstd/zstd_internal.h | 2 -
mm/zsmalloc.c | 1 -
net/ipv4/proc.c | 2 +-
net/ipv6/proc.c | 2 +-
net/netfilter/nf_nat_core.c | 6 +-
net/tipc/core.h | 2 +-
net/tipc/link.c | 10 +-
tools/testing/selftests/vm/mremap_test.c | 2 +
48 files changed, 290 insertions(+), 184 deletions(-)
--
2.47.3
From: NeilBrown <neil(a)brown.name>
nfsd exports a "pseudo root filesystem" which is used by NFSv4 to find
the various exported filesystems using LOOKUP requests from a known root
filehandle. NFSv3 uses the MOUNT protocol to find those exported
filesystems and so is not given access to the pseudo root filesystem.
If a v3 (or v2) client uses a filehandle from that filesystem,
nfsd_set_fh_dentry() will report an error, but still stores the export
in "struct svc_fh" even though it also drops the reference (exp_put()).
This means that when fh_put() is called an extra reference will be dropped
which can lead to use-after-free and possible denial of service.
Normal NFS usage will not provide a pseudo-root filehandle to a v3
client. This bug can only be triggered by the client synthesising an
incorrect filehandle.
To fix this we move the assignments to the svc_fh later, after all
possible error cases have been detected.
Reported-and-tested-by: tianshuo han <hantianshuo233(a)gmail.com>
Fixes: ef7f6c4904d0 ("nfsd: move V4ROOT version check to nfsd_set_fh_dentry()")
Signed-off-by: NeilBrown <neil(a)brown.name>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
fs/nfsd/nfsfh.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 3eb724ec9566..ed85dd43da18 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -269,9 +269,6 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net,
dentry);
}
- fhp->fh_dentry = dentry;
- fhp->fh_export = exp;
-
switch (fhp->fh_maxsize) {
case NFS4_FHSIZE:
if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR)
@@ -293,6 +290,9 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net,
goto out;
}
+ fhp->fh_dentry = dentry;
+ fhp->fh_export = exp;
+
return 0;
out:
exp_put(exp);
--
2.51.0
Commits 8f4dc4e54eed4 (6.1.y) and 23249dade24e6 (5.15.y) (maybe other
stable kernels as well) deadlock the host kernel (presumably a
recursive spinlock):
queued_spin_lock_slowpath+0x274/0x358
raw_spin_rq_lock_nested+0x2c/0x48
_raw_spin_rq_lock_irqsave+0x30/0x4c
run_rebalance_domains+0x808/0x2e18
__do_softirq+0x104/0x550
irq_exit+0x88/0xe0
handle_domain_irq+0x7c/0xb0
gic_handle_irq+0x1cc/0x420
call_on_irq_stack+0x20/0x48
do_interrupt_handler+0x3c/0x50
el1_interrupt+0x30/0x58
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x7c/0x80
kvm_arch_vcpu_ioctl_run+0x24c/0x49c
kvm_vcpu_ioctl+0xc4/0x614
We found out a similar report at [1], but it doesn't seem like a formal
patch was ever posted. Will, can you please send a formal patch so that
stable kernels can run VMs again?
[1] https://lists.linaro.org/archives/list/linux-stable-mirror@lists.linaro.org…