The patch titled
Subject: mm/hugetlb: fix avoid_reserve to allow taking folio from subpool
has been added to the -mm mm-unstable branch. Its filename is
mm-hugetlb-fix-avoid_reserve-to-allow-taking-folio-from-subpool.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix avoid_reserve to allow taking folio from subpool
Date: Tue, 7 Jan 2025 15:39:56 -0500
Patch series "mm/hugetlb: Refactor hugetlb allocation resv accounting",
v2.
This is a follow up on Ackerley's series here as replacement:
https://lore.kernel.org/r/cover.1728684491.git.ackerleytng@google.com
The goal of this series is to cleanup hugetlb resv accounting, especially
during folio allocation, to decouple a few things:
- Hugetlb folios v.s. Hugetlbfs: IOW, the hope is in the future hugetlb
folios can be allocated completely without hugetlbfs.
- Decouple VMA v.s. hugetlb folio allocations: allocating a hugetlb folio
should not always require a hugetlbfs VMA. For example, either it got
allocated from the inode level (see hugetlbfs_fallocate() where it used
a pesudo VMA for allocation), or it can be allocated by other kernel
subsystems.
It paves way for other users to allocate hugetlb folios out of either
system reservations, or subpools (instead of hugetlbfs, as a file system).
For longer term, this prepares hugetlb as a separate concept versus
hugetlbfs, so that hugetlb folios can be allocated by not only hugetlbfs
and other things.
Tests I've done:
- I had a reproducer in patch 1 for the bug I found, this will start to
work after patch 1 or the whole set applied.
- Hugetlb regression tests (on x86_64 2MBs), includes:
- All vmtests on hugetlbfs
- libhugetlbfs test suite (which may fail some tests, but no new failures
will be introduced by this series, so all such failures happen before
this series so shouldn't be relevant).
This patch (of 7):
Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a
process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"),
avoid_reserve was introduced for a special case of CoW on hugetlb private
mappings, and only if the owner VMA is trying to allocate yet another
hugetlb folio that is not reserved within the private vma reserved map.
Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle
areas hole punched by fallocate"), alloc_huge_page() enforced to not
consume any global reservation as long as avoid_reserve=true. This
operation doesn't look correct, because even if it will enforce the
allocation to not use global reservation at all, it will still try to take
one reservation from the spool (if the subpool existed). Then since the
spool reserved pages take from global reservation, it'll also take one
reservation globally.
Logically it can cause global reservation to go wrong.
I wrote a reproducer below, trigger this special path, and every run of
such program will cause global reservation count to increment by one, until
it hits the number of free pages:
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#define MSIZE (2UL << 20)
int main(int argc, char *argv[])
{
const char *path;
int *buf;
int fd, ret;
pid_t child;
if (argc < 2) {
printf("usage: %s <hugetlb_file>\n", argv[0]);
return -1;
}
path = argv[1];
fd = open(path, O_RDWR | O_CREAT, 0666);
if (fd < 0) {
perror("open failed");
return -1;
}
ret = fallocate(fd, 0, 0, MSIZE);
if (ret != 0) {
perror("fallocate");
return -1;
}
buf = mmap(NULL, MSIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE, fd, 0);
if (buf == MAP_FAILED) {
perror("mmap() failed");
return -1;
}
/* Allocate a page */
*buf = 1;
child = fork();
if (child == 0) {
/* child doesn't need to do anything */
exit(0);
}
/* Trigger CoW from owner */
*buf = 2;
munmap(buf, MSIZE);
close(fd);
unlink(path);
return 0;
}
It can only reproduce with a sub-mount when there're reserved pages on the
spool, like:
# sysctl vm.nr_hugepages=128
# mkdir ./hugetlb-pool
# mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool
Then run the reproducer on the mountpoint:
# ./reproducer ./hugetlb-pool/test
Fix it by taking the reservation from spool if available. In general,
avoid_reserve is IMHO more about "avoid vma resv map", not spool's.
I copied stable, however I have no intention for backporting if it's not a
clean cherry-pick, because private hugetlb mapping, and then fork() on top
is too rare to hit.
Link: https://lkml.kernel.org/r/20250107204002.2683356-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20250107204002.2683356-2-peterx@redhat.com
Fixes: d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng(a)google.com>
Tested-by: Ackerley Tng <ackerleytng(a)google.com>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 22 +++-------------------
1 file changed, 3 insertions(+), 19 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-avoid_reserve-to-allow-taking-folio-from-subpool
+++ a/mm/hugetlb.c
@@ -1394,8 +1394,7 @@ static unsigned long available_huge_page
static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
struct vm_area_struct *vma,
- unsigned long address, int avoid_reserve,
- long chg)
+ unsigned long address, long chg)
{
struct folio *folio = NULL;
struct mempolicy *mpol;
@@ -1411,10 +1410,6 @@ static struct folio *dequeue_hugetlb_fol
if (!vma_has_reserves(vma, chg) && !available_huge_pages(h))
goto err;
- /* If reserves cannot be used, ensure enough pages are in the pool */
- if (avoid_reserve && !available_huge_pages(h))
- goto err;
-
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
@@ -1430,7 +1425,7 @@ static struct folio *dequeue_hugetlb_fol
folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask,
nid, nodemask);
- if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) {
+ if (folio && vma_has_reserves(vma, chg)) {
folio_set_hugetlb_restore_reserve(folio);
h->resv_huge_pages--;
}
@@ -3047,17 +3042,6 @@ struct folio *alloc_hugetlb_folio(struct
gbl_chg = hugepage_subpool_get_pages(spool, 1);
if (gbl_chg < 0)
goto out_end_reservation;
-
- /*
- * Even though there was no reservation in the region/reserve
- * map, there could be reservations associated with the
- * subpool that can be used. This would be indicated if the
- * return value of hugepage_subpool_get_pages() is zero.
- * However, if avoid_reserve is specified we still avoid even
- * the subpool reservations.
- */
- if (avoid_reserve)
- gbl_chg = 1;
}
/* If this allocation is not consuming a reservation, charge it now.
@@ -3080,7 +3064,7 @@ struct folio *alloc_hugetlb_folio(struct
* from the global free pool (global change). gbl_chg == 0 indicates
* a reservation exists for the allocation.
*/
- folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
+ folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg);
if (!folio) {
spin_unlock_irq(&hugetlb_lock);
folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-hugetlb-fix-avoid_reserve-to-allow-taking-folio-from-subpool.patch
mm-hugetlb-stop-using-avoid_reserve-flag-in-fork.patch
mm-hugetlb-rename-avoid_reserve-to-cow_from_owner.patch
mm-hugetlb-clean-up-map-global-resv-accounting-when-allocate.patch
mm-hugetlb-simplify-vma_has_reserves.patch
mm-hugetlb-drop-vma_has_reserves.patch
mm-hugetlb-unify-restore-reserve-accounting-for-new-allocations.patch
When device_register(&child->dev) failed, calling put_device() to
explicitly release child->dev. Otherwise, it could cause double free
problem.
device_register() includes device_add(). As comment of device_add()
says, 'if device_add() succeeds, you should call device_del() when you
want to get rid of it. If device_add() has not succeeded, use only
put_device() to drop the reference count'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 4f535093cf8f ("PCI: Put pci_dev in device tree as early as possible")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- added the bug description about the comment of device_add();
- fixed the patch as suggestions;
- added Cc and Fixes table.
---
drivers/pci/probe.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2e81ab0f5a25..51b78fcda4eb 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1174,7 +1174,10 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
add_dev:
pci_set_bus_msi_domain(child);
ret = device_register(&child->dev);
- WARN_ON(ret < 0);
+ if (WARN_ON(ret < 0)) {
+ put_device(&child->dev);
+ return NULL;
+ }
pcibios_add_bus(child);
--
2.25.1
This is the start of the stable review cycle for the 5.4.286 release.
There are 66 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 17 Nov 2024 06:37:07 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.286-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.286-rc1
Linus Torvalds <torvalds(a)linux-foundation.org>
9p: fix slab cache name creation for real
Christoph Hellwig <hch(a)lst.de>
mm: add remap_pfn_range_notrack
Alex Zhang <zhangalex(a)google.com>
mm/memory.c: make remap_pfn_range() reject unaligned addr
chenqiwu <chenqiwu(a)xiaomi.com>
mm: fix ambiguous comments for better code readability
WANG Wenhu <wenhu.wang(a)vivo.com>
mm: clarify a confusing comment for remap_pfn_range()
Li Nan <linan122(a)huawei.com>
md/raid10: improve code of mrdev in raid10_sync_request
Reinhard Speyerer <rspmn(a)arcor.de>
net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition
Alessandro Zanni <alessandro.zanni87(a)gmail.com>
fs: Fix uninitialized value issue in from_kuid and from_kgid
Michael Ellerman <mpe(a)ellerman.id.au>
powerpc/powernv: Free name on error in opal_event_init()
Julian Vetter <jvetter(a)kalrayinc.com>
sound: Make CONFIG_SND depend on INDIRECT_IOMEM instead of UML
Rik van Riel <riel(a)surriel.com>
bpf: use kvzmalloc to allocate BPF verifier environment
WangYuli <wangyuli(a)uniontech.com>
HID: multitouch: Add quirk for HONOR MagicBook Art 14 touchpad
Pedro Falcato <pedro.falcato(a)gmail.com>
9p: Avoid creating multiple slab caches with the same name
Jan Schär <jan(a)jschaer.ch>
ALSA: usb-audio: Add endianness annotations
Hyunwoo Kim <v4bel(a)theori.io>
vsock/virtio: Initialization of the dangling pointer occurring in vsk->trans
Hyunwoo Kim <v4bel(a)theori.io>
hv_sock: Initializing vsk->trans to NULL to prevent a dangling pointer
Zheng Yejian <zhengyejian1(a)huawei.com>
ftrace: Fix possible use-after-free issue in ftrace_location()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix NFSv4's PUTPUBFH operation
Jan Schär <jan(a)jschaer.ch>
ALSA: usb-audio: Add quirks for Dell WD19 dock
Jan Schär <jan(a)jschaer.ch>
ALSA: usb-audio: Support jack detection on Dell dock
Andrew Kanner <andrew.kanner(a)gmail.com>
ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
Marc Zyngier <maz(a)kernel.org>
irqchip/gic-v3: Force propagation of the active state with a read-back
Benoît Monin <benoit.monin(a)gmx.fr>
USB: serial: option: add Quectel RG650V
Reinhard Speyerer <rspmn(a)arcor.de>
USB: serial: option: add Fibocom FG132 0x0112 composition
Jack Wu <wojackbb(a)gmail.com>
USB: serial: qcserial: add support for Sierra Wireless EM86xx
Dan Carpenter <dan.carpenter(a)linaro.org>
USB: serial: io_edgeport: fix use after free in debug printk
Zijun Hu <quic_zijuhu(a)quicinc.com>
usb: musb: sunxi: Fix accessing an released usb phy
Qi Xi <xiqi2(a)huawei.com>
fs/proc: fix compile warning about variable 'vmcore_mmap_ops'
Benoit Sevens <bsevens(a)google.com>
media: uvcvideo: Skip parsing frames of type UVC_VS_UNDEFINED in uvc_parse_format
Nikolay Aleksandrov <razor(a)blackwall.org>
net: bridge: xmit: make sure we have at least eth header len bytes
Michael Walle <michael(a)walle.cc>
spi: fix use-after-free of the add_lock mutex
Mark Brown <broonie(a)kernel.org>
spi: Fix deadlock when adding SPI controllers on SPI buses
Sean Nyekjaer <sean(a)geanix.com>
mtd: rawnand: protect access to rawnand devices while in suspend
Filipe Manana <fdmanana(a)suse.com>
btrfs: reinitialize delayed ref list after deleting it from the list
Roberto Sassu <roberto.sassu(a)huawei.com>
nfs: Fix KMSAN warning in decode_getfattr_attrs()
Zichen Xie <zichenxie0106(a)gmail.com>
dm-unstriped: cast an operand to sector_t to prevent potential uint32_t overflow
Ming-Hung Tsai <mtsai(a)redhat.com>
dm cache: fix potential out-of-bounds access on the first resume
Ming-Hung Tsai <mtsai(a)redhat.com>
dm cache: optimize dirty bit checking with find_next_bit when resizing
Ming-Hung Tsai <mtsai(a)redhat.com>
dm cache: fix out-of-bounds access to the dirty bitset when resizing
Ming-Hung Tsai <mtsai(a)redhat.com>
dm cache: correct the number of origin blocks to match the target length
Antonio Quartulli <antonio(a)mandelbit.com>
drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()
Erik Schumacher <erik.schumacher(a)iris-sensing.com>
pwm: imx-tpm: Use correct MODULO value for EPWM mode
Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
media: v4l2-tpg: prevent the risk of a division by zero
Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
media: cx24116: prevent overflows on SNR calculus
Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
media: s5p-jpeg: prevent buffer overflows
Murad Masimov <m.masimov(a)maxima.ru>
ALSA: firewire-lib: fix return value on fail in amdtp_tscm_init()
Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
media: adv7604: prevent underflow condition when reporting colorspace
Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
media: dvb_frontend: don't play tricks with underflow values
Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
media: dvbdev: prevent the risk of out of memory access
Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
media: stb0899_algo: initialize cfr before using it
Peiyang Wang <wangpeiyang1(a)huawei.com>
net: hns3: fix kernel crash when uninstalling driver
Dario Binacchi <dario.binacchi(a)amarulasolutions.com>
can: c_can: fix {rx,tx}_errors statistics
Xin Long <lucien.xin(a)gmail.com>
sctp: properly validate chunk size in sctp_sf_ootb()
Wei Fang <wei.fang(a)nxp.com>
net: enetc: set MAC address to the VF net_device
Qinglang Miao <miaoqinglang(a)huawei.com>
enetc: simplify the return expression of enetc_vf_set_mac_addr()
Chen Ridong <chenridong(a)huawei.com>
security/keys: fix slab-out-of-bounds in key_task_permission
Jiri Kosina <jkosina(a)suse.com>
HID: core: zero-initialize the report buffer
Heiko Stuebner <heiko(a)sntech.de>
ARM: dts: rockchip: Fix the realtek audio codec on rk3036-kylin
Heiko Stuebner <heiko(a)sntech.de>
ARM: dts: rockchip: Fix the spi controller on rk3036
Heiko Stuebner <heiko(a)sntech.de>
ARM: dts: rockchip: drop grf reference from rk3036 hdmi
Heiko Stuebner <heiko(a)sntech.de>
ARM: dts: rockchip: fix rk3036 acodec node
Heiko Stuebner <heiko(a)sntech.de>
arm64: dts: rockchip: Remove #cooling-cells from fan on Theobroma lion
Heiko Stuebner <heiko(a)sntech.de>
arm64: dts: rockchip: Fix bluetooth properties on Rock960 boards
Diederik de Haas <didi.debian(a)cknow.org>
arm64: dts: rockchip: Remove hdmi's 2nd interrupt on rk3328
Geert Uytterhoeven <geert+renesas(a)glider.be>
arm64: dts: rockchip: Fix rt5651 compatible value on rk3399-sapphire-excavator
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/rk3036-kylin.dts | 4 +-
arch/arm/boot/dts/rk3036.dtsi | 14 +-
arch/arm64/boot/dts/rockchip/rk3328.dtsi | 3 +-
arch/arm64/boot/dts/rockchip/rk3368-lion.dtsi | 1 -
arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi | 2 +-
.../dts/rockchip/rk3399-sapphire-excavator.dts | 2 +-
arch/powerpc/platforms/powernv/opal-irqchip.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
drivers/hid/hid-core.c | 2 +-
drivers/hid/hid-multitouch.c | 5 +
drivers/irqchip/irq-gic-v3.c | 7 +
drivers/md/dm-cache-target.c | 35 ++---
drivers/md/dm-unstripe.c | 4 +-
drivers/md/raid10.c | 23 +--
drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 3 +
drivers/media/dvb-core/dvb_frontend.c | 4 +-
drivers/media/dvb-core/dvbdev.c | 17 ++-
drivers/media/dvb-frontends/cx24116.c | 7 +-
drivers/media/dvb-frontends/stb0899_algo.c | 2 +-
drivers/media/i2c/adv7604.c | 26 ++--
drivers/media/platform/s5p-jpeg/jpeg-core.c | 17 ++-
drivers/media/usb/uvc/uvc_driver.c | 2 +-
drivers/mtd/nand/raw/nand_base.c | 44 +++---
drivers/net/can/c_can/c_can.c | 7 +-
drivers/net/ethernet/freescale/enetc/enetc_vf.c | 2 +
drivers/net/ethernet/hisilicon/hns3/hnae3.c | 5 +-
drivers/net/usb/qmi_wwan.c | 1 +
drivers/pwm/pwm-imx-tpm.c | 4 +-
drivers/spi/spi.c | 27 ++--
drivers/usb/musb/sunxi.c | 2 -
drivers/usb/serial/io_edgeport.c | 8 +-
drivers/usb/serial/option.c | 6 +
drivers/usb/serial/qcserial.c | 2 +
fs/btrfs/delayed-ref.c | 2 +-
fs/nfs/inode.c | 1 +
fs/nfsd/nfs4xdr.c | 10 +-
fs/ocfs2/file.c | 9 +-
fs/ocfs2/xattr.c | 3 +-
fs/proc/vmcore.c | 9 +-
include/linux/mm.h | 2 +
include/linux/mm_types.h | 4 +-
include/linux/mtd/rawnand.h | 2 +
include/linux/spi/spi.h | 3 +
kernel/bpf/verifier.c | 4 +-
kernel/trace/ftrace.c | 30 ++--
mm/memory.c | 56 ++++---
net/9p/client.c | 12 +-
net/bridge/br_device.c | 5 +
net/sctp/sm_statefuns.c | 2 +-
net/vmw_vsock/hyperv_transport.c | 1 +
net/vmw_vsock/virtio_transport_common.c | 1 +
security/keys/keyring.c | 7 +-
sound/Kconfig | 2 +-
sound/firewire/tascam/amdtp-tascam.c | 2 +-
sound/usb/mixer_quirks.c | 170 +++++++++++++++++++++
57 files changed, 453 insertions(+), 183 deletions(-)
Allowing the usb_2 controller GDSC to be turned off during system suspend
renders the controller unable to resume.
So use PWRSTS_RET_ON instead in order to make sure this the GDSC doesn't
go down.
Fixes: 161b7c401f4b ("clk: qcom: Add Global Clock controller (GCC) driver for X1E80100")
Cc: stable(a)vger.kernel.org # 6.8
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
drivers/clk/qcom/gcc-x1e80100.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/clk/qcom/gcc-x1e80100.c b/drivers/clk/qcom/gcc-x1e80100.c
index 8ea25aa25dff043ab4a81fee78b6173139f871b6..7288af845434d824eb91489ab97be25d665cad3a 100644
--- a/drivers/clk/qcom/gcc-x1e80100.c
+++ b/drivers/clk/qcom/gcc-x1e80100.c
@@ -6083,7 +6083,7 @@ static struct gdsc gcc_usb20_prim_gdsc = {
.pd = {
.name = "gcc_usb20_prim_gdsc",
},
- .pwrsts = PWRSTS_OFF_ON,
+ .pwrsts = PWRSTS_RET_ON,
.flags = POLL_CFG_GDSCR | RETAIN_FF_ENABLE,
};
---
base-commit: 7b4b9bf203da94fbeac75ed3116c84aa03e74578
change-id: 20250107-x1e80100-clk-gcc-fix-usb2-gdsc-pwrsts-a8eae668c7d2
Best regards,
--
Abel Vesa <abel.vesa(a)linaro.org>
Back when the CRD support was brought up, the usb_2 controller didn't
have anything connected to it in order to test it properly, so it was
never enabled.
On the Lenovo ThinkPad T14s, the usb_2 controller has the fingerprint
controller connected to it. So enabling it, proved that the interrupts
lines were wrong from the start.
Fix both the pwr_event and the DWC ctrl_irq lines, according to
documentation.
Fixes: 4af46b7bd66f ("arm64: dts: qcom: x1e80100: Add USB nodes")
Cc: stable(a)vger.kernel.org # 6.9
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
arch/arm64/boot/dts/qcom/x1e80100.dtsi | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100.dtsi b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
index e2f1873296ec7b7ffdb4c57b5c9d5b09368de168..1c3ad5ae0a41ea235cb176095cd49de7fa89ae4a 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
@@ -4631,7 +4631,7 @@ usb_2: usb@a2f8800 {
<&gcc GCC_USB20_MASTER_CLK>;
assigned-clock-rates = <19200000>, <200000000>;
- interrupts-extended = <&intc GIC_SPI 240 IRQ_TYPE_LEVEL_HIGH>,
+ interrupts-extended = <&intc GIC_SPI 245 IRQ_TYPE_LEVEL_HIGH>,
<&pdc 50 IRQ_TYPE_EDGE_BOTH>,
<&pdc 49 IRQ_TYPE_EDGE_BOTH>;
interrupt-names = "pwr_event",
@@ -4657,7 +4657,7 @@ &mc_virt SLAVE_EBI1 QCOM_ICC_TAG_ALWAYS>,
usb_2_dwc3: usb@a200000 {
compatible = "snps,dwc3";
reg = <0 0x0a200000 0 0xcd00>;
- interrupts = <GIC_SPI 241 IRQ_TYPE_LEVEL_HIGH>;
+ interrupts = <GIC_SPI 240 IRQ_TYPE_LEVEL_HIGH>;
iommus = <&apps_smmu 0x14e0 0x0>;
phys = <&usb_2_hsphy>;
phy-names = "usb2-phy";
---
base-commit: 7b4b9bf203da94fbeac75ed3116c84aa03e74578
change-id: 20250107-x1e80100-fix-usb2-controller-irqs-b226a747f73a
Best regards,
--
Abel Vesa <abel.vesa(a)linaro.org>
When the system begins to enter suspend mode, dwc3_suspend() is called
by PM suspend. There is a problem that if someone interrupt the system
suspend process between dwc3_suspend() and pm_suspend() of its parent
device, PM suspend will be canceled and attempt to resume suspended
devices so that dwc3_resume() will be called. However, dwc3 and its
parent device (like the power domain or glue driver) may already be
suspended by runtime PM in fact. If this sutiation happened, the
pm_runtime_set_active() in dwc3_resume() will return an error since
parent device was suspended. This can lead to unexpected behavior if
DWC3 proceeds to execute dwc3_resume_common().
EX.
RPM suspend: ... -> dwc3_runtime_suspend()
-> rpm_suspend() of parent device
...
PM suspend: ... -> dwc3_suspend() -> pm_suspend of parent device
^ interrupt, so resume suspended device
... <- dwc3_resume() <-/
^ pm_runtime_set_active() returns error
To prevent the problem, this commit will skip dwc3_resume_common() and
return the error if pm_runtime_set_active() fails.
Fixes: 68c26fe58182 ("usb: dwc3: set pm runtime active before resume common")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ray Chi <raychi(a)google.com>
---
drivers/usb/dwc3/core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index c22b8678e02e..7578c5133568 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -2609,12 +2609,15 @@ static int dwc3_resume(struct device *dev)
pinctrl_pm_select_default_state(dev);
pm_runtime_disable(dev);
- pm_runtime_set_active(dev);
+ ret = pm_runtime_set_active(dev);
+ if (ret)
+ goto out;
ret = dwc3_resume_common(dwc, PMSG_RESUME);
if (ret)
pm_runtime_set_suspended(dev);
+out:
pm_runtime_enable(dev);
return ret;
--
2.47.1.613.gc27f4b7a9f-goog
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: a9d9c33132d49329ada647e4514d210d15e31d81
Gitweb: https://git.kernel.org/tip/a9d9c33132d49329ada647e4514d210d15e31d81
Author: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
AuthorDate: Tue, 07 Jan 2025 15:30:56 -08:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Tue, 07 Jan 2025 15:55:51 -08:00
x86/fpu: Ensure shadow stack is active before "getting" registers
The x86 shadow stack support has its own set of registers. Those registers
are XSAVE-managed, but they are "supervisor state components" which means
that userspace can not touch them with XSAVE/XRSTOR. It also means that
they are not accessible from the existing ptrace ABI for XSAVE state.
Thus, there is a new ptrace get/set interface for it.
The regset code that ptrace uses provides an ->active() handler in
addition to the get/set ones. For shadow stack this ->active() handler
verifies that shadow stack is enabled via the ARCH_SHSTK_SHSTK bit in the
thread struct. The ->active() handler is checked from some call sites of
the regset get/set handlers, but not the ptrace ones. This was not
understood when shadow stack support was put in place.
As a result, both the set/get handlers can be called with
XFEATURE_CET_USER in its init state, which would cause get_xsave_addr() to
return NULL and trigger a WARN_ON(). The ssp_set() handler luckily has an
ssp_active() check to avoid surprising the kernel with shadow stack
behavior when the kernel is not ready for it (ARCH_SHSTK_SHSTK==0). That
check just happened to avoid the warning.
But the ->get() side wasn't so lucky. It can be called with shadow stacks
disabled, triggering the warning in practice, as reported by Christina
Schimpe:
WARNING: CPU: 5 PID: 1773 at arch/x86/kernel/fpu/regset.c:198 ssp_get+0x89/0xa0
[...]
Call Trace:
<TASK>
? show_regs+0x6e/0x80
? ssp_get+0x89/0xa0
? __warn+0x91/0x150
? ssp_get+0x89/0xa0
? report_bug+0x19d/0x1b0
? handle_bug+0x46/0x80
? exc_invalid_op+0x1d/0x80
? asm_exc_invalid_op+0x1f/0x30
? __pfx_ssp_get+0x10/0x10
? ssp_get+0x89/0xa0
? ssp_get+0x52/0xa0
__regset_get+0xad/0xf0
copy_regset_to_user+0x52/0xc0
ptrace_regset+0x119/0x140
ptrace_request+0x13c/0x850
? wait_task_inactive+0x142/0x1d0
? do_syscall_64+0x6d/0x90
arch_ptrace+0x102/0x300
[...]
Ensure that shadow stacks are active in a thread before looking them up
in the XSAVE buffer. Since ARCH_SHSTK_SHSTK and user_ssp[SHSTK_EN] are
set at the same time, the active check ensures that there will be
something to find in the XSAVE buffer.
[ dhansen: changelog/subject tweaks ]
Fixes: 2fab02b25ae7 ("x86: Add PTRACE interface for shadow stack")
Reported-by: Christina Schimpe <christina.schimpe(a)intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Tested-by: Christina Schimpe <christina.schimpe(a)intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250107233056.235536-1-rick.p.edgecombe%40inte…
---
arch/x86/kernel/fpu/regset.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 6bc1eb2..887b0b8 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -190,7 +190,8 @@ int ssp_get(struct task_struct *target, const struct user_regset *regset,
struct fpu *fpu = &target->thread.fpu;
struct cet_user_state *cetregs;
- if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK))
+ if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) ||
+ !ssp_active(target, regset))
return -ENODEV;
sync_fpstate(fpu);
The x86 shadow stack support has its own set of registers. Those registers
are XSAVE-managed, but they are "supervisor state components" which means
that userspace can't touch them with XSAVE/XRSTOR. It also means that
they are not accessible from the existing ptrace ABI like the FPU register
or GPRs. Thus, there is a new ptrace get/set interface for it.
The regset code that ptrace uses provides an ->active() handler in
addition to the get/set ones. For shadow stack this ->active() handler
verifies that shadow stack is enabled via the ARCH_SHSTK_SHSTK bit in the
thread struct. The ->active() handler is checked from some callsites of
the regset get/set handlers, but not the ptrace ones. This was not
understood when shadow stack support was put in place.
As a result, both the set/get handlers can be called with
XFEATURE_CET_USER in its init state, which would cause get_xsave_addr() to
return NULL and trigger a WARN_ON(). The ssp_set() handler luckily has an
ssp_active() check to avoid surprising the kernel with shadow stack
behavior when the kernel is not read for it (ARCH_SHSTK_SHSTK==0). That
check just happened to avoid the warning.
But the ->get() side wasn't so lucky. It can be called with shadow stacks
disabled, triggering the warning in practice, as reported by Christina
Schimpe:
WARNING: CPU: 5 PID: 1773 at arch/x86/kernel/fpu/regset.c:198 ssp_get+0x89/0xa0
[...]
Call Trace:
<TASK>
? show_regs+0x6e/0x80
? ssp_get+0x89/0xa0
? __warn+0x91/0x150
? ssp_get+0x89/0xa0
? report_bug+0x19d/0x1b0
? handle_bug+0x46/0x80
? exc_invalid_op+0x1d/0x80
? asm_exc_invalid_op+0x1f/0x30
? __pfx_ssp_get+0x10/0x10
? ssp_get+0x89/0xa0
? ssp_get+0x52/0xa0
__regset_get+0xad/0xf0
copy_regset_to_user+0x52/0xc0
ptrace_regset+0x119/0x140
ptrace_request+0x13c/0x850
? wait_task_inactive+0x142/0x1d0
? do_syscall_64+0x6d/0x90
arch_ptrace+0x102/0x300
[...]
Ensure that shadow stacks are active in a thread before looking them up
in the XSAVE buffer. Since ARCH_SHSTK_SHSTK and user_ssp[SHSTK_EN] are
set at the same time, the active check ensures that there will be
something to find in the XSAVE buffer.
Fixes: 2fab02b25ae7 ("x86: Add PTRACE interface for shadow stack")
Reported-by: Christina Schimpe <christina.schimpe(a)intel.com>
Tested-by: Christina Schimpe <christina.schimpe(a)intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Cc: stable(a)vger.kernel.org
---
v2:
- Incorporate log feedback from Dave here:
https://lore.kernel.org/lkml/81d3af8f-bad8-4559-8a0f-3271dd7f0abc@intel.com/
arch/x86/kernel/fpu/regset.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 6bc1eb2a21bd..887b0b8e21e3 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -190,7 +190,8 @@ int ssp_get(struct task_struct *target, const struct user_regset *regset,
struct fpu *fpu = &target->thread.fpu;
struct cet_user_state *cetregs;
- if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK))
+ if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) ||
+ !ssp_active(target, regset))
return -ENODEV;
sync_fpstate(fpu);
--
2.47.1