Hello,
New build issue found on stable-rc/linux-6.13.y:
---
format specifies type 'unsigned long' but the argument has type
'size_t' (aka 'unsigned int') [-Werror,-Wformat] in
drivers/gpu/drm/xe/xe_guc_ct.o (drivers/gpu/drm/xe/xe_guc_ct.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/i/maestro:654a7499ce48b09c6748b92c5ad90055057a67a1
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: c025445840beab62deab3af5afa8429f63e8f186
Log excerpt:
=====================================================
drivers/gpu/drm/xe/xe_guc_ct.c:1703:43: error: format specifies type
'unsigned long' but the argument has type 'size_t' (aka 'unsigned
int') [-Werror,-Wformat]
1703 | drm_printf(p, "[CTB].length: 0x%lx\n",
snapshot->ctb_size);
| ~~~
^~~~~~~~~~~~~~~~~~
| %zx
1 error generated.
=====================================================
# Builds where the incident occurred:
## i386_defconfig+allmodconfig+CONFIG_FRAME_WARN=2048 on (i386):
- compiler: clang-17
- dashboard: https://d.kernelci.org/build/maestro:67d98d1028b1441c081c5d19
#kernelci issue maestro:654a7499ce48b09c6748b92c5ad90055057a67a1
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-5.4.y:
---
clang: error: linker command failed with exit code 1 (use -v to see
invocation) in samples/seccomp/bpf-fancy (scripts/Makefile.host:116)
[logspec:kbuild,kbuild.other]
---
- dashboard: https://d.kernelci.org/i/maestro:9b282409ffe9399386349927812ed439dcc91837
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: bbf51067f14852464d75398076a963edbe1e5cc0
Log excerpt:
=====================================================
.o
Generating X.509 key generation config
/usr/bin/ld: cannot find crtbeginS.o: No such file or directory
/usr/bin/ld: cannot find -lgcc: No such file or directory
/usr/bin/ld: cannot find -lgcc_s: No such file or directory
clang: error: linker command failed with exit code 1 (use -v to see invocation)
=====================================================
# Builds where the incident occurred:
## i386_defconfig+allmodconfig+CONFIG_FRAME_WARN=2048 on (i386):
- compiler: clang-17
- dashboard: https://d.kernelci.org/build/maestro:67d9892328b1441c081c5044
#kernelci issue maestro:9b282409ffe9399386349927812ed439dcc91837
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-6.13.y:
---
format specifies type 'unsigned long' but the argument has type
'size_t' (aka 'unsigned int') [-Werror,-Wformat] in
drivers/gpu/drm/xe/xe_guc_ct.o (drivers/gpu/drm/xe/xe_guc_ct.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/i/maestro:654a7499ce48b09c6748b92c5ad90055057a67a1
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: c025445840beab62deab3af5afa8429f63e8f186
Log excerpt:
=====================================================
drivers/gpu/drm/xe/xe_guc_ct.c:1703:43: error: format specifies type
'unsigned long' but the argument has type 'size_t' (aka 'unsigned
int') [-Werror,-Wformat]
1703 | drm_printf(p, "[CTB].length: 0x%lx\n",
snapshot->ctb_size);
| ~~~
^~~~~~~~~~~~~~~~~~
| %zx
1 error generated.
=====================================================
# Builds where the incident occurred:
## i386_defconfig+allmodconfig+CONFIG_FRAME_WARN=2048 on (i386):
- compiler: clang-17
- dashboard: https://d.kernelci.org/build/maestro:67d98d1028b1441c081c5d19
#kernelci issue maestro:654a7499ce48b09c6748b92c5ad90055057a67a1
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a18a69bbec083093c3bfebaec13ce0b4c6b2af7e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031846-septic-french-4efb@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a18a69bbec083093c3bfebaec13ce0b4c6b2af7e Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch(a)lst.de>
Date: Fri, 30 Aug 2024 15:37:00 -0700
Subject: [PATCH] xfs: use the recalculated transaction reservation in
xfs_growfs_rt_bmblock
After going great length to calculate the transaction reservation for
the new geometry, we should also use it to allocate the transaction it
was calculated for.
Fixes: 578bd4ce7100 ("xfs: recompute growfsrtfree transaction reservation while growing rt volume")
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Darrick J. Wong <djwong(a)kernel.org>
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index d290749b0304..a9f08d96f1fe 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -730,10 +730,12 @@ xfs_growfs_rt_bmblock(
xfs_rtsummary_blockcount(mp, nmp->m_rsumlevels,
nmp->m_sb.sb_rbmblocks));
- /* recompute growfsrt reservation from new rsumsize */
+ /*
+ * Recompute the growfsrt reservation from the new rsumsize, so that the
+ * transaction below use the new, potentially larger value.
+ * */
xfs_trans_resv_calc(nmp, &nmp->m_resv);
-
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtfree, 0, 0, 0,
+ error = xfs_trans_alloc(mp, &M_RES(nmp)->tr_growrtfree, 0, 0, 0,
&args.tp);
if (error)
goto out_free;
Hello,
New build issue found on stable-rc/linux-6.1.y:
---
implicit declaration of function ‘vunmap’; did you mean ‘kunmap’?
[-Werror=implicit-function-declaration] in io_uring/io_uring.o
(io_uring/io_uring.c) [logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/i/maestro:cbcc52388974e489070975f640bce79475aeb50a
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 463226a52e45030747cdf2c689bf719e1ab21055
Log excerpt:
=====================================================
io_uring/io_uring.c:2540:17: error: implicit declaration of function
‘vunmap’; did you mean ‘kunmap’?
[-Werror=implicit-function-declaration]
2540 | vunmap(ptr);
| ^~~~~~
| kunmap
io_uring/io_uring.c: In function ‘io_mem_alloc_single’:
io_uring/io_uring.c:2588:15: error: implicit declaration of function
‘vmap’; did you mean ‘kmap’? [-Werror=implicit-function-declaration]
2588 | ret = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
| ^~~~
| kmap
io_uring/io_uring.c:2588:37: error: ‘VM_MAP’ undeclared (first use in
this function); did you mean ‘VM_MTE’?
2588 | ret = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
| ^~~~~~
| VM_MTE
io_uring/io_uring.c:2588:37: note: each undeclared identifier is
reported only once for each function it appears in
CC fs/read_write.o
CC kernel/bpf/core.o
cc1: some warnings being treated as errors
=====================================================
# Builds where the incident occurred:
## 32r2el_defconfig on (mips):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67d98b2228b1441c081c54ba
#kernelci issue maestro:cbcc52388974e489070975f640bce79475aeb50a
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-6.6.y:
---
‘fb_center_logo’ undeclared (first use in this function); did you
mean ‘fb_prepare_logo’? in drivers/video/fbdev/core/fbcon.o
(drivers/video/fbdev/core/fbcon.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/i/maestro:e495b4ec34845e228a5f1639d0ec805fdbb6e7c0
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 52baa369b052eae3278dda3062d63a3058eb9cfe
Log excerpt:
=====================================================
drivers/video/fbdev/core/fbcon.c:478:33: error: ‘fb_center_logo’
undeclared (first use in this function); did you mean
‘fb_prepare_logo’?
478 | fb_center_logo = true;
| ^~~~~~~~~~~~~~
| fb_prepare_logo
drivers/video/fbdev/core/fbcon.c:478:33: note: each undeclared
identifier is reported only once for each function it appears in
CC fs/ubifs/budget.o
drivers/video/fbdev/core/fbcon.c:485:33: error: ‘fb_logo_count’
undeclared (first use in this function); did you mean ‘file_count’?
485 | fb_logo_count =
simple_strtol(options, &options, 0);
| ^~~~~~~~~~~~~
| file_count
=====================================================
# Builds where the incident occurred:
## cros://chromeos-6.6/arm64/chromiumos-mediatek.flavour.config+lab-setup+arm64-chromebook+CONFIG_MODULE_COMPRESS=n+CONFIG_MODULE_COMPRESS_NONE=y
on (arm64):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67d98b4f28b1441c081c5566
## cros://chromeos-6.6/x86_64/chromeos-amd-stoneyridge.flavour.config+lab-setup+x86-board+CONFIG_MODULE_COMPRESS=n+CONFIG_MODULE_COMPRESS_NONE=y
on (x86_64):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67d98b5428b1441c081c558c
## cros://chromeos-6.6/x86_64/chromeos-intel-pineview.flavour.config+lab-setup+x86-board+CONFIG_MODULE_COMPRESS=n+CONFIG_MODULE_COMPRESS_NONE=y
on (x86_64):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67d98b5928b1441c081c558f
## multi_v5_defconfig on (arm):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67d98b8f28b1441c081c5622
## multi_v7_defconfig on (arm):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67d98b8028b1441c081c55fe
## multi_v7_defconfig+kselftest on (arm):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67d98b8a28b1441c081c561f
#kernelci issue maestro:e495b4ec34845e228a5f1639d0ec805fdbb6e7c0
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-6.6.y:
---
implicit declaration of function ‘vunmap’; did you mean ‘kunmap’?
[-Werror=implicit-function-declaration] in io_uring/io_uring.o
(io_uring/io_uring.c) [logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/i/maestro:160797c1391e9c7479eace7259b46a47c35c7db7
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 52baa369b052eae3278dda3062d63a3058eb9cfe
Log excerpt:
=====================================================
io_uring/io_uring.c:2708:17: error: implicit declaration of function
‘vunmap’; did you mean ‘kunmap’?
[-Werror=implicit-function-declaration]
2708 | vunmap(ptr);
| ^~~~~~
| kunmap
io_uring/io_uring.c: In function ‘__io_uaddr_map’:
io_uring/io_uring.c:2784:21: error: implicit declaration of function
‘vmap’; did you mean ‘kmap’? [-Werror=implicit-function-declaration]
2784 | page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL);
| ^~~~
| kmap
io_uring/io_uring.c:2784:48: error: ‘VM_MAP’ undeclared (first use in
this function); did you mean ‘VM_MTE’?
2784 | page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL);
| ^~~~~~
| VM_MTE
io_uring/io_uring.c:2784:48: note: each undeclared identifier is
reported only once for each function it appears in
io_uring/io_uring.c: In function ‘io_mem_alloc_single’:
io_uring/io_uring.c:2863:37: error: ‘VM_MAP’ undeclared (first use in
this function); did you mean ‘VM_MTE’?
2863 | ret = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
| ^~~~~~
| VM_MTE
CC crypto/sha256_generic.o
cc1: some warnings being treated as errors
=====================================================
# Builds where the incident occurred:
## 32r2el_defconfig on (mips):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67d98bca28b1441c081c56f5
#kernelci issue maestro:160797c1391e9c7479eace7259b46a47c35c7db7
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
The max_bad_eraseblocks_per_lun member of nand_device obviously
describes a number of *maximum* number of bad eraseblocks per LUN.
Fix this obvious typo.
Fixes: 377e517b5fa5 ("mtd: nand: Add max_bad_eraseblocks_per_lun info to memorg")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
include/linux/mtd/nand.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 0e2f228e8b4a..07486168d104 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -21,7 +21,7 @@ struct nand_device;
* @oobsize: OOB area size
* @pages_per_eraseblock: number of pages per eraseblock
* @eraseblocks_per_lun: number of eraseblocks per LUN (Logical Unit Number)
- * @max_bad_eraseblocks_per_lun: maximum number of eraseblocks per LUN
+ * @max_bad_eraseblocks_per_lun: maximum number of bad eraseblocks per LUN
* @planes_per_lun: number of planes per LUN
* @luns_per_target: number of LUN per target (target is a synonym for die)
* @ntargets: total number of targets exposed by the NAND device
--
2.48.1
Potentially include errata for Landlock ABI v5 (Linux 6.10) and v6
(Linux 6.12). That will be useful for the following signal scoping
erratum.
As explained in errata.h, this commit should be backportable without
conflict down to ABI v5. It must then not include the errata/abi-6.h
file.
Fixes: 54a6e6bbf3be ("landlock: Add signal scoping")
Cc: Günther Noack <gnoack(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
Link: https://lore.kernel.org/r/20250318161443.279194-5-mic@digikod.net
---
Changes since v1:
- New patch.
---
security/landlock/errata.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/security/landlock/errata.h b/security/landlock/errata.h
index f26b28b9873d..8e626accac10 100644
--- a/security/landlock/errata.h
+++ b/security/landlock/errata.h
@@ -63,6 +63,18 @@ static const struct landlock_erratum landlock_errata_init[] __initconst = {
#endif
#undef LANDLOCK_ERRATA_ABI
+#define LANDLOCK_ERRATA_ABI 5
+#if __has_include("errata/abi-5.h")
+#include "errata/abi-5.h"
+#endif
+#undef LANDLOCK_ERRATA_ABI
+
+#define LANDLOCK_ERRATA_ABI 6
+#if __has_include("errata/abi-6.h")
+#include "errata/abi-6.h"
+#endif
+#undef LANDLOCK_ERRATA_ABI
+
/*
* For each new erratum, we need to include all the ABI files up to the impacted
* ABI to make all potential future intermediate errata easy to backport.
--
2.48.1
Hello,
New build issue found on stable-rc/linux-6.6.y:
---
implicit declaration of function ‘vunmap’; did you mean ‘kunmap’?
[-Werror=implicit-function-declaration] in io_uring/io_uring.o
(io_uring/io_uring.c) [logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/i/maestro:160797c1391e9c7479eace7259b46a47c35c7db7
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 52baa369b052eae3278dda3062d63a3058eb9cfe
Log excerpt:
=====================================================
io_uring/io_uring.c:2708:17: error: implicit declaration of function
‘vunmap’; did you mean ‘kunmap’?
[-Werror=implicit-function-declaration]
2708 | vunmap(ptr);
| ^~~~~~
| kunmap
io_uring/io_uring.c: In function ‘__io_uaddr_map’:
io_uring/io_uring.c:2784:21: error: implicit declaration of function
‘vmap’; did you mean ‘kmap’? [-Werror=implicit-function-declaration]
2784 | page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL);
| ^~~~
| kmap
io_uring/io_uring.c:2784:48: error: ‘VM_MAP’ undeclared (first use in
this function); did you mean ‘VM_MTE’?
2784 | page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL);
| ^~~~~~
| VM_MTE
io_uring/io_uring.c:2784:48: note: each undeclared identifier is
reported only once for each function it appears in
io_uring/io_uring.c: In function ‘io_mem_alloc_single’:
io_uring/io_uring.c:2863:37: error: ‘VM_MAP’ undeclared (first use in
this function); did you mean ‘VM_MTE’?
2863 | ret = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
| ^~~~~~
| VM_MTE
CC crypto/sha256_generic.o
cc1: some warnings being treated as errors
=====================================================
# Builds where the incident occurred:
## 32r2el_defconfig on (mips):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67d98bca28b1441c081c56f5
#kernelci issue maestro:160797c1391e9c7479eace7259b46a47c35c7db7
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Backport this series to 6.1&6.6 because we get build errors with GCC14
and OpenSSL3 (or later):
certs/extract-cert.c: In function 'main':
certs/extract-cert.c:124:17: error: implicit declaration of function 'ENGINE_load_builtin_engines' [-Wimplicit-function-declaration]
124 | ENGINE_load_builtin_engines();
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
certs/extract-cert.c:126:21: error: implicit declaration of function 'ENGINE_by_id' [-Wimplicit-function-declaration]
126 | e = ENGINE_by_id("pkcs11");
| ^~~~~~~~~~~~
certs/extract-cert.c:126:19: error: assignment to 'ENGINE *' {aka 'struct engine_st *'} from 'int' makes pointer from integer without a cast [-Wint-conversion]
126 | e = ENGINE_by_id("pkcs11");
| ^
certs/extract-cert.c:128:21: error: implicit declaration of function 'ENGINE_init' [-Wimplicit-function-declaration]
128 | if (ENGINE_init(e))
| ^~~~~~~~~~~
certs/extract-cert.c:133:30: error: implicit declaration of function 'ENGINE_ctrl_cmd_string' [-Wimplicit-function-declaration]
133 | ERR(!ENGINE_ctrl_cmd_string(e, "PIN", key_pass, 0), "Set PKCS#11 PIN");
| ^~~~~~~~~~~~~~~~~~~~~~
certs/extract-cert.c:64:32: note: in definition of macro 'ERR'
64 | bool __cond = (cond); \
| ^~~~
certs/extract-cert.c:134:17: error: implicit declaration of function 'ENGINE_ctrl_cmd' [-Wimplicit-function-declaration]
134 | ENGINE_ctrl_cmd(e, "LOAD_CERT_CTRL", 0, &parms, NULL, 1);
| ^~~~~~~~~~~~~~~
In theory 5.4&5.10&5.15 also need this, but they need more efforts because
file paths are different.
The ENGINE interface has its limitations and it has been superseded
by the PROVIDER API, it is deprecated in OpenSSL version 3.0.
Some distros have started removing it from header files.
Update sign-file and extract-cert to use PROVIDER API for OpenSSL Major >= 3.
Tested on F39 with openssl-3.1.1, pkcs11-provider-0.5-2, openssl-pkcs11-0.4.12-4
and softhsm-2.6.1-5 by using same key/cert as PEM and PKCS11 and comparing that
the result is identical.
Jan Stancek (3):
sign-file,extract-cert: move common SSL helper functions to a header
sign-file,extract-cert: avoid using deprecated ERR_get_error_line()
sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3
Signed-off-by: Jan Stancek <jstancek(a)redhat.com>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
MAINTAINERS | 1 +
certs/Makefile | 2 +-
certs/extract-cert.c | 138 +++++++++++++++++++++++--------------------
scripts/sign-file.c | 134 +++++++++++++++++++++--------------------
scripts/ssl-common.h | 32 ++++++++++
5 files changed, 178 insertions(+), 129 deletions(-)
create mode 100644 scripts/ssl-common.h
---
2.27.0
From: Kang Yang <quic_kangyang(a)quicinc.com>
[ Upstream commit 95c38953cb1ecf40399a676a1f85dfe2b5780a9a ]
When running 'rmmod ath10k', ath10k_sdio_remove() will free sdio
workqueue by destroy_workqueue(). But if CONFIG_INIT_ON_FREE_DEFAULT_ON
is set to yes, kernel panic will happen:
Call trace:
destroy_workqueue+0x1c/0x258
ath10k_sdio_remove+0x84/0x94
sdio_bus_remove+0x50/0x16c
device_release_driver_internal+0x188/0x25c
device_driver_detach+0x20/0x2c
This is because during 'rmmod ath10k', ath10k_sdio_remove() will call
ath10k_core_destroy() before destroy_workqueue(). wiphy_dev_release()
will finally be called in ath10k_core_destroy(). This function will free
struct cfg80211_registered_device *rdev and all its members, including
wiphy, dev and the pointer of sdio workqueue. Then the pointer of sdio
workqueue will be set to NULL due to CONFIG_INIT_ON_FREE_DEFAULT_ON.
After device release, destroy_workqueue() will use NULL pointer then the
kernel panic happen.
Call trace:
ath10k_sdio_remove
->ath10k_core_unregister
……
->ath10k_core_stop
->ath10k_hif_stop
->ath10k_sdio_irq_disable
->ath10k_hif_power_down
->del_timer_sync(&ar_sdio->sleep_timer)
->ath10k_core_destroy
->ath10k_mac_destroy
->ieee80211_free_hw
->wiphy_free
……
->wiphy_dev_release
->destroy_workqueue
Need to call destroy_workqueue() before ath10k_core_destroy(), free
the work queue buffer first and then free pointer of work queue by
ath10k_core_destroy(). This order matches the error path order in
ath10k_sdio_probe().
No work will be queued on sdio workqueue between it is destroyed and
ath10k_core_destroy() is called. Based on the call_stack above, the
reason is:
Only ath10k_sdio_sleep_timer_handler(), ath10k_sdio_hif_tx_sg() and
ath10k_sdio_irq_disable() will queue work on sdio workqueue.
Sleep timer will be deleted before ath10k_core_destroy() in
ath10k_hif_power_down().
ath10k_sdio_irq_disable() only be called in ath10k_hif_stop().
ath10k_core_unregister() will call ath10k_hif_power_down() to stop hif
bus, so ath10k_sdio_hif_tx_sg() won't be called anymore.
Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00189
Signed-off-by: Kang Yang <quic_kangyang(a)quicinc.com>
Tested-by: David Ruth <druth(a)chromium.org>
Reviewed-by: David Ruth <druth(a)chromium.org>
Link: https://patch.msgid.link/20241008022246.1010-1-quic_kangyang@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson(a)quicinc.com>
Signed-off-by: Alva Lan <alvalan9(a)foxmail.com>
---
drivers/net/wireless/ath/ath10k/sdio.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c
index 63e1c2d783c5..b2e0abb5b2b5 100644
--- a/drivers/net/wireless/ath/ath10k/sdio.c
+++ b/drivers/net/wireless/ath/ath10k/sdio.c
@@ -3,6 +3,7 @@
* Copyright (c) 2004-2011 Atheros Communications Inc.
* Copyright (c) 2011-2012,2017 Qualcomm Atheros, Inc.
* Copyright (c) 2016-2017 Erik Stromdahl <erik.stromdahl(a)gmail.com>
+ * Copyright (c) 2022-2024 Qualcomm Innovation Center, Inc. All rights reserved.
*/
#include <linux/module.h>
@@ -2648,9 +2649,9 @@ static void ath10k_sdio_remove(struct sdio_func *func)
netif_napi_del(&ar->napi);
- ath10k_core_destroy(ar);
-
destroy_workqueue(ar_sdio->workqueue);
+
+ ath10k_core_destroy(ar);
}
static const struct sdio_device_id ath10k_sdio_devices[] = {
--
2.34.1
From: Hillf Danton <hdanton(a)sina.com>
Once a key's reference count has been reduced to 0, the garbage collector
thread may destroy it at any time and so key_put() is not allowed to touch
the key after that point. The most key_put() is normally allowed to do is
to touch key_gc_work as that's a static global variable.
However, in an effort to speed up the reclamation of quota, this is now
done in key_put() once the key's usage is reduced to 0 - but now the code
is looking at the key after the deadline, which is forbidden.
Fix this on an expedited basis[*] by taking a ref on the key->user struct
and caching the key length before dropping the refcount so that we can
reduce the quota afterwards if we dropped the last ref.
[*] This is going to hurt key_put() performance, so a better way is
probably necessary, such as sticking the dead key onto a queue for the
garbage collector to pick up rather than having it scan the serial number
registry.
Fixes: 9578e327b2b4 ("keys: update key quotas in key_put()")
Reported-by: syzbot+6105ffc1ded71d194d6d(a)syzkaller.appspotmail.com
Tested-by: syzbot+6105ffc1ded71d194d6d(a)syzkaller.appspotmail.com
Suggested-by: Hillf Danton <hdanton(a)sina.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
cc: Jarkko Sakkinen <jarkko(a)kernel.org>
cc: Kees Cook <kees(a)kernel.org>
cc: keyrings(a)vger.kernel.org
cc: stable(a)vger.kernel.org # v6.10+
---
security/keys/key.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/security/keys/key.c b/security/keys/key.c
index 3d7d185019d3..1e6028492355 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -645,21 +645,30 @@ EXPORT_SYMBOL(key_reject_and_link);
*/
void key_put(struct key *key)
{
+ int quota_flag;
+ unsigned short len;
+ struct key_user *user;
+
if (key) {
key_check(key);
+ quota_flag = test_bit(KEY_FLAG_IN_QUOTA, &key->flags);
+ len = key->quotalen;
+ user = key->user;
+ refcount_inc(&user->usage);
if (refcount_dec_and_test(&key->usage)) {
unsigned long flags;
/* deal with the user's key tracking and quota */
- if (test_bit(KEY_FLAG_IN_QUOTA, &key->flags)) {
- spin_lock_irqsave(&key->user->lock, flags);
- key->user->qnkeys--;
- key->user->qnbytes -= key->quotalen;
- spin_unlock_irqrestore(&key->user->lock, flags);
+ if (quota_flag) {
+ spin_lock_irqsave(&user->lock, flags);
+ user->qnkeys--;
+ user->qnbytes -= len;
+ spin_unlock_irqrestore(&user->lock, flags);
}
schedule_work(&key_gc_work);
}
+ key_user_put(user);
}
}
EXPORT_SYMBOL(key_put);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x df27cef153603b18a7d094b53cc3d5264ff32797
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031635-resent-sniff-676f@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df27cef153603b18a7d094b53cc3d5264ff32797 Mon Sep 17 00:00:00 2001
From: Benno Lossin <benno.lossin(a)proton.me>
Date: Wed, 5 Mar 2025 13:29:01 +0000
Subject: [PATCH] rust: init: fix `Zeroable` implementation for
`Option<NonNull<T>>` and `Option<KBox<T>>`
According to [1], `NonNull<T>` and `#[repr(transparent)]` wrapper types
such as our custom `KBox<T>` have the null pointer optimization only if
`T: Sized`. Thus remove the `Zeroable` implementation for the unsized
case.
Link: https://doc.rust-lang.org/stable/std/option/index.html#representation [1]
Reported-by: Alice Ryhl <aliceryhl(a)google.com>
Closes: https://lore.kernel.org/rust-for-linux/CAH5fLghL+qzrD8KiCF1V3vf2YcC6aWySzkm…
Cc: stable(a)vger.kernel.org # v6.12+ (a custom patch will be needed for 6.6.y)
Fixes: 38cde0bd7b67 ("rust: init: add `Zeroable` trait and `init::zeroed` function")
Signed-off-by: Benno Lossin <benno.lossin(a)proton.me>
Reviewed-by: Alice Ryhl <aliceryhl(a)google.com>
Reviewed-by: Andreas Hindborg <a.hindborg(a)kernel.org>
Link: https://lore.kernel.org/r/20250305132836.2145476-1-benno.lossin@proton.me
[ Added Closes tag and moved up the Reported-by one. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
diff --git a/rust/kernel/init.rs b/rust/kernel/init.rs
index 7fd1ea8265a5..8bbd5e3398fc 100644
--- a/rust/kernel/init.rs
+++ b/rust/kernel/init.rs
@@ -1418,17 +1418,14 @@ macro_rules! impl_zeroable {
// SAFETY: `T: Zeroable` and `UnsafeCell` is `repr(transparent)`.
{<T: ?Sized + Zeroable>} UnsafeCell<T>,
- // SAFETY: All zeros is equivalent to `None` (option layout optimization guarantee).
+ // SAFETY: All zeros is equivalent to `None` (option layout optimization guarantee:
+ // https://doc.rust-lang.org/stable/std/option/index.html#representation).
Option<NonZeroU8>, Option<NonZeroU16>, Option<NonZeroU32>, Option<NonZeroU64>,
Option<NonZeroU128>, Option<NonZeroUsize>,
Option<NonZeroI8>, Option<NonZeroI16>, Option<NonZeroI32>, Option<NonZeroI64>,
Option<NonZeroI128>, Option<NonZeroIsize>,
-
- // SAFETY: All zeros is equivalent to `None` (option layout optimization guarantee).
- //
- // In this case we are allowed to use `T: ?Sized`, since all zeros is the `None` variant.
- {<T: ?Sized>} Option<NonNull<T>>,
- {<T: ?Sized>} Option<KBox<T>>,
+ {<T>} Option<NonNull<T>>,
+ {<T>} Option<KBox<T>>,
// SAFETY: `null` pointer is valid.
//
Once device_register() failed, we should call put_device() to
decrement reference count for cleanup. Or it could cause memory leak.
And move callback function before put_device().
As comment of device_register() says, 'NOTE: _Never_ directly free
@dev after calling this function, even if it returned an error! Always
use put_device() to give up the reference initialized in this function
instead.'
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: baa057e29b58 ("media: v4l2-dev: use pr_foo() for printing messages")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- modified the patch as no callback function before put_device().
---
drivers/media/v4l2-core/v4l2-dev.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
index 5bcaeeba4d09..4a8fdf8115c0 100644
--- a/drivers/media/v4l2-core/v4l2-dev.c
+++ b/drivers/media/v4l2-core/v4l2-dev.c
@@ -1054,17 +1054,16 @@ int __video_register_device(struct video_device *vdev,
vdev->dev.class = &video_class;
vdev->dev.devt = MKDEV(VIDEO_MAJOR, vdev->minor);
vdev->dev.parent = vdev->dev_parent;
+ vdev->dev.release = v4l2_device_release;
dev_set_name(&vdev->dev, "%s%d", name_base, vdev->num);
mutex_lock(&videodev_lock);
ret = device_register(&vdev->dev);
if (ret < 0) {
mutex_unlock(&videodev_lock);
pr_err("%s: device_register failed\n", __func__);
- goto cleanup;
+ put_device(&vdev->dev);
+ return ret;
}
- /* Register the release callback that will be called when the last
- reference to the device goes away. */
- vdev->dev.release = v4l2_device_release;
if (nr != -1 && nr != vdev->num && warn_if_nr_in_use)
pr_warn("%s: requested %s%d, got %s\n", __func__,
--
2.25.1
From: Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
Commit 7fdaf8966aae ("can: ucan: use strscpy() to instead of strncpy()")
unintentionally introduced a one byte out of bound read on strscpy()'s
source argument (which is kind of ironic knowing that strscpy() is meant
to be a more secure alternative :)).
Let's consider below buffers:
dest[len + 1]; /* will be NUL terminated */
src[len]; /* may not be NUL terminated */
When doing:
strncpy(dest, src, len);
dest[len] = '\0';
strncpy() will read up to len bytes from src.
On the other hand:
strscpy(dest, src, len + 1);
will read up to len + 1 bytes from src, that is to say, an out of bound
read of one byte will occur on src if it is not NUL terminated. Note
that the src[len] byte is never copied, but strscpy() still needs to
read it to check whether a truncation occurred or not.
This exact pattern happened in ucan.
The root cause is that the source is not NUL terminated. Instead of
doing a copy in a local buffer, directly NUL terminate it as soon as
usb_control_msg() returns. With this, the local firmware_str[] variable
can be removed.
On top of this do a couple refactors:
- ucan_ctl_payload->raw is only used for the firmware string, so
rename it to ucan_ctl_payload->fw_str and change its type from u8 to
char.
- ucan_device_request_in() is only used to retrieve the firmware
string, so rename it to ucan_get_fw_str() and refactor it to make it
directly handle all the string termination logic.
Reported-by: syzbot+d7d8c418e8317899e88c(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-can/67b323a4.050a0220.173698.002b.GAE@google.…
Fixes: 7fdaf8966aae ("can: ucan: use strscpy() to instead of strncpy()")
Signed-off-by: Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
Link: https://patch.msgid.link/20250218143515.627682-2-mailhol.vincent@wanadoo.fr
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/ucan.c | 43 ++++++++++++++++----------------------
1 file changed, 18 insertions(+), 25 deletions(-)
diff --git a/drivers/net/can/usb/ucan.c b/drivers/net/can/usb/ucan.c
index 39a63b7313a4..07406daf7c88 100644
--- a/drivers/net/can/usb/ucan.c
+++ b/drivers/net/can/usb/ucan.c
@@ -186,7 +186,7 @@ union ucan_ctl_payload {
*/
struct ucan_ctl_cmd_get_protocol_version cmd_get_protocol_version;
- u8 raw[128];
+ u8 fw_str[128];
} __packed;
enum {
@@ -424,18 +424,20 @@ static int ucan_ctrl_command_out(struct ucan_priv *up,
UCAN_USB_CTL_PIPE_TIMEOUT);
}
-static int ucan_device_request_in(struct ucan_priv *up,
- u8 cmd, u16 subcmd, u16 datalen)
+static void ucan_get_fw_str(struct ucan_priv *up, char *fw_str, size_t size)
{
- return usb_control_msg(up->udev,
- usb_rcvctrlpipe(up->udev, 0),
- cmd,
- USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE,
- subcmd,
- 0,
- up->ctl_msg_buffer,
- datalen,
- UCAN_USB_CTL_PIPE_TIMEOUT);
+ int ret;
+
+ ret = usb_control_msg(up->udev, usb_rcvctrlpipe(up->udev, 0),
+ UCAN_DEVICE_GET_FW_STRING,
+ USB_DIR_IN | USB_TYPE_VENDOR |
+ USB_RECIP_DEVICE,
+ 0, 0, fw_str, size - 1,
+ UCAN_USB_CTL_PIPE_TIMEOUT);
+ if (ret > 0)
+ fw_str[ret] = '\0';
+ else
+ strscpy(fw_str, "unknown", size);
}
/* Parse the device information structure reported by the device and
@@ -1314,7 +1316,6 @@ static int ucan_probe(struct usb_interface *intf,
u8 in_ep_addr;
u8 out_ep_addr;
union ucan_ctl_payload *ctl_msg_buffer;
- char firmware_str[sizeof(union ucan_ctl_payload) + 1];
udev = interface_to_usbdev(intf);
@@ -1527,17 +1528,6 @@ static int ucan_probe(struct usb_interface *intf,
*/
ucan_parse_device_info(up, &ctl_msg_buffer->cmd_get_device_info);
- /* just print some device information - if available */
- ret = ucan_device_request_in(up, UCAN_DEVICE_GET_FW_STRING, 0,
- sizeof(union ucan_ctl_payload));
- if (ret > 0) {
- /* copy string while ensuring zero termination */
- strscpy(firmware_str, up->ctl_msg_buffer->raw,
- sizeof(union ucan_ctl_payload) + 1);
- } else {
- strcpy(firmware_str, "unknown");
- }
-
/* device is compatible, reset it */
ret = ucan_ctrl_command_out(up, UCAN_COMMAND_RESET, 0, 0);
if (ret < 0)
@@ -1555,7 +1545,10 @@ static int ucan_probe(struct usb_interface *intf,
/* initialisation complete, log device info */
netdev_info(up->netdev, "registered device\n");
- netdev_info(up->netdev, "firmware string: %s\n", firmware_str);
+ ucan_get_fw_str(up, up->ctl_msg_buffer->fw_str,
+ sizeof(up->ctl_msg_buffer->fw_str));
+ netdev_info(up->netdev, "firmware string: %s\n",
+ up->ctl_msg_buffer->fw_str);
/* success */
return 0;
base-commit: 4003c9e78778e93188a09d6043a74f7154449d43
--
2.47.2
The second parameter of memblock_set_node() is size instead of end.
Since it iterates from lower address to higher address, finally the node
id is correct. But during the process, some of them are wrong.
Pass size instead of end.
Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
CC: Mike Rapoport <rppt(a)kernel.org>
CC: Yajun Deng <yajun.deng(a)linux.dev>
CC: <stable(a)vger.kernel.org>
---
mm/memblock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index 64ae678cd1d1..85442f1b7f14 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2192,7 +2192,7 @@ static void __init memmap_init_reserved_pages(void)
if (memblock_is_nomap(region))
reserve_bootmem_region(start, end, nid);
- memblock_set_node(start, end, &memblock.reserved, nid);
+ memblock_set_node(start, region->size, &memblock.reserved, nid);
}
/*
--
2.34.1
Add device awake calls in case of rproc boot and rproc shutdown path.
Currently, device awake call is only present in the recovery path
of remoteproc. If a user stops and starts rproc by using the sysfs
interface, then on pm suspension the firmware loading fails. Keep the
device awake in such a case just like it is done for the recovery path.
Fixes: a781e5aa59110 ("remoteproc: core: Prevent system suspend during remoteproc recovery")
Signed-off-by: Souradeep Chowdhury <quic_schowdhu(a)quicinc.com>
Cc: stable(a)vger.kernel.org
---
drivers/remoteproc/remoteproc_core.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index c2cf0d277729..5d6c4e694b4c 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1917,6 +1917,7 @@ int rproc_boot(struct rproc *rproc)
return -EINVAL;
}
+ pm_stay_awake(rproc->dev.parent);
dev = &rproc->dev;
ret = mutex_lock_interruptible(&rproc->lock);
@@ -1961,6 +1962,7 @@ int rproc_boot(struct rproc *rproc)
atomic_dec(&rproc->power);
unlock_mutex:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_boot);
@@ -1991,6 +1993,7 @@ int rproc_shutdown(struct rproc *rproc)
struct device *dev = &rproc->dev;
int ret = 0;
+ pm_stay_awake(rproc->dev.parent);
ret = mutex_lock_interruptible(&rproc->lock);
if (ret) {
dev_err(dev, "can't lock rproc %s: %d\n", rproc->name, ret);
@@ -2027,6 +2030,7 @@ int rproc_shutdown(struct rproc *rproc)
rproc->table_ptr = NULL;
out:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_shutdown);
--
2.34.1
Once cdev_device_add() failed, we should use put_device() to decrement
reference count for cleanup. Or it could cause memory leak. Although
operations in err_free_ida are similar to the operations in callback
function fsi_slave_release(), put_device() is a correct handling
operation as comments require when cdev_device_add() fails.
As comment of device_add() says, 'if device_add() succeeds, you should
call device_del() when you want to get rid of it. If device_add() has
not succeeded, use only put_device() to drop the reference count'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 371975b0b075 ("fsi/core: Fix error paths on CFAM init")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/fsi/fsi-core.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/fsi/fsi-core.c b/drivers/fsi/fsi-core.c
index e2e1e9df6115..1373e05e3659 100644
--- a/drivers/fsi/fsi-core.c
+++ b/drivers/fsi/fsi-core.c
@@ -1084,7 +1084,8 @@ static int fsi_slave_init(struct fsi_master *master, int link, uint8_t id)
rc = cdev_device_add(&slave->cdev, &slave->dev);
if (rc) {
dev_err(&slave->dev, "Error %d creating slave device\n", rc);
- goto err_free_ida;
+ put_device(&slave->dev);
+ return rc;
}
/* Now that we have the cdev registered with the core, any fatal
@@ -1110,8 +1111,6 @@ static int fsi_slave_init(struct fsi_master *master, int link, uint8_t id)
return 0;
-err_free_ida:
- fsi_free_minor(slave->dev.devt);
err_free:
of_node_put(slave->dev.of_node);
kfree(slave);
--
2.25.1
If kvm_arch_vcpu_create() fails to share the vCPU page with the
hypervisor, we propagate the error back to the ioctl but leave the
vGIC vCPU data initialised. Note only does this leak the corresponding
memory when the vCPU is destroyed but it can also lead to use-after-free
if the redistributor device handling tries to walk into the vCPU.
Add the missing cleanup to kvm_arch_vcpu_create(), ensuring that the
vGIC vCPU structures are destroyed on error.
Cc: <stable(a)vger.kernel.org>
Cc: Marc Zyngier <maz(a)kernel.org>
Cc: Oliver Upton <oliver.upton(a)linux.dev>
Cc: Quentin Perret <qperret(a)google.com>
Signed-off-by: Will Deacon <will(a)kernel.org>
---
It's hard to come up with a "Fixes:" tag for this. Prior to 3f868e142c0b
("KVM: arm64: Introduce kvm_share_hyp()"), create_hyp_mappings() could
still have failed, although if you go back before 66c57edd3bc7 ("KVM:
arm64: Restrict EL2 stage-1 changes in protected mode") then it's
vanishingly unlikely.
arch/arm64/kvm/arm.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b8e55a441282..fa71cee02faa 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -466,7 +466,11 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
if (err)
return err;
- return kvm_share_hyp(vcpu, vcpu + 1);
+ err = kvm_share_hyp(vcpu, vcpu + 1);
+ if (err)
+ kvm_vgic_vcpu_destroy(vcpu);
+
+ return err;
}
void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
--
2.49.0.rc1.451.g8f38331e32-goog
On our Marvell OCTEON CN96XX board, we observed the following panic on
the latest kernel:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080
CPU: 22 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6 #20
Hardware name: Marvell OcteonTX CN96XX board (DT)
pc : of_pci_add_properties+0x278/0x4c8
Call trace:
of_pci_add_properties+0x278/0x4c8 (P)
of_pci_make_dev_node+0xe0/0x158
pci_bus_add_device+0x158/0x228
pci_bus_add_devices+0x40/0x98
pci_host_probe+0x94/0x118
pci_host_common_probe+0x130/0x1b0
platform_probe+0x70/0xf0
The dmesg logs indicated that the PCI bridge was scanning with an invalid bus range:
pci-host-generic 878020000000.pci: PCI host bridge to bus 0002:00
pci_bus 0002:00: root bus resource [bus 00-ff]
pci 0002:00:00.0: scanning [bus f9-f9] behind bridge, pass 0
pci 0002:00:01.0: scanning [bus fa-fa] behind bridge, pass 0
pci 0002:00:02.0: scanning [bus fb-fb] behind bridge, pass 0
pci 0002:00:03.0: scanning [bus fc-fc] behind bridge, pass 0
pci 0002:00:04.0: scanning [bus fd-fd] behind bridge, pass 0
pci 0002:00:05.0: scanning [bus fe-fe] behind bridge, pass 0
pci 0002:00:06.0: scanning [bus ff-ff] behind bridge, pass 0
pci 0002:00:07.0: scanning [bus 00-00] behind bridge, pass 0
pci 0002:00:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0002:00:08.0: scanning [bus 01-01] behind bridge, pass 0
pci 0002:00:09.0: scanning [bus 02-02] behind bridge, pass 0
pci 0002:00:0a.0: scanning [bus 03-03] behind bridge, pass 0
pci 0002:00:0b.0: scanning [bus 04-04] behind bridge, pass 0
pci 0002:00:0c.0: scanning [bus 05-05] behind bridge, pass 0
pci 0002:00:0d.0: scanning [bus 06-06] behind bridge, pass 0
pci 0002:00:0e.0: scanning [bus 07-07] behind bridge, pass 0
pci 0002:00:0f.0: scanning [bus 08-08] behind bridge, pass 0
This regression was introduced by commit 7246a4520b4b ("PCI: Use
preserve_config in place of pci_flags"). On our board, the 0002:00:07.0
bridge is misconfigured by the bootloader. Both its secondary and
subordinate bus numbers are initialized to 0, while its fixed secondary
bus number is set to 8. However, bus number 8 is also assigned to another
bridge (0002:00:0f.0). Although this is a bootloader issue, before the
change in commit 7246a4520b4b, the PCI_REASSIGN_ALL_BUS flag was set
by default when PCI_PROBE_ONLY was not enabled, ensuing that all the
bus number for these bridges were reassigned, avoiding any conflicts.
After the change introduced in commit 7246a4520b4b, the bus numbers
assigned by the bootloader are reused by all other bridges, except
the misconfigured 0002:00:07.0 bridge. The kernel attempt to reconfigure
0002:00:07.0 by reusing the fixed secondary bus number 8 assigned by
bootloader. However, since a pci_bus has already been allocated for
bus 8 due to the probe of 0002:00:0f.0, no new pci_bus allocated for
0002:00:07.0. This results in a pci bridge device without a pci_bus
attached (pdev->subordinate == NULL). Consequently, accessing
pdev->subordinate in of_pci_prop_bus_range() leads to a NULL pointer
dereference.
To summarize, we need to set the PCI_REASSIGN_ALL_BUS flag when
PCI_PROBE_ONLY is not enabled in order to work around issue like the
one described above.
Cc: stable(a)vger.kernel.org
Fixes: 7246a4520b4b ("PCI: Use preserve_config in place of pci_flags")
Signed-off-by: Bo Sun <Bo.Sun.CN(a)windriver.com>
---
Changes in v2:
- Added explicit comment about the quirk, as requested by Mani.
- Made commit message more clear, as requested by Bjorn.
drivers/pci/quirks.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 82b21e34c545..cec58c7479e1 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6181,6 +6181,23 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect);
+/*
+ * Quirk for Marvell CN96XX/CN10XXX boards:
+ *
+ * Adds PCI_REASSIGN_ALL_BUS unless PCI_PROBE_ONLY is set, forcing bus number
+ * reassignment to avoid conflicts caused by bootloader misconfigured PCI bridges.
+ *
+ * This resolves a regression introduced by commit 7246a4520b4b ("PCI: Use
+ * preserve_config in place of pci_flags"), which removed this behavior.
+ */
+static void quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr(struct pci_dev *dev)
+{
+ if (!pci_has_flag(PCI_PROBE_ONLY))
+ pci_add_flags(PCI_REASSIGN_ALL_BUS);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_CAVIUM, 0xa002,
+ quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr);
+
#ifdef CONFIG_PCIEASPM
/*
* Several Intel DG2 graphics devices advertise that they can only tolerate
--
2.48.1
The quilt patch titled
Subject: mm/hwpoison: do not send SIGBUS to processes with recovered clean pages
has been removed from the -mm tree. Its filename was
mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Shuai Xue <xueshuai(a)linux.alibaba.com>
Subject: mm/hwpoison: do not send SIGBUS to processes with recovered clean pages
Date: Wed, 12 Mar 2025 19:28:51 +0800
When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a UCNA
signature, and the core reporting and SRAR signature machine check when
the data is about to be consumed.
- Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]
Prior to Icelake memory controllers reported patrol scrub events that
detected a previously unseen uncorrected error in memory by signaling a
broadcast machine check with an SRAO (Software Recoverable Action
Optional) signature in the machine check bank. This was overkill because
it's not an urgent problem that no core is on the verge of consuming that
bad data. It's also found that multi SRAO UCE may cause nested MCE
interrupts and finally become an IERR.
Hence, Intel downgrades the machine check bank signature of patrol scrub
from SRAO to UCNA (Uncorrected, No Action required), and signal changed to
#CMCI. Just to add to the confusion, Linux does take an action (in
uc_decode_notifier()) to try to offline the page despite the UC*NA*
signature name.
- Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1]
Having decided that CMCI/UCNA is the best action for patrol scrub errors,
the memory controller uses it for reads too. But the memory controller is
executing asynchronously from the core, and can't tell the difference
between a "real" read and a speculative read. So it will do CMCI/UCNA if
an error is found in any read.
Thus:
1) Core is clever and thinks address A is needed soon, issues a speculative read.
2) Core finds it is going to use address A soon after sending the read request
3) The CMCI from the memory controller is in a race with MCE from the core
that will soon try to retire the load from address A.
Quite often (because speculation has got better) the CMCI from the memory
controller is delivered before the core is committed to the instruction
reading address A, so the interrupt is taken, and Linux offlines the page
(marking it as poison).
- Why user process is killed for instr case
Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
"not recovered"") tries to fix noise message "Memory error not recovered"
and skips duplicate SIGBUSs due to the race. But it also introduced a bug
that kill_accessing_process() return -EHWPOISON for instr case, as result,
kill_me_maybe() send a SIGBUS to user process.
If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure(). For dirty pages,
memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag,
converting the PTE to a hwpoison entry. As a result,
kill_accessing_process():
- call walk_page_range() and return 1 regardless of whether
try_to_unmap() succeeds or fails,
- call kill_proc() to make sure a SIGBUS is sent
- return -EHWPOISON to indicate that SIGBUS is already sent to the
process and kill_me_maybe() doesn't have to send it again.
However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the
PTE unchanged and not converted to a hwpoison entry. Conversely, for
clean pages where PTE entries are not marked as hwpoison,
kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send
a SIGBUS.
Console log looks like this:
Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects
Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered
Memory failure: 0x827ca68: already hardware poisoned
mce: Memory error not recovered
To fix it, return 0 for "corrupted page was clean", preventing an
unnecessary SIGBUS to user process.
[1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.…
Link: https://lkml.kernel.org/r/20250312112852.82415-3-xueshuai@linux.alibaba.com
Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"")
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
Tested-by: Tony Luck <tony.luck(a)intel.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jane Chu <jane.chu(a)oracle.com>
Cc: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Cc: Josh Poimboeuf <jpoimboe(a)kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ruidong Tian <tianruidong(a)linux.alibaba.com>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam(a)amd.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--- a/mm/memory-failure.c~mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages
+++ a/mm/memory-failure.c
@@ -881,12 +881,17 @@ static int kill_accessing_process(struct
mmap_read_lock(p->mm);
ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwpoison_walk_ops,
(void *)&priv);
+ /*
+ * ret = 1 when CMCI wins, regardless of whether try_to_unmap()
+ * succeeds or fails, then kill the process with SIGBUS.
+ * ret = 0 when poison page is a clean page and it's dropped, no
+ * SIGBUS is needed.
+ */
if (ret == 1 && priv.tk.addr)
kill_proc(&priv.tk, pfn, flags);
- else
- ret = 0;
mmap_read_unlock(p->mm);
- return ret > 0 ? -EHWPOISON : -EFAULT;
+
+ return ret > 0 ? -EHWPOISON : 0;
}
/*
_
Patches currently in -mm which might be from xueshuai(a)linux.alibaba.com are
The quilt patch titled
Subject: x86/mce: use is_copy_from_user() to determine copy-from-user context
has been removed from the -mm tree. Its filename was
x86-mce-use-is_copy_from_user-to-determine-copy-from-user-context.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Shuai Xue <xueshuai(a)linux.alibaba.com>
Subject: x86/mce: use is_copy_from_user() to determine copy-from-user context
Date: Wed, 12 Mar 2025 19:28:50 +0800
Patch series "mm/hwpoison: Fix regressions in memory failure handling",
v4.
## 1. What am I trying to do:
This patchset resolves two critical regressions related to memory failure
handling that have appeared in the upstream kernel since version 5.17, as
compared to 5.10 LTS.
- copyin case: poison found in user page while kernel copying from user space
- instr case: poison found while instruction fetching in user space
## 2. What is the expected outcome and why
- For copyin case:
Kernel can recover from poison found where kernel is doing get_user() or
copy_from_user() if those places get an error return and the kernel return
-EFAULT to the process instead of crashing. More specifily, MCE handler
checks the fixup handler type to decide whether an in kernel #MC can be
recovered. When EX_TYPE_UACCESS is found, the PC jumps to recovery code
specified in _ASM_EXTABLE_FAULT() and return a -EFAULT to user space.
- For instr case:
If a poison found while instruction fetching in user space, full recovery
is possible. User process takes #PF, Linux allocates a new page and fills
by reading from storage.
## 3. What actually happens and why
- For copyin case: kernel panic since v5.17
Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
extable fixup type, EX_TYPE_EFAULT_REG, and later patches updated the
extable fixup type for copy-from-user operations, changing it from
EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG. It breaks previous EX_TYPE_UACCESS
handling when posion found in get_user() or copy_from_user().
- For instr case: user process is killed by a SIGBUS signal due to #CMCI
and #MCE race
When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a UCNA
signature, and the core reporting and SRAR signature machine check when
the data is about to be consumed.
### Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]
Prior to Icelake memory controllers reported patrol scrub events that
detected a previously unseen uncorrected error in memory by signaling a
broadcast machine check with an SRAO (Software Recoverable Action
Optional) signature in the machine check bank. This was overkill because
it's not an urgent problem that no core is on the verge of consuming that
bad data. It's also found that multi SRAO UCE may cause nested MCE
interrupts and finally become an IERR.
Hence, Intel downgrades the machine check bank signature of patrol scrub
from SRAO to UCNA (Uncorrected, No Action required), and signal changed to
#CMCI. Just to add to the confusion, Linux does take an action (in
uc_decode_notifier()) to try to offline the page despite the UC*NA*
signature name.
### Background: why #CMCI and #MCE race when poison is consuming in
Intel platform [1]
Having decided that CMCI/UCNA is the best action for patrol scrub errors,
the memory controller uses it for reads too. But the memory controller is
executing asynchronously from the core, and can't tell the difference
between a "real" read and a speculative read. So it will do CMCI/UCNA if
an error is found in any read.
Thus:
1) Core is clever and thinks address A is needed soon, issues a
speculative read.
2) Core finds it is going to use address A soon after sending the read
request
3) The CMCI from the memory controller is in a race with MCE from the
core that will soon try to retire the load from address A.
Quite often (because speculation has got better) the CMCI from the memory
controller is delivered before the core is committed to the instruction
reading address A, so the interrupt is taken, and Linux offlines the page
(marking it as poison).
## Why user process is killed for instr case
Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
"not recovered"") tries to fix noise message "Memory error not recovered"
and skips duplicate SIGBUSs due to the race. But it also introduced a bug
that kill_accessing_process() return -EHWPOISON for instr case, as result,
kill_me_maybe() send a SIGBUS to user process.
# 4. The fix, in my opinion, should be:
- For copyin case:
The key point is whether the error context is in a read from user memory.
We do not care about the ex-type if we know its a MOV reading from
userspace.
is_copy_from_user() return true when both of the following two checks are
true:
- the current instruction is copy
- source address is user memory
If copy_user is true, we set
m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;
Then do_machine_check() will try fixup_exception() first.
- For instr case: let kill_accessing_process() return 0 to prevent a SIGBUS.
- For patch 3:
The return value of memory_failure() is quite important while discussed
instr case regression with Tony and Miaohe for patch 2, so add comment
about the return value.
This patch (of 3):
Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a
("x86/futex: Remove .fixup usage") updated the extable fixup type for
copy-from-user operations, changing it from EX_TYPE_UACCESS to
EX_TYPE_EFAULT_REG. The error context for copy-from-user operations no
longer functions as an in-kernel recovery context. Consequently, the
error context for copy-from-user operations no longer functions as an
in-kernel recovery context, resulting in kernel panics with the message:
"Machine check: Data load in unrecoverable area of kernel."
To address this, it is crucial to identify if an error context involves a
read operation from user memory. The function is_copy_from_user() can be
utilized to determine:
- the current operation is copy
- when reading user memory
When these conditions are met, is_copy_from_user() will return true,
confirming that it is indeed a direct copy from user memory. This check
is essential for correctly handling the context of errors in these
operations without relying on the extable fixup types that previously
allowed for in-kernel recovery.
So, use is_copy_from_user() to determine if a context is copy user directly.
Link: https://lkml.kernel.org/r/20250312112852.82415-1-xueshuai@linux.alibaba.com
Link: https://lkml.kernel.org/r/20250312112852.82415-2-xueshuai@linux.alibaba.com
Fixes: 4c132d1d844a ("x86/futex: Remove .fixup usage")
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz(a)infradead.org>
Acked-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Tested-by: Tony Luck <tony.luck(a)intel.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Josh Poimboeuf <jpoimboe(a)kernel.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Ruidong Tian <tianruidong(a)linux.alibaba.com>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam(a)amd.com>
Cc: Jane Chu <jane.chu(a)oracle.com>
Cc: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/cpu/mce/severity.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/cpu/mce/severity.c~x86-mce-use-is_copy_from_user-to-determine-copy-from-user-context
+++ a/arch/x86/kernel/cpu/mce/severity.c
@@ -300,13 +300,12 @@ static noinstr int error_context(struct
copy_user = is_copy_from_user(regs);
instrumentation_end();
- switch (fixup_type) {
- case EX_TYPE_UACCESS:
- if (!copy_user)
- return IN_KERNEL;
- m->kflags |= MCE_IN_KERNEL_COPYIN;
- fallthrough;
+ if (copy_user) {
+ m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;
+ return IN_KERNEL_RECOV;
+ }
+ switch (fixup_type) {
case EX_TYPE_FAULT_MCE_SAFE:
case EX_TYPE_DEFAULT_MCE_SAFE:
m->kflags |= MCE_IN_KERNEL_RECOV;
_
Patches currently in -mm which might be from xueshuai(a)linux.alibaba.com are
The quilt patch titled
Subject: mm: add missing release barrier on PGDAT_RECLAIM_LOCKED unlock
has been removed from the -mm tree. Its filename was
mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Subject: mm: add missing release barrier on PGDAT_RECLAIM_LOCKED unlock
Date: Wed, 12 Mar 2025 10:10:13 -0400
The PGDAT_RECLAIM_LOCKED bit is used to provide mutual exclusion of node
reclaim for struct pglist_data using a single bit.
It is "locked" with a test_and_set_bit (similarly to a try lock) which
provides full ordering with respect to loads and stores done within
__node_reclaim().
It is "unlocked" with clear_bit(), which does not provide any ordering
with respect to loads and stores done before clearing the bit.
The lack of clear_bit() memory ordering with respect to stores within
__node_reclaim() can cause a subsequent CPU to fail to observe stores from
a prior node reclaim. This is not an issue in practice on TSO (e.g.
x86), but it is an issue on weakly-ordered architectures (e.g. arm64).
Fix this by using clear_bit_unlock rather than clear_bit to clear
PGDAT_RECLAIM_LOCKED with a release memory ordering semantic.
This provides stronger memory ordering (release rather than relaxed).
Link: https://lkml.kernel.org/r/20250312141014.129725-1-mathieu.desnoyers@efficio…
Fixes: d773ed6b856a ("mm: test and set zone reclaim lock before starting reclaim")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea(a)gmail.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Jade Alglave <j.alglave(a)ucl.ac.uk>
Cc: Luc Maranget <luc.maranget(a)inria.fr>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock
+++ a/mm/vmscan.c
@@ -7581,7 +7581,7 @@ int node_reclaim(struct pglist_data *pgd
return NODE_RECLAIM_NOSCAN;
ret = __node_reclaim(pgdat, gfp_mask, order);
- clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags);
+ clear_bit_unlock(PGDAT_RECLAIM_LOCKED, &pgdat->flags);
if (ret)
count_vm_event(PGSCAN_ZONE_RECLAIM_SUCCESS);
_
Patches currently in -mm which might be from mathieu.desnoyers(a)efficios.com are
The quilt patch titled
Subject: mm/mremap: correctly handle partial mremap() of VMA starting at 0
has been removed from the -mm tree. Its filename was
mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: mm/mremap: correctly handle partial mremap() of VMA starting at 0
Date: Mon, 10 Mar 2025 20:50:34 +0000
Patch series "refactor mremap and fix bug", v3.
The existing mremap() logic has grown organically over a very long period
of time, resulting in code that is in many parts, very difficult to follow
and full of subtleties and sources of confusion.
In addition, it is difficult to thread state through the operation
correctly, as function arguments have expanded, some parameters are
expected to be temporarily altered during the operation, others are
intended to remain static and some can be overridden.
This series completely refactors the mremap implementation, sensibly
separating functions, adding comments to explain the more subtle aspects
of the implementation and making use of small structs to thread state
through everything.
The reason for doing so is to lay the groundwork for planned future
changes to the mremap logic, changes which require the ability to easily
pass around state.
Additionally, it would be unhelpful to add yet more logic to code that is
already difficult to follow without first refactoring it like this.
The first patch in this series additionally fixes a bug when a VMA with
start address zero is partially remapped.
Tested on real hardware under heavy workload and all self tests are
passing.
This patch (of 3):
Consider the case of a partial mremap() (that results in a VMA split) of
an accountable VMA (i.e. which has the VM_ACCOUNT flag set) whose start
address is zero, with the MREMAP_MAYMOVE flag specified and a scenario
where a move does in fact occur:
addr end
| |
v v
|-------------|
| vma |
|-------------|
0
This move is affected by unmapping the range [addr, end). In order to
prevent an incorrect decrement of accounted memory which has already been
determined, the mremap() code in move_vma() clears VM_ACCOUNT from the VMA
prior to doing so, before reestablishing it in each of the VMAs
post-split:
addr end
| |
v v
|---| |---|
| A | | B |
|---| |---|
Commit 6b73cff239e5 ("mm: change munmap splitting order and move_vma()")
changed this logic such as to determine whether there is a need to do so
by establishing account_start and account_end and, in the instance where
such an operation is required, assigning them to vma->vm_start and
vma->vm_end.
Later the code checks if the operation is required for 'A' referenced
above thusly:
if (account_start) {
...
}
However, if the VMA described above has vma->vm_start == 0, which is now
assigned to account_start, this branch will not be executed.
As a result, the VMA 'A' above will remain stripped of its VM_ACCOUNT
flag, incorrectly.
The fix is to simply convert these variables to booleans and set them as
required.
Link: https://lkml.kernel.org/r/cover.1741639347.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/dc55cb6db25d97c3d9e460de4986a323fa959676.17416393…
Fixes: 6b73cff239e5 ("mm: change munmap splitting order and move_vma()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Harry Yoo <harry.yoo(a)oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mremap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
--- a/mm/mremap.c~mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0
+++ a/mm/mremap.c
@@ -705,8 +705,8 @@ static unsigned long move_vma(struct vm_
unsigned long vm_flags = vma->vm_flags;
unsigned long new_pgoff;
unsigned long moved_len;
- unsigned long account_start = 0;
- unsigned long account_end = 0;
+ bool account_start = false;
+ bool account_end = false;
unsigned long hiwater_vm;
int err = 0;
bool need_rmap_locks;
@@ -790,9 +790,9 @@ static unsigned long move_vma(struct vm_
if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) {
vm_flags_clear(vma, VM_ACCOUNT);
if (vma->vm_start < old_addr)
- account_start = vma->vm_start;
+ account_start = true;
if (vma->vm_end > old_addr + old_len)
- account_end = vma->vm_end;
+ account_end = true;
}
/*
@@ -832,7 +832,7 @@ static unsigned long move_vma(struct vm_
/* OOM: unable to split vma, just get accounts right */
if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
vm_acct_memory(old_len >> PAGE_SHIFT);
- account_start = account_end = 0;
+ account_start = account_end = false;
}
if (vm_flags & VM_LOCKED) {
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
Variable "bridge" is allocated by agp_alloc_bridge() and
have to be released by agp_put_bridge() if something goes
wrong. In this patch, add the missing call of agp_put_bridge()
in agp_amdk7_probe() to prevent potential memory leak bug.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
---
drivers/char/agp/amd-k7-agp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/char/agp/amd-k7-agp.c b/drivers/char/agp/amd-k7-agp.c
index 795c8c9ff680..40e1fc462dca 100644
--- a/drivers/char/agp/amd-k7-agp.c
+++ b/drivers/char/agp/amd-k7-agp.c
@@ -441,6 +441,7 @@ static int agp_amdk7_probe(struct pci_dev *pdev,
gfxcard = pci_get_class(PCI_CLASS_DISPLAY_VGA<<8, gfxcard);
if (!gfxcard) {
dev_info(&pdev->dev, "no AGP VGA controller\n");
+ agp_put_bridge(bridge);
return -ENODEV;
}
cap_ptr = pci_find_capability(gfxcard, PCI_CAP_ID_AGP);
--
2.25.1
Commit 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") introduce
a way to set nid to all reserved region.
But there is a corner case it will leave some region with invalid nid.
When memblock_set_node() doubles the array of memblock.reserved, it may
lead to a new reserved region before current position. The new region
will be left with an invalid node id.
Repeat the process when detecting it.
Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
CC: Mike Rapoport <rppt(a)kernel.org>
CC: Yajun Deng <yajun.deng(a)linux.dev>
CC: <stable(a)vger.kernel.org>
---
mm/memblock.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/mm/memblock.c b/mm/memblock.c
index 85442f1b7f14..302dd7bc622d 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2184,7 +2184,10 @@ static void __init memmap_init_reserved_pages(void)
* set nid on all reserved pages and also treat struct
* pages for the NOMAP regions as PageReserved
*/
+repeat:
for_each_mem_region(region) {
+ unsigned long max = memblock.reserved.max;
+
nid = memblock_get_region_node(region);
start = region->base;
end = start + region->size;
@@ -2193,6 +2196,15 @@ static void __init memmap_init_reserved_pages(void)
reserve_bootmem_region(start, end, nid);
memblock_set_node(start, region->size, &memblock.reserved, nid);
+
+ /*
+ * 'max' is changed means memblock.reserved has been doubled
+ * its array, which may result a new reserved region before
+ * current 'start'. Now we should repeat the procedure to set
+ * its node id.
+ */
+ if (max != memblock.reserved.max)
+ goto repeat;
}
/*
--
2.34.1
When long lines are sent to the debug console on the framebuffer, the
right-most column is lost. Fix this by subtracting 1 from the column
count before comparing it with console_struct_cur_column, as the latter
counts from zero.
Linewrap is handled with a recursive call to console_putc, but this
alters the console_struct_cur_row global. Store the old value before
calling console_putc, so the right-most character gets rendered on the
correct line.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Finn Thain <fthain(a)linux-m68k.org>
---
arch/m68k/kernel/head.S | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/m68k/kernel/head.S b/arch/m68k/kernel/head.S
index 852255cf60de..9c60047764d0 100644
--- a/arch/m68k/kernel/head.S
+++ b/arch/m68k/kernel/head.S
@@ -3583,11 +3583,16 @@ L(console_not_home):
movel %a0@(Lconsole_struct_cur_column),%d0
addql #1,%a0@(Lconsole_struct_cur_column)
movel %a0@(Lconsole_struct_num_columns),%d1
+ subil #1,%d1
cmpl %d1,%d0
jcs 1f
- console_putc #'\n' /* recursion is OK! */
+ /* recursion will alter console_struct so load d1 register first */
+ movel %a0@(Lconsole_struct_cur_row),%d1
+ console_putc #'\n'
+ jmp 2f
1:
movel %a0@(Lconsole_struct_cur_row),%d1
+2:
/*
* At this point we make a shift in register usage
--
2.45.3
Prepare vPMC registers for user-initiated changes after first run. This
is important specifically for debugging Windows on QEMU with GDB; QEMU
tries to write back all visible registers when resuming the VM execution
with GDB, corrupting the PMU state. Windows always uses the PMU so this
can cause adverse effects on that particular OS.
This series also contains patch "KVM: arm64: PMU: Set raw values from
user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}", which reverts semantic
changes made for the mentioned registers in the past. It is necessary
to migrate the PMU state properly on Firecracker, QEMU, and crosvm.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v5:
- Rebased.
- Link to v4: https://lore.kernel.org/r/20250313-pmc-v4-0-2c976827118c@daynix.com
Changes in v4:
- Reverted changes for functions implementing ioctls in patch
"KVM: arm64: PMU: Assume PMU presence in pmu-emul.c".
- Removed kvm_pmu_vcpu_reset().
- Reordered function calls in kvm_vcpu_reload_pmu() for better style.
- Link to v3: https://lore.kernel.org/r/20250312-pmc-v3-0-0411cab5dc3d@daynix.com
Changes in v3:
- Added patch "KVM: arm64: PMU: Assume PMU presence in pmu-emul.c".
- Added an explanation of this path series' motivation to each patch.
- Explained why userspace register writes and register reset should be
covered in patch "KVM: arm64: PMU: Reload when user modifies
registers".
- Marked patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}" for stable.
- Reoreded so that patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}" would come first.
- Added patch "KVM: arm64: PMU: Call kvm_pmu_handle_pmcr() after masking
PMCNTENSET_EL0".
- Added patch "KVM: arm64: Reload PMCNTENSET_EL0".
- Link to v2: https://lore.kernel.org/r/20250307-pmc-v2-0-6c3375a5f1e4@daynix.com
Changes in v2:
- Changed to utilize KVM_REQ_RELOAD_PMU as suggested by Oliver Upton.
- Added patch "KVM: arm64: PMU: Reload when user modifies registers"
to cover more registers.
- Added patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}".
- Link to v1: https://lore.kernel.org/r/20250302-pmc-v1-1-caff989093dc@daynix.com
---
Akihiko Odaki (5):
KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
KVM: arm64: PMU: Reload when user modifies registers
KVM: arm64: PMU: Reload when resetting
arch/arm64/kvm/arm.c | 17 ++++++++-----
arch/arm64/kvm/emulate-nested.c | 6 +++--
arch/arm64/kvm/pmu-emul.c | 56 +++++++++++------------------------------
arch/arm64/kvm/reset.c | 3 ---
arch/arm64/kvm/sys_regs.c | 52 ++++++++++++++++++++++----------------
include/kvm/arm_pmu.h | 4 +--
6 files changed, 62 insertions(+), 76 deletions(-)
---
base-commit: 80e54e84911a923c40d7bee33a34c1b4be148d7a
change-id: 20250302-pmc-b90a86af945c
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
pciehp_reset_slot() disables PDCE (Presence Detect Changed Enable) and
DLLSCE (Data Link Layer State Changed Enable) for the duration of reset
and clears the related status bits PDC and DLLSC from the Slot Status
register after the reset to avoid hotplug incorrectly assuming the card
was removed.
However, hotplug shares interrupt with PME and BW notifications both of
which can make pciehp_isr() to run despite PDCE and DLLSCE bits being
off. pciehp_isr() then picks PDC or DLLSC bits from the Slot Status
register due to the events that occur during reset and caches them into
->pending_events. Later, the IRQ thread in pciehp_ist() will process
the ->pending_events and will assume the Link went Down due to a card
change (in pciehp_handle_presence_or_link_change()).
Change pciehp_reset_slot() to also clear HPIE (Hot-Plug Interrupt
Enable) as pciehp_isr() will first check HPIE to see if the interrupt
is not for it. Then synchronize with the IRQ handling to ensure no
events are pending, before invoking the reset.
Similarly, if the poll mode is in use, park the poll thread over the
duration of the reset to stop handling events.
In order to not race irq_syncronize()/kthread_{,un}park() with the irq
/ poll_thread freeing from pciehp_remove(), take reset_lock in
pciehp_free_irq() and check the irq / poll_thread variable validity in
pciehp_reset_slot().
Fixes: 06a8d89af551 ("PCI: pciehp: Disable link notification across slot reset")
Fixes: 720d6a671a6e ("PCI: pciehp: Do not handle events if interrupts are masked")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765
Suggested-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
drivers/pci/hotplug/pciehp_hpc.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index bb5a8d9f03ad..c487e274b282 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -77,10 +77,15 @@ static inline int pciehp_request_irq(struct controller *ctrl)
static inline void pciehp_free_irq(struct controller *ctrl)
{
- if (pciehp_poll_mode)
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
+ if (pciehp_poll_mode) {
kthread_stop(ctrl->poll_thread);
- else
+ ctrl->poll_thread = NULL;
+ } else {
free_irq(ctrl->pcie->irq, ctrl);
+ ctrl->pcie->irq = IRQ_NOTCONNECTED;
+ }
+ up_read(&ctrl->reset_lock);
}
static int pcie_poll_cmd(struct controller *ctrl, int timeout)
@@ -766,8 +771,9 @@ static int pciehp_poll(void *data)
while (!kthread_should_stop()) {
/* poll for interrupt events or user requests */
- while (pciehp_isr(IRQ_NOTCONNECTED, ctrl) == IRQ_WAKE_THREAD ||
- atomic_read(&ctrl->pending_events))
+ while (!kthread_should_park() &&
+ (pciehp_isr(IRQ_NOTCONNECTED, ctrl) == IRQ_WAKE_THREAD ||
+ atomic_read(&ctrl->pending_events)))
pciehp_ist(IRQ_NOTCONNECTED, ctrl);
if (pciehp_poll_time <= 0 || pciehp_poll_time > 60)
@@ -907,6 +913,8 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
down_write_nested(&ctrl->reset_lock, ctrl->depth);
+ if (!pciehp_poll_mode)
+ ctrl_mask |= PCI_EXP_SLTCTL_HPIE;
if (!ATTN_BUTTN(ctrl)) {
ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
stat_mask |= PCI_EXP_SLTSTA_PDC;
@@ -918,9 +926,21 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
+ /* Make sure HPIE is no longer seen by the interrupt handler. */
+ if (pciehp_poll_mode) {
+ if (ctrl->poll_thread)
+ kthread_park(ctrl->poll_thread);
+ } else {
+ if (ctrl->pcie->irq != IRQ_NOTCONNECTED)
+ synchronize_irq(ctrl->pcie->irq);
+ }
+
rc = pci_bridge_secondary_bus_reset(ctrl->pcie->port);
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask);
+ if (pciehp_poll_mode && ctrl->poll_thread)
+ kthread_unpark(ctrl->poll_thread);
+
pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
--
2.39.5
Hi there,
Would you like an updated contact list of Real Estate CRM Software users and customers?
We can also provide contact lists for users and customers of the following companies:
* AppFolio
* RealPage
* Buildium
* CoreLogic
* Yardi Voyager
* Kissflow CRM
* Pipedrive
* HubSpot CRM Service and more...
Please specify the target technology users and geographical areas of interest, so we can provide relevant information accordingly.
Kind regards,
Kristina Williams
Lead Specialist
To unsubscribe from future emails, simply reply with Stop.
This patch series contains some missing openvswitch port output fixes
for the stable 5.15 kernel.
Felix Huettner (1):
net: openvswitch: fix race on port output
Ilya Maximets (1):
openvswitch: fix lockup on tx to unregistering netdev with carrier
net/core/dev.c | 1 +
net/openvswitch/actions.c | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
--
2.34.1
This patch series contains some missing openvswitch port output fixes
for the stable 5.4 kernel.
Felix Huettner (1):
net: openvswitch: fix race on port output
Ilya Maximets (1):
openvswitch: fix lockup on tx to unregistering netdev with carrier
net/core/dev.c | 1 +
net/openvswitch/actions.c | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
--
2.34.1
This patch series contains some missing openvswitch port output fixes
for the stable 5.10 kernel.
Felix Huettner (1):
net: openvswitch: fix race on port output
Ilya Maximets (1):
openvswitch: fix lockup on tx to unregistering netdev with carrier
net/core/dev.c | 1 +
net/openvswitch/actions.c | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
--
2.34.1
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Chen Linxuan: backport same logic without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
v1 -> v2: use vma_is_secretmem() instead of directly checking
vma->vm_file->f_op == &secretmem_fops
v2 -> v3: keep original comment
---
lib/buildid.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..8d839ff5548e 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,10 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma_is_secretmem(vma))
+ return -EFAULT;
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
From: Eric Dumazet <edumazet(a)google.com>
tcp_abort() has the same issue than the one fixed in the prior patch
in tcp_write_err().
commit 5ce4645c23cf5f048eb8e9ce49e514bababdee85 upstream.
To apply commit bac76cf89816bff06c4ec2f3df97dc34e150a1c4,
this patch must be applied first.
In order to get consistent results from tcp_poll(), we must call
sk_error_report() after tcp_done().
We can use tcp_done_with_error() to centralize this logic.
Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Acked-by: Neal Cardwell <ncardwell(a)google.com>
Link: https://lore.kernel.org/r/20240528125253.1966136-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
[youngmin: Resolved minor conflict in net/ipv4/tcp.c]
Signed-off-by: Youngmin Nam <youngmin.nam(a)samsung.com>
---
net/ipv4/tcp.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3c85ecab1445..c1e624ca6a25 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4514,13 +4514,9 @@ int tcp_abort(struct sock *sk, int err)
bh_lock_sock(sk);
if (!sock_flag(sk, SOCK_DEAD)) {
- WRITE_ONCE(sk->sk_err, err);
- /* This barrier is coupled with smp_rmb() in tcp_poll() */
- smp_wmb();
- sk_error_report(sk);
if (tcp_need_reset(sk->sk_state))
tcp_send_active_reset(sk, GFP_ATOMIC);
- tcp_done(sk);
+ tcp_done_with_error(sk, err);
}
bh_unlock_sock(sk);
--
2.39.2
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Chen Linxuan: backport same logic without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
v1 -> v2: use vma_is_secretmem() instead of directly checking
vma->vm_file->f_op == &secretmem_fops
---
lib/buildid.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..34315d09b544 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,10 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+ /* reject secretmem */
+ if (vma_is_secretmem(vma))
+ return -EFAULT;
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
[ Upstream commit 58a039e679fe72bd0efa8b2abe669a7914bb4429 ]
Commit ea7e2d5e49c0 ("mm: call the security_mmap_file() LSM hook in
remap_file_pages()") fixed a security issue, it added an LSM check when
trying to remap file pages, so that LSMs have the opportunity to evaluate
such action like for other memory operations such as mmap() and
mprotect().
However, that commit called security_mmap_file() inside the mmap_lock
lock, while the other calls do it before taking the lock, after commit
8b3ec6814c83 ("take security_mmap_file() outside of ->mmap_sem").
This caused lock inversion issue with IMA which was taking the mmap_lock
and i_mutex lock in the opposite way when the remap_file_pages() system
call was called.
Solve the issue by splitting the critical region in remap_file_pages() in
two regions: the first takes a read lock of mmap_lock, retrieves the VMA
and the file descriptor associated, and calculates the 'prot' and 'flags'
variables; the second takes a write lock on mmap_lock, checks that the VMA
flags and the VMA file descriptor are the same as the ones obtained in the
first critical region (otherwise the system call fails), and calls
do_mmap().
In between, after releasing the read lock and before taking the write
lock, call security_mmap_file(), and solve the lock inversion issue.
Link: https://lkml.kernel.org/r/20241018161415.3845146-1-roberto.sassu@huaweiclou…
Fixes: ea7e2d5e49c0 ("mm: call the security_mmap_file() LSM hook in remap_file_pages()")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Reported-by: syzbot+1cd571a672400ef3a930(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-security-module/66f7b10e.050a0220.46d20.0036.…
Tested-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Reviewed-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Paul Moore <paul(a)paul-moore.com>
Tested-by: syzbot+1cd571a672400ef3a930(a)syzkaller.appspotmail.com
Cc: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: Dmitry Kasatkin <dmitry.kasatkin(a)gmail.com>
Cc: Eric Snowberg <eric.snowberg(a)oracle.com>
Cc: James Morris <jmorris(a)namei.org>
Cc: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: "Serge E. Hallyn" <serge(a)hallyn.com>
Cc: Shu Han <ebpqwerty472123(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
mm/mmap.c | 69 +++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 52 insertions(+), 17 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index e4dfeaef668a..03a24cb3951d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2981,6 +2981,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
unsigned long populate = 0;
unsigned long ret = -EINVAL;
struct file *file;
+ vm_flags_t vm_flags;
pr_warn_once("%s (%d) uses deprecated remap_file_pages() syscall. See Documentation/mm/remap_file_pages.rst.\n",
current->comm, current->pid);
@@ -2997,12 +2998,60 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
if (pgoff + (size >> PAGE_SHIFT) < pgoff)
return ret;
- if (mmap_write_lock_killable(mm))
+ if (mmap_read_lock_killable(mm))
+ return -EINTR;
+
+ /*
+ * Look up VMA under read lock first so we can perform the security
+ * without holding locks (which can be problematic). We reacquire a
+ * write lock later and check nothing changed underneath us.
+ */
+ vma = vma_lookup(mm, start);
+
+ if (!vma || !(vma->vm_flags & VM_SHARED)) {
+ mmap_read_unlock(mm);
+ return -EINVAL;
+ }
+
+ prot |= vma->vm_flags & VM_READ ? PROT_READ : 0;
+ prot |= vma->vm_flags & VM_WRITE ? PROT_WRITE : 0;
+ prot |= vma->vm_flags & VM_EXEC ? PROT_EXEC : 0;
+
+ flags &= MAP_NONBLOCK;
+ flags |= MAP_SHARED | MAP_FIXED | MAP_POPULATE;
+ if (vma->vm_flags & VM_LOCKED)
+ flags |= MAP_LOCKED;
+
+ /* Save vm_flags used to calculate prot and flags, and recheck later. */
+ vm_flags = vma->vm_flags;
+ file = get_file(vma->vm_file);
+
+ mmap_read_unlock(mm);
+
+ /* Call outside mmap_lock to be consistent with other callers. */
+ ret = security_mmap_file(file, prot, flags);
+ if (ret) {
+ fput(file);
+ return ret;
+ }
+
+ ret = -EINVAL;
+
+ /* OK security check passed, take write lock + let it rip. */
+ if (mmap_write_lock_killable(mm)) {
+ fput(file);
return -EINTR;
+ }
vma = vma_lookup(mm, start);
- if (!vma || !(vma->vm_flags & VM_SHARED))
+ if (!vma)
+ goto out;
+
+ /* Make sure things didn't change under us. */
+ if (vma->vm_flags != vm_flags)
+ goto out;
+ if (vma->vm_file != file)
goto out;
if (start + size > vma->vm_end) {
@@ -3030,25 +3079,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
goto out;
}
- prot |= vma->vm_flags & VM_READ ? PROT_READ : 0;
- prot |= vma->vm_flags & VM_WRITE ? PROT_WRITE : 0;
- prot |= vma->vm_flags & VM_EXEC ? PROT_EXEC : 0;
-
- flags &= MAP_NONBLOCK;
- flags |= MAP_SHARED | MAP_FIXED | MAP_POPULATE;
- if (vma->vm_flags & VM_LOCKED)
- flags |= MAP_LOCKED;
-
- file = get_file(vma->vm_file);
- ret = security_mmap_file(vma->vm_file, prot, flags);
- if (ret)
- goto out_fput;
ret = do_mmap(vma->vm_file, start, size,
prot, flags, 0, pgoff, &populate, NULL);
-out_fput:
- fput(file);
out:
mmap_write_unlock(mm);
+ fput(file);
if (populate)
mm_populate(ret, populate);
if (!IS_ERR_VALUE(ret))
--
2.25.1
From: Waiman Long <longman(a)redhat.com>
[ Upstream commit 85b2b9c16d053364e2004883140538e73b333cdb ]
A circular lock dependency splat has been seen involving down_trylock():
======================================================
WARNING: possible circular locking dependency detected
6.12.0-41.el10.s390x+debug
------------------------------------------------------
dd/32479 is trying to acquire lock:
0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
but task is already holding lock:
000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
the existing dependency chain (in reverse order) is:
-> #4 (&zone->lock){-.-.}-{2:2}:
-> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
-> #2 (&rq->__lock){-.-.}-{2:2}:
-> #1 (&p->pi_lock){-.-.}-{2:2}:
-> #0 ((console_sem).lock){-.-.}-{2:2}:
The console_sem -> pi_lock dependency is due to calling try_to_wake_up()
while holding the console_sem raw_spinlock. This dependency can be broken
by using wake_q to do the wakeup instead of calling try_to_wake_up()
under the console_sem lock. This will also make the semaphore's
raw_spinlock become a terminal lock without taking any further locks
underneath it.
The hrtimer_bases.lock is a raw_spinlock while zone->lock is a
spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via
the debug_objects_fill_pool() helper function in the debugobjects code.
-> #4 (&zone->lock){-.-.}-{2:2}:
__lock_acquire+0xe86/0x1cc0
lock_acquire.part.0+0x258/0x630
lock_acquire+0xb8/0xe0
_raw_spin_lock_irqsave+0xb4/0x120
rmqueue_bulk+0xac/0x8f0
__rmqueue_pcplist+0x580/0x830
rmqueue_pcplist+0xfc/0x470
rmqueue.isra.0+0xdec/0x11b0
get_page_from_freelist+0x2ee/0xeb0
__alloc_pages_noprof+0x2c2/0x520
alloc_pages_mpol_noprof+0x1fc/0x4d0
alloc_pages_noprof+0x8c/0xe0
allocate_slab+0x320/0x460
___slab_alloc+0xa58/0x12b0
__slab_alloc.isra.0+0x42/0x60
kmem_cache_alloc_noprof+0x304/0x350
fill_pool+0xf6/0x450
debug_object_activate+0xfe/0x360
enqueue_hrtimer+0x34/0x190
__run_hrtimer+0x3c8/0x4c0
__hrtimer_run_queues+0x1b2/0x260
hrtimer_interrupt+0x316/0x760
do_IRQ+0x9a/0xe0
do_irq_async+0xf6/0x160
Normally a raw_spinlock to spinlock dependency is not legitimate
and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled,
but debug_objects_fill_pool() is an exception as it explicitly
allows this dependency for non-PREEMPT_RT kernel without causing
PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is
legitimate and not a bug.
Anyway, semaphore is the only locking primitive left that is still
using try_to_wake_up() to do wakeup inside critical section, all the
other locking primitives had been migrated to use wake_q to do wakeup
outside of the critical section. It is also possible that there are
other circular locking dependencies involving printk/console_sem or
other existing/new semaphores lurking somewhere which may show up in
the future. Let just do the migration now to wake_q to avoid headache
like this.
Reported-by: yzbot+ed801a886dfdbfe7136d(a)syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman(a)redhat.com>
Signed-off-by: Boqun Feng <boqun.feng(a)gmail.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/r/20250307232717.1759087-3-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/semaphore.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index d9dd94defc0a9..19389fdbfdfb1 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -29,6 +29,7 @@
#include <linux/export.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
+#include <linux/sched/wake_q.h>
#include <linux/semaphore.h>
#include <linux/spinlock.h>
#include <linux/ftrace.h>
@@ -37,7 +38,7 @@ static noinline void __down(struct semaphore *sem);
static noinline int __down_interruptible(struct semaphore *sem);
static noinline int __down_killable(struct semaphore *sem);
static noinline int __down_timeout(struct semaphore *sem, long timeout);
-static noinline void __up(struct semaphore *sem);
+static noinline void __up(struct semaphore *sem, struct wake_q_head *wake_q);
/**
* down - acquire the semaphore
@@ -178,13 +179,16 @@ EXPORT_SYMBOL(down_timeout);
void up(struct semaphore *sem)
{
unsigned long flags;
+ DEFINE_WAKE_Q(wake_q);
raw_spin_lock_irqsave(&sem->lock, flags);
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
- __up(sem);
+ __up(sem, &wake_q);
raw_spin_unlock_irqrestore(&sem->lock, flags);
+ if (!wake_q_empty(&wake_q))
+ wake_up_q(&wake_q);
}
EXPORT_SYMBOL(up);
@@ -252,11 +256,12 @@ static noinline int __sched __down_timeout(struct semaphore *sem, long timeout)
return __down_common(sem, TASK_UNINTERRUPTIBLE, timeout);
}
-static noinline void __sched __up(struct semaphore *sem)
+static noinline void __sched __up(struct semaphore *sem,
+ struct wake_q_head *wake_q)
{
struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
struct semaphore_waiter, list);
list_del(&waiter->list);
waiter->up = true;
- wake_up_process(waiter->task);
+ wake_q_add(wake_q, waiter->task);
}
--
2.39.5
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Chen Linxuan: backport same logic without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
v1 -> v2: use vma_is_secretmem() instead of directly checking
vma->vm_file->f_op == &secretmem_fops
---
lib/buildid.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..34315d09b544 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,10 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+ /* reject secretmem */
+ if (vma_is_secretmem(vma))
+ return -EFAULT;
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
From: Waiman Long <longman(a)redhat.com>
[ Upstream commit 85b2b9c16d053364e2004883140538e73b333cdb ]
A circular lock dependency splat has been seen involving down_trylock():
======================================================
WARNING: possible circular locking dependency detected
6.12.0-41.el10.s390x+debug
------------------------------------------------------
dd/32479 is trying to acquire lock:
0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
but task is already holding lock:
000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
the existing dependency chain (in reverse order) is:
-> #4 (&zone->lock){-.-.}-{2:2}:
-> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
-> #2 (&rq->__lock){-.-.}-{2:2}:
-> #1 (&p->pi_lock){-.-.}-{2:2}:
-> #0 ((console_sem).lock){-.-.}-{2:2}:
The console_sem -> pi_lock dependency is due to calling try_to_wake_up()
while holding the console_sem raw_spinlock. This dependency can be broken
by using wake_q to do the wakeup instead of calling try_to_wake_up()
under the console_sem lock. This will also make the semaphore's
raw_spinlock become a terminal lock without taking any further locks
underneath it.
The hrtimer_bases.lock is a raw_spinlock while zone->lock is a
spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via
the debug_objects_fill_pool() helper function in the debugobjects code.
-> #4 (&zone->lock){-.-.}-{2:2}:
__lock_acquire+0xe86/0x1cc0
lock_acquire.part.0+0x258/0x630
lock_acquire+0xb8/0xe0
_raw_spin_lock_irqsave+0xb4/0x120
rmqueue_bulk+0xac/0x8f0
__rmqueue_pcplist+0x580/0x830
rmqueue_pcplist+0xfc/0x470
rmqueue.isra.0+0xdec/0x11b0
get_page_from_freelist+0x2ee/0xeb0
__alloc_pages_noprof+0x2c2/0x520
alloc_pages_mpol_noprof+0x1fc/0x4d0
alloc_pages_noprof+0x8c/0xe0
allocate_slab+0x320/0x460
___slab_alloc+0xa58/0x12b0
__slab_alloc.isra.0+0x42/0x60
kmem_cache_alloc_noprof+0x304/0x350
fill_pool+0xf6/0x450
debug_object_activate+0xfe/0x360
enqueue_hrtimer+0x34/0x190
__run_hrtimer+0x3c8/0x4c0
__hrtimer_run_queues+0x1b2/0x260
hrtimer_interrupt+0x316/0x760
do_IRQ+0x9a/0xe0
do_irq_async+0xf6/0x160
Normally a raw_spinlock to spinlock dependency is not legitimate
and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled,
but debug_objects_fill_pool() is an exception as it explicitly
allows this dependency for non-PREEMPT_RT kernel without causing
PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is
legitimate and not a bug.
Anyway, semaphore is the only locking primitive left that is still
using try_to_wake_up() to do wakeup inside critical section, all the
other locking primitives had been migrated to use wake_q to do wakeup
outside of the critical section. It is also possible that there are
other circular locking dependencies involving printk/console_sem or
other existing/new semaphores lurking somewhere which may show up in
the future. Let just do the migration now to wake_q to avoid headache
like this.
Reported-by: yzbot+ed801a886dfdbfe7136d(a)syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman(a)redhat.com>
Signed-off-by: Boqun Feng <boqun.feng(a)gmail.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/r/20250307232717.1759087-3-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/semaphore.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 9aa855a96c4ae..aadde65402913 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -29,6 +29,7 @@
#include <linux/export.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
+#include <linux/sched/wake_q.h>
#include <linux/semaphore.h>
#include <linux/spinlock.h>
#include <linux/ftrace.h>
@@ -37,7 +38,7 @@ static noinline void __down(struct semaphore *sem);
static noinline int __down_interruptible(struct semaphore *sem);
static noinline int __down_killable(struct semaphore *sem);
static noinline int __down_timeout(struct semaphore *sem, long timeout);
-static noinline void __up(struct semaphore *sem);
+static noinline void __up(struct semaphore *sem, struct wake_q_head *wake_q);
/**
* down - acquire the semaphore
@@ -178,13 +179,16 @@ EXPORT_SYMBOL(down_timeout);
void up(struct semaphore *sem)
{
unsigned long flags;
+ DEFINE_WAKE_Q(wake_q);
raw_spin_lock_irqsave(&sem->lock, flags);
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
- __up(sem);
+ __up(sem, &wake_q);
raw_spin_unlock_irqrestore(&sem->lock, flags);
+ if (!wake_q_empty(&wake_q))
+ wake_up_q(&wake_q);
}
EXPORT_SYMBOL(up);
@@ -252,11 +256,12 @@ static noinline int __sched __down_timeout(struct semaphore *sem, long timeout)
return __down_common(sem, TASK_UNINTERRUPTIBLE, timeout);
}
-static noinline void __sched __up(struct semaphore *sem)
+static noinline void __sched __up(struct semaphore *sem,
+ struct wake_q_head *wake_q)
{
struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
struct semaphore_waiter, list);
list_del(&waiter->list);
waiter->up = true;
- wake_up_process(waiter->task);
+ wake_q_add(wake_q, waiter->task);
}
--
2.39.5
From: Eric Dumazet <edumazet(a)google.com>
tcp_abort() has the same issue than the one fixed in the prior patch
in tcp_write_err().
commit 5ce4645c23cf5f048eb8e9ce49e514bababdee85 upstream.
To apply commit bac76cf89816bff06c4ec2f3df97dc34e150a1c4,
this patch must be applied first.
In order to get consistent results from tcp_poll(), we must call
sk_error_report() after tcp_done().
We can use tcp_done_with_error() to centralize this logic.
Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Acked-by: Neal Cardwell <ncardwell(a)google.com>
Link: https://lore.kernel.org/r/20240528125253.1966136-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
[youngmin: Resolved minor conflict in net/ipv4/tcp.c]
Signed-off-by: Youngmin Nam <youngmin.nam(a)samsung.com>
---
net/ipv4/tcp.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7d591a0cf0c7..1ad3a20eb9b7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4755,13 +4755,9 @@ int tcp_abort(struct sock *sk, int err)
bh_lock_sock(sk);
if (!sock_flag(sk, SOCK_DEAD)) {
- WRITE_ONCE(sk->sk_err, err);
- /* This barrier is coupled with smp_rmb() in tcp_poll() */
- smp_wmb();
- sk_error_report(sk);
if (tcp_need_reset(sk->sk_state))
tcp_send_active_reset(sk, GFP_ATOMIC);
- tcp_done(sk);
+ tcp_done_with_error(sk, err);
}
bh_unlock_sock(sk);
--
2.39.2
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Chen Linxuan: backport same logic without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
v1 -> v2: use vma_is_secretmem() instead of directly checking
vma->vm_file->f_op == &secretmem_fops
v2 -> v3: keep original comment
---
lib/buildid.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..8d839ff5548e 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,10 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma_is_secretmem(vma))
+ return -EFAULT;
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Chen Linxuan: backport same logic without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
v1 -> v2: use vma_is_secretmem() instead of directly checking
vma->vm_file->f_op == &secretmem_fops
---
lib/buildid.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..34315d09b544 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,10 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+ /* reject secretmem */
+ if (vma_is_secretmem(vma))
+ return -EFAULT;
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
From: Waiman Long <longman(a)redhat.com>
[ Upstream commit 85b2b9c16d053364e2004883140538e73b333cdb ]
A circular lock dependency splat has been seen involving down_trylock():
======================================================
WARNING: possible circular locking dependency detected
6.12.0-41.el10.s390x+debug
------------------------------------------------------
dd/32479 is trying to acquire lock:
0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
but task is already holding lock:
000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
the existing dependency chain (in reverse order) is:
-> #4 (&zone->lock){-.-.}-{2:2}:
-> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
-> #2 (&rq->__lock){-.-.}-{2:2}:
-> #1 (&p->pi_lock){-.-.}-{2:2}:
-> #0 ((console_sem).lock){-.-.}-{2:2}:
The console_sem -> pi_lock dependency is due to calling try_to_wake_up()
while holding the console_sem raw_spinlock. This dependency can be broken
by using wake_q to do the wakeup instead of calling try_to_wake_up()
under the console_sem lock. This will also make the semaphore's
raw_spinlock become a terminal lock without taking any further locks
underneath it.
The hrtimer_bases.lock is a raw_spinlock while zone->lock is a
spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via
the debug_objects_fill_pool() helper function in the debugobjects code.
-> #4 (&zone->lock){-.-.}-{2:2}:
__lock_acquire+0xe86/0x1cc0
lock_acquire.part.0+0x258/0x630
lock_acquire+0xb8/0xe0
_raw_spin_lock_irqsave+0xb4/0x120
rmqueue_bulk+0xac/0x8f0
__rmqueue_pcplist+0x580/0x830
rmqueue_pcplist+0xfc/0x470
rmqueue.isra.0+0xdec/0x11b0
get_page_from_freelist+0x2ee/0xeb0
__alloc_pages_noprof+0x2c2/0x520
alloc_pages_mpol_noprof+0x1fc/0x4d0
alloc_pages_noprof+0x8c/0xe0
allocate_slab+0x320/0x460
___slab_alloc+0xa58/0x12b0
__slab_alloc.isra.0+0x42/0x60
kmem_cache_alloc_noprof+0x304/0x350
fill_pool+0xf6/0x450
debug_object_activate+0xfe/0x360
enqueue_hrtimer+0x34/0x190
__run_hrtimer+0x3c8/0x4c0
__hrtimer_run_queues+0x1b2/0x260
hrtimer_interrupt+0x316/0x760
do_IRQ+0x9a/0xe0
do_irq_async+0xf6/0x160
Normally a raw_spinlock to spinlock dependency is not legitimate
and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled,
but debug_objects_fill_pool() is an exception as it explicitly
allows this dependency for non-PREEMPT_RT kernel without causing
PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is
legitimate and not a bug.
Anyway, semaphore is the only locking primitive left that is still
using try_to_wake_up() to do wakeup inside critical section, all the
other locking primitives had been migrated to use wake_q to do wakeup
outside of the critical section. It is also possible that there are
other circular locking dependencies involving printk/console_sem or
other existing/new semaphores lurking somewhere which may show up in
the future. Let just do the migration now to wake_q to avoid headache
like this.
Reported-by: yzbot+ed801a886dfdbfe7136d(a)syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman(a)redhat.com>
Signed-off-by: Boqun Feng <boqun.feng(a)gmail.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/r/20250307232717.1759087-3-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/semaphore.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 9ee381e4d2a4d..a26c915430ba0 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -29,6 +29,7 @@
#include <linux/export.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
+#include <linux/sched/wake_q.h>
#include <linux/semaphore.h>
#include <linux/spinlock.h>
#include <linux/ftrace.h>
@@ -37,7 +38,7 @@ static noinline void __down(struct semaphore *sem);
static noinline int __down_interruptible(struct semaphore *sem);
static noinline int __down_killable(struct semaphore *sem);
static noinline int __down_timeout(struct semaphore *sem, long timeout);
-static noinline void __up(struct semaphore *sem);
+static noinline void __up(struct semaphore *sem, struct wake_q_head *wake_q);
/**
* down - acquire the semaphore
@@ -182,13 +183,16 @@ EXPORT_SYMBOL(down_timeout);
void up(struct semaphore *sem)
{
unsigned long flags;
+ DEFINE_WAKE_Q(wake_q);
raw_spin_lock_irqsave(&sem->lock, flags);
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
- __up(sem);
+ __up(sem, &wake_q);
raw_spin_unlock_irqrestore(&sem->lock, flags);
+ if (!wake_q_empty(&wake_q))
+ wake_up_q(&wake_q);
}
EXPORT_SYMBOL(up);
@@ -256,11 +260,12 @@ static noinline int __sched __down_timeout(struct semaphore *sem, long timeout)
return __down_common(sem, TASK_UNINTERRUPTIBLE, timeout);
}
-static noinline void __sched __up(struct semaphore *sem)
+static noinline void __sched __up(struct semaphore *sem,
+ struct wake_q_head *wake_q)
{
struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
struct semaphore_waiter, list);
list_del(&waiter->list);
waiter->up = true;
- wake_up_process(waiter->task);
+ wake_q_add(wake_q, waiter->task);
}
--
2.39.5
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Chen Linxuan: backport same logic without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
v1 -> v2: use vma_is_secretmem() instead of directly checking
vma->vm_file->f_op == &secretmem_fops
v2 -> v3: keep original comment
---
lib/buildid.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..8d839ff5548e 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,10 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma_is_secretmem(vma))
+ return -EFAULT;
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
From: Waiman Long <longman(a)redhat.com>
[ Upstream commit 85b2b9c16d053364e2004883140538e73b333cdb ]
A circular lock dependency splat has been seen involving down_trylock():
======================================================
WARNING: possible circular locking dependency detected
6.12.0-41.el10.s390x+debug
------------------------------------------------------
dd/32479 is trying to acquire lock:
0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
but task is already holding lock:
000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
the existing dependency chain (in reverse order) is:
-> #4 (&zone->lock){-.-.}-{2:2}:
-> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
-> #2 (&rq->__lock){-.-.}-{2:2}:
-> #1 (&p->pi_lock){-.-.}-{2:2}:
-> #0 ((console_sem).lock){-.-.}-{2:2}:
The console_sem -> pi_lock dependency is due to calling try_to_wake_up()
while holding the console_sem raw_spinlock. This dependency can be broken
by using wake_q to do the wakeup instead of calling try_to_wake_up()
under the console_sem lock. This will also make the semaphore's
raw_spinlock become a terminal lock without taking any further locks
underneath it.
The hrtimer_bases.lock is a raw_spinlock while zone->lock is a
spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via
the debug_objects_fill_pool() helper function in the debugobjects code.
-> #4 (&zone->lock){-.-.}-{2:2}:
__lock_acquire+0xe86/0x1cc0
lock_acquire.part.0+0x258/0x630
lock_acquire+0xb8/0xe0
_raw_spin_lock_irqsave+0xb4/0x120
rmqueue_bulk+0xac/0x8f0
__rmqueue_pcplist+0x580/0x830
rmqueue_pcplist+0xfc/0x470
rmqueue.isra.0+0xdec/0x11b0
get_page_from_freelist+0x2ee/0xeb0
__alloc_pages_noprof+0x2c2/0x520
alloc_pages_mpol_noprof+0x1fc/0x4d0
alloc_pages_noprof+0x8c/0xe0
allocate_slab+0x320/0x460
___slab_alloc+0xa58/0x12b0
__slab_alloc.isra.0+0x42/0x60
kmem_cache_alloc_noprof+0x304/0x350
fill_pool+0xf6/0x450
debug_object_activate+0xfe/0x360
enqueue_hrtimer+0x34/0x190
__run_hrtimer+0x3c8/0x4c0
__hrtimer_run_queues+0x1b2/0x260
hrtimer_interrupt+0x316/0x760
do_IRQ+0x9a/0xe0
do_irq_async+0xf6/0x160
Normally a raw_spinlock to spinlock dependency is not legitimate
and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled,
but debug_objects_fill_pool() is an exception as it explicitly
allows this dependency for non-PREEMPT_RT kernel without causing
PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is
legitimate and not a bug.
Anyway, semaphore is the only locking primitive left that is still
using try_to_wake_up() to do wakeup inside critical section, all the
other locking primitives had been migrated to use wake_q to do wakeup
outside of the critical section. It is also possible that there are
other circular locking dependencies involving printk/console_sem or
other existing/new semaphores lurking somewhere which may show up in
the future. Let just do the migration now to wake_q to avoid headache
like this.
Reported-by: yzbot+ed801a886dfdbfe7136d(a)syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman(a)redhat.com>
Signed-off-by: Boqun Feng <boqun.feng(a)gmail.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/r/20250307232717.1759087-3-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/semaphore.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 34bfae72f2952..de9117c0e671e 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -29,6 +29,7 @@
#include <linux/export.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
+#include <linux/sched/wake_q.h>
#include <linux/semaphore.h>
#include <linux/spinlock.h>
#include <linux/ftrace.h>
@@ -38,7 +39,7 @@ static noinline void __down(struct semaphore *sem);
static noinline int __down_interruptible(struct semaphore *sem);
static noinline int __down_killable(struct semaphore *sem);
static noinline int __down_timeout(struct semaphore *sem, long timeout);
-static noinline void __up(struct semaphore *sem);
+static noinline void __up(struct semaphore *sem, struct wake_q_head *wake_q);
/**
* down - acquire the semaphore
@@ -183,13 +184,16 @@ EXPORT_SYMBOL(down_timeout);
void __sched up(struct semaphore *sem)
{
unsigned long flags;
+ DEFINE_WAKE_Q(wake_q);
raw_spin_lock_irqsave(&sem->lock, flags);
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
- __up(sem);
+ __up(sem, &wake_q);
raw_spin_unlock_irqrestore(&sem->lock, flags);
+ if (!wake_q_empty(&wake_q))
+ wake_up_q(&wake_q);
}
EXPORT_SYMBOL(up);
@@ -269,11 +273,12 @@ static noinline int __sched __down_timeout(struct semaphore *sem, long timeout)
return __down_common(sem, TASK_UNINTERRUPTIBLE, timeout);
}
-static noinline void __sched __up(struct semaphore *sem)
+static noinline void __sched __up(struct semaphore *sem,
+ struct wake_q_head *wake_q)
{
struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
struct semaphore_waiter, list);
list_del(&waiter->list);
waiter->up = true;
- wake_up_process(waiter->task);
+ wake_q_add(wake_q, waiter->task);
}
--
2.39.5
ufs-exynos driver enables the shareability option for gs101 which
means the descriptors need to be allocated as cacheable.
Fix the DT node and update bindings to add the dma-coherent property.
This fixes the UFS stability issues we have seen with the upstream
UFS driver.
Note this DT fix can go in independently of the other UFS fixes series
I sent recently [1], as the bootloader already leaves the sharability
bits enabled.
regards,
Peter
[1] https://lore.kernel.org/linux-scsi/20250226220414.343659-1-peter.griffin@li…
To: André Draszik <andre.draszik(a)linaro.org>
To: Tudor Ambarus <tudor.ambarus(a)linaro.org>
To: Rob Herring <robh(a)kernel.org>
To: Krzysztof Kozlowski <krzk+dt(a)kernel.org>
To: Conor Dooley <conor+dt(a)kernel.org>
To: Alim Akhtar <alim.akhtar(a)samsung.com>
To: Avri Altman <avri.altman(a)wdc.com>
To: Bart Van Assche <bvanassche(a)acm.org>
To: Martin K. Petersen <martin.petersen(a)oracle.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-samsung-soc(a)vger.kernel.org
Cc: devicetree(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-scsi(a)vger.kernel.org
Cc: kernel-team(a)android.com
Cc: willmcvicker(a)google.com
Signed-off-by: Peter Griffin <peter.griffin(a)linaro.org>
---
Peter Griffin (2):
arm64: dts: exynos: gs101: ufs: add dma-coherent property
scsi: ufs: dt-bindings: exynos: add dma-coherent property for gs101
Documentation/devicetree/bindings/ufs/samsung,exynos-ufs.yaml | 2 ++
arch/arm64/boot/dts/exynos/google/gs101.dtsi | 1 +
2 files changed, 3 insertions(+)
---
base-commit: b323d8e7bc03d27dec646bfdccb7d1a92411f189
change-id: 20250314-ufs-dma-coherent-980f2467690d
Best regards,
--
Peter Griffin <peter.griffin(a)linaro.org>
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 5daa0c35a1f0e7a6c3b8ba9cb721e7d1ace6e619
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031621-july-parkway-796f@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5daa0c35a1f0e7a6c3b8ba9cb721e7d1ace6e619 Mon Sep 17 00:00:00 2001
From: Matthew Maurer <mmaurer(a)google.com>
Date: Wed, 8 Jan 2025 23:35:08 +0000
Subject: [PATCH] rust: Disallow BTF generation with Rust + LTO
The kernel cannot currently self-parse BTF containing Rust debug
information. pahole uses the language of the CU to determine whether to
filter out debug information when generating the BTF. When LTO is
enabled, Rust code can cross CU boundaries, resulting in Rust debug
information in CUs labeled as C. This results in a system which cannot
parse its own BTF.
Signed-off-by: Matthew Maurer <mmaurer(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: c1177979af9c ("btf, scripts: Exclude Rust CUs with pahole")
Link: https://lore.kernel.org/r/20250108-rust-btf-lto-incompat-v1-1-60243ff6d820@…
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
diff --git a/init/Kconfig b/init/Kconfig
index d0d021b3fa3b..324c2886b2ea 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1973,7 +1973,7 @@ config RUST
depends on !MODVERSIONS || GENDWARFKSYMS
depends on !GCC_PLUGIN_RANDSTRUCT
depends on !RANDSTRUCT
- depends on !DEBUG_INFO_BTF || PAHOLE_HAS_LANG_EXCLUDE
+ depends on !DEBUG_INFO_BTF || (PAHOLE_HAS_LANG_EXCLUDE && !LTO)
depends on !CFI_CLANG || HAVE_CFI_ICALL_NORMALIZE_INTEGERS_RUSTC
select CFI_ICALL_NORMALIZE_INTEGERS if CFI_CLANG
depends on !CALL_PADDING || RUSTC_VERSION >= 108100
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5daa0c35a1f0e7a6c3b8ba9cb721e7d1ace6e619
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031622-rocker-narrow-b1e3@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5daa0c35a1f0e7a6c3b8ba9cb721e7d1ace6e619 Mon Sep 17 00:00:00 2001
From: Matthew Maurer <mmaurer(a)google.com>
Date: Wed, 8 Jan 2025 23:35:08 +0000
Subject: [PATCH] rust: Disallow BTF generation with Rust + LTO
The kernel cannot currently self-parse BTF containing Rust debug
information. pahole uses the language of the CU to determine whether to
filter out debug information when generating the BTF. When LTO is
enabled, Rust code can cross CU boundaries, resulting in Rust debug
information in CUs labeled as C. This results in a system which cannot
parse its own BTF.
Signed-off-by: Matthew Maurer <mmaurer(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: c1177979af9c ("btf, scripts: Exclude Rust CUs with pahole")
Link: https://lore.kernel.org/r/20250108-rust-btf-lto-incompat-v1-1-60243ff6d820@…
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
diff --git a/init/Kconfig b/init/Kconfig
index d0d021b3fa3b..324c2886b2ea 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1973,7 +1973,7 @@ config RUST
depends on !MODVERSIONS || GENDWARFKSYMS
depends on !GCC_PLUGIN_RANDSTRUCT
depends on !RANDSTRUCT
- depends on !DEBUG_INFO_BTF || PAHOLE_HAS_LANG_EXCLUDE
+ depends on !DEBUG_INFO_BTF || (PAHOLE_HAS_LANG_EXCLUDE && !LTO)
depends on !CFI_CLANG || HAVE_CFI_ICALL_NORMALIZE_INTEGERS_RUSTC
select CFI_ICALL_NORMALIZE_INTEGERS if CFI_CLANG
depends on !CALL_PADDING || RUSTC_VERSION >= 108100
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x 9360dfe4cbd62ff1eb8217b815964931523b75b3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031611-flatly-revolving-8c7b@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9360dfe4cbd62ff1eb8217b815964931523b75b3 Mon Sep 17 00:00:00 2001
From: Andrea Righi <arighi(a)nvidia.com>
Date: Mon, 3 Mar 2025 18:51:59 +0100
Subject: [PATCH] sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()
If a BPF scheduler provides an invalid CPU (outside the nr_cpu_ids
range) as prev_cpu to scx_bpf_select_cpu_dfl() it can cause a kernel
crash.
To prevent this, validate prev_cpu in scx_bpf_select_cpu_dfl() and
trigger an scx error if an invalid CPU is specified.
Fixes: f0e1a0643a59b ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable(a)vger.kernel.org # v6.12+
Signed-off-by: Andrea Righi <arighi(a)nvidia.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 0f1da199cfc7..7b9dfee858e7 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6422,6 +6422,9 @@ static bool check_builtin_idle_enabled(void)
__bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
u64 wake_flags, bool *is_idle)
{
+ if (!ops_cpu_valid(prev_cpu, NULL))
+ goto prev_cpu;
+
if (!check_builtin_idle_enabled())
goto prev_cpu;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 9360dfe4cbd62ff1eb8217b815964931523b75b3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031614-broiler-overgrown-4bd4@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9360dfe4cbd62ff1eb8217b815964931523b75b3 Mon Sep 17 00:00:00 2001
From: Andrea Righi <arighi(a)nvidia.com>
Date: Mon, 3 Mar 2025 18:51:59 +0100
Subject: [PATCH] sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()
If a BPF scheduler provides an invalid CPU (outside the nr_cpu_ids
range) as prev_cpu to scx_bpf_select_cpu_dfl() it can cause a kernel
crash.
To prevent this, validate prev_cpu in scx_bpf_select_cpu_dfl() and
trigger an scx error if an invalid CPU is specified.
Fixes: f0e1a0643a59b ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable(a)vger.kernel.org # v6.12+
Signed-off-by: Andrea Righi <arighi(a)nvidia.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 0f1da199cfc7..7b9dfee858e7 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6422,6 +6422,9 @@ static bool check_builtin_idle_enabled(void)
__bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
u64 wake_flags, bool *is_idle)
{
+ if (!ops_cpu_valid(prev_cpu, NULL))
+ goto prev_cpu;
+
if (!check_builtin_idle_enabled())
goto prev_cpu;
Set the MMC_CAP_NEED_RSP_BUSY capability for the sdhci-pxav3 host to
prevent conversion of R1B responses to R1. Without this, the eMMC card
in the samsung,coreprimevelte smartphone using the Marvell PXA1908 SoC
with this mmc host doesn't probe with the ETIMEDOUT error originating in
__mmc_poll_for_busy.
Note that the other issues reported for this phone and host, namely
floods of "Tuning failed, falling back to fixed sampling clock" dmesg
messages for the eMMC and unstable SDIO are not mitigated by this
change.
Link: https://lore.kernel.org/r/20200310153340.5593-1-ulf.hansson@linaro.org/
Link: https://lore.kernel.org/r/D7204PWIGQGI.1FRFQPPIEE2P9@matfyz.cz/
Link: https://lore.kernel.org/r/20250115-pxa1908-lkml-v14-0-847d24f3665a@skole.hr/
Cc: Duje Mihanović <duje.mihanovic(a)skole.hr>
Cc: stable(a)vger.kernel.org
Signed-off-by: Karel Balej <balejk(a)matfyz.cz>
---
drivers/mmc/host/sdhci-pxav3.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/mmc/host/sdhci-pxav3.c b/drivers/mmc/host/sdhci-pxav3.c
index 990723a008ae..3fb56face3d8 100644
--- a/drivers/mmc/host/sdhci-pxav3.c
+++ b/drivers/mmc/host/sdhci-pxav3.c
@@ -399,6 +399,7 @@ static int sdhci_pxav3_probe(struct platform_device *pdev)
if (!IS_ERR(pxa->clk_core))
clk_prepare_enable(pxa->clk_core);
+ host->mmc->caps |= MMC_CAP_NEED_RSP_BUSY;
/* enable 1/8V DDR capable */
host->mmc->caps |= MMC_CAP_1_8V_DDR;
--
2.48.1
Add device awake calls in case of rproc boot and rproc shutdown path.
Currently, device awake call is only present in the recovery path
of remoteproc. If a user stops and starts rproc by using the sysfs
interface, then on pm suspension the firmware loading fails. Keep the
device awake in such a case just like it is done for the recovery path.
Fixes: a781e5aa59110 ("remoteproc: core: Prevent system suspend during remoteproc recovery")
Signed-off-by: Souradeep Chowdhury <quic_schowdhu(a)quicinc.com>
Cc: stable(a)vger.kernel.org
---
drivers/remoteproc/remoteproc_core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index c2cf0d277729..908a7b8f6c7e 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1916,7 +1916,8 @@ int rproc_boot(struct rproc *rproc)
pr_err("invalid rproc handle\n");
return -EINVAL;
}
-
+
+ pm_stay_awake(rproc->dev.parent);
dev = &rproc->dev;
ret = mutex_lock_interruptible(&rproc->lock);
@@ -1961,6 +1962,7 @@ int rproc_boot(struct rproc *rproc)
atomic_dec(&rproc->power);
unlock_mutex:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_boot);
@@ -1991,6 +1993,7 @@ int rproc_shutdown(struct rproc *rproc)
struct device *dev = &rproc->dev;
int ret = 0;
+ pm_stay_awake(rproc->dev.parent);
ret = mutex_lock_interruptible(&rproc->lock);
if (ret) {
dev_err(dev, "can't lock rproc %s: %d\n", rproc->name, ret);
@@ -2027,6 +2030,7 @@ int rproc_shutdown(struct rproc *rproc)
rproc->table_ptr = NULL;
out:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_shutdown);
--
2.34.1
Add device awake calls in case of rproc boot and rproc shutdown path.
Currently, device awake call is only present in the recovery path
of remoteproc. If a user stops and starts rproc by using the sysfs
interface, then on pm suspension the firmware loading fails. Keep the
device awake in such a case just like it is done for the recovery path.
Signed-off-by: Souradeep Chowdhury <quic_schowdhu(a)quicinc.com>
---
drivers/remoteproc/remoteproc_core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index c2cf0d277729..908a7b8f6c7e 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1916,7 +1916,8 @@ int rproc_boot(struct rproc *rproc)
pr_err("invalid rproc handle\n");
return -EINVAL;
}
-
+
+ pm_stay_awake(rproc->dev.parent);
dev = &rproc->dev;
ret = mutex_lock_interruptible(&rproc->lock);
@@ -1961,6 +1962,7 @@ int rproc_boot(struct rproc *rproc)
atomic_dec(&rproc->power);
unlock_mutex:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_boot);
@@ -1991,6 +1993,7 @@ int rproc_shutdown(struct rproc *rproc)
struct device *dev = &rproc->dev;
int ret = 0;
+ pm_stay_awake(rproc->dev.parent);
ret = mutex_lock_interruptible(&rproc->lock);
if (ret) {
dev_err(dev, "can't lock rproc %s: %d\n", rproc->name, ret);
@@ -2027,6 +2030,7 @@ int rproc_shutdown(struct rproc *rproc)
rproc->table_ptr = NULL;
out:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_shutdown);
--
2.34.1
Add device awake calls in case of rproc boot and rproc shutdown path.
Currently, device awake call is only present in the recovery path
of remoteproc. If a user stops and starts rproc by using the sysfs
interface, then on pm suspension the firmware loading fails. Keep the
device awake in such a case just like it is done for the recovery path.
Signed-off-by: Souradeep Chowdhury <quic_schowdhu(a)quicinc.com>
Cc: stable(a)vger.kernel.org
---
drivers/remoteproc/remoteproc_core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index c2cf0d277729..908a7b8f6c7e 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1916,7 +1916,8 @@ int rproc_boot(struct rproc *rproc)
pr_err("invalid rproc handle\n");
return -EINVAL;
}
-
+
+ pm_stay_awake(rproc->dev.parent);
dev = &rproc->dev;
ret = mutex_lock_interruptible(&rproc->lock);
@@ -1961,6 +1962,7 @@ int rproc_boot(struct rproc *rproc)
atomic_dec(&rproc->power);
unlock_mutex:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_boot);
@@ -1991,6 +1993,7 @@ int rproc_shutdown(struct rproc *rproc)
struct device *dev = &rproc->dev;
int ret = 0;
+ pm_stay_awake(rproc->dev.parent);
ret = mutex_lock_interruptible(&rproc->lock);
if (ret) {
dev_err(dev, "can't lock rproc %s: %d\n", rproc->name, ret);
@@ -2027,6 +2030,7 @@ int rproc_shutdown(struct rproc *rproc)
rproc->table_ptr = NULL;
out:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_shutdown);
--
2.34.1
The quilt patch titled
Subject: sparc/mm: avoid calling arch_enter/leave_lazy_mmu() in set_ptes
has been removed from the -mm tree. Its filename was
sparc-mm-avoid-calling-arch_enter-leave_lazy_mmu-in-set_ptes.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: sparc/mm: avoid calling arch_enter/leave_lazy_mmu() in set_ptes
Date: Mon, 3 Mar 2025 14:15:38 +0000
With commit 1a10a44dfc1d ("sparc64: implement the new page table range
API") set_ptes was added to the sparc architecture. The implementation
included calling arch_enter/leave_lazy_mmu() calls.
The patch removes the usage of arch_enter/leave_lazy_mmu() since this
implies nesting of lazy mmu regions which is not supported. Without this
fix, lazy mmu mode is effectively disabled because we exit the mode after
the first set_ptes:
remap_pte_range()
-> arch_enter_lazy_mmu()
-> set_ptes()
-> arch_enter_lazy_mmu()
-> arch_leave_lazy_mmu()
-> arch_leave_lazy_mmu()
Powerpc suffered the same problem and fixed it in a corresponding way with
commit 47b8def9358c ("powerpc/mm: Avoid calling
arch_enter/leave_lazy_mmu() in set_ptes").
Link: https://lkml.kernel.org/r/20250303141542.3371656-5-ryan.roberts@arm.com
Fixes: 1a10a44dfc1d ("sparc64: implement the new page table range API")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Andreas Larsson <andreas(a)gaisler.com>
Acked-by: Juergen Gross <jgross(a)suse.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Juegren Gross <jgross(a)suse.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/sparc/include/asm/pgtable_64.h | 2 --
1 file changed, 2 deletions(-)
--- a/arch/sparc/include/asm/pgtable_64.h~sparc-mm-avoid-calling-arch_enter-leave_lazy_mmu-in-set_ptes
+++ a/arch/sparc/include/asm/pgtable_64.h
@@ -936,7 +936,6 @@ static inline void __set_pte_at(struct m
static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
{
- arch_enter_lazy_mmu_mode();
for (;;) {
__set_pte_at(mm, addr, ptep, pte, 0);
if (--nr == 0)
@@ -945,7 +944,6 @@ static inline void set_ptes(struct mm_st
pte_val(pte) += PAGE_SIZE;
addr += PAGE_SIZE;
}
- arch_leave_lazy_mmu_mode();
}
#define set_ptes set_ptes
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-use-ptep_get-instead-of-directly-dereferencing-pte_t.patch
The quilt patch titled
Subject: sparc/mm: disable preemption in lazy mmu mode
has been removed from the -mm tree. Its filename was
sparc-mm-disable-preemption-in-lazy-mmu-mode.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: sparc/mm: disable preemption in lazy mmu mode
Date: Mon, 3 Mar 2025 14:15:37 +0000
Since commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy
updates") it's been possible for arch_[enter|leave]_lazy_mmu_mode() to be
called without holding a page table lock (for the kernel mappings case),
and therefore it is possible that preemption may occur while in the lazy
mmu mode. The Sparc lazy mmu implementation is not robust to preemption
since it stores the lazy mode state in a per-cpu structure and does not
attempt to manage that state on task switch.
Powerpc had the same issue and fixed it by explicitly disabling preemption
in arch_enter_lazy_mmu_mode() and re-enabling in
arch_leave_lazy_mmu_mode(). See commit b9ef323ea168 ("powerpc/64s:
Disable preemption in hash lazy mmu mode").
Given Sparc's lazy mmu mode is based on powerpc's, let's fix it in the
same way here.
Link: https://lkml.kernel.org/r/20250303141542.3371656-4-ryan.roberts@arm.com
Fixes: 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy updates")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Andreas Larsson <andreas(a)gaisler.com>
Acked-by: Juergen Gross <jgross(a)suse.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Juegren Gross <jgross(a)suse.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/sparc/mm/tlb.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/sparc/mm/tlb.c~sparc-mm-disable-preemption-in-lazy-mmu-mode
+++ a/arch/sparc/mm/tlb.c
@@ -52,8 +52,10 @@ out:
void arch_enter_lazy_mmu_mode(void)
{
- struct tlb_batch *tb = this_cpu_ptr(&tlb_batch);
+ struct tlb_batch *tb;
+ preempt_disable();
+ tb = this_cpu_ptr(&tlb_batch);
tb->active = 1;
}
@@ -64,6 +66,7 @@ void arch_leave_lazy_mmu_mode(void)
if (tb->tlb_nr)
flush_tlb_pending();
tb->active = 0;
+ preempt_enable();
}
static void tlb_batch_add_one(struct mm_struct *mm, unsigned long vaddr,
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-use-ptep_get-instead-of-directly-dereferencing-pte_t.patch
The quilt patch titled
Subject: mm: fix lazy mmu docs and usage
has been removed from the -mm tree. Its filename was
mm-fix-lazy-mmu-docs-and-usage.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: mm: fix lazy mmu docs and usage
Date: Mon, 3 Mar 2025 14:15:35 +0000
Patch series "Fix lazy mmu mode", v2.
I'm planning to implement lazy mmu mode for arm64 to optimize vmalloc. As
part of that, I will extend lazy mmu mode to cover kernel mappings in
vmalloc table walkers. While lazy mmu mode is already used for kernel
mappings in a few places, this will extend it's use significantly.
Having reviewed the existing lazy mmu implementations in powerpc, sparc
and x86, it looks like there are a bunch of bugs, some of which may be
more likely to trigger once I extend the use of lazy mmu. So this series
attempts to clarify the requirements and fix all the bugs in advance of
that series. See patch #1 commit log for all the details.
This patch (of 5):
The docs, implementations and use of arch_[enter|leave]_lazy_mmu_mode() is
a bit of a mess (to put it politely). There are a number of issues
related to nesting of lazy mmu regions and confusion over whether the
task, when in a lazy mmu region, is preemptible or not. Fix all the
issues relating to the core-mm. Follow up commits will fix the
arch-specific implementations. 3 arches implement lazy mmu; powerpc,
sparc and x86.
When arch_[enter|leave]_lazy_mmu_mode() was first introduced by commit
6606c3e0da53 ("[PATCH] paravirt: lazy mmu mode hooks.patch"), it was
expected that lazy mmu regions would never nest and that the appropriate
page table lock(s) would be held while in the region, thus ensuring the
region is non-preemptible. Additionally lazy mmu regions were only used
during manipulation of user mappings.
Commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy
updates") started invoking the lazy mmu mode in apply_to_pte_range(),
which is used for both user and kernel mappings. For kernel mappings the
region is no longer protected by any lock so there is no longer any
guarantee about non-preemptibility. Additionally, for RT configs, the
holding the PTL only implies no CPU migration, it doesn't prevent
preemption.
Commit bcc6cc832573 ("mm: add default definition of set_ptes()") added
arch_[enter|leave]_lazy_mmu_mode() to the default implementation of
set_ptes(), used by x86. So after this commit, lazy mmu regions can be
nested. Additionally commit 1a10a44dfc1d ("sparc64: implement the new
page table range API") and commit 9fee28baa601 ("powerpc: implement the
new page table range API") did the same for the sparc and powerpc
set_ptes() overrides.
powerpc couldn't deal with preemption so avoids it in commit b9ef323ea168
("powerpc/64s: Disable preemption in hash lazy mmu mode"), which
explicitly disables preemption for the whole region in its implementation.
x86 can support preemption (or at least it could until it tried to add
support nesting; more on this below). Sparc looks to be totally broken in
the face of preemption, as far as I can tell.
powerpc can't deal with nesting, so avoids it in commit 47b8def9358c
("powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes"),
which removes the lazy mmu calls from its implementation of set_ptes().
x86 attempted to support nesting in commit 49147beb0ccb ("x86/xen: allow
nesting of same lazy mode") but as far as I can tell, this breaks its
support for preemption.
In short, it's all a mess; the semantics for
arch_[enter|leave]_lazy_mmu_mode() are not clearly defined and as a result
the implementations all have different expectations, sticking plasters and
bugs.
arm64 is aiming to start using these hooks, so let's clean everything up
before adding an arm64 implementation. Update the documentation to state
that lazy mmu regions can never be nested, must not be called in interrupt
context and preemption may or may not be enabled for the duration of the
region. And fix the generic implementation of set_ptes() to avoid
nesting.
arch-specific fixes to conform to the new spec will proceed this one.
These issues were spotted by code review and I have no evidence of issues
being reported in the wild.
Link: https://lkml.kernel.org/r/20250303141542.3371656-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20250303141542.3371656-2-ryan.roberts@arm.com
Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Juergen Gross <jgross(a)suse.com>
Cc: Andreas Larsson <andreas(a)gaisler.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Juegren Gross <jgross(a)suse.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/pgtable.h | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
--- a/include/linux/pgtable.h~mm-fix-lazy-mmu-docs-and-usage
+++ a/include/linux/pgtable.h
@@ -222,10 +222,14 @@ static inline int pmd_dirty(pmd_t pmd)
* hazard could result in the direct mode hypervisor case, since the actual
* write to the page tables may not yet have taken place, so reads though
* a raw PTE pointer after it has been modified are not guaranteed to be
- * up to date. This mode can only be entered and left under the protection of
- * the page table locks for all page tables which may be modified. In the UP
- * case, this is required so that preemption is disabled, and in the SMP case,
- * it must synchronize the delayed page table writes properly on other CPUs.
+ * up to date.
+ *
+ * In the general case, no lock is guaranteed to be held between entry and exit
+ * of the lazy mode. So the implementation must assume preemption may be enabled
+ * and cpu migration is possible; it must take steps to be robust against this.
+ * (In practice, for user PTE updates, the appropriate page table lock(s) are
+ * held, but for kernel PTE updates, no lock is held). Nesting is not permitted
+ * and the mode cannot be used in interrupt context.
*/
#ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE
#define arch_enter_lazy_mmu_mode() do {} while (0)
@@ -287,7 +291,6 @@ static inline void set_ptes(struct mm_st
{
page_table_check_ptes_set(mm, ptep, pte, nr);
- arch_enter_lazy_mmu_mode();
for (;;) {
set_pte(ptep, pte);
if (--nr == 0)
@@ -295,7 +298,6 @@ static inline void set_ptes(struct mm_st
ptep++;
pte = pte_next_pfn(pte);
}
- arch_leave_lazy_mmu_mode();
}
#endif
#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-use-ptep_get-instead-of-directly-dereferencing-pte_t.patch
The quilt patch titled
Subject: mm: make page_mapped_in_vma() hugetlb walk aware
has been removed from the -mm tree. Its filename was
mm-make-page_mapped_in_vma-hugetlb-walk-aware.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jane Chu <jane.chu(a)oracle.com>
Subject: mm: make page_mapped_in_vma() hugetlb walk aware
Date: Mon, 24 Feb 2025 14:14:45 -0700
When a process consumes a UE in a page, the memory failure handler
attempts to collect information for a potential SIGBUS. If the page is an
anonymous page, page_mapped_in_vma(page, vma) is invoked in order to
1. retrieve the vaddr from the process' address space,
2. verify that the vaddr is indeed mapped to the poisoned page,
where 'page' is the precise small page with UE.
It's been observed that when injecting poison to a non-head subpage of an
anonymous hugetlb page, no SIGBUS shows up, while injecting to the head
page produces a SIGBUS. The cause is that, though hugetlb_walk() returns
a valid pmd entry (on x86), but check_pte() detects mismatch between the
head page per the pmd and the input subpage. Thus the vaddr is considered
not mapped to the subpage and the process is not collected for SIGBUS
purpose. This is the calling stack:
collect_procs_anon
page_mapped_in_vma
page_vma_mapped_walk
hugetlb_walk
huge_pte_lock
check_pte
check_pte() header says that it
"check if [pvmw->pfn, @pvmw->pfn + @pvmw->nr_pages) is mapped at the @pvmw->pte"
but practically works only if pvmw->pfn is the head page pfn at pvmw->pte.
Hindsight acknowledging that some pvmw->pte could point to a hugepage of
some sort such that it makes sense to make check_pte() work for hugepage.
Link: https://lkml.kernel.org/r/20250224211445.2663312-1-jane.chu@oracle.com
Signed-off-by: Jane Chu <jane.chu(a)oracle.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kirill A. Shuemov <kirill.shutemov(a)linux.intel.com>
Cc: linmiaohe <linmiaohe(a)huawei.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
--- a/mm/page_vma_mapped.c~mm-make-page_mapped_in_vma-hugetlb-walk-aware
+++ a/mm/page_vma_mapped.c
@@ -84,6 +84,7 @@ again:
* mapped at the @pvmw->pte
* @pvmw: page_vma_mapped_walk struct, includes a pair pte and pfn range
* for checking
+ * @pte_nr: the number of small pages described by @pvmw->pte.
*
* page_vma_mapped_walk() found a place where pfn range is *potentially*
* mapped. check_pte() has to validate this.
@@ -100,7 +101,7 @@ again:
* Otherwise, return false.
*
*/
-static bool check_pte(struct page_vma_mapped_walk *pvmw)
+static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
{
unsigned long pfn;
pte_t ptent = ptep_get(pvmw->pte);
@@ -132,7 +133,11 @@ static bool check_pte(struct page_vma_ma
pfn = pte_pfn(ptent);
}
- return (pfn - pvmw->pfn) < pvmw->nr_pages;
+ if ((pfn + pte_nr - 1) < pvmw->pfn)
+ return false;
+ if (pfn > (pvmw->pfn + pvmw->nr_pages - 1))
+ return false;
+ return true;
}
/* Returns true if the two ranges overlap. Careful to not overflow. */
@@ -207,7 +212,7 @@ bool page_vma_mapped_walk(struct page_vm
return false;
pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte);
- if (!check_pte(pvmw))
+ if (!check_pte(pvmw, pages_per_huge_page(hstate)))
return not_found(pvmw);
return true;
}
@@ -290,7 +295,7 @@ restart:
goto next_pte;
}
this_pte:
- if (check_pte(pvmw))
+ if (check_pte(pvmw, 1))
return true;
next_pte:
do {
_
Patches currently in -mm which might be from jane.chu(a)oracle.com are
The quilt patch titled
Subject: mm/damon: avoid applying DAMOS action to same entity multiple times
has been removed from the -mm tree. Its filename was
mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon: avoid applying DAMOS action to same entity multiple times
Date: Fri, 7 Feb 2025 13:20:33 -0800
'paddr' DAMON operations set can apply a DAMOS scheme's action to a large
folio multiple times in single DAMOS-regions-walk if the folio is laid on
multiple DAMON regions. Add a field for DAMOS scheme object that can be
used by the underlying ops to know what was the last entity that the
scheme's action has applied. The core layer unsets the field when each
DAMOS-regions-walk is done for the given scheme. And update 'paddr' ops
to use the infrastructure to avoid the problem.
Link: https://lkml.kernel.org/r/20250207212033.45269-3-sj@kernel.org
Fixes: 57223ac29584 ("mm/damon/paddr: support the pageout scheme")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: Usama Arif <usamaarif642(a)gmail.com>
Closes: https://lore.kernel.org/20250203225604.44742-3-usamaarif642@gmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/damon.h | 11 +++++++++++
mm/damon/core.c | 1 +
mm/damon/paddr.c | 39 +++++++++++++++++++++++++++------------
3 files changed, 39 insertions(+), 12 deletions(-)
--- a/include/linux/damon.h~mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times
+++ a/include/linux/damon.h
@@ -432,6 +432,7 @@ struct damos_access_pattern {
* @wmarks: Watermarks for automated (in)activation of this scheme.
* @target_nid: Destination node if @action is "migrate_{hot,cold}".
* @filters: Additional set of &struct damos_filter for &action.
+ * @last_applied: Last @action applied ops-managing entity.
* @stat: Statistics of this scheme.
* @list: List head for siblings.
*
@@ -454,6 +455,15 @@ struct damos_access_pattern {
* implementation could check pages of the region and skip &action to respect
* &filters
*
+ * The minimum entity that @action can be applied depends on the underlying
+ * &struct damon_operations. Since it may not be aligned with the core layer
+ * abstract, namely &struct damon_region, &struct damon_operations could apply
+ * @action to same entity multiple times. Large folios that underlying on
+ * multiple &struct damon region objects could be such examples. The &struct
+ * damon_operations can use @last_applied to avoid that. DAMOS core logic
+ * unsets @last_applied when each regions walking for applying the scheme is
+ * finished.
+ *
* After applying the &action to each region, &stat_count and &stat_sz is
* updated to reflect the number of regions and total size of regions that the
* &action is applied.
@@ -482,6 +492,7 @@ struct damos {
int target_nid;
};
struct list_head filters;
+ void *last_applied;
struct damos_stat stat;
struct list_head list;
};
--- a/mm/damon/core.c~mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times
+++ a/mm/damon/core.c
@@ -1856,6 +1856,7 @@ static void kdamond_apply_schemes(struct
s->next_apply_sis = c->passed_sample_intervals +
(s->apply_interval_us ? s->apply_interval_us :
c->attrs.aggr_interval) / sample_interval;
+ s->last_applied = NULL;
}
}
--- a/mm/damon/paddr.c~mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times
+++ a/mm/damon/paddr.c
@@ -254,6 +254,17 @@ static bool damos_pa_filter_out(struct d
return false;
}
+static bool damon_pa_invalid_damos_folio(struct folio *folio, struct damos *s)
+{
+ if (!folio)
+ return true;
+ if (folio == s->last_applied) {
+ folio_put(folio);
+ return true;
+ }
+ return false;
+}
+
static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s,
unsigned long *sz_filter_passed)
{
@@ -261,6 +272,7 @@ static unsigned long damon_pa_pageout(st
LIST_HEAD(folio_list);
bool install_young_filter = true;
struct damos_filter *filter;
+ struct folio *folio;
/* check access in page level again by default */
damos_for_each_filter(filter, s) {
@@ -279,9 +291,8 @@ static unsigned long damon_pa_pageout(st
addr = r->ar.start;
while (addr < r->ar.end) {
- struct folio *folio = damon_get_folio(PHYS_PFN(addr));
-
- if (!folio) {
+ folio = damon_get_folio(PHYS_PFN(addr));
+ if (damon_pa_invalid_damos_folio(folio, s)) {
addr += PAGE_SIZE;
continue;
}
@@ -307,6 +318,7 @@ put_folio:
damos_destroy_filter(filter);
applied = reclaim_pages(&folio_list);
cond_resched();
+ s->last_applied = folio;
return applied * PAGE_SIZE;
}
@@ -315,12 +327,12 @@ static inline unsigned long damon_pa_mar
unsigned long *sz_filter_passed)
{
unsigned long addr, applied = 0;
+ struct folio *folio;
addr = r->ar.start;
while (addr < r->ar.end) {
- struct folio *folio = damon_get_folio(PHYS_PFN(addr));
-
- if (!folio) {
+ folio = damon_get_folio(PHYS_PFN(addr));
+ if (damon_pa_invalid_damos_folio(folio, s)) {
addr += PAGE_SIZE;
continue;
}
@@ -339,6 +351,7 @@ put_folio:
addr += folio_size(folio);
folio_put(folio);
}
+ s->last_applied = folio;
return applied * PAGE_SIZE;
}
@@ -482,12 +495,12 @@ static unsigned long damon_pa_migrate(st
{
unsigned long addr, applied;
LIST_HEAD(folio_list);
+ struct folio *folio;
addr = r->ar.start;
while (addr < r->ar.end) {
- struct folio *folio = damon_get_folio(PHYS_PFN(addr));
-
- if (!folio) {
+ folio = damon_get_folio(PHYS_PFN(addr));
+ if (damon_pa_invalid_damos_folio(folio, s)) {
addr += PAGE_SIZE;
continue;
}
@@ -506,6 +519,7 @@ put_folio:
}
applied = damon_pa_migrate_pages(&folio_list, s->target_nid);
cond_resched();
+ s->last_applied = folio;
return applied * PAGE_SIZE;
}
@@ -523,15 +537,15 @@ static unsigned long damon_pa_stat(struc
{
unsigned long addr;
LIST_HEAD(folio_list);
+ struct folio *folio;
if (!damon_pa_scheme_has_filter(s))
return 0;
addr = r->ar.start;
while (addr < r->ar.end) {
- struct folio *folio = damon_get_folio(PHYS_PFN(addr));
-
- if (!folio) {
+ folio = damon_get_folio(PHYS_PFN(addr));
+ if (damon_pa_invalid_damos_folio(folio, s)) {
addr += PAGE_SIZE;
continue;
}
@@ -541,6 +555,7 @@ static unsigned long damon_pa_stat(struc
addr += folio_size(folio);
folio_put(folio);
}
+ s->last_applied = folio;
return 0;
}
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-sysfs-schemes-let-damon_sysfs_scheme_set_filters-be-used-for-different-named-directories.patch
mm-damon-sysfs-schemes-implement-core_filters-and-ops_filters-directories.patch
mm-damon-sysfs-schemes-commit-filters-in-coreops_filters-directories.patch
mm-damon-core-expose-damos_filter_for_ops-to-damon-kernel-api-callers.patch
mm-damon-sysfs-schemes-record-filters-of-which-layer-should-be-added-to-the-given-filters-directory.patch
mm-damon-sysfs-schemes-return-error-when-for-attempts-to-install-filters-on-wrong-sysfs-directory.patch
docs-abi-damon-document-coreops_filters-directories.patch
docs-admin-guide-mm-damon-usage-update-for-coreops_filters-directories.patch
mm-damon-sysfs-validate-user-inputs-from-damon_sysfs_commit_input.patch
mm-damon-core-invoke-kdamond_call-after-merging-is-done-if-possible.patch
mm-damon-core-make-damon_set_attrs-be-safe-to-be-called-from-damon_call.patch
mm-damon-sysfs-handle-commit-command-using-damon_call.patch
mm-damon-sysfs-remove-damon_sysfs_cmd_request-code-from-damon_sysfs_handle_cmd.patch
mm-damon-sysfs-remove-damon_sysfs_cmd_request_callback-and-its-callers.patch
mm-damon-sysfs-remove-damon_sysfs_cmd_request-and-its-readers.patch
mm-damon-sysfs-schemes-remove-obsolete-comment-for-damon_sysfs_schemes_clear_regions.patch
mm-damon-remove-damon_callback-private.patch
mm-damon-remove-before_start-of-damon_callback.patch
mm-damon-remove-damon_callback-after_sampling.patch
mm-damon-remove-damon_callback-before_damos_apply.patch
mm-damon-remove-damon_operations-reset_aggregated.patch
mm-damon-sysfs-schemes-avoid-wformat-security-warning-on-damon_sysfs_access_pattern_add_range_dir.patch
mm-madvise-use-is_memory_failure-from-madvise_do_behavior.patch
mm-madvise-split-out-populate-behavior-check-logic.patch
mm-madvise-deduplicate-madvise_do_behavior-skip-case-handlings.patch
mm-madvise-remove-len-parameter-of-madvise_do_behavior.patch
The quilt patch titled
Subject: mm/damon/ops: have damon_get_folio return folio even for tail pages
has been removed from the -mm tree. Its filename was
mm-damon-ops-have-damon_get_folio-return-folio-even-for-tail-pages.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Usama Arif <usamaarif642(a)gmail.com>
Subject: mm/damon/ops: have damon_get_folio return folio even for tail pages
Date: Fri, 7 Feb 2025 13:20:32 -0800
Patch series "mm/damon/paddr: fix large folios access and schemes handling".
DAMON operations set for physical address space, namely 'paddr', treats
tail pages as unaccessed always. It can also apply DAMOS action to a
large folio multiple times within single DAMOS' regions walking. As a
result, the monitoring output has poor quality and DAMOS works in
unexpected ways when large folios are being used. Fix those.
The patches were parts of Usama's hugepage_size DAMOS filter patch
series[1]. The first fix has collected from there with a slight commit
message change for the subject prefix. The second fix is re-written by SJ
and posted as an RFC before this series. The second one also got a slight
commit message change for the subject prefix.
[1] https://lore.kernel.org/20250203225604.44742-1-usamaarif642@gmail.com
[2] https://lore.kernel.org/20250206231103.38298-1-sj@kernel.org
This patch (of 2):
This effectively adds support for large folios in damon for paddr, as
damon_pa_mkold/young won't get a null folio from this function and won't
ignore it, hence access will be checked and reported. This also means
that larger folios will be considered for different DAMOS actions like
pageout, prioritization and migration. As these DAMOS actions will
consider larger folios, iterate through the region at folio_size and not
PAGE_SIZE intervals. This should not have an affect on vaddr, as
damon_young_pmd_entry considers pmd entries.
Link: https://lkml.kernel.org/r/20250207212033.45269-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250207212033.45269-2-sj@kernel.org
Fixes: a28397beb55b ("mm/damon: implement primitives for physical address space monitoring")
Signed-off-by: Usama Arif <usamaarif642(a)gmail.com>
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/ops-common.c | 2 +-
mm/damon/paddr.c | 24 ++++++++++++++++++------
2 files changed, 19 insertions(+), 7 deletions(-)
--- a/mm/damon/ops-common.c~mm-damon-ops-have-damon_get_folio-return-folio-even-for-tail-pages
+++ a/mm/damon/ops-common.c
@@ -26,7 +26,7 @@ struct folio *damon_get_folio(unsigned l
struct page *page = pfn_to_online_page(pfn);
struct folio *folio;
- if (!page || PageTail(page))
+ if (!page)
return NULL;
folio = page_folio(page);
--- a/mm/damon/paddr.c~mm-damon-ops-have-damon_get_folio-return-folio-even-for-tail-pages
+++ a/mm/damon/paddr.c
@@ -277,11 +277,14 @@ static unsigned long damon_pa_pageout(st
damos_add_filter(s, filter);
}
- for (addr = r->ar.start; addr < r->ar.end; addr += PAGE_SIZE) {
+ addr = r->ar.start;
+ while (addr < r->ar.end) {
struct folio *folio = damon_get_folio(PHYS_PFN(addr));
- if (!folio)
+ if (!folio) {
+ addr += PAGE_SIZE;
continue;
+ }
if (damos_pa_filter_out(s, folio))
goto put_folio;
@@ -297,6 +300,7 @@ static unsigned long damon_pa_pageout(st
else
list_add(&folio->lru, &folio_list);
put_folio:
+ addr += folio_size(folio);
folio_put(folio);
}
if (install_young_filter)
@@ -312,11 +316,14 @@ static inline unsigned long damon_pa_mar
{
unsigned long addr, applied = 0;
- for (addr = r->ar.start; addr < r->ar.end; addr += PAGE_SIZE) {
+ addr = r->ar.start;
+ while (addr < r->ar.end) {
struct folio *folio = damon_get_folio(PHYS_PFN(addr));
- if (!folio)
+ if (!folio) {
+ addr += PAGE_SIZE;
continue;
+ }
if (damos_pa_filter_out(s, folio))
goto put_folio;
@@ -329,6 +336,7 @@ static inline unsigned long damon_pa_mar
folio_deactivate(folio);
applied += folio_nr_pages(folio);
put_folio:
+ addr += folio_size(folio);
folio_put(folio);
}
return applied * PAGE_SIZE;
@@ -475,11 +483,14 @@ static unsigned long damon_pa_migrate(st
unsigned long addr, applied;
LIST_HEAD(folio_list);
- for (addr = r->ar.start; addr < r->ar.end; addr += PAGE_SIZE) {
+ addr = r->ar.start;
+ while (addr < r->ar.end) {
struct folio *folio = damon_get_folio(PHYS_PFN(addr));
- if (!folio)
+ if (!folio) {
+ addr += PAGE_SIZE;
continue;
+ }
if (damos_pa_filter_out(s, folio))
goto put_folio;
@@ -490,6 +501,7 @@ static unsigned long damon_pa_migrate(st
goto put_folio;
list_add(&folio->lru, &folio_list);
put_folio:
+ addr += folio_size(folio);
folio_put(folio);
}
applied = damon_pa_migrate_pages(&folio_list, s->target_nid);
_
Patches currently in -mm which might be from usamaarif642(a)gmail.com are
The quilt patch titled
Subject: mm/rmap: reject hugetlb folios in folio_make_device_exclusive()
has been removed from the -mm tree. Its filename was
mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/rmap: reject hugetlb folios in folio_make_device_exclusive()
Date: Mon, 10 Feb 2025 20:37:44 +0100
Even though FOLL_SPLIT_PMD on hugetlb now always fails with -EOPNOTSUPP,
let's add a safety net in case FOLL_SPLIT_PMD usage would ever be
reworked.
In particular, before commit 9cb28da54643 ("mm/gup: handle hugetlb in the
generic follow_page_mask code"), GUP(FOLL_SPLIT_PMD) would just have
returned a page. In particular, hugetlb folios that are not PMD-sized
would never have been prone to FOLL_SPLIT_PMD.
hugetlb folios can be anonymous, and page_make_device_exclusive_one() is
not really prepared for handling them at all. So let's spell that out.
Link: https://lkml.kernel.org/r/20250210193801.781278-3-david@redhat.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Tested-by: Alistair Popple <apopple(a)nvidia.com>
Cc: Alex Shi <alexs(a)kernel.org>
Cc: Danilo Krummrich <dakr(a)kernel.org>
Cc: Dave Airlie <airlied(a)gmail.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Karol Herbst <kherbst(a)redhat.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Lyude <lyude(a)redhat.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Simona Vetter <simona.vetter(a)ffwll.ch>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yanteng Si <si.yanteng(a)linux.dev>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/rmap.c~mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive
+++ a/mm/rmap.c
@@ -2499,7 +2499,7 @@ static bool folio_make_device_exclusive(
* Restrict to anonymous folios for now to avoid potential writeback
* issues.
*/
- if (!folio_test_anon(folio))
+ if (!folio_test_anon(folio) || folio_test_hugetlb(folio))
return false;
rmap_walk(folio, &rwc);
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-factor-out-large-folio-handling-from-folio_order-into-folio_large_order.patch
mm-factor-out-large-folio-handling-from-folio_nr_pages-into-folio_large_nr_pages.patch
mm-let-_folio_nr_pages-overlay-memcg_data-in-first-tail-page.patch
mm-let-_folio_nr_pages-overlay-memcg_data-in-first-tail-page-fix.patch
mm-move-hugetlb-specific-things-in-folio-to-page.patch
mm-move-_pincount-in-folio-to-page-on-32bit.patch
mm-move-_entire_mapcount-in-folio-to-page-on-32bit.patch
mm-rmap-pass-dst_vma-to-folio_dup_file_rmap_pte-and-friends.patch
mm-rmap-pass-vma-to-__folio_add_rmap.patch
mm-rmap-abstract-large-mapcount-operations-for-large-folios-hugetlb.patch
bit_spinlock-__always_inline-unlock-functions.patch
mm-rmap-use-folio_large_nr_pages-in-add-remove-functions.patch
mm-rmap-basic-mm-owner-tracking-for-large-folios-hugetlb.patch
mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch
mm-convert-folio_likely_mapped_shared-to-folio_maybe_mapped_shared.patch
mm-config_no_page_mapcount-to-prepare-for-not-maintain-per-page-mapcounts-in-large-folios.patch
fs-proc-page-remove-per-page-mapcount-dependency-for-proc-kpagecount-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-pm_mmap_exclusive-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-mapmax-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-smaps-smaps_rollup-config_no_page_mapcount.patch
mm-stop-maintaining-the-per-page-mapcount-of-large-folios-config_no_page_mapcount.patch
The quilt patch titled
Subject: mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs
has been removed from the -mm tree. Its filename was
mm-gup-reject-foll_split_pmd-with-hugetlb-vmas.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs
Date: Mon, 10 Feb 2025 20:37:43 +0100
Patch series "mm: fixes for device-exclusive entries (hmm)", v2.
Discussing the PageTail() call in make_device_exclusive_range() with
Willy, I recently discovered [1] that device-exclusive handling does not
properly work with THP, making the hmm-tests selftests fail if THPs are
enabled on the system.
Looking into more details, I found that hugetlb is not properly fenced,
and I realized that something that was bugging me for longer -- how
device-exclusive entries interact with mapcounts -- completely breaks
migration/swapout/split/hwpoison handling of these folios while they have
device-exclusive PTEs.
The program below can be used to allocate 1 GiB worth of pages and making
them device-exclusive on a kernel with CONFIG_TEST_HMM.
Once they are device-exclusive, these folios cannot get swapped out
(proc$pid/smaps_rollup will always indicate 1 GiB RSS no matter how much
one forces memory reclaim), and when having a memory block onlined to
ZONE_MOVABLE, trying to offline it will loop forever and complain about
failed migration of a page that should be movable.
# echo offline > /sys/devices/system/memory/memory136/state
# echo online_movable > /sys/devices/system/memory/memory136/state
# ./hmm-swap &
... wait until everything is device-exclusive
# echo offline > /sys/devices/system/memory/memory136/state
[ 285.193431][T14882] page: refcount:2 mapcount:0 mapping:0000000000000000
index:0x7f20671f7 pfn:0x442b6a
[ 285.196618][T14882] memcg:ffff888179298000
[ 285.198085][T14882] anon flags: 0x5fff0000002091c(referenced|uptodate|
dirty|active|owner_2|swapbacked|node=1|zone=3|lastcpupid=0x7ff)
[ 285.201734][T14882] raw: ...
[ 285.204464][T14882] raw: ...
[ 285.207196][T14882] page dumped because: migration failure
[ 285.209072][T14882] page_owner tracks the page as allocated
[ 285.210915][T14882] page last allocated via order 0, migratetype
Movable, gfp_mask 0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO),
id 14926, tgid 14926 (hmm-swap), ts 254506295376, free_ts 227402023774
[ 285.216765][T14882] post_alloc_hook+0x197/0x1b0
[ 285.218874][T14882] get_page_from_freelist+0x76e/0x3280
[ 285.220864][T14882] __alloc_frozen_pages_noprof+0x38e/0x2740
[ 285.223302][T14882] alloc_pages_mpol+0x1fc/0x540
[ 285.225130][T14882] folio_alloc_mpol_noprof+0x36/0x340
[ 285.227222][T14882] vma_alloc_folio_noprof+0xee/0x1a0
[ 285.229074][T14882] __handle_mm_fault+0x2b38/0x56a0
[ 285.230822][T14882] handle_mm_fault+0x368/0x9f0
...
This series fixes all issues I found so far. There is no easy way to fix
without a bigger rework/cleanup. I have a bunch of cleanups on top (some
previous sent, some the result of the discussion in v1) that I will send
out separately once this landed and I get to it.
I wish we could just use some special present PROT_NONE PTEs instead of
these (non-present, non-none) fake-swap entries; but that just results in
the same problem we keep having (lack of spare PTE bits), and staring at
other similar fake-swap entries, that ship has sailed.
With this series, make_device_exclusive() doesn't actually belong into
mm/rmap.c anymore, but I'll leave moving that for another day.
I only tested this series with the hmm-tests selftests due to lack of HW,
so I'd appreciate some testing, especially if the interaction between two
GPUs wanting a device-exclusive entry works as expected.
<program>
#include <stdio.h>
#include <fcntl.h>
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <linux/types.h>
#include <linux/ioctl.h>
#define HMM_DMIRROR_EXCLUSIVE _IOWR('H', 0x05, struct hmm_dmirror_cmd)
struct hmm_dmirror_cmd {
__u64 addr;
__u64 ptr;
__u64 npages;
__u64 cpages;
__u64 faults;
};
const size_t size = 1 * 1024 * 1024 * 1024ul;
const size_t chunk_size = 2 * 1024 * 1024ul;
int main(void)
{
struct hmm_dmirror_cmd cmd;
size_t cur_size;
int fd, ret;
char *addr, *mirror;
fd = open("/dev/hmm_dmirror1", O_RDWR, 0);
if (fd < 0) {
perror("open failed\n");
exit(1);
}
addr = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap failed\n");
exit(1);
}
madvise(addr, size, MADV_NOHUGEPAGE);
memset(addr, 1, size);
mirror = malloc(chunk_size);
for (cur_size = 0; cur_size < size; cur_size += chunk_size) {
cmd.addr = (uintptr_t)addr + cur_size;
cmd.ptr = (uintptr_t)mirror;
cmd.npages = chunk_size / getpagesize();
ret = ioctl(fd, HMM_DMIRROR_EXCLUSIVE, &cmd);
if (ret) {
perror("ioctl failed\n");
exit(1);
}
}
pause();
return 0;
}
</program>
[1] https://lkml.kernel.org/r/25e02685-4f1d-47fa-be5b-01ff85bb0ce2@redhat.com
This patch (of 17):
We only have two FOLL_SPLIT_PMD users. While uprobe refuses hugetlb
early, make_device_exclusive_range() can end up getting called on hugetlb
VMAs.
Right now, this means that with a PMD-sized hugetlb page, we can end up
calling split_huge_pmd(), because pmd_trans_huge() also succeeds with
hugetlb PMDs.
For example, using a modified hmm-test selftest one can trigger:
[ 207.017134][T14945] ------------[ cut here ]------------
[ 207.018614][T14945] kernel BUG at mm/page_table_check.c:87!
[ 207.019716][T14945] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 207.021072][T14945] CPU: 3 UID: 0 PID: ...
[ 207.023036][T14945] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[ 207.024834][T14945] RIP: 0010:page_table_check_clear.part.0+0x488/0x510
[ 207.026128][T14945] Code: ...
[ 207.029965][T14945] RSP: 0018:ffffc9000cb8f348 EFLAGS: 00010293
[ 207.031139][T14945] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: ffffffff8249a0cd
[ 207.032649][T14945] RDX: ffff88811e883c80 RSI: ffffffff8249a357 RDI: ffff88811e883c80
[ 207.034183][T14945] RBP: ffff888105c0a050 R08: 0000000000000005 R09: 0000000000000000
[ 207.035688][T14945] R10: 00000000ffffffff R11: 0000000000000003 R12: 0000000000000001
[ 207.037203][T14945] R13: 0000000000000200 R14: 0000000000000001 R15: dffffc0000000000
[ 207.038711][T14945] FS: 00007f2783275740(0000) GS:ffff8881f4980000(0000) knlGS:0000000000000000
[ 207.040407][T14945] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 207.041660][T14945] CR2: 00007f2782c00000 CR3: 0000000132356000 CR4: 0000000000750ef0
[ 207.043196][T14945] PKRU: 55555554
[ 207.043880][T14945] Call Trace:
[ 207.044506][T14945] <TASK>
[ 207.045086][T14945] ? __die+0x51/0x92
[ 207.045864][T14945] ? die+0x29/0x50
[ 207.046596][T14945] ? do_trap+0x250/0x320
[ 207.047430][T14945] ? do_error_trap+0xe7/0x220
[ 207.048346][T14945] ? page_table_check_clear.part.0+0x488/0x510
[ 207.049535][T14945] ? handle_invalid_op+0x34/0x40
[ 207.050494][T14945] ? page_table_check_clear.part.0+0x488/0x510
[ 207.051681][T14945] ? exc_invalid_op+0x2e/0x50
[ 207.052589][T14945] ? asm_exc_invalid_op+0x1a/0x20
[ 207.053596][T14945] ? page_table_check_clear.part.0+0x1fd/0x510
[ 207.054790][T14945] ? page_table_check_clear.part.0+0x487/0x510
[ 207.055993][T14945] ? page_table_check_clear.part.0+0x488/0x510
[ 207.057195][T14945] ? page_table_check_clear.part.0+0x487/0x510
[ 207.058384][T14945] __page_table_check_pmd_clear+0x34b/0x5a0
[ 207.059524][T14945] ? __pfx___page_table_check_pmd_clear+0x10/0x10
[ 207.060775][T14945] ? __pfx___mutex_unlock_slowpath+0x10/0x10
[ 207.061940][T14945] ? __pfx___lock_acquire+0x10/0x10
[ 207.062967][T14945] pmdp_huge_clear_flush+0x279/0x360
[ 207.064024][T14945] split_huge_pmd_locked+0x82b/0x3750
...
Before commit 9cb28da54643 ("mm/gup: handle hugetlb in the generic
follow_page_mask code"), we would have ignored the flag; instead, let's
simply refuse the combination completely in check_vma_flags(): the caller
is likely not prepared to handle any hugetlb folios.
We'll teach make_device_exclusive_range() separately to ignore any hugetlb
folios as a future-proof safety net.
Link: https://lkml.kernel.org/r/20250210193801.781278-1-david@redhat.com
Link: https://lkml.kernel.org/r/20250210193801.781278-2-david@redhat.com
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Tested-by: Alistair Popple <apopple(a)nvidia.com>
Cc: Alex Shi <alexs(a)kernel.org>
Cc: Danilo Krummrich <dakr(a)kernel.org>
Cc: Dave Airlie <airlied(a)gmail.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Karol Herbst <kherbst(a)redhat.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Lyude <lyude(a)redhat.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yanteng Si <si.yanteng(a)linux.dev>
Cc: Simona Vetter <simona.vetter(a)ffwll.ch>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/gup.c~mm-gup-reject-foll_split_pmd-with-hugetlb-vmas
+++ a/mm/gup.c
@@ -1283,6 +1283,9 @@ static int check_vma_flags(struct vm_are
if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
return -EOPNOTSUPP;
+ if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma))
+ return -EOPNOTSUPP;
+
if (vma_is_secretmem(vma))
return -EFAULT;
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-factor-out-large-folio-handling-from-folio_order-into-folio_large_order.patch
mm-factor-out-large-folio-handling-from-folio_nr_pages-into-folio_large_nr_pages.patch
mm-let-_folio_nr_pages-overlay-memcg_data-in-first-tail-page.patch
mm-let-_folio_nr_pages-overlay-memcg_data-in-first-tail-page-fix.patch
mm-move-hugetlb-specific-things-in-folio-to-page.patch
mm-move-_pincount-in-folio-to-page-on-32bit.patch
mm-move-_entire_mapcount-in-folio-to-page-on-32bit.patch
mm-rmap-pass-dst_vma-to-folio_dup_file_rmap_pte-and-friends.patch
mm-rmap-pass-vma-to-__folio_add_rmap.patch
mm-rmap-abstract-large-mapcount-operations-for-large-folios-hugetlb.patch
bit_spinlock-__always_inline-unlock-functions.patch
mm-rmap-use-folio_large_nr_pages-in-add-remove-functions.patch
mm-rmap-basic-mm-owner-tracking-for-large-folios-hugetlb.patch
mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch
mm-convert-folio_likely_mapped_shared-to-folio_maybe_mapped_shared.patch
mm-config_no_page_mapcount-to-prepare-for-not-maintain-per-page-mapcounts-in-large-folios.patch
fs-proc-page-remove-per-page-mapcount-dependency-for-proc-kpagecount-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-pm_mmap_exclusive-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-mapmax-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-smaps-smaps_rollup-config_no_page_mapcount.patch
mm-stop-maintaining-the-per-page-mapcount-of-large-folios-config_no_page_mapcount.patch
From: Eric Dumazet <edumazet(a)google.com>
tcp_abort() has the same issue than the one fixed in the prior patch
in tcp_write_err().
commit 5ce4645c23cf5f048eb8e9ce49e514bababdee85 upstream.
To apply commit bac76cf89816bff06c4ec2f3df97dc34e150a1c4,
this patch must be applied first.
In order to get consistent results from tcp_poll(), we must call
sk_error_report() after tcp_done().
We can use tcp_done_with_error() to centralize this logic.
Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Acked-by: Neal Cardwell <ncardwell(a)google.com>
Link: https://lore.kernel.org/r/20240528125253.1966136-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # v5.10+
[youngmin: Resolved minor conflict in net/ipv4/tcp.c]
Signed-off-by: Youngmin Nam <youngmin.nam(a)samsung.com>
---
net/ipv4/tcp.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7ad82be40f34..9fe164aa185c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4630,13 +4630,9 @@ int tcp_abort(struct sock *sk, int err)
bh_lock_sock(sk);
if (!sock_flag(sk, SOCK_DEAD)) {
- WRITE_ONCE(sk->sk_err, err);
- /* This barrier is coupled with smp_rmb() in tcp_poll() */
- smp_wmb();
- sk_error_report(sk);
if (tcp_need_reset(sk->sk_state))
tcp_send_active_reset(sk, GFP_ATOMIC);
- tcp_done(sk);
+ tcp_done_with_error(sk, err);
}
bh_unlock_sock(sk);
--
2.39.2
This fix is the deadline version of the change made to the rt scheduler
here:
https://lore.kernel.org/lkml/20250225180553.167995-1-harshit@nutanix.com/
Please go through the original change for more details on the issue.
In this fix we bail out or retry in the push_dl_task, if the task is no
longer at the head of pushable tasks list because this list changed
while trying to lock the runqueue of the other CPU.
Signed-off-by: Harshit Agarwal <harshit(a)nutanix.com>
Cc: stable(a)vger.kernel.org
---
kernel/sched/deadline.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 38e4537790af..c5048969c640 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2704,6 +2704,7 @@ static int push_dl_task(struct rq *rq)
{
struct task_struct *next_task;
struct rq *later_rq;
+ struct task_struct *task;
int ret = 0;
next_task = pick_next_pushable_dl_task(rq);
@@ -2734,15 +2735,30 @@ static int push_dl_task(struct rq *rq)
/* Will lock the rq it'll find */
later_rq = find_lock_later_rq(next_task, rq);
- if (!later_rq) {
- struct task_struct *task;
+ task = pick_next_pushable_dl_task(rq);
+ if (later_rq && (!task || task != next_task)) {
+ /*
+ * We must check all this again, since
+ * find_lock_later_rq releases rq->lock and it is
+ * then possible that next_task has migrated and
+ * is no longer at the head of the pushable list.
+ */
+ double_unlock_balance(rq, later_rq);
+ if (!task) {
+ /* No more tasks */
+ goto out;
+ }
+ put_task_struct(next_task);
+ next_task = task;
+ goto retry;
+ }
+ if (!later_rq) {
/*
* We must check all this again, since
* find_lock_later_rq releases rq->lock and it is
* then possible that next_task has migrated.
*/
- task = pick_next_pushable_dl_task(rq);
if (task == next_task) {
/*
* The task is still there. We don't try
@@ -2751,9 +2767,10 @@ static int push_dl_task(struct rq *rq)
goto out;
}
- if (!task)
+ if (!task) {
/* No more tasks */
goto out;
+ }
put_task_struct(next_task);
next_task = task;
--
2.22.3
Upon encountering errors during the HSIC pinctrl handling section the
regulator should be disabled.
Use devm_add_action_or_reset() to let the regulator-disabling routine be
handled by device resource management stack.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4d6141288c33 ("usb: chipidea: imx: pinctrl for HSIC is optional")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
v2: simplify the patch taking advantage of devm-helper (Frank Li)
It looks like devm_regulator_get_enable_optional() exists for this very
use case utilized in the driver but it's not present in old supported
stable kernels and I may say those dependency-patches wouldn't apply there
cleanly. So fix the problem first, further code simplification is a
subject to cleanup patch.
drivers/usb/chipidea/ci_hdrc_imx.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index 619779eef333..d942b3c72640 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -336,6 +336,13 @@ static int ci_hdrc_imx_notify_event(struct ci_hdrc *ci, unsigned int event)
return ret;
}
+static void ci_hdrc_imx_disable_regulator(void *arg)
+{
+ struct ci_hdrc_imx_data *data = arg;
+
+ regulator_disable(data->hsic_pad_regulator);
+}
+
static int ci_hdrc_imx_probe(struct platform_device *pdev)
{
struct ci_hdrc_imx_data *data;
@@ -394,6 +401,13 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
"Failed to enable HSIC pad regulator\n");
goto err_put;
}
+ ret = devm_add_action_or_reset(dev,
+ ci_hdrc_imx_disable_regulator, data);
+ if (ret) {
+ dev_err(dev,
+ "Failed to add regulator devm action\n");
+ goto err_put;
+ }
}
}
@@ -432,11 +446,11 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
ret = imx_get_clks(dev);
if (ret)
- goto disable_hsic_regulator;
+ goto qos_remove_request;
ret = imx_prepare_enable_clks(dev);
if (ret)
- goto disable_hsic_regulator;
+ goto qos_remove_request;
ret = clk_prepare_enable(data->clk_wakeup);
if (ret)
@@ -526,10 +540,7 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
clk_disable_unprepare(data->clk_wakeup);
err_wakeup_clk:
imx_disable_unprepare_clks(dev);
-disable_hsic_regulator:
- if (data->hsic_pad_regulator)
- /* don't overwrite original ret (cf. EPROBE_DEFER) */
- regulator_disable(data->hsic_pad_regulator);
+qos_remove_request:
if (pdata.flags & CI_HDRC_PMQOS)
cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
@@ -557,8 +568,6 @@ static void ci_hdrc_imx_remove(struct platform_device *pdev)
clk_disable_unprepare(data->clk_wakeup);
if (data->plat_data->flags & CI_HDRC_PMQOS)
cpu_latency_qos_remove_request(&data->pm_qos_req);
- if (data->hsic_pad_regulator)
- regulator_disable(data->hsic_pad_regulator);
}
if (data->usbmisc_data)
put_device(data->usbmisc_data->dev);
--
2.48.1
usbmisc is an optional device property so it is totally valid for the
corresponding data->usbmisc_data to have a NULL value.
Check that before dereferencing the pointer.
Found by Linux Verification Center (linuxtesting.org) with Svace static
analysis tool.
Fixes: 74adad500346 ("usb: chipidea: ci_hdrc_imx: decrement device's refcount in .remove() and in the error path of .probe()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
drivers/usb/chipidea/ci_hdrc_imx.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index 1a7fc638213e..619779eef333 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -534,7 +534,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
err_put:
- put_device(data->usbmisc_data->dev);
+ if (data->usbmisc_data)
+ put_device(data->usbmisc_data->dev);
return ret;
}
@@ -559,7 +560,8 @@ static void ci_hdrc_imx_remove(struct platform_device *pdev)
if (data->hsic_pad_regulator)
regulator_disable(data->hsic_pad_regulator);
}
- put_device(data->usbmisc_data->dev);
+ if (data->usbmisc_data)
+ put_device(data->usbmisc_data->dev);
}
static void ci_hdrc_imx_shutdown(struct platform_device *pdev)
--
2.48.1
The quilt patch titled
Subject: mm/page_alloc: fix memory accept before watermarks gets initialized
has been removed from the -mm tree. Its filename was
mm-page_alloc-fix-memory-accept-before-watermarks-gets-initialized.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/page_alloc: fix memory accept before watermarks gets initialized
Date: Mon, 10 Mar 2025 10:28:55 +0200
Watermarks are initialized during the postcore initcall. Until then, all
watermarks are set to zero. This causes cond_accept_memory() to
incorrectly skip memory acceptance because a watermark of 0 is always met.
This can lead to a premature OOM on boot.
To ensure progress, accept one MAX_ORDER page if the watermark is zero.
Link: https://lkml.kernel.org/r/20250310082855.2587122-1-kirill.shutemov@linux.in…
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Tested-by: Farrah Chen <farrah.chen(a)intel.com>
Reported-by: Farrah Chen <farrah.chen(a)intel.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Pankaj Gupta <pankaj.gupta(a)amd.com>
Cc: Ashish Kalra <ashish.kalra(a)amd.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe(a)intel.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: "Mike Rapoport (IBM)" <rppt(a)kernel.org>
Cc: Thomas Lendacky <thomas.lendacky(a)amd.com>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-fix-memory-accept-before-watermarks-gets-initialized
+++ a/mm/page_alloc.c
@@ -7004,7 +7004,7 @@ static inline bool has_unaccepted_memory
static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
- long to_accept;
+ long to_accept, wmark;
bool ret = false;
if (!has_unaccepted_memory())
@@ -7013,8 +7013,18 @@ static bool cond_accept_memory(struct zo
if (list_empty(&zone->unaccepted_pages))
return false;
+ wmark = promo_wmark_pages(zone);
+
+ /*
+ * Watermarks have not been initialized yet.
+ *
+ * Accepting one MAX_ORDER page to ensure progress.
+ */
+ if (!wmark)
+ return try_to_accept_memory_one(zone);
+
/* How much to accept to get to promo watermark? */
- to_accept = promo_wmark_pages(zone) -
+ to_accept = wmark -
(zone_page_state(zone, NR_FREE_PAGES) -
__zone_watermark_unusable_free(zone, order, 0) -
zone_page_state(zone, NR_UNACCEPTED));
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
The quilt patch titled
Subject: memcg: drain obj stock on cpu hotplug teardown
has been removed from the -mm tree. Its filename was
memcg-drain-obj-stock-on-cpu-hotplug-teardown.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Shakeel Butt <shakeel.butt(a)linux.dev>
Subject: memcg: drain obj stock on cpu hotplug teardown
Date: Mon, 10 Mar 2025 16:09:34 -0700
Currently on cpu hotplug teardown, only memcg stock is drained but we
need to drain the obj stock as well otherwise we will miss the stats
accumulated on the target cpu as well as the nr_bytes cached. The stats
include MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B. In
addition we are leaking reference to struct obj_cgroup object.
Link: https://lkml.kernel.org/r/20250310230934.2913113-1-shakeel.butt@linux.dev
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/mm/memcontrol.c~memcg-drain-obj-stock-on-cpu-hotplug-teardown
+++ a/mm/memcontrol.c
@@ -1921,9 +1921,18 @@ void drain_all_stock(struct mem_cgroup *
static int memcg_hotplug_cpu_dead(unsigned int cpu)
{
struct memcg_stock_pcp *stock;
+ struct obj_cgroup *old;
+ unsigned long flags;
stock = &per_cpu(memcg_stock, cpu);
+
+ /* drain_obj_stock requires stock_lock */
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
+ old = drain_obj_stock(stock);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+
drain_stock(stock);
+ obj_cgroup_put(old);
return 0;
}
_
Patches currently in -mm which might be from shakeel.butt(a)linux.dev are
memcg-add-hierarchical-effective-limits-for-v2.patch
memcg-dont-call-propagate_protected_usage-for-v1.patch
page_counter-track-failcnt-only-for-legacy-cgroups.patch
page_counter-reduce-struct-page_counter-size.patch
memcg-bypass-root-memcg-check-for-skmem-charging.patch
memcg-avoid-refill_stock-for-root-memcg.patch
memcg-move-do_memsw_account-to-config_memcg_v1.patch
The quilt patch titled
Subject: mm/huge_memory: drop beyond-EOF folios with the right number of refs
has been removed from the -mm tree. Its filename was
mm-huge_memory-drop-beyond-eof-folios-with-the-right-number-of-refs.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/huge_memory: drop beyond-EOF folios with the right number of refs
Date: Mon, 10 Mar 2025 11:57:27 -0400
When an after-split folio is large and needs to be dropped due to EOF,
folio_put_refs(folio, folio_nr_pages(folio)) should be used to drop all
page cache refs. Otherwise, the folio will not be freed, causing memory
leak.
This leak would happen on a filesystem with blocksize > page_size and a
truncate is performed, where the blocksize makes folios split to >0 order
ones, causing truncated folios not being freed.
Link: https://lkml.kernel.org/r/20250310155727.472846-1-ziy@nvidia.com
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: Hugh Dickins <hughd(a)google.com>
Closes: https://lore.kernel.org/all/fcbadb7f-dd3e-21df-f9a7-2853b53183c4@google.com/
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov(a)linux.intel.com>
Cc: Luis Chamberalin <mcgrof(a)kernel.org>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Pankaj Raghav <p.raghav(a)samsung.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Yang Shi <yang(a)os.amperecomputing.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-huge_memory-drop-beyond-eof-folios-with-the-right-number-of-refs
+++ a/mm/huge_memory.c
@@ -3304,7 +3304,7 @@ static void __split_huge_page(struct pag
folio_account_cleaned(tail,
inode_to_wb(folio->mapping->host));
__filemap_remove_folio(tail, NULL);
- folio_put(tail);
+ folio_put_refs(tail, folio_nr_pages(tail));
} else if (!folio_test_anon(folio)) {
__xa_store(&folio->mapping->i_pages, tail->index,
tail, 0);
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
selftests-mm-make-file-backed-thp-split-work-by-writing-pmd-size-data.patch
mm-huge_memory-allow-split-shmem-large-folio-to-any-lower-order.patch
selftests-mm-test-splitting-file-backed-thp-to-any-lower-order.patch
xarray-add-xas_try_split-to-split-a-multi-index-entry.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split-fix.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split-fix-2.patch
mm-huge_memory-move-folio-split-common-code-to-__folio_split.patch
mm-huge_memory-add-buddy-allocator-like-non-uniform-folio_split.patch
mm-huge_memory-remove-the-old-unused-__split_huge_page.patch
mm-huge_memory-add-folio_split-to-debugfs-testing-interface.patch
mm-truncate-use-folio_split-in-truncate-operation.patch
selftests-mm-add-tests-for-folio_split-buddy-allocator-like-split.patch
mm-filemap-use-xas_try_split-in-__filemap_add_folio.patch
mm-shmem-use-xas_try_split-in-shmem_split_large_entry.patch
The quilt patch titled
Subject: selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculation
has been removed from the -mm tree. Its filename was
selftests-mm-run_vmtestssh-fix-half_ufd_size_mb-calculation.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Rafael Aquini <raquini(a)redhat.com>
Subject: selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculation
Date: Tue, 18 Feb 2025 14:22:51 -0500
We noticed that uffd-stress test was always failing to run when invoked
for the hugetlb profiles on x86_64 systems with a processor count of 64 or
bigger:
...
# ------------------------------------
# running ./uffd-stress hugetlb 128 32
# ------------------------------------
# ERROR: invalid MiB (errno=9, @uffd-stress.c:459)
...
# [FAIL]
not ok 3 uffd-stress hugetlb 128 32 # exit=1
...
The problem boils down to how run_vmtests.sh (mis)calculates the size of
the region it feeds to uffd-stress. The latter expects to see an amount
of MiB while the former is just giving out the number of free hugepages
halved down. This measurement discrepancy ends up violating uffd-stress'
assertion on number of hugetlb pages allocated per CPU, causing it to bail
out with the error above.
This commit fixes that issue by adjusting run_vmtests.sh's
half_ufd_size_MB calculation so it properly renders the region size in
MiB, as expected, while maintaining all of its original constraints in
place.
Link: https://lkml.kernel.org/r/20250218192251.53243-1-aquini@redhat.com
Fixes: 2e47a445d7b3 ("selftests/mm: run_vmtests.sh: fix hugetlb mem size calculation")
Signed-off-by: Rafael Aquini <raquini(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/run_vmtests.sh | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/run_vmtests.sh~selftests-mm-run_vmtestssh-fix-half_ufd_size_mb-calculation
+++ a/tools/testing/selftests/mm/run_vmtests.sh
@@ -304,7 +304,9 @@ uffd_stress_bin=./uffd-stress
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} anon 20 16
# Hugetlb tests require source and destination huge pages. Pass in half
# the size of the free pages we have, which is used for *each*.
-half_ufd_size_MB=$((freepgs / 2))
+# uffd-stress expects a region expressed in MiB, so we adjust
+# half_ufd_size_MB accordingly.
+half_ufd_size_MB=$(((freepgs * hpgsize_KB) / 1024 / 2))
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb "$half_ufd_size_MB" 32
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb-private "$half_ufd_size_MB" 32
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} shmem 20 16
_
Patches currently in -mm which might be from raquini(a)redhat.com are
The quilt patch titled
Subject: mm: fix error handling in __filemap_get_folio() with FGP_NOWAIT
has been removed from the -mm tree. Its filename was
mm-fix-error-handling-in-__filemap_get_folio-with-fgp_nowait.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Raphael S. Carvalho" <raphaelsc(a)scylladb.com>
Subject: mm: fix error handling in __filemap_get_folio() with FGP_NOWAIT
Date: Mon, 24 Feb 2025 11:37:00 -0300
original report:
https://lore.kernel.org/all/CAKhLTr1UL3ePTpYjXOx2AJfNk8Ku2EdcEfu+CH1sf3Asr=…
When doing buffered writes with FGP_NOWAIT, under memory pressure, the
system returned ENOMEM despite there being plenty of available memory, to
be reclaimed from page cache. The user space used io_uring interface,
which in turn submits I/O with FGP_NOWAIT (the fast path).
retsnoop pointed to iomap_get_folio:
00:34:16.180612 -> 00:34:16.180651 TID/PID 253786/253721
(reactor-1/combined_tests):
entry_SYSCALL_64_after_hwframe+0x76
do_syscall_64+0x82
__do_sys_io_uring_enter+0x265
io_submit_sqes+0x209
io_issue_sqe+0x5b
io_write+0xdd
xfs_file_buffered_write+0x84
iomap_file_buffered_write+0x1a6
32us [-ENOMEM] iomap_write_begin+0x408
iter=&{.inode=0xffff8c67aa031138,.len=4096,.flags=33,.iomap={.addr=0xffffffffffffffff,.length=4096,.type=1,.flags=3,.bdev=0x���
pos=0 len=4096 foliop=0xffffb32c296b7b80
! 4us [-ENOMEM] iomap_get_folio
iter=&{.inode=0xffff8c67aa031138,.len=4096,.flags=33,.iomap={.addr=0xffffffffffffffff,.length=4096,.type=1,.flags=3,.bdev=0x���
pos=0 len=4096
This is likely a regression caused by 66dabbb65d67 ("mm: return an ERR_PTR
from __filemap_get_folio"), which moved error handling from
io_map_get_folio() to __filemap_get_folio(), but broke FGP_NOWAIT
handling, so ENOMEM is being escaped to user space. Had it correctly
returned -EAGAIN with NOWAIT, either io_uring or user space itself would
be able to retry the request.
It's not enough to patch io_uring since the iomap interface is the one
responsible for it, and pwritev2(RWF_NOWAIT) and AIO interfaces must
return the proper error too.
The patch was tested with scylladb test suite (its original reproducer),
and the tests all pass now when memory is pressured.
Link: https://lkml.kernel.org/r/20250224143700.23035-1-raphaelsc@scylladb.com
Fixes: 66dabbb65d67 ("mm: return an ERR_PTR from __filemap_get_folio")
Signed-off-by: Raphael S. Carvalho <raphaelsc(a)scylladb.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Dave Chinner <dchinner(a)redhat.com>
Cc: "Darrick J. Wong" <djwong(a)kernel.org>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
--- a/mm/filemap.c~mm-fix-error-handling-in-__filemap_get_folio-with-fgp_nowait
+++ a/mm/filemap.c
@@ -1986,8 +1986,19 @@ no_page:
if (err == -EEXIST)
goto repeat;
- if (err)
+ if (err) {
+ /*
+ * When NOWAIT I/O fails to allocate folios this could
+ * be due to a nonblocking memory allocation and not
+ * because the system actually is out of memory.
+ * Return -EAGAIN so that there caller retries in a
+ * blocking fashion instead of propagating -ENOMEM
+ * to the application.
+ */
+ if ((fgp_flags & FGP_NOWAIT) && err == -ENOMEM)
+ err = -EAGAIN;
return ERR_PTR(err);
+ }
/*
* filemap_add_folio locks the page, and for mmap
* we expect an unlocked page.
_
Patches currently in -mm which might be from raphaelsc(a)scylladb.com are
The quilt patch titled
Subject: mm/migrate: fix shmem xarray update during migration
has been removed from the -mm tree. Its filename was
mm-migrate-fix-shmem-xarray-update-during-migration.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/migrate: fix shmem xarray update during migration
Date: Wed, 5 Mar 2025 15:04:03 -0500
A shmem folio can be either in page cache or in swap cache, but not at the
same time. Namely, once it is in swap cache, folio->mapping should be
NULL, and the folio is no longer in a shmem mapping.
In __folio_migrate_mapping(), to determine the number of xarray entries to
update, folio_test_swapbacked() is used, but that conflates shmem in page
cache case and shmem in swap cache case. It leads to xarray multi-index
entry corruption, since it turns a sibling entry to a normal entry during
xas_store() (see [1] for a userspace reproduction). Fix it by only using
folio_test_swapcache() to determine whether xarray is storing swap cache
entries or not to choose the right number of xarray entries to update.
[1] https://lore.kernel.org/linux-mm/Z8idPCkaJW1IChjT@casper.infradead.org/
Note:
In __split_huge_page(), folio_test_anon() && folio_test_swapcache() is
used to get swap_cache address space, but that ignores the shmem folio in
swap cache case. It could lead to NULL pointer dereferencing when a
in-swap-cache shmem folio is split at __xa_store(), since
!folio_test_anon() is true and folio->mapping is NULL. But fortunately,
its caller split_huge_page_to_list_to_order() bails out early with EBUSY
when folio->mapping is NULL. So no need to take care of it here.
Link: https://lkml.kernel.org/r/20250305200403.2822855-1-ziy@nvidia.com
Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: Liu Shixin <liushixin2(a)huawei.com>
Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
Suggested-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Charan Teja Kalla <quic_charante(a)quicinc.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Lance Yang <ioworker0(a)gmail.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
--- a/mm/migrate.c~mm-migrate-fix-shmem-xarray-update-during-migration
+++ a/mm/migrate.c
@@ -518,15 +518,13 @@ static int __folio_migrate_mapping(struc
if (folio_test_anon(folio) && folio_test_large(folio))
mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1);
folio_ref_add(newfolio, nr); /* add cache reference */
- if (folio_test_swapbacked(folio)) {
+ if (folio_test_swapbacked(folio))
__folio_set_swapbacked(newfolio);
- if (folio_test_swapcache(folio)) {
- folio_set_swapcache(newfolio);
- newfolio->private = folio_get_private(folio);
- }
+ if (folio_test_swapcache(folio)) {
+ folio_set_swapcache(newfolio);
+ newfolio->private = folio_get_private(folio);
entries = nr;
} else {
- VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
entries = 1;
}
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
selftests-mm-make-file-backed-thp-split-work-by-writing-pmd-size-data.patch
mm-huge_memory-allow-split-shmem-large-folio-to-any-lower-order.patch
selftests-mm-test-splitting-file-backed-thp-to-any-lower-order.patch
xarray-add-xas_try_split-to-split-a-multi-index-entry.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split-fix.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split-fix-2.patch
mm-huge_memory-move-folio-split-common-code-to-__folio_split.patch
mm-huge_memory-add-buddy-allocator-like-non-uniform-folio_split.patch
mm-huge_memory-remove-the-old-unused-__split_huge_page.patch
mm-huge_memory-add-folio_split-to-debugfs-testing-interface.patch
mm-truncate-use-folio_split-in-truncate-operation.patch
selftests-mm-add-tests-for-folio_split-buddy-allocator-like-split.patch
mm-filemap-use-xas_try_split-in-__filemap_add_folio.patch
mm-shmem-use-xas_try_split-in-shmem_split_large_entry.patch
fpga_mgr_test_img_load_sgt() allocates memory for sgt using
kunit_kzalloc() however it does not check if the allocation failed.
It then passes sgt to sg_alloc_table(), which passes it to
__sg_alloc_table(). This function calls memset() on sgt in an attempt to
zero it out. If the allocation fails then sgt will be NULL and the
memset will trigger a NULL pointer dereference.
Fix this by checking the allocation with KUNIT_ASSERT_NOT_ERR_OR_NULL().
Fixes: ccbc1c302115 ("fpga: add an initial KUnit suite for the FPGA Manager")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com>
---
drivers/fpga/tests/fpga-mgr-test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/fpga/tests/fpga-mgr-test.c b/drivers/fpga/tests/fpga-mgr-test.c
index 9cb37aefbac4..1902ebf5a298 100644
--- a/drivers/fpga/tests/fpga-mgr-test.c
+++ b/drivers/fpga/tests/fpga-mgr-test.c
@@ -263,6 +263,7 @@ static void fpga_mgr_test_img_load_sgt(struct kunit *test)
img_buf = init_test_buffer(test, IMAGE_SIZE);
sgt = kunit_kzalloc(test, sizeof(*sgt), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sgt);
ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
KUNIT_ASSERT_EQ(test, ret, 0);
sg_init_one(sgt->sgl, img_buf, IMAGE_SIZE);
--
2.39.5
Commit 60e3318e3e900 ("cifs: use fs_context for automounts") was
released in v6.1.54 and broke the failover when one of the servers
inside DFS becomes unavailable. We reproduced the problem on the EC2
instances of different types. Reverting aforementioned commint on top of
the latest stable verison v6.1.94 helps to resolve the problem.
Earliest working version is v6.2-rc1. There were two big merges of CIFS fixes:
[1] and [2]. We would like to ask for the help to investigate this problem and
if some of those patches need to be backported. Also, is it safe to just revert
problematic commit until proper fixes/backports will be available?
We will help to do testing and confirm if fix works, but let me also list the
steps we used to reproduce the problem if it will help to identify the problem:
1. Create Active Directory domain eg. 'corp.fsxtest.local' in AWS Directory
Service with:
- three AWS FSX file systems filesystem1..filesystem3
- three Windows servers; They have DFS installed as per
https://learn.microsoft.com/en-us/windows-server/storage/dfs-namespaces/dfs…:
- dfs-srv1: EC2AMAZ-2EGTM59
- dfs-srv2: EC2AMAZ-1N36PRD
- dfs-srv3: EC2AMAZ-0PAUH2U
2. Create DFS namespace eg. 'dfs-namespace' in Windows server 2008 mode
and three folders targets in it:
- referral-a mapped to filesystem1.corp.local
- referral-b mapped to filesystem2.corp.local
- referral-c mapped to filesystem3.corp.local
- local folders dfs-srv1..dfs-srv3 in C:\DFSRoots\dfs-namespace of every
Windows server. This helps to quickly define underlying server when
DFS is mounted.
3. Enabled cifs debug logs:
```
echo 'module cifs +p' > /sys/kernel/debug/dynamic_debug/control
echo 'file fs/cifs/* +p' > /sys/kernel/debug/dynamic_debug/control
echo 7 > /proc/fs/cifs/cifsFYI
```
4. Mount DFS namespace on Amazon Linux 2023 instance running any vanilla
kernel v6.1.54+:
```
dmesg -c &>/dev/null
cd /mnt
mount -t cifs -o cred=/mnt/creds,echo_interval=5 \
//corp.fsxtest.local/dfs-namespace \
./dfs-namespace
```
5. List DFS root, it's also required to avoid recursive mounts that happen
during regular 'ls' run:
```
sh -c 'ls dfs-namespace'
dfs-srv2 referral-a referral-b
```
The DFS server is EC2AMAZ-1N36PRD, it's also listed in mount:
```
[root@ip-172-31-2-82 mnt]# mount | grep dfs
//corp.fsxtest.local/dfs-namespace on /mnt/dfs-namespace type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.11.26,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
//EC2AMAZ-1N36PRD.corp.fsxtest.local/dfs-namespace/referral-a on /mnt/dfs-namespace/referral-a type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.12.80,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
```
List files in first folder:
```
sh -c 'ls dfs-namespace/referral-a'
filea.txt.txt
```
6. Shutdown DFS server-2.
List DFS root again, server changed from dfs-srv2 to dfs-srv1 EC2AMAZ-2EGTM59:
```
sh -c 'ls dfs-namespace'
dfs-srv1 referral-a referral-b
```
7. Try to list files in another folder, this causes ls to fail with error:
```
sh -c 'ls dfs-namespace/referral-b'
ls: cannot access 'dfs-namespace/referral-b': No route to host```
Sometimes it's also 'Operation now in progress' error.
mount shows the same output:
```
//corp.fsxtest.local/dfs-namespace on /mnt/dfs-namespace type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.11.26,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
//EC2AMAZ-1N36PRD.corp.fsxtest.local/dfs-namespace/referral-a on /mnt/dfs-namespace/referral-a type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.12.80,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
```
I also attached kernel debug logs from this test.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Reported-by: Andrei Paniakin <apanyaki(a)amazon.com>
Bisected-by: Simba Bonga <simbarb(a)amazon.com>
---
#regzbot introduced: v6.1.54..v6.2-rc1
Currently, RCU boost testing in rcutorture is broken because it relies on
having RT throttling disabled. This means the test will always pass (or
rarely fail). This occurs because recently, RT throttling was replaced
by DL server which boosts CFS tasks even when rcutorture tried to
disable throttling (see rcu_torture_disable_rt_throttle()). However, the
systctl_sched_rt_runtime variable is not considered thus still allowing
RT tasks to be preempted by CFS tasks.
Therefore this patch prevents DL server from starting when RCU torture
sets the sysctl_sched_rt_runtime to -1.
With this patch, boosting in TREE09 fails reliably if RCU_BOOST=n.
Steven also mentioned that this could fix RT usecases where users do not
want DL server to be interfering.
Cc: stable(a)vger.kernel.org
Cc: Paul E. McKenney <paulmck(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Fixes: cea5a3472ac4 ("sched/fair: Cleanup fair_server")
Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com>
---
v1->v2:
Updated Fixes tag (Steven)
Moved the stoppage of DL server to fair (Juri)
kernel/sched/fair.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1c0ef435a7aa..d7ba333393f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1242,7 +1242,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
* against fair_server such that it can account for this time
* and possibly avoid running this period.
*/
- if (dl_server_active(&rq->fair_server))
+ if (dl_server_active(&rq->fair_server) && rt_bandwidth_enabled())
dl_server_update(&rq->fair_server, delta_exec);
}
@@ -5957,7 +5957,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
sub_nr_running(rq, queued_delta);
/* Stop the fair server if throttling resulted in no runnable tasks */
- if (rq_h_nr_queued && !rq->cfs.h_nr_queued)
+ if (rq_h_nr_queued && !rq->cfs.h_nr_queued && dl_server_active(&rq->fair_server))
dl_server_stop(&rq->fair_server);
done:
/*
@@ -6056,7 +6056,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
}
/* Start the fair server if un-throttling resulted in new runnable tasks */
- if (!rq_h_nr_queued && rq->cfs.h_nr_queued)
+ if (!rq_h_nr_queued && rq->cfs.h_nr_queued && rt_bandwidth_enabled())
dl_server_start(&rq->fair_server);
/* At this point se is NULL and we are at root level*/
@@ -7005,9 +7005,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (!rq_h_nr_queued && rq->cfs.h_nr_queued) {
/* Account for idle runtime */
- if (!rq->nr_running)
+ if (!rq->nr_running && rt_bandwidth_enabled())
dl_server_update_idle_time(rq, rq->curr);
- dl_server_start(&rq->fair_server);
+
+ if (rt_bandwidth_enabled())
+ dl_server_start(&rq->fair_server);
}
/* At this point se is NULL and we are at root level*/
@@ -7134,7 +7136,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
sub_nr_running(rq, h_nr_queued);
- if (rq_h_nr_queued && !rq->cfs.h_nr_queued)
+ if (rq_h_nr_queued && !rq->cfs.h_nr_queued && dl_server_active(&rq->fair_server))
dl_server_stop(&rq->fair_server);
/* balance early to pull high priority tasks */
--
2.43.0
Hi,
As an exhibiting company at The Intec International Trade Fair for Machine Tools, Manufacturing and Automation (INTEC 2025) we have access to the fully updated attendee list, including last-minute registrants who attended the show.
This exclusive list can help you:
✅ Connect with high-intent leads
✅ Follow up with attendees who visited (or missed) your booth
✅ Maximize your ROI from the event
If you're Interested, I’d love to share more details—and I can also provide pricing information.
Looking forward to your thoughts.
Regards,
Jennifer Martin
Sr. Marketing Manager
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x d4234d131b0a3f9e65973f1cdc71bb3560f5d14b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031635-spider-hurricane-4475@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4234d131b0a3f9e65973f1cdc71bb3560f5d14b Mon Sep 17 00:00:00 2001
From: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Date: Tue, 4 Mar 2025 15:27:00 +0800
Subject: [PATCH] arm64: mm: Populate vmemmap at the page level if not section
aligned
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
The first problem is that if start or end is not aligned to a section
boundary, such as when a subsection is hot added, populating the entire
section is wasteful.
The next problem is if we hotplug something that spans part of 128 MiB
section (subsections, let's call it memblock1), and then hotplug something
that spans another part of a 128 MiB section(subsections, let's call it
memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
the entire PMD entry which also supports memblock2 even though memblock2
is still active.
Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
fix similar to x86-64: populate to pages levels if start/end is not aligned
with section boundary.
Cc: stable(a)vger.kernel.org # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Link: https://lore.kernel.org/r/20250304072700.3405036-1-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8..1dfe1a8efdbe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,8 +1177,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+ /* [start, end] should be within one section */
+ WARN_ON_ONCE(end - start > PAGES_PER_SECTION * sizeof(struct page));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ (end - start < PAGES_PER_SECTION * sizeof(struct page)))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d4234d131b0a3f9e65973f1cdc71bb3560f5d14b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031628-percolate-sulphate-3806@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4234d131b0a3f9e65973f1cdc71bb3560f5d14b Mon Sep 17 00:00:00 2001
From: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Date: Tue, 4 Mar 2025 15:27:00 +0800
Subject: [PATCH] arm64: mm: Populate vmemmap at the page level if not section
aligned
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
The first problem is that if start or end is not aligned to a section
boundary, such as when a subsection is hot added, populating the entire
section is wasteful.
The next problem is if we hotplug something that spans part of 128 MiB
section (subsections, let's call it memblock1), and then hotplug something
that spans another part of a 128 MiB section(subsections, let's call it
memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
the entire PMD entry which also supports memblock2 even though memblock2
is still active.
Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
fix similar to x86-64: populate to pages levels if start/end is not aligned
with section boundary.
Cc: stable(a)vger.kernel.org # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Link: https://lore.kernel.org/r/20250304072700.3405036-1-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8..1dfe1a8efdbe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,8 +1177,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+ /* [start, end] should be within one section */
+ WARN_ON_ONCE(end - start > PAGES_PER_SECTION * sizeof(struct page));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ (end - start < PAGES_PER_SECTION * sizeof(struct page)))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d4234d131b0a3f9e65973f1cdc71bb3560f5d14b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031627-capture-pouring-5da4@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4234d131b0a3f9e65973f1cdc71bb3560f5d14b Mon Sep 17 00:00:00 2001
From: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Date: Tue, 4 Mar 2025 15:27:00 +0800
Subject: [PATCH] arm64: mm: Populate vmemmap at the page level if not section
aligned
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
The first problem is that if start or end is not aligned to a section
boundary, such as when a subsection is hot added, populating the entire
section is wasteful.
The next problem is if we hotplug something that spans part of 128 MiB
section (subsections, let's call it memblock1), and then hotplug something
that spans another part of a 128 MiB section(subsections, let's call it
memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
the entire PMD entry which also supports memblock2 even though memblock2
is still active.
Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
fix similar to x86-64: populate to pages levels if start/end is not aligned
with section boundary.
Cc: stable(a)vger.kernel.org # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Link: https://lore.kernel.org/r/20250304072700.3405036-1-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8..1dfe1a8efdbe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,8 +1177,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+ /* [start, end] should be within one section */
+ WARN_ON_ONCE(end - start > PAGES_PER_SECTION * sizeof(struct page));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ (end - start < PAGES_PER_SECTION * sizeof(struct page)))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d4234d131b0a3f9e65973f1cdc71bb3560f5d14b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031627-theorize-manned-7263@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4234d131b0a3f9e65973f1cdc71bb3560f5d14b Mon Sep 17 00:00:00 2001
From: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Date: Tue, 4 Mar 2025 15:27:00 +0800
Subject: [PATCH] arm64: mm: Populate vmemmap at the page level if not section
aligned
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
The first problem is that if start or end is not aligned to a section
boundary, such as when a subsection is hot added, populating the entire
section is wasteful.
The next problem is if we hotplug something that spans part of 128 MiB
section (subsections, let's call it memblock1), and then hotplug something
that spans another part of a 128 MiB section(subsections, let's call it
memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
the entire PMD entry which also supports memblock2 even though memblock2
is still active.
Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
fix similar to x86-64: populate to pages levels if start/end is not aligned
with section boundary.
Cc: stable(a)vger.kernel.org # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Link: https://lore.kernel.org/r/20250304072700.3405036-1-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8..1dfe1a8efdbe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,8 +1177,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+ /* [start, end] should be within one section */
+ WARN_ON_ONCE(end - start > PAGES_PER_SECTION * sizeof(struct page));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ (end - start < PAGES_PER_SECTION * sizeof(struct page)))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 91cf42c63f2d8a9c1bcdfe923218e079b32e1a69
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031631-sensuous-cataract-f3f3@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91cf42c63f2d8a9c1bcdfe923218e079b32e1a69 Mon Sep 17 00:00:00 2001
From: Conor Dooley <conor.dooley(a)microchip.com>
Date: Mon, 3 Mar 2025 10:47:40 +0000
Subject: [PATCH] spi: microchip-core: prevent RX overflows when transmit size
> FIFO size
When the size of a transfer exceeds the size of the FIFO (32 bytes), RX
overflows will be generated and receive data will be corrupted and
warnings will be produced. For example, here's an error generated by a
transfer of 36 bytes:
spi_master spi0: mchp_corespi_interrupt: RX OVERFLOW: rxlen: 4, txlen: 0
The driver is currently split between handling receiving in the
interrupt handler, and sending outside of it. Move all handling out of
the interrupt handling, and explicitly link the number of bytes read of
of the RX FIFO to the number written into the TX one. This both resolves
the overflow problems as well as simplifying the flow of the driver.
CC: stable(a)vger.kernel.org
Fixes: 9ac8d17694b6 ("spi: add support for microchip fpga spi controllers")
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
Link: https://patch.msgid.link/20250303-veal-snooper-712c1dfad336@wendy
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/spi-microchip-core.c b/drivers/spi/spi-microchip-core.c
index 5b6af55855ef..62ba0bd9cbb7 100644
--- a/drivers/spi/spi-microchip-core.c
+++ b/drivers/spi/spi-microchip-core.c
@@ -70,8 +70,7 @@
#define INT_RX_CHANNEL_OVERFLOW BIT(2)
#define INT_TX_CHANNEL_UNDERRUN BIT(3)
-#define INT_ENABLE_MASK (CONTROL_RX_DATA_INT | CONTROL_TX_DATA_INT | \
- CONTROL_RX_OVER_INT | CONTROL_TX_UNDER_INT)
+#define INT_ENABLE_MASK (CONTROL_RX_OVER_INT | CONTROL_TX_UNDER_INT)
#define REG_CONTROL (0x00)
#define REG_FRAME_SIZE (0x04)
@@ -133,10 +132,15 @@ static inline void mchp_corespi_disable(struct mchp_corespi *spi)
mchp_corespi_write(spi, REG_CONTROL, control);
}
-static inline void mchp_corespi_read_fifo(struct mchp_corespi *spi)
+static inline void mchp_corespi_read_fifo(struct mchp_corespi *spi, int fifo_max)
{
- while (spi->rx_len >= spi->n_bytes && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_EMPTY)) {
- u32 data = mchp_corespi_read(spi, REG_RX_DATA);
+ for (int i = 0; i < fifo_max; i++) {
+ u32 data;
+
+ while (mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_EMPTY)
+ ;
+
+ data = mchp_corespi_read(spi, REG_RX_DATA);
spi->rx_len -= spi->n_bytes;
@@ -211,11 +215,10 @@ static inline void mchp_corespi_set_xfer_size(struct mchp_corespi *spi, int len)
mchp_corespi_write(spi, REG_FRAMESUP, len);
}
-static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi)
+static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi, int fifo_max)
{
- int fifo_max, i = 0;
+ int i = 0;
- fifo_max = DIV_ROUND_UP(min(spi->tx_len, FIFO_DEPTH), spi->n_bytes);
mchp_corespi_set_xfer_size(spi, fifo_max);
while ((i < fifo_max) && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_TXFIFO_FULL)) {
@@ -413,19 +416,6 @@ static irqreturn_t mchp_corespi_interrupt(int irq, void *dev_id)
if (intfield == 0)
return IRQ_NONE;
- if (intfield & INT_TXDONE)
- mchp_corespi_write(spi, REG_INT_CLEAR, INT_TXDONE);
-
- if (intfield & INT_RXRDY) {
- mchp_corespi_write(spi, REG_INT_CLEAR, INT_RXRDY);
-
- if (spi->rx_len)
- mchp_corespi_read_fifo(spi);
- }
-
- if (!spi->rx_len && !spi->tx_len)
- finalise = true;
-
if (intfield & INT_RX_CHANNEL_OVERFLOW) {
mchp_corespi_write(spi, REG_INT_CLEAR, INT_RX_CHANNEL_OVERFLOW);
finalise = true;
@@ -512,9 +502,14 @@ static int mchp_corespi_transfer_one(struct spi_controller *host,
mchp_corespi_write(spi, REG_SLAVE_SELECT, spi->pending_slave_select);
- while (spi->tx_len)
- mchp_corespi_write_fifo(spi);
+ while (spi->tx_len) {
+ int fifo_max = DIV_ROUND_UP(min(spi->tx_len, FIFO_DEPTH), spi->n_bytes);
+ mchp_corespi_write_fifo(spi, fifo_max);
+ mchp_corespi_read_fifo(spi, fifo_max);
+ }
+
+ spi_finalize_current_transfer(host);
return 1;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 91cf42c63f2d8a9c1bcdfe923218e079b32e1a69
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031631-outskirts-catfight-a453@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91cf42c63f2d8a9c1bcdfe923218e079b32e1a69 Mon Sep 17 00:00:00 2001
From: Conor Dooley <conor.dooley(a)microchip.com>
Date: Mon, 3 Mar 2025 10:47:40 +0000
Subject: [PATCH] spi: microchip-core: prevent RX overflows when transmit size
> FIFO size
When the size of a transfer exceeds the size of the FIFO (32 bytes), RX
overflows will be generated and receive data will be corrupted and
warnings will be produced. For example, here's an error generated by a
transfer of 36 bytes:
spi_master spi0: mchp_corespi_interrupt: RX OVERFLOW: rxlen: 4, txlen: 0
The driver is currently split between handling receiving in the
interrupt handler, and sending outside of it. Move all handling out of
the interrupt handling, and explicitly link the number of bytes read of
of the RX FIFO to the number written into the TX one. This both resolves
the overflow problems as well as simplifying the flow of the driver.
CC: stable(a)vger.kernel.org
Fixes: 9ac8d17694b6 ("spi: add support for microchip fpga spi controllers")
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
Link: https://patch.msgid.link/20250303-veal-snooper-712c1dfad336@wendy
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/spi-microchip-core.c b/drivers/spi/spi-microchip-core.c
index 5b6af55855ef..62ba0bd9cbb7 100644
--- a/drivers/spi/spi-microchip-core.c
+++ b/drivers/spi/spi-microchip-core.c
@@ -70,8 +70,7 @@
#define INT_RX_CHANNEL_OVERFLOW BIT(2)
#define INT_TX_CHANNEL_UNDERRUN BIT(3)
-#define INT_ENABLE_MASK (CONTROL_RX_DATA_INT | CONTROL_TX_DATA_INT | \
- CONTROL_RX_OVER_INT | CONTROL_TX_UNDER_INT)
+#define INT_ENABLE_MASK (CONTROL_RX_OVER_INT | CONTROL_TX_UNDER_INT)
#define REG_CONTROL (0x00)
#define REG_FRAME_SIZE (0x04)
@@ -133,10 +132,15 @@ static inline void mchp_corespi_disable(struct mchp_corespi *spi)
mchp_corespi_write(spi, REG_CONTROL, control);
}
-static inline void mchp_corespi_read_fifo(struct mchp_corespi *spi)
+static inline void mchp_corespi_read_fifo(struct mchp_corespi *spi, int fifo_max)
{
- while (spi->rx_len >= spi->n_bytes && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_EMPTY)) {
- u32 data = mchp_corespi_read(spi, REG_RX_DATA);
+ for (int i = 0; i < fifo_max; i++) {
+ u32 data;
+
+ while (mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_EMPTY)
+ ;
+
+ data = mchp_corespi_read(spi, REG_RX_DATA);
spi->rx_len -= spi->n_bytes;
@@ -211,11 +215,10 @@ static inline void mchp_corespi_set_xfer_size(struct mchp_corespi *spi, int len)
mchp_corespi_write(spi, REG_FRAMESUP, len);
}
-static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi)
+static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi, int fifo_max)
{
- int fifo_max, i = 0;
+ int i = 0;
- fifo_max = DIV_ROUND_UP(min(spi->tx_len, FIFO_DEPTH), spi->n_bytes);
mchp_corespi_set_xfer_size(spi, fifo_max);
while ((i < fifo_max) && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_TXFIFO_FULL)) {
@@ -413,19 +416,6 @@ static irqreturn_t mchp_corespi_interrupt(int irq, void *dev_id)
if (intfield == 0)
return IRQ_NONE;
- if (intfield & INT_TXDONE)
- mchp_corespi_write(spi, REG_INT_CLEAR, INT_TXDONE);
-
- if (intfield & INT_RXRDY) {
- mchp_corespi_write(spi, REG_INT_CLEAR, INT_RXRDY);
-
- if (spi->rx_len)
- mchp_corespi_read_fifo(spi);
- }
-
- if (!spi->rx_len && !spi->tx_len)
- finalise = true;
-
if (intfield & INT_RX_CHANNEL_OVERFLOW) {
mchp_corespi_write(spi, REG_INT_CLEAR, INT_RX_CHANNEL_OVERFLOW);
finalise = true;
@@ -512,9 +502,14 @@ static int mchp_corespi_transfer_one(struct spi_controller *host,
mchp_corespi_write(spi, REG_SLAVE_SELECT, spi->pending_slave_select);
- while (spi->tx_len)
- mchp_corespi_write_fifo(spi);
+ while (spi->tx_len) {
+ int fifo_max = DIV_ROUND_UP(min(spi->tx_len, FIFO_DEPTH), spi->n_bytes);
+ mchp_corespi_write_fifo(spi, fifo_max);
+ mchp_corespi_read_fifo(spi, fifo_max);
+ }
+
+ spi_finalize_current_transfer(host);
return 1;
}
On Sat, Mar 15, 2025 at 07:57:55PM +0100, proxy0(a)tutamail.com wrote:
> Mar 15, 2025, 9:28 PM by lukas(a)wunner.de:
> > After dwelling on this for a while, I'm thinking that it may re-introduce
> > the issue fixed by commit f5eff5591b8f ("PCI: pciehp: Fix AB-BA deadlock
> > between reset_lock and device_lock"):
> >
> > Looking at the second and third stack trace in its commit message,
> > down_write(reset_lock) in pciehp_reset_slot() is basically equivalent
> > to synchronize_irq() and we're holding device_lock() at that point,
> > hindering progress of pciehp_ist().
> >
> > So I think I have guided you in the wrong direction and I apologize
> > for that.
> >
> > However it seems to me that this should be solvable with the small
> > patch below. Am I missing something?
> >
> > @Joel Mathew Thomas, could you give the below patch a spin and see
> > if it helps?
>
> I've tested the patch series along with the additional patch provided.
>
> Kernel: 6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix
>
> Patches applied:
> - [PATCH 1/4] PCI/hotplug: Disable HPIE over reset
> - [PATCH 2/4] PCI/hotplug: Clearing HPIE for the duration of reset is enough
> - [PATCH 3/4] PCI/hotplug: reset_lock is not required synchronizing with irq thread
> - [PATCH 4/4] PCI/hotplug: Don't enable HPIE in poll mode
> - The latest patch from you:
> + /* Ignore events masked by pciehp_reset_slot(). */
> + events &= ctrl->slot_ctrl;
> + if (!events)
> + return IRQ_HANDLED;
Could you test *only* the quoted diff, i.e. without patches [1/4] - [4/4],
on top of a recent kernel?
Sorry for not having been clear about this.
I believe that patch [1/4] will re-introduce a deadlock we've
already fixed two years ago, so the small diff above seeks to
replace it with a simpler approach that will hopefully avoid
the issue as well.
Thanks,
Lukas
Hi,
I prepared this series about 6 months ago, but never got around to
sending it in. In mainline, we got rid of remap_pfn_range() on the
io_uring side, and this just backports it to 6.6-stable as well.
This eliminates issues with fragmented memory, and hence it'd
be nice to have it in 6.6 stable as well. 6.6-stable also has
mmap support for provided ring buffers, and since we've had an
issue related to remap_pfn_range() there, let's bring it to 6.6
stable as well as a precaution.
--
Jens Axboe
From: Andrew Davis <afd(a)ti.com>
[ Upstream commit 5ab90f40121a9f6a9b368274cd92d0f435dc7cfa ]
The syscon helper device_node_to_regmap() is used to fetch a regmap
registered to a device node. It also currently creates this regmap
if the node did not already have a regmap associated with it. This
should only be used on "syscon" nodes. This driver is not such a
device and instead uses device_node_to_regmap() on its own node as
a hacky way to create a regmap for itself.
This will not work going forward and so we should create our regmap
the normal way by defining our regmap_config, fetching our memory
resource, then using the normal regmap_init_mmio() function.
Signed-off-by: Andrew Davis <afd(a)ti.com>
Tested-by: Nishanth Menon <nm(a)ti.com>
Link: https://lore.kernel.org/r/20250123182234.597665-1-afd@ti.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/phy/ti/phy-gmii-sel.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/phy/ti/phy-gmii-sel.c b/drivers/phy/ti/phy-gmii-sel.c
index e0ca59ae31531..ff5d5e29629fa 100644
--- a/drivers/phy/ti/phy-gmii-sel.c
+++ b/drivers/phy/ti/phy-gmii-sel.c
@@ -424,6 +424,12 @@ static int phy_gmii_sel_init_ports(struct phy_gmii_sel_priv *priv)
return 0;
}
+static const struct regmap_config phy_gmii_sel_regmap_cfg = {
+ .reg_bits = 32,
+ .val_bits = 32,
+ .reg_stride = 4,
+};
+
static int phy_gmii_sel_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
@@ -468,7 +474,14 @@ static int phy_gmii_sel_probe(struct platform_device *pdev)
priv->regmap = syscon_node_to_regmap(node->parent);
if (IS_ERR(priv->regmap)) {
- priv->regmap = device_node_to_regmap(node);
+ void __iomem *base;
+
+ base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(base))
+ return dev_err_probe(dev, PTR_ERR(base),
+ "failed to get base memory resource\n");
+
+ priv->regmap = regmap_init_mmio(dev, base, &phy_gmii_sel_regmap_cfg);
if (IS_ERR(priv->regmap))
return dev_err_probe(dev, PTR_ERR(priv->regmap),
"Failed to get syscon\n");
--
2.39.5
From: Zhang Lixu <lixu.zhang(a)intel.com>
[ Upstream commit 4b54ae69197b9f416baa0fceadff7e89075f8454 ]
The timestamps in the Firmware log and HID sensor samples are incorrect.
They show 1970-01-01 because the current IPC driver only uses the first
8 bytes of bootup time when synchronizing time with the firmware. The
firmware converts the bootup time to UTC time, which results in the
display of 1970-01-01.
In write_ipc_from_queue(), when sending the MNG_SYNC_FW_CLOCK message,
the clock is updated according to the definition of ipc_time_update_msg.
However, in _ish_sync_fw_clock(), the message length is specified as the
size of uint64_t when building the doorbell. As a result, the firmware
only receives the first 8 bytes of struct ipc_time_update_msg.
This patch corrects the length in the doorbell to ensure the entire
ipc_time_update_msg is sent, fixing the timestamp issue.
Signed-off-by: Zhang Lixu <lixu.zhang(a)intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hid/intel-ish-hid/ipc/ipc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/hid/intel-ish-hid/ipc/ipc.c b/drivers/hid/intel-ish-hid/ipc/ipc.c
index dd5fc60874ba1..b1a41c90c5741 100644
--- a/drivers/hid/intel-ish-hid/ipc/ipc.c
+++ b/drivers/hid/intel-ish-hid/ipc/ipc.c
@@ -577,14 +577,14 @@ static void fw_reset_work_fn(struct work_struct *unused)
static void _ish_sync_fw_clock(struct ishtp_device *dev)
{
static unsigned long prev_sync;
- uint64_t usec;
+ struct ipc_time_update_msg time = {};
if (prev_sync && time_before(jiffies, prev_sync + 20 * HZ))
return;
prev_sync = jiffies;
- usec = ktime_to_us(ktime_get_boottime());
- ipc_send_mng_msg(dev, MNG_SYNC_FW_CLOCK, &usec, sizeof(uint64_t));
+ /* The fields of time would be updated while sending message */
+ ipc_send_mng_msg(dev, MNG_SYNC_FW_CLOCK, &time, sizeof(time));
}
/**
--
2.39.5
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Linxuan: perform an equivalent direct check without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
Some previous discussions can be found in the following links:
https://lore.kernel.org/stable/05D0A9F7DE394601+20250311100555.310788-2-che…
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..6249bd47fb0b 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Linxuan: perform an equivalent direct check without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
Some previous discussions can be found in the following links:
https://lore.kernel.org/stable/05D0A9F7DE394601+20250311100555.310788-2-che…
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..6249bd47fb0b 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Linxuan: perform an equivalent direct check without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
Some previous discussions can be found in the following links:
https://lore.kernel.org/stable/05D0A9F7DE394601+20250311100555.310788-2-che…
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..6249bd47fb0b 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
Fix the order of the freq-table-hz property, then convert to OPP tables
and add interconnect support for UFS for the SM6350 SoC.
Signed-off-by: Luca Weiss <luca.weiss(a)fairphone.com>
---
Luca Weiss (3):
arm64: dts: qcom: sm6350: Fix wrong order of freq-table-hz for UFS
arm64: dts: qcom: sm6350: Add OPP table support to UFSHC
arm64: dts: qcom: sm6350: Add interconnect support to UFS
arch/arm64/boot/dts/qcom/sm6350.dtsi | 49 ++++++++++++++++++++++++++++--------
1 file changed, 39 insertions(+), 10 deletions(-)
---
base-commit: eea255893718268e1ab852fb52f70c613d109b99
change-id: 20250314-sm6350-ufs-things-53c5de9fec5e
Best regards,
--
Luca Weiss <luca.weiss(a)fairphone.com>
The retain_ff bit should be updated for a GDSC when it is under SW
control and ON. The current sequence needs to be fixed as the GDSC
needs to update retention and is moved to HW control which does not
guarantee the GDSC to be in enabled state.
During the GDSC FSM state, the GDSC hardware waits for an ACK and the
timeout for the ACK is 2000us as per design requirements.
Signed-off-by: Taniya Das <quic_tdas(a)quicinc.com>
---
Taniya Das (2):
clk: qcom: gdsc: Set retain_ff before moving to HW CTRL
clk: qcom: gdsc: Update the status poll timeout for GDSC
drivers/clk/qcom/gdsc.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
---
base-commit: c674aa7c289e51659e40dda0f954886ef7f80042
change-id: 20250212-gdsc_fixes-77e8b8e27e2f
Best regards,
--
Taniya Das <quic_tdas(a)quicinc.com>
From: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
DSP expects the periods to be aligned to fragment sizes, currently
setting up to hw constriants on periods bytes is not going to work
correctly as we can endup with periods sizes aligned to 32 bytes however
not aligned to fragment size.
Update the constriants to use fragment size, and also set at step of
10ms for period size to accommodate DSP requirements of 10ms latency.
Fixes: 9b4fe0f1cd79 ("ASoC: qdsp6: audioreach: add q6apm-dai support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Tested-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/qcom/qdsp6/q6apm-dai.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/sound/soc/qcom/qdsp6/q6apm-dai.c b/sound/soc/qcom/qdsp6/q6apm-dai.c
index 90cb24947f31..180ff24041bf 100644
--- a/sound/soc/qcom/qdsp6/q6apm-dai.c
+++ b/sound/soc/qcom/qdsp6/q6apm-dai.c
@@ -385,13 +385,14 @@ static int q6apm_dai_open(struct snd_soc_component *component,
}
}
- ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_BYTES, 32);
+ /* setup 10ms latency to accommodate DSP restrictions */
+ ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, 480);
if (ret < 0) {
dev_err(dev, "constraint for period bytes step ret = %d\n", ret);
goto err;
}
- ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_BYTES, 32);
+ ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, 480);
if (ret < 0) {
dev_err(dev, "constraint for buffer bytes step ret = %d\n", ret);
goto err;
--
2.39.5
Hi there,
I wanted to check in to see if you've had a chance to review my previous message.
Your feedback would be greatly appreciated.
Best regards,
Kevin
________________________________
From: Kevin Martin
Sent: 12 March 2025 08:08
To: linux-stable-mirror(a)lists.linaro.org<mailto:linux-stable-mirror@lists.linaro.org>
Subject: Get premium contact lists
Hi there,
Hope you're having a great day!
Would you be interested in a recently verified list of NetApp clients to support your outreach?
Let me know, and I'll be happy to share the details.
Best regards,
Kevin Martin
Demand Consultant
If you wish to stop receiving emails, reply with Abolish.
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: bd50b1f5b6f3 ("arm64: dts: qcom: x1e80100: Add Compute Reference Device")
Cc: stable(a)vger.kernel.org # 6.8
Cc: Abel Vesa <abel.vesa(a)linaro.org>
Cc: Rajendra Nayak <quic_rjendra(a)quicinc.com>
Cc: Sibi Sankar <quic_sibis(a)quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1-crd.dtsi | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1-crd.dtsi b/arch/arm64/boot/dts/qcom/x1-crd.dtsi
index 9e587dc57532..22ebcbe54e24 100644
--- a/arch/arm64/boot/dts/qcom/x1-crd.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1-crd.dtsi
@@ -610,6 +610,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -631,6 +632,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: af16b00578a7 ("arm64: dts: qcom: Add base X1E80100 dtsi and the QCP dts")
Cc: stable(a)vger.kernel.org # 6.8
Cc: Rajendra Nayak <quic_rjendra(a)quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-qcp.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts b/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
index ec594628304a..8f366bf61bbd 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
@@ -437,6 +437,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -458,6 +459,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 45247fe17db2 ("arm64: dts: qcom: x1e80100: add Lenovo Thinkpad Yoga slim 7x devicetree")
Cc: stable(a)vger.kernel.org # 6.11
Cc: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts b/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts
index a3d53f2ba2c3..9d4ba9728355 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts
@@ -290,6 +290,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l14b_3p0: ldo14 {
@@ -304,8 +305,8 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
-
};
regulators-1 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 6f18b8d4142c ("arm64: dts: qcom: x1e80100-hp-x14: dt for HP Omnibook X Laptop 14")
Cc: stable(a)vger.kernel.org # 6.14
Cc: Jens Glathe <jens.glathe(a)oldschoolsolutions.biz>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts b/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts
index cd860a246c45..ab5addb33b7a 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts
@@ -633,6 +633,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -654,6 +655,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Note that these supplies currently have no consumers described in
mainline.
Fixes: f5b788d0e8cd ("arm64: dts: qcom: Add support for X1-based Dell XPS 13 9345")
Cc: stable(a)vger.kernel.org # 6.13
Reviewed-by: Aleksandrs Vinarskis <alex.vinarskis(a)gmail.com>
Tested-by: Aleksandrs Vinarskis <alex.vinarskis(a)gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts b/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
index 86e87f03b0ec..90f588ed7d63 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
@@ -359,6 +359,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -380,6 +381,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l17b_2p5: ldo17 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 7b8a31e82b87 ("arm64: dts: qcom: Add X1E001DE Snapdragon Devkit for Windows")
Cc: stable(a)vger.kernel.org # 6.14
Cc: Sibi Sankar <quic_sibis(a)quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e001de-devkit.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts b/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts
index 5e3970b26e2f..f92bda2d34f2 100644
--- a/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts
+++ b/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts
@@ -507,6 +507,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -528,6 +529,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 7d1cbe2f4985 ("arm64: dts: qcom: Add X1E78100 ThinkPad T14s Gen 6")
Cc: stable(a)vger.kernel.org # 6.12
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi b/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi
index eff0e73640bc..160c052db1ec 100644
--- a/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi
@@ -456,6 +456,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -477,6 +478,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l17b_2p5: ldo17 {
--
2.48.1
From: Haibo Chen <haibo.chen(a)nxp.com>
After a suspend/resume cycle on a down interface, it will come up as
ERROR-ACTIVE.
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state STOPPED (berr-counter tx 0 rx 0) restart-ms 1000
$ sudo systemctl suspend
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 1000
And only set CAN state to CAN_STATE_ERROR_ACTIVE when resume process
has no issue, otherwise keep in CAN_STATE_SLEEPING as suspend did.
Fixes: 4de349e786a3 ("can: flexcan: fix resume function")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haibo Chen <haibo.chen(a)nxp.com>
Link: https://patch.msgid.link/20250314110145.899179-1-haibo.chen@nxp.com
Reported-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
Closes: https://lore.kernel.org/all/20250314-married-polar-elephant-b15594-mkl@peng…
[mkl: add newlines]
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/flexcan/flexcan-core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/flexcan/flexcan-core.c b/drivers/net/can/flexcan/flexcan-core.c
index ac1a860986df..3a71fd235722 100644
--- a/drivers/net/can/flexcan/flexcan-core.c
+++ b/drivers/net/can/flexcan/flexcan-core.c
@@ -2266,8 +2266,9 @@ static int __maybe_unused flexcan_suspend(struct device *device)
}
netif_stop_queue(dev);
netif_device_detach(dev);
+
+ priv->can.state = CAN_STATE_SLEEPING;
}
- priv->can.state = CAN_STATE_SLEEPING;
return 0;
}
@@ -2278,7 +2279,6 @@ static int __maybe_unused flexcan_resume(struct device *device)
struct flexcan_priv *priv = netdev_priv(dev);
int err;
- priv->can.state = CAN_STATE_ERROR_ACTIVE;
if (netif_running(dev)) {
netif_device_attach(dev);
netif_start_queue(dev);
@@ -2298,6 +2298,8 @@ static int __maybe_unused flexcan_resume(struct device *device)
flexcan_chip_interrupts_enable(dev);
}
+
+ priv->can.state = CAN_STATE_ERROR_ACTIVE;
}
return 0;
--
2.47.2
From: Haibo Chen <haibo.chen(a)nxp.com>
After a suspend/resume cycle on a down interface, it will come up as
ERROR-ACTIVE.
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state STOPPED (berr-counter tx 0 rx 0) restart-ms 1000
$ sudo systemctl suspend
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 1000
And only set CAN state to CAN_STATE_ERROR_ACTIVE when resume process
has no issue, otherwise keep in CAN_STATE_SLEEPING as suspend did.
Fixes: 4de349e786a3 ("can: flexcan: fix resume function")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haibo Chen <haibo.chen(a)nxp.com>
---
Changes for v3:
- only handle priv->can.state when netif_running(dev) return true in PM.
---
drivers/net/can/flexcan/flexcan-core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/flexcan/flexcan-core.c b/drivers/net/can/flexcan/flexcan-core.c
index b347a1c93536..d4d342d8f490 100644
--- a/drivers/net/can/flexcan/flexcan-core.c
+++ b/drivers/net/can/flexcan/flexcan-core.c
@@ -2299,8 +2299,8 @@ static int __maybe_unused flexcan_suspend(struct device *device)
}
netif_stop_queue(dev);
netif_device_detach(dev);
+ priv->can.state = CAN_STATE_SLEEPING;
}
- priv->can.state = CAN_STATE_SLEEPING;
return 0;
}
@@ -2311,7 +2311,6 @@ static int __maybe_unused flexcan_resume(struct device *device)
struct flexcan_priv *priv = netdev_priv(dev);
int err;
- priv->can.state = CAN_STATE_ERROR_ACTIVE;
if (netif_running(dev)) {
netif_device_attach(dev);
netif_start_queue(dev);
@@ -2331,6 +2330,7 @@ static int __maybe_unused flexcan_resume(struct device *device)
flexcan_chip_interrupts_enable(dev);
}
+ priv->can.state = CAN_STATE_ERROR_ACTIVE;
}
return 0;
--
2.34.1
This patch series addresses 2 issues
1) Fix typo in pattern properties for R-Car V4M.
2) Fix page entries in the AFL list.
v2->v3:
* Collected tags.
* Dropped unused variables cfg and start from
rcar_canfd_configure_afl_rules().
v1->v2:
* Split fixes patches as separate series.
* Added Rb tag from Geert for binding patch.
* Added the tag Cc:stable@vger.kernel.org
Biju Das (2):
dt-bindings: can: renesas,rcar-canfd: Fix typo in pattern properties
for R-Car V4M
can: rcar_canfd: Fix page entries in the AFL list
.../bindings/net/can/renesas,rcar-canfd.yaml | 2 +-
drivers/net/can/rcar/rcar_canfd.c | 28 ++++++++-----------
2 files changed, 12 insertions(+), 18 deletions(-)
--
2.43.0
Prepare vPMC registers for user-initiated changes after first run. This
is important specifically for debugging Windows on QEMU with GDB; QEMU
tries to write back all visible registers when resuming the VM execution
with GDB, corrupting the PMU state. Windows always uses the PMU so this
can cause adverse effects on that particular OS.
This series also contains patch "KVM: arm64: PMU: Set raw values from
user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}", which reverts semantic
changes made for the mentioned registers in the past. It is necessary
to migrate the PMU state properly on Firecracker, QEMU, and crosvm.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v4:
- Reverted changes for functions implementing ioctls in patch
"KVM: arm64: PMU: Assume PMU presence in pmu-emul.c".
- Removed kvm_pmu_vcpu_reset().
- Reordered function calls in kvm_vcpu_reload_pmu() for better style.
- Link to v3: https://lore.kernel.org/r/20250312-pmc-v3-0-0411cab5dc3d@daynix.com
Changes in v3:
- Added patch "KVM: arm64: PMU: Assume PMU presence in pmu-emul.c".
- Added an explanation of this path series' motivation to each patch.
- Explained why userspace register writes and register reset should be
covered in patch "KVM: arm64: PMU: Reload when user modifies
registers".
- Marked patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}" for stable.
- Reoreded so that patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}" would come first.
- Added patch "KVM: arm64: PMU: Call kvm_pmu_handle_pmcr() after masking
PMCNTENSET_EL0".
- Added patch "KVM: arm64: Reload PMCNTENSET_EL0".
- Link to v2: https://lore.kernel.org/r/20250307-pmc-v2-0-6c3375a5f1e4@daynix.com
Changes in v2:
- Changed to utilize KVM_REQ_RELOAD_PMU as suggested by Oliver Upton.
- Added patch "KVM: arm64: PMU: Reload when user modifies registers"
to cover more registers.
- Added patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}".
- Link to v1: https://lore.kernel.org/r/20250302-pmc-v1-1-caff989093dc@daynix.com
---
Akihiko Odaki (7):
KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
KVM: arm64: PMU: Reload when user modifies registers
KVM: arm64: PMU: Call kvm_pmu_handle_pmcr() after masking PMCNTENSET_EL0
KVM: arm64: PMU: Reload PMCNTENSET_EL0
KVM: arm64: PMU: Reload when resetting
arch/arm64/kvm/arm.c | 8 ++++---
arch/arm64/kvm/pmu-emul.c | 60 ++++++++++++++---------------------------------
arch/arm64/kvm/reset.c | 3 ---
arch/arm64/kvm/sys_regs.c | 53 +++++++++++++++++++++++------------------
include/kvm/arm_pmu.h | 3 +--
5 files changed, 53 insertions(+), 74 deletions(-)
---
base-commit: da2f480cb24d39d480b1e235eda0dd2d01f8765b
change-id: 20250302-pmc-b90a86af945c
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
This is a note to let you know that I've just added the patch titled
iio: adc: ad7768-1: Fix conversion result sign
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
From 8236644f5ecb180e80ad92d691c22bc509b747bb Mon Sep 17 00:00:00 2001
From: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Date: Thu, 6 Mar 2025 18:00:29 -0300
Subject: iio: adc: ad7768-1: Fix conversion result sign
The ad7768-1 ADC output code is two's complement, meaning that the voltage
conversion result is a signed value.. Since the value is a 24 bit one,
stored in a 32 bit variable, the sign should be extended in order to get
the correct representation.
Also the channel description has been updated to signed representation,
to match the ADC specifications.
Fixes: a5f8c7da3dbe ("iio: adc: Add AD7768-1 ADC basic support")
Reviewed-by: David Lechner <dlechner(a)baylibre.com>
Reviewed-by: Marcelo Schmitt <marcelo.schmitt(a)analog.com>
Signed-off-by: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Signed-off-by: Jonathan Santos <Jonathan.Santos(a)analog.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://patch.msgid.link/505994d3b71c2aa38ba714d909a68e021f12124c.174126812…
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad7768-1.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/adc/ad7768-1.c b/drivers/iio/adc/ad7768-1.c
index ea829c51e80b..09e7cccfd51c 100644
--- a/drivers/iio/adc/ad7768-1.c
+++ b/drivers/iio/adc/ad7768-1.c
@@ -142,7 +142,7 @@ static const struct iio_chan_spec ad7768_channels[] = {
.channel = 0,
.scan_index = 0,
.scan_type = {
- .sign = 'u',
+ .sign = 's',
.realbits = 24,
.storagebits = 32,
.shift = 8,
@@ -373,7 +373,7 @@ static int ad7768_read_raw(struct iio_dev *indio_dev,
iio_device_release_direct(indio_dev);
if (ret < 0)
return ret;
- *val = ret;
+ *val = sign_extend32(ret, chan->scan_type.realbits - 1);
return IIO_VAL_INT;
--
2.48.1
This is a note to let you know that I've just added the patch titled
iio: adc: ad7768-1: Fix conversion result sign
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
From 8236644f5ecb180e80ad92d691c22bc509b747bb Mon Sep 17 00:00:00 2001
From: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Date: Thu, 6 Mar 2025 18:00:29 -0300
Subject: iio: adc: ad7768-1: Fix conversion result sign
The ad7768-1 ADC output code is two's complement, meaning that the voltage
conversion result is a signed value.. Since the value is a 24 bit one,
stored in a 32 bit variable, the sign should be extended in order to get
the correct representation.
Also the channel description has been updated to signed representation,
to match the ADC specifications.
Fixes: a5f8c7da3dbe ("iio: adc: Add AD7768-1 ADC basic support")
Reviewed-by: David Lechner <dlechner(a)baylibre.com>
Reviewed-by: Marcelo Schmitt <marcelo.schmitt(a)analog.com>
Signed-off-by: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Signed-off-by: Jonathan Santos <Jonathan.Santos(a)analog.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://patch.msgid.link/505994d3b71c2aa38ba714d909a68e021f12124c.174126812…
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad7768-1.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/adc/ad7768-1.c b/drivers/iio/adc/ad7768-1.c
index ea829c51e80b..09e7cccfd51c 100644
--- a/drivers/iio/adc/ad7768-1.c
+++ b/drivers/iio/adc/ad7768-1.c
@@ -142,7 +142,7 @@ static const struct iio_chan_spec ad7768_channels[] = {
.channel = 0,
.scan_index = 0,
.scan_type = {
- .sign = 'u',
+ .sign = 's',
.realbits = 24,
.storagebits = 32,
.shift = 8,
@@ -373,7 +373,7 @@ static int ad7768_read_raw(struct iio_dev *indio_dev,
iio_device_release_direct(indio_dev);
if (ret < 0)
return ret;
- *val = ret;
+ *val = sign_extend32(ret, chan->scan_type.realbits - 1);
return IIO_VAL_INT;
--
2.48.1
Currently on book3s-hv, the capability KVM_CAP_SPAPR_TCE_VFIO is only
available for KVM Guests running on PowerNV and not for the KVM guests
running on pSeries hypervisors. This prevents a pSeries L2 guest from
leveraging the in-kernel acceleration for H_PUT_TCE_INDIRECT and
H_STUFF_TCE hcalls that results in slow startup times for large memory
guests.
Support for VFIO on pSeries was restored in commit f431a8cde7f1
("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries"),
making it possible to re-enable this capability on pSeries hosts.
This change enables KVM_CAP_SPAPR_TCE_VFIO for nested PAPR guests on
pSeries, while maintaining the existing behavior on PowerNV. Booting an
L2 guest with 128GB of memory shows an average 11% improvement in
startup time.
Fixes: f431a8cde7f1 ("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries")
Cc: stable(a)vger.kernel.org
Reviewed-by: Vaibhav Jain <vaibhav(a)linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Signed-off-by: Amit Machhiwal <amachhiw(a)linux.ibm.com>
---
Changes since v2:
* Updated the patch description
* v2: https://lore.kernel.org/all/20250129094033.2265211-1-amachhiw@linux.ibm.com/
Changes since v1:
* Addressed review comments from Ritesh
* v1: https://lore.kernel.org/all/20250109132053.158436-1-amachhiw@linux.ibm.com/
arch/powerpc/kvm/powerpc.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index ce1d91eed231..a7138eb18d59 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -543,26 +543,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = !hv_enabled;
break;
#ifdef CONFIG_KVM_MPIC
case KVM_CAP_IRQ_MPIC:
r = 1;
break;
#endif
#ifdef CONFIG_PPC_BOOK3S_64
case KVM_CAP_SPAPR_TCE:
+ fallthrough;
case KVM_CAP_SPAPR_TCE_64:
- r = 1;
- break;
case KVM_CAP_SPAPR_TCE_VFIO:
- r = !!cpu_has_feature(CPU_FTR_HVMODE);
- break;
case KVM_CAP_PPC_RTAS:
case KVM_CAP_PPC_FIXUP_HCALL:
case KVM_CAP_PPC_ENABLE_HCALL:
#ifdef CONFIG_KVM_XICS
case KVM_CAP_IRQ_XICS:
#endif
case KVM_CAP_PPC_GET_CPU_CHAR:
r = 1;
break;
#ifdef CONFIG_KVM_XIVE
base-commit: 6537cfb395f352782918d8ee7b7f10ba2cc3cbf2
--
2.48.1
From: Thomas Weißschuh <linux(a)weissschuh.net>
commit fd53aa40e65f518453115b6f56183b0c201db26b upstream.
The ioctl and sysfs handlers unconditionally call the ->enable callback.
Not all drivers implement that callback, leading to NULL dereferences.
Example of affected drivers: ptp_s390.c, ptp_vclock.c and ptp_mock.c.
Instead use a dummy callback if no better was specified by the driver.
Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Acked-by: Richard Cochran <richardcochran(a)gmail.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski(a)linux.intel.com>
Link: https://patch.msgid.link/20250123-ptp-enable-v1-1-b015834d3a47@weissschuh.n…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Conflict due to
42704b26b0f1 ("ptp: Add cycles support for virtual clocks")
not in the tree]
Signed-off-by: Abdelkareem Abdelsaamad <kareemem(a)amazon.com>
---
drivers/ptp/ptp_clock.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 4d775cd8ee3c..c895e26b1f17 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -188,6 +188,11 @@ static void ptp_clock_release(struct device *dev)
kfree(ptp);
}
+static int ptp_enable(struct ptp_clock_info *ptp, struct ptp_clock_request *request, int on)
+{
+ return -EOPNOTSUPP;
+}
+
static void ptp_aux_kworker(struct kthread_work *work)
{
struct ptp_clock *ptp = container_of(work, struct ptp_clock,
@@ -233,6 +238,9 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
mutex_init(&ptp->pincfg_mux);
init_waitqueue_head(&ptp->tsev_wq);
+ if (!ptp->info->enable)
+ ptp->info->enable = ptp_enable;
+
if (ptp->info->do_aux_work) {
kthread_init_delayed_work(&ptp->aux_work, ptp_aux_kworker);
ptp->kworker = kthread_create_worker(0, "ptp%d", ptp->index);
--
2.47.1
This patch series addresses a shutdown issue reported in [1].
This problem has been fixed on kernel 6.12 and later, kernel 6.6 is
the last kernel these upstream patches should go to because the Realtek
RTL8852BE chip supported by kernel since v6.2 is the only chip known to
have this problem.
[1] https://github.com/lwfinger/rtw89/issues/372
Zenm Chen (2):
wifi: rtw89: pci: add pre_deinit to be called after probe complete
wifi: rtw89: pci: disable PCIE wake bit when PCIE deinit
drivers/net/wireless/realtek/rtw89/core.c | 2 ++
drivers/net/wireless/realtek/rtw89/core.h | 6 ++++++
drivers/net/wireless/realtek/rtw89/pci.c | 10 ++++++++++
3 files changed, 18 insertions(+)
--
2.48.1
This series addresses GPU reset issues reported in [1], where running a
long compute job would trigger repeated GPU resets, leading to a UI
freeze.
Patches #1 and #2 prevent the same faulty job from being resubmitted in a
loop, mitigating the first cause of the issue.
However, the issue isn't entirely solved. Even with only a single GPU
reset, the UI still freezes on the Raspberry Pi 5, indicating a GPU hang.
Patches #3 to #6 address this by properly configuring the V3D_SMS
registers, which are required for power management and resets in V3D 7.1.
Patch #7 updates the DT maintainership, replacing Emma with the current
v3d driver maintainer.
[1] https://github.com/raspberrypi/linux/issues/6660
Best Regards,
- Maíra
---
v1 -> v2:
- [1/6, 2/6, 5/6] Add Iago's R-b (Iago Toral)
- [3/6] Use V3D_GEN_* macros consistently throughout the driver (Phil Elwell)
- [3/6] Don't add Iago's R-b in 3/6 due to changes in the patch
- [4/6] Add per-compatible restrictions to enforce per‐SoC register rules (Conor Dooley)
- [6/6] Add Emma's A-b, collected through IRC (Emma Anholt)
- [6/6] Add Rob's A-b (Rob Herring)
- Link to v1: https://lore.kernel.org/r/20250226-v3d-gpu-reset-fixes-v1-0-83a969fdd9c1@ig…
v2 -> v3:
- [3/7] Add Iago's R-b (Iago Toral)
- [4/7, 5/7] Separate the patches to ease the reviewing process -> Now,
PATCH 4/7 only adds the per-compatible rules and PATCH 5/7 adds the
SMS registers
- [4/7] `allOf` goes above `additionalProperties` (Krzysztof Kozlowski)
- [4/7, 5/7] Sync `reg` and `reg-names` items (Krzysztof Kozlowski)
- Link to v2: https://lore.kernel.org/r/20250308-v3d-gpu-reset-fixes-v2-0-2939c30f0cc4@ig…
v3 -> v4:
- [4/7] BCM2712 has an external reset controller, therefore the "bridge"
register is not needed (Krzysztof Kozlowski)
- [4/7] Remove the word "required" from the reg descriptions (Rob Herring)
- [5/7] Improve commit message (Rob Herring)
- Link to v3: https://lore.kernel.org/r/20250311-v3d-gpu-reset-fixes-v3-0-64f7a4247ec0@ig…
---
Maíra Canal (7):
drm/v3d: Don't run jobs that have errors flagged in its fence
drm/v3d: Set job pointer to NULL when the job's fence has an error
drm/v3d: Associate a V3D tech revision to all supported devices
dt-bindings: gpu: v3d: Add per-compatible register restrictions
dt-bindings: gpu: v3d: Add SMS register to BCM2712 compatible
drm/v3d: Use V3D_SMS registers for power on/off and reset on V3D 7.x
dt-bindings: gpu: Add V3D driver maintainer as DT maintainer
.../devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 77 ++++++++++---
drivers/gpu/drm/v3d/v3d_debugfs.c | 126 ++++++++++-----------
drivers/gpu/drm/v3d/v3d_drv.c | 62 +++++++++-
drivers/gpu/drm/v3d/v3d_drv.h | 22 +++-
drivers/gpu/drm/v3d/v3d_gem.c | 27 ++++-
drivers/gpu/drm/v3d/v3d_irq.c | 6 +-
drivers/gpu/drm/v3d/v3d_perfmon.c | 4 +-
drivers/gpu/drm/v3d/v3d_regs.h | 26 +++++
drivers/gpu/drm/v3d/v3d_sched.c | 29 ++++-
9 files changed, 279 insertions(+), 100 deletions(-)
---
base-commit: 10646ddac2917b31c985ceff0e4982c42a9c924b
change-id: 20250224-v3d-gpu-reset-fixes-2d21fc70711d
Smatch noticed that inode_getblk() can return 1 on successful mapping of
a block instead of expected 0 after commit b405c1e58b73 ("udf: refactor
udf_next_aext() to handle error"). This could confuse some of the
callers and lead to strange failures (although the one reported by
Smatch in udf_mkdir() is impossible to trigger in practice). Fix the
return value of inode_getblk().
Link: https://lore.kernel.org/all/cb514af7-bbe0-435b-934f-dd1d7a16d2cd@stanley.mo…
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Fixes: b405c1e58b73 ("udf: refactor udf_next_aext() to handle error")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/udf/inode.c | 1 +
1 file changed, 1 insertion(+)
I plan to merge this patch through my tree.
diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 70c907fe8af9..4386dd845e40 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -810,6 +810,7 @@ static int inode_getblk(struct inode *inode, struct udf_map_rq *map)
}
map->oflags = UDF_BLK_MAPPED;
map->pblk = udf_get_lb_pblock(inode->i_sb, &eloc, offset);
+ ret = 0;
goto out_free;
}
--
2.43.0
There are two variables that indicate the interrupt type to be used
in the next test execution, "irq_type" as global and test->irq_type.
The global is referenced from pci_endpoint_test_get_irq() to preserve
the current type for ioctl(PCITEST_GET_IRQTYPE).
The type set in this function isn't reflected in the global "irq_type",
so ioctl(PCITEST_GET_IRQTYPE) returns the previous type.
As a result, the wrong type is displayed in old "pcitest" as follows:
# pcitest -i 0
SET IRQ TYPE TO LEGACY: OKAY
# pcitest -I
GET IRQ TYPE: MSI
And new "pcitest" in kselftest results in an error as follows:
# RUN pci_ep_basic.LEGACY_IRQ_TEST ...
# pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Expected 0 (0) == ret (1)
# pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Can't get Legacy IRQ type
Fix this issue by propagating the current type to the global "irq_type".
Cc: stable(a)vger.kernel.org
Fixes: b2ba9225e031 ("misc: pci_endpoint_test: Avoid using module parameter to determine irqtype")
Reviewed-by: Niklas Cassel <cassel(a)kernel.org>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
---
drivers/misc/pci_endpoint_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index acf3d8dab131..896392c428de 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -833,6 +833,7 @@ static int pci_endpoint_test_set_irq(struct pci_endpoint_test *test,
return ret;
}
+ irq_type = test->irq_type;
return 0;
}
--
2.25.1
From: Sven Eckelmann <sven(a)narfation.org>
An OGMv1 and OGMv2 packet receive processing were not only limited by the
number of bytes in the received packet but also by the nodes maximum
aggregation packet size limit. But this limit is relevant for TX and not
for RX. It must not be enforced by batadv_(i)v_ogm_aggr_packet to avoid
loss of information in case of a different limit for sender and receiver.
This has a minor side effect for B.A.T.M.A.N. IV because the
batadv_iv_ogm_aggr_packet is also used for the preprocessing for the TX.
But since the aggregation code itself will not allow more than
BATADV_MAX_AGGREGATION_BYTES bytes, this check was never triggering (in
this context) prior of removing it.
Cc: stable(a)vger.kernel.org
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Fixes: 9323158ef9f4 ("batman-adv: OGMv2 - implement originators logic")
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
---
net/batman-adv/bat_iv_ogm.c | 3 +--
net/batman-adv/bat_v_ogm.c | 3 +--
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 07ae5dd1f150..b12645949ae5 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -325,8 +325,7 @@ batadv_iv_ogm_aggr_packet(int buff_pos, int packet_len,
/* check if there is enough space for the optional TVLV */
next_buff_pos += ntohs(ogm_packet->tvlv_len);
- return (next_buff_pos <= packet_len) &&
- (next_buff_pos <= BATADV_MAX_AGGREGATION_BYTES);
+ return next_buff_pos <= packet_len;
}
/* send a batman ogm to a given interface */
diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c
index e503ee0d896b..8f89ffe6020c 100644
--- a/net/batman-adv/bat_v_ogm.c
+++ b/net/batman-adv/bat_v_ogm.c
@@ -839,8 +839,7 @@ batadv_v_ogm_aggr_packet(int buff_pos, int packet_len,
/* check if there is enough space for the optional TVLV */
next_buff_pos += ntohs(ogm2_packet->tvlv_len);
- return (next_buff_pos <= packet_len) &&
- (next_buff_pos <= BATADV_MAX_AGGREGATION_BYTES);
+ return next_buff_pos <= packet_len;
}
/**
--
2.39.5
From: Remi Pommarel <repk(a)triplefau.lt>
Since commit 4436df478860 ("batman-adv: Add flex array to struct
batadv_tvlv_tt_data"), the introduction of batadv_tvlv_tt_data's flex
array member in batadv_tt_tvlv_ogm_handler_v1() put tt_changes at
invalid offset. Those TT changes are supposed to be filled from the end
of batadv_tvlv_tt_data structure (including vlan_data flexible array),
but only the flex array size is taken into account missing completely
the size of the fixed part of the structure itself.
Fix the tt_change offset computation by using struct_size() instead of
flex_array_size() so both flex array member and its container structure
sizes are taken into account.
Cc: stable(a)vger.kernel.org
Fixes: 4436df478860 ("batman-adv: Add flex array to struct batadv_tvlv_tt_data")
Signed-off-by: Remi Pommarel <repk(a)triplefau.lt>
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
---
net/batman-adv/translation-table.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index 760d51fdbdf6..7d5de4cbb814 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -3959,23 +3959,21 @@ static void batadv_tt_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
struct batadv_tvlv_tt_change *tt_change;
struct batadv_tvlv_tt_data *tt_data;
u16 num_entries, num_vlan;
- size_t flex_size;
+ size_t tt_data_sz;
if (tvlv_value_len < sizeof(*tt_data))
return;
tt_data = tvlv_value;
- tvlv_value_len -= sizeof(*tt_data);
-
num_vlan = ntohs(tt_data->num_vlan);
- flex_size = flex_array_size(tt_data, vlan_data, num_vlan);
- if (tvlv_value_len < flex_size)
+ tt_data_sz = struct_size(tt_data, vlan_data, num_vlan);
+ if (tvlv_value_len < tt_data_sz)
return;
tt_change = (struct batadv_tvlv_tt_change *)((void *)tt_data
- + flex_size);
- tvlv_value_len -= flex_size;
+ + tt_data_sz);
+ tvlv_value_len -= tt_data_sz;
num_entries = batadv_tt_entries(tvlv_value_len);
--
2.39.5
From: Sven Eckelmann <sven(a)narfation.org>
The ELP worker needs to calculate new metric values for all neighbors
"reachable" over an interface. Some of the used metric sources require
locks which might need to sleep. This sleep is incompatible with the RCU
list iterator used for the recorded neighbors. The initial approach to work
around of this problem was to queue another work item per neighbor and then
run this in a new context.
Even when this solved the RCU vs might_sleep() conflict, it has a major
problems: Nothing was stopping the work item in case it is not needed
anymore - for example because one of the related interfaces was removed or
the batman-adv module was unloaded - resulting in potential invalid memory
accesses.
Directly canceling the metric worker also has various problems:
* cancel_work_sync for a to-be-deactivated interface is called with
rtnl_lock held. But the code in the ELP metric worker also tries to use
rtnl_lock() - which will never return in this case. This also means that
cancel_work_sync would never return because it is waiting for the worker
to finish.
* iterating over the neighbor list for the to-be-deactivated interface is
currently done using the RCU specific methods. Which means that it is
possible to miss items when iterating over it without the associated
spinlock - a behaviour which is acceptable for a periodic metric check
but not for a cleanup routine (which must "stop" all still running
workers)
The better approch is to get rid of the per interface neighbor metric
worker and handle everything in the interface worker. The original problems
are solved by:
* creating a list of neighbors which require new metric information inside
the RCU protected context, gathering the metric according to the new list
outside the RCU protected context
* only use rcu_trylock inside metric gathering code to avoid a deadlock
when the cancel_delayed_work_sync is called in the interface removal code
(which is called with the rtnl_lock held)
Cc: stable(a)vger.kernel.org
Fixes: c833484e5f38 ("batman-adv: ELP - compute the metric based on the estimated throughput")
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
---
net/batman-adv/bat_v.c | 2 --
net/batman-adv/bat_v_elp.c | 71 ++++++++++++++++++++++++++------------
net/batman-adv/bat_v_elp.h | 2 --
net/batman-adv/types.h | 3 --
4 files changed, 48 insertions(+), 30 deletions(-)
diff --git a/net/batman-adv/bat_v.c b/net/batman-adv/bat_v.c
index ac11f1f08db0..d35479c465e2 100644
--- a/net/batman-adv/bat_v.c
+++ b/net/batman-adv/bat_v.c
@@ -113,8 +113,6 @@ static void
batadv_v_hardif_neigh_init(struct batadv_hardif_neigh_node *hardif_neigh)
{
ewma_throughput_init(&hardif_neigh->bat_v.throughput);
- INIT_WORK(&hardif_neigh->bat_v.metric_work,
- batadv_v_elp_throughput_metric_update);
}
/**
diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c
index 65e52de52bcd..b065578b4436 100644
--- a/net/batman-adv/bat_v_elp.c
+++ b/net/batman-adv/bat_v_elp.c
@@ -18,6 +18,7 @@
#include <linux/if_ether.h>
#include <linux/jiffies.h>
#include <linux/kref.h>
+#include <linux/list.h>
#include <linux/minmax.h>
#include <linux/netdevice.h>
#include <linux/nl80211.h>
@@ -26,6 +27,7 @@
#include <linux/rcupdate.h>
#include <linux/rtnetlink.h>
#include <linux/skbuff.h>
+#include <linux/slab.h>
#include <linux/stddef.h>
#include <linux/string.h>
#include <linux/types.h>
@@ -41,6 +43,18 @@
#include "routing.h"
#include "send.h"
+/**
+ * struct batadv_v_metric_queue_entry - list of hardif neighbors which require
+ * and metric update
+ */
+struct batadv_v_metric_queue_entry {
+ /** @hardif_neigh: hardif neighbor scheduled for metric update */
+ struct batadv_hardif_neigh_node *hardif_neigh;
+
+ /** @list: list node for metric_queue */
+ struct list_head list;
+};
+
/**
* batadv_v_elp_start_timer() - restart timer for ELP periodic work
* @hard_iface: the interface for which the timer has to be reset
@@ -137,10 +151,17 @@ static bool batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh,
goto default_throughput;
}
+ /* only use rtnl_trylock because the elp worker will be cancelled while
+ * the rntl_lock is held. the cancel_delayed_work_sync() would otherwise
+ * wait forever when the elp work_item was started and it is then also
+ * trying to rtnl_lock
+ */
+ if (!rtnl_trylock())
+ return false;
+
/* if not a wifi interface, check if this device provides data via
* ethtool (e.g. an Ethernet adapter)
*/
- rtnl_lock();
ret = __ethtool_get_link_ksettings(hard_iface->net_dev, &link_settings);
rtnl_unlock();
if (ret == 0) {
@@ -175,31 +196,19 @@ static bool batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh,
/**
* batadv_v_elp_throughput_metric_update() - worker updating the throughput
* metric of a single hop neighbour
- * @work: the work queue item
+ * @neigh: the neighbour to probe
*/
-void batadv_v_elp_throughput_metric_update(struct work_struct *work)
+static void
+batadv_v_elp_throughput_metric_update(struct batadv_hardif_neigh_node *neigh)
{
- struct batadv_hardif_neigh_node_bat_v *neigh_bat_v;
- struct batadv_hardif_neigh_node *neigh;
u32 throughput;
bool valid;
- neigh_bat_v = container_of(work, struct batadv_hardif_neigh_node_bat_v,
- metric_work);
- neigh = container_of(neigh_bat_v, struct batadv_hardif_neigh_node,
- bat_v);
-
valid = batadv_v_elp_get_throughput(neigh, &throughput);
if (!valid)
- goto put_neigh;
+ return;
ewma_throughput_add(&neigh->bat_v.throughput, throughput);
-
-put_neigh:
- /* decrement refcounter to balance increment performed before scheduling
- * this task
- */
- batadv_hardif_neigh_put(neigh);
}
/**
@@ -273,14 +282,16 @@ batadv_v_elp_wifi_neigh_probe(struct batadv_hardif_neigh_node *neigh)
*/
static void batadv_v_elp_periodic_work(struct work_struct *work)
{
+ struct batadv_v_metric_queue_entry *metric_entry;
+ struct batadv_v_metric_queue_entry *metric_safe;
struct batadv_hardif_neigh_node *hardif_neigh;
struct batadv_hard_iface *hard_iface;
struct batadv_hard_iface_bat_v *bat_v;
struct batadv_elp_packet *elp_packet;
+ struct list_head metric_queue;
struct batadv_priv *bat_priv;
struct sk_buff *skb;
u32 elp_interval;
- bool ret;
bat_v = container_of(work, struct batadv_hard_iface_bat_v, elp_wq.work);
hard_iface = container_of(bat_v, struct batadv_hard_iface, bat_v);
@@ -316,6 +327,8 @@ static void batadv_v_elp_periodic_work(struct work_struct *work)
atomic_inc(&hard_iface->bat_v.elp_seqno);
+ INIT_LIST_HEAD(&metric_queue);
+
/* The throughput metric is updated on each sent packet. This way, if a
* node is dead and no longer sends packets, batman-adv is still able to
* react timely to its death.
@@ -340,16 +353,28 @@ static void batadv_v_elp_periodic_work(struct work_struct *work)
/* Reading the estimated throughput from cfg80211 is a task that
* may sleep and that is not allowed in an rcu protected
- * context. Therefore schedule a task for that.
+ * context. Therefore add it to metric_queue and process it
+ * outside rcu protected context.
*/
- ret = queue_work(batadv_event_workqueue,
- &hardif_neigh->bat_v.metric_work);
-
- if (!ret)
+ metric_entry = kzalloc(sizeof(*metric_entry), GFP_ATOMIC);
+ if (!metric_entry) {
batadv_hardif_neigh_put(hardif_neigh);
+ continue;
+ }
+
+ metric_entry->hardif_neigh = hardif_neigh;
+ list_add(&metric_entry->list, &metric_queue);
}
rcu_read_unlock();
+ list_for_each_entry_safe(metric_entry, metric_safe, &metric_queue, list) {
+ batadv_v_elp_throughput_metric_update(metric_entry->hardif_neigh);
+
+ batadv_hardif_neigh_put(metric_entry->hardif_neigh);
+ list_del(&metric_entry->list);
+ kfree(metric_entry);
+ }
+
restart_timer:
batadv_v_elp_start_timer(hard_iface);
out:
diff --git a/net/batman-adv/bat_v_elp.h b/net/batman-adv/bat_v_elp.h
index 9e2740195fa2..c9cb0a307100 100644
--- a/net/batman-adv/bat_v_elp.h
+++ b/net/batman-adv/bat_v_elp.h
@@ -10,7 +10,6 @@
#include "main.h"
#include <linux/skbuff.h>
-#include <linux/workqueue.h>
int batadv_v_elp_iface_enable(struct batadv_hard_iface *hard_iface);
void batadv_v_elp_iface_disable(struct batadv_hard_iface *hard_iface);
@@ -19,6 +18,5 @@ void batadv_v_elp_iface_activate(struct batadv_hard_iface *primary_iface,
void batadv_v_elp_primary_iface_set(struct batadv_hard_iface *primary_iface);
int batadv_v_elp_packet_recv(struct sk_buff *skb,
struct batadv_hard_iface *if_incoming);
-void batadv_v_elp_throughput_metric_update(struct work_struct *work);
#endif /* _NET_BATMAN_ADV_BAT_V_ELP_H_ */
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index 04f6398b3a40..85a50096f5b2 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -596,9 +596,6 @@ struct batadv_hardif_neigh_node_bat_v {
* neighbor
*/
unsigned long last_unicast_tx;
-
- /** @metric_work: work queue callback item for metric update */
- struct work_struct metric_work;
};
/**
--
2.39.5
From: Andy Strohman <andrew(a)andrewstrohman.com>
Reference counting is used to ensure that
batadv_hardif_neigh_node and batadv_hard_iface
are not freed before/during
batadv_v_elp_throughput_metric_update work is
finished.
But there isn't a guarantee that the hard if will
remain associated with a soft interface up until
the work is finished.
This fixes a crash triggered by reboot that looks
like this:
Call trace:
batadv_v_mesh_free+0xd0/0x4dc [batman_adv]
batadv_v_elp_throughput_metric_update+0x1c/0xa4
process_one_work+0x178/0x398
worker_thread+0x2e8/0x4d0
kthread+0xd8/0xdc
ret_from_fork+0x10/0x20
(the batadv_v_mesh_free call is misleading,
and does not actually happen)
I was able to make the issue happen more reliably
by changing hardif_neigh->bat_v.metric_work work
to be delayed work. This allowed me to track down
and confirm the fix.
Cc: stable(a)vger.kernel.org
Fixes: c833484e5f38 ("batman-adv: ELP - compute the metric based on the estimated throughput")
Signed-off-by: Andy Strohman <andrew(a)andrewstrohman.com>
[sven(a)narfation.org: prevent entering batadv_v_elp_get_throughput without
soft_iface]
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
---
net/batman-adv/bat_v_elp.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c
index 1d704574e6bf..fbf499bcc671 100644
--- a/net/batman-adv/bat_v_elp.c
+++ b/net/batman-adv/bat_v_elp.c
@@ -66,12 +66,19 @@ static void batadv_v_elp_start_timer(struct batadv_hard_iface *hard_iface)
static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
{
struct batadv_hard_iface *hard_iface = neigh->if_incoming;
+ struct net_device *soft_iface = hard_iface->soft_iface;
struct ethtool_link_ksettings link_settings;
struct net_device *real_netdev;
struct station_info sinfo;
u32 throughput;
int ret;
+ /* don't query throughput when no longer associated with any
+ * batman-adv interface
+ */
+ if (!soft_iface)
+ return BATADV_THROUGHPUT_DEFAULT_VALUE;
+
/* if the user specified a customised value for this interface, then
* return it directly
*/
@@ -141,7 +148,7 @@ static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
default_throughput:
if (!(hard_iface->bat_v.flags & BATADV_WARNING_DEFAULT)) {
- batadv_info(hard_iface->soft_iface,
+ batadv_info(soft_iface,
"WiFi driver or ethtool info does not provide information about link speeds on interface %s, therefore defaulting to hardcoded throughput values of %u.%1u Mbps. Consider overriding the throughput manually or checking your driver.\n",
hard_iface->net_dev->name,
BATADV_THROUGHPUT_DEFAULT_VALUE / 10,
--
2.39.5
Backport of a similar change from commit 5ac9b4e935df ("lib/buildid:
Handle memfd_secret() files in build_id_parse()") to address an issue
where accessing secret memfd contents through build_id_parse() would
trigger faults.
Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
This repro will cause BUG: unable to handle kernel paging request in
build_id_parse in 5.15/6.1/6.6.
Some other discussions can be found in [1].
[1] https://lore.kernel.org/bpf/20241104175256.2327164-1-jolsa@kernel.org/T/#u
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..9db35305f257 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 927e926d72d9155fde3264459fe9bfd7b5e40d28
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030948-playhouse-strongman-c9c3@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 927e926d72d9155fde3264459fe9bfd7b5e40d28 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb(a)google.com>
Date: Wed, 26 Feb 2025 10:55:09 -0800
Subject: [PATCH] userfaultfd: fix PTE unmapping stack-allocated PTE copies
Current implementation of move_pages_pte() copies source and destination
PTEs in order to detect concurrent changes to PTEs involved in the move.
However these copies are also used to unmap the PTEs, which will fail if
CONFIG_HIGHPTE is enabled because the copies are allocated on the stack.
Fix this by using the actual PTEs which were kmap()ed.
Link: https://lkml.kernel.org/r/20250226185510.2732648-3-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index f5c6b3454f76..d06453fa8aba 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1290,8 +1290,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
spin_unlock(src_ptl);
if (!locked) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
folio_lock(src_folio);
@@ -1307,8 +1307,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
/* at this point we have src_folio locked */
if (folio_test_large(src_folio)) {
/* split_folio() can block */
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
err = split_folio(src_folio);
if (err)
@@ -1333,8 +1333,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
goto out;
}
if (!anon_vma_trylock_write(src_anon_vma)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
anon_vma_lock_write(src_anon_vma);
@@ -1352,8 +1352,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
entry = pte_to_swp_entry(orig_src_pte);
if (non_swap_entry(entry)) {
if (is_migration_entry(entry)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
migration_entry_wait(mm, src_pmd, src_addr);
err = -EAGAIN;
@@ -1396,8 +1396,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
src_folio = folio;
src_folio_pte = orig_src_pte;
if (!folio_trylock(src_folio)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
put_swap_device(si);
si = NULL;
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x 927e926d72d9155fde3264459fe9bfd7b5e40d28
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030947-disloyal-bust-0d23@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 927e926d72d9155fde3264459fe9bfd7b5e40d28 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb(a)google.com>
Date: Wed, 26 Feb 2025 10:55:09 -0800
Subject: [PATCH] userfaultfd: fix PTE unmapping stack-allocated PTE copies
Current implementation of move_pages_pte() copies source and destination
PTEs in order to detect concurrent changes to PTEs involved in the move.
However these copies are also used to unmap the PTEs, which will fail if
CONFIG_HIGHPTE is enabled because the copies are allocated on the stack.
Fix this by using the actual PTEs which were kmap()ed.
Link: https://lkml.kernel.org/r/20250226185510.2732648-3-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index f5c6b3454f76..d06453fa8aba 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1290,8 +1290,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
spin_unlock(src_ptl);
if (!locked) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
folio_lock(src_folio);
@@ -1307,8 +1307,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
/* at this point we have src_folio locked */
if (folio_test_large(src_folio)) {
/* split_folio() can block */
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
err = split_folio(src_folio);
if (err)
@@ -1333,8 +1333,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
goto out;
}
if (!anon_vma_trylock_write(src_anon_vma)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
anon_vma_lock_write(src_anon_vma);
@@ -1352,8 +1352,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
entry = pte_to_swp_entry(orig_src_pte);
if (non_swap_entry(entry)) {
if (is_migration_entry(entry)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
migration_entry_wait(mm, src_pmd, src_addr);
err = -EAGAIN;
@@ -1396,8 +1396,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
src_folio = folio;
src_folio_pte = orig_src_pte;
if (!folio_trylock(src_folio)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
put_swap_device(si);
si = NULL;
On Thu, Mar 13, 2025 at 03:20:34PM +0100, Christian König wrote:
> Am 11.03.25 um 20:01 schrieb Qasim Ijaz:
> > In the ttm_bo_unreserve_bulk() test function, resv is allocated
> > using kunit_kzalloc(), but the subsequent assertion mistakenly
> > verifies the ttm_dev pointer instead of checking the resv pointer.
> > This mistake means that if allocation for resv fails, the error will
> > go undetected, resv will be NULL and a call to dma_resv_init(resv)
>
> The description here is correct, but the subject line is a bit misleading.
>
> Please use something like this instead "drm/ttm/tests: incorrect assert in ttm_bo_unreserve_bulk()".
>
> > will dereference a NULL pointer.
>
> That irrelevant, an allocation failure will result in a NULL pointer deref anyway. This is just an unit test.
>
> >
> > Fix the assertion to properly verify the resv pointer.
> >
> > Fixes: 588c4c8d58c4 ("drm/ttm/tests: Fix a warning in ttm_bo_unreserve_bulk")
> > Cc: stable(a)vger.kernel.org
>
> Please drop those tags. This is just an unit test, not relevant for stability and therefore shouldn't be backported.
>
> Regards,
> Christian.
>
Thank you for the feedback Christian, I will resend a new patch with the
changes you described.
Thanks,
Qasim.
> > Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com>
> > ---
> > drivers/gpu/drm/ttm/tests/ttm_bo_test.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
> > index f8f20d2f6174..e08e5a138420 100644
> > --- a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
> > +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
> > @@ -340,7 +340,7 @@ static void ttm_bo_unreserve_bulk(struct kunit *test)
> > KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
> >
> > resv = kunit_kzalloc(test, sizeof(*resv), GFP_KERNEL);
> > - KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
> > + KUNIT_ASSERT_NOT_NULL(test, resv);
> >
> > err = ttm_device_kunit_init(priv, ttm_dev, false, false);
> > KUNIT_ASSERT_EQ(test, err, 0);
>
Commit 5afd032961e8 "perf cs-etm: Don't flush when packet_queue fills
up" uses i as a loop counter in cs_etm__process_queues(). It was
backported to the 5.4 and 5.10 stable branches, but the i variable
doesn't exist there as it was only added in 5.15.
Declare i with the expected type.
Fixes: 1ed167325c32 ("perf cs-etm: Don't flush when packet_queue fills up")
Fixes: 26db806fa23e ("perf cs-etm: Don't flush when packet_queue fills up")
Signed-off-by: Ben Hutchings <benh(a)debian.org>
---
tools/perf/util/cs-etm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index e3fa32b83367..2055d582a8a4 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2171,7 +2171,7 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
static int cs_etm__process_queues(struct cs_etm_auxtrace *etm)
{
int ret = 0;
- unsigned int cs_queue_nr, queue_nr;
+ unsigned int cs_queue_nr, queue_nr, i;
u8 trace_chan_id;
u64 timestamp;
struct auxtrace_queue *queue;
In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is
incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini()
can only release 'pfp' field of 'gfx'. The release function of 'me' field
should be gfx_v12_0_me_fini().
Fixes: 52cb80c12e8a ("drm/amdgpu: Add gfx v12_0 ip block support (v6)")
Cc: stable(a)vger.kernel.org # 6.11+
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index da327ab48a57..02bc2eddf0c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -2413,7 +2413,7 @@ static int gfx_v12_0_cp_gfx_load_me_microcode_rs64(struct amdgpu_device *adev)
(void **)&adev->gfx.me.me_fw_data_ptr);
if (r) {
dev_err(adev->dev, "(%d) failed to create me data bo\n", r);
- gfx_v12_0_pfp_fini(adev);
+ gfx_v12_0_me_fini(adev);
return r;
}
--
2.42.0.windows.2
In the ttm_bo_unreserve_bulk() test function, resv is allocated
using kunit_kzalloc(), but the subsequent assertion mistakenly
verifies the ttm_dev pointer instead of checking the resv pointer.
This mistake means that if allocation for resv fails, the error will
go undetected, resv will be NULL and a call to dma_resv_init(resv)
will dereference a NULL pointer.
Fix the assertion to properly verify the resv pointer.
Fixes: 588c4c8d58c4 ("drm/ttm/tests: Fix a warning in ttm_bo_unreserve_bulk")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com>
---
drivers/gpu/drm/ttm/tests/ttm_bo_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
index f8f20d2f6174..e08e5a138420 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
@@ -340,7 +340,7 @@ static void ttm_bo_unreserve_bulk(struct kunit *test)
KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
resv = kunit_kzalloc(test, sizeof(*resv), GFP_KERNEL);
- KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
+ KUNIT_ASSERT_NOT_NULL(test, resv);
err = ttm_device_kunit_init(priv, ttm_dev, false, false);
KUNIT_ASSERT_EQ(test, err, 0);
--
2.39.5
The PWM allow configuring the PWM resolution from 8 bits PWM
values up to 15 bits values, for the Hi-Res PWMs, and then either
6-bit or 9-bit for the normal PWMs. The current implementation loops
through all possible resolutions (PWM sizes), for the PWM subtype, on top
of the already existing process of determining the prediv, exponent and
refclk.
The first and second issues are related to capping the computed PWM
value.
The third issue is that it uses the wrong maximum possible PWM
value for determining the best matched period.
Fix all of them.
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
Changes in v4:
- Rebased on next-20250305
- Re-worded the commit messages for the first two patches to include an
example that should better explain what the issue that is being fixed
is. As per Uwe's request.
- Link to v3: https://lore.kernel.org/r/20250303-leds-qcom-lpg-fix-max-pwm-on-hi-res-v3-0…
Changes in v3:
- Added a new patch that fixes the normal PWMs, since they now support
6-bit resolution as well. Added it as first patch.
- Re-worded the second patch. Included Bjorn's suggestion and R-b tag.
- Link to v2: https://lore.kernel.org/r/20250226-leds-qcom-lpg-fix-max-pwm-on-hi-res-v2-0…
Changes in v2:
- Re-worded the commit to drop the details that are not important
w.r.t. what the patch is fixing.
- Added another patch which fixes the resolution used for determining
best matched period and PWM config.
- Link to v1: https://lore.kernel.org/r/20250220-leds-qcom-lpg-fix-max-pwm-on-hi-res-v1-1…
---
Abel Vesa (3):
leds: rgb: leds-qcom-lpg: Fix pwm resolution max for normal PWMs
leds: rgb: leds-qcom-lpg: Fix pwm resolution max for Hi-Res PWMs
leds: rgb: leds-qcom-lpg: Fix calculation of best period Hi-Res PWMs
drivers/leds/rgb/leds-qcom-lpg.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
---
base-commit: 7ec162622e66a4ff886f8f28712ea1b13069e1aa
change-id: 20250220-leds-qcom-lpg-fix-max-pwm-on-hi-res-067e8782a79b
Best regards,
--
Abel Vesa <abel.vesa(a)linaro.org>
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x 3e385c0d6ce88ac9916dcf84267bd5855d830748
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030957-magnetism-lustily-55d9@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3e385c0d6ce88ac9916dcf84267bd5855d830748 Mon Sep 17 00:00:00 2001
From: Alexey Kardashevskiy <aik(a)amd.com>
Date: Fri, 7 Mar 2025 12:37:00 +1100
Subject: [PATCH] virt: sev-guest: Move SNP Guest Request data pages handling
under snp_cmd_mutex
Compared to the SNP Guest Request, the "Extended" version adds data pages for
receiving certificates. If not enough pages provided, the HV can report to the
VM how much is needed so the VM can reallocate and repeat.
Commit
ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex")
moved handling of the allocated/desired pages number out of scope of said
mutex and create a possibility for a race (multiple instances trying to
trigger Extended request in a VM) as there is just one instance of
snp_msg_desc per /dev/sev-guest and no locking other than snp_cmd_mutex.
Fix the issue by moving the data blob/size and the GHCB input struct
(snp_req_data) into snp_guest_req which is allocated on stack now and accessed
by the GHCB caller under that mutex.
Stop allocating SEV_FW_BLOB_MAX_SIZE in snp_msg_alloc() as only one of four
callers needs it. Free the received blob in get_ext_report() right after it is
copied to the userspace. Possible future users of snp_send_guest_request() are
likely to have different ideas about the buffer size anyways.
Fixes: ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex")
Signed-off-by: Alexey Kardashevskiy <aik(a)amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj(a)amd.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250307013700.437505-3-aik@amd.com
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 82492efc5d94..96c7bc698e6b 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -2853,19 +2853,8 @@ struct snp_msg_desc *snp_msg_alloc(void)
if (!mdesc->response)
goto e_free_request;
- mdesc->certs_data = alloc_shared_pages(SEV_FW_BLOB_MAX_SIZE);
- if (!mdesc->certs_data)
- goto e_free_response;
-
- /* initial the input address for guest request */
- mdesc->input.req_gpa = __pa(mdesc->request);
- mdesc->input.resp_gpa = __pa(mdesc->response);
- mdesc->input.data_gpa = __pa(mdesc->certs_data);
-
return mdesc;
-e_free_response:
- free_shared_pages(mdesc->response, sizeof(struct snp_guest_msg));
e_free_request:
free_shared_pages(mdesc->request, sizeof(struct snp_guest_msg));
e_unmap:
@@ -2885,7 +2874,6 @@ void snp_msg_free(struct snp_msg_desc *mdesc)
kfree(mdesc->ctx);
free_shared_pages(mdesc->response, sizeof(struct snp_guest_msg));
free_shared_pages(mdesc->request, sizeof(struct snp_guest_msg));
- free_shared_pages(mdesc->certs_data, SEV_FW_BLOB_MAX_SIZE);
iounmap((__force void __iomem *)mdesc->secrets);
memset(mdesc, 0, sizeof(*mdesc));
@@ -3054,7 +3042,7 @@ static int __handle_guest_request(struct snp_msg_desc *mdesc, struct snp_guest_r
* sequence number must be incremented or the VMPCK must be deleted to
* prevent reuse of the IV.
*/
- rc = snp_issue_guest_request(req, &mdesc->input, rio);
+ rc = snp_issue_guest_request(req, &req->input, rio);
switch (rc) {
case -ENOSPC:
/*
@@ -3064,7 +3052,7 @@ static int __handle_guest_request(struct snp_msg_desc *mdesc, struct snp_guest_r
* order to increment the sequence number and thus avoid
* IV reuse.
*/
- override_npages = mdesc->input.data_npages;
+ override_npages = req->input.data_npages;
req->exit_code = SVM_VMGEXIT_GUEST_REQUEST;
/*
@@ -3120,7 +3108,7 @@ static int __handle_guest_request(struct snp_msg_desc *mdesc, struct snp_guest_r
}
if (override_npages)
- mdesc->input.data_npages = override_npages;
+ req->input.data_npages = override_npages;
return rc;
}
@@ -3158,6 +3146,11 @@ int snp_send_guest_request(struct snp_msg_desc *mdesc, struct snp_guest_req *req
*/
memcpy(mdesc->request, &mdesc->secret_request, sizeof(mdesc->secret_request));
+ /* Initialize the input address for guest request */
+ req->input.req_gpa = __pa(mdesc->request);
+ req->input.resp_gpa = __pa(mdesc->response);
+ req->input.data_gpa = req->certs_data ? __pa(req->certs_data) : 0;
+
rc = __handle_guest_request(mdesc, req, rio);
if (rc) {
if (rc == -EIO &&
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 1581246491b5..ba7999f66abe 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -203,6 +203,9 @@ struct snp_guest_req {
unsigned int vmpck_id;
u8 msg_version;
u8 msg_type;
+
+ struct snp_req_data input;
+ void *certs_data;
};
/*
@@ -263,9 +266,6 @@ struct snp_msg_desc {
struct snp_guest_msg secret_request, secret_response;
struct snp_secrets_page *secrets;
- struct snp_req_data input;
-
- void *certs_data;
struct aesgcm_ctx *ctx;
diff --git a/drivers/virt/coco/sev-guest/sev-guest.c b/drivers/virt/coco/sev-guest/sev-guest.c
index 23ac177472be..70fbc9a3e703 100644
--- a/drivers/virt/coco/sev-guest/sev-guest.c
+++ b/drivers/virt/coco/sev-guest/sev-guest.c
@@ -176,6 +176,7 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
struct snp_guest_req req = {};
int ret, npages = 0, resp_len;
sockptr_t certs_address;
+ struct page *page;
if (sockptr_is_null(io->req_data) || sockptr_is_null(io->resp_data))
return -EINVAL;
@@ -209,8 +210,20 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
* the host. If host does not supply any certs in it, then copy
* zeros to indicate that certificate data was not provided.
*/
- memset(mdesc->certs_data, 0, report_req->certs_len);
npages = report_req->certs_len >> PAGE_SHIFT;
+ page = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO,
+ get_order(report_req->certs_len));
+ if (!page)
+ return -ENOMEM;
+
+ req.certs_data = page_address(page);
+ ret = set_memory_decrypted((unsigned long)req.certs_data, npages);
+ if (ret) {
+ pr_err("failed to mark page shared, ret=%d\n", ret);
+ __free_pages(page, get_order(report_req->certs_len));
+ return -EFAULT;
+ }
+
cmd:
/*
* The intermediate response buffer is used while decrypting the
@@ -219,10 +232,12 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
*/
resp_len = sizeof(report_resp->data) + mdesc->ctx->authsize;
report_resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
- if (!report_resp)
- return -ENOMEM;
+ if (!report_resp) {
+ ret = -ENOMEM;
+ goto e_free_data;
+ }
- mdesc->input.data_npages = npages;
+ req.input.data_npages = npages;
req.msg_version = arg->msg_version;
req.msg_type = SNP_MSG_REPORT_REQ;
@@ -237,7 +252,7 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
/* If certs length is invalid then copy the returned length */
if (arg->vmm_error == SNP_GUEST_VMM_ERR_INVALID_LEN) {
- report_req->certs_len = mdesc->input.data_npages << PAGE_SHIFT;
+ report_req->certs_len = req.input.data_npages << PAGE_SHIFT;
if (copy_to_sockptr(io->req_data, report_req, sizeof(*report_req)))
ret = -EFAULT;
@@ -246,7 +261,7 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
if (ret)
goto e_free;
- if (npages && copy_to_sockptr(certs_address, mdesc->certs_data, report_req->certs_len)) {
+ if (npages && copy_to_sockptr(certs_address, req.certs_data, report_req->certs_len)) {
ret = -EFAULT;
goto e_free;
}
@@ -256,6 +271,13 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
e_free:
kfree(report_resp);
+e_free_data:
+ if (npages) {
+ if (set_memory_encrypted((unsigned long)req.certs_data, npages))
+ WARN_ONCE(ret, "failed to restore encryption mask (leak it)\n");
+ else
+ __free_pages(page, get_order(report_req->certs_len));
+ }
return ret;
}
From: Sean Christopherson <seanjc(a)google.com>
commit a8de7f100bb5989d9c3627d3a223ee1c863f3b69 upstream.
Advertise support for Hyper-V's SEND_IPI and SEND_IPI_EX hypercalls if and
only if the local API is emulated/virtualized by KVM, and explicitly reject
said hypercalls if the local APIC is emulated in userspace, i.e. don't rely
on userspace to opt-in to KVM_CAP_HYPERV_ENFORCE_CPUID.
Rejecting SEND_IPI and SEND_IPI_EX fixes a NULL-pointer dereference if
Hyper-V enlightenments are exposed to the guest without an in-kernel local
APIC:
dump_stack+0xbe/0xfd
__kasan_report.cold+0x34/0x84
kasan_report+0x3a/0x50
__apic_accept_irq+0x3a/0x5c0
kvm_hv_send_ipi.isra.0+0x34e/0x820
kvm_hv_hypercall+0x8d9/0x9d0
kvm_emulate_hypercall+0x506/0x7e0
__vmx_handle_exit+0x283/0xb60
vmx_handle_exit+0x1d/0xd0
vcpu_enter_guest+0x16b0/0x24c0
vcpu_run+0xc0/0x550
kvm_arch_vcpu_ioctl_run+0x170/0x6d0
kvm_vcpu_ioctl+0x413/0xb20
__se_sys_ioctl+0x111/0x160
do_syscal1_64+0x30/0x40
entry_SYSCALL_64_after_hwframe+0x67/0xd1
Note, checking the sending vCPU is sufficient, as the per-VM irqchip_mode
can't be modified after vCPUs are created, i.e. if one vCPU has an
in-kernel local APIC, then all vCPUs have an in-kernel local APIC.
Reported-by: Dongjie Zou <zoudongjie(a)huawei.com>
Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls")
Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID")
Cc: stable(a)vger.kernel.org
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Link: https://lore.kernel.org/r/20250118003454.2619573-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Conflict due to
72167a9d7da2 ("KVM: x86: hyper-v: Stop shadowing global 'current_vcpu'
variable")
not in the tree]
Signed-off-by: Abdelkareem Abdelsaamad <kareemem(a)amazon.com>
---
arch/x86/kvm/hyperv.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 20eb8f55e1f1..e097faf12c82 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1618,6 +1618,9 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *current_vcpu, u64 ingpa, u64 outgpa,
u32 vector;
bool all_cpus;
+ if (!lapic_in_kernel(current_vcpu))
+ return HV_STATUS_INVALID_HYPERCALL_INPUT;
+
if (!ex) {
if (!fast) {
if (unlikely(kvm_read_guest(kvm, ingpa, &send_ipi,
@@ -2060,7 +2063,8 @@ int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->eax |= HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
ent->eax |= HV_X64_APIC_ACCESS_RECOMMENDED;
ent->eax |= HV_X64_RELAXED_TIMING_RECOMMENDED;
- ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED;
+ if (!vcpu || lapic_in_kernel(vcpu))
+ ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED;
ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED;
if (evmcs_ver)
ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
--
2.47.1
From: Baoquan He <bhe(a)redhat.com>
commit b3e34a47f98974d0844444c5121aaff123004e57 upstream.
This is reported by kmemleak detector:
unreferenced object 0xffffc900002a9000 (size 4096):
comm "kexec", pid 14950, jiffies 4295110793 (age 373.951s)
hex dump (first 32 bytes):
7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............
04 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 ..>.............
backtrace:
[<0000000016a8ef9f>] __vmalloc_node_range+0x101/0x170
[<000000002b66b6c0>] __vmalloc_node+0xb4/0x160
[<00000000ad40107d>] crash_prepare_elf64_headers+0x8e/0xcd0
[<0000000019afff23>] crash_load_segments+0x260/0x470
[<0000000019ebe95c>] bzImage64_load+0x814/0xad0
[<0000000093e16b05>] arch_kexec_kernel_image_load+0x1be/0x2a0
[<000000009ef2fc88>] kimage_file_alloc_init+0x2ec/0x5a0
[<0000000038f5a97a>] __do_sys_kexec_file_load+0x28d/0x530
[<0000000087c19992>] do_syscall_64+0x3b/0x90
[<0000000066e063a4>] entry_SYSCALL_64_after_hwframe+0x44/0xae
In crash_prepare_elf64_headers(), a buffer is allocated via vmalloc() to
store elf headers. While it's not freed back to system correctly when
kdump kernel is reloaded or unloaded. Then memory leak is caused. Fix it
by introducing x86 specific function arch_kimage_file_post_load_cleanup(),
and freeing the buffer there.
And also remove the incorrect elf header buffer freeing code. Before
calling arch specific kexec_file loading function, the image instance has
been initialized. So 'image->elf_headers' must be NULL. It doesn't make
sense to free the elf header buffer in the place.
Three different people have reported three bugs about the memory leak on
x86_64 inside Redhat.
Link: https://lkml.kernel.org/r/20220223113225.63106-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Acked-by: Dave Young <dyoung(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Conflict due to
179350f00e06 ("x86: Use ELF fields defined in 'struct kimage'")
not in the tree]
Signed-off-by: Abdelkareem Abdelsaamad <kareemem(a)amazon.com>
---
arch/x86/kernel/machine_kexec_64.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index a29a44a98e5b..19f6aafd595a 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -402,9 +402,6 @@ void machine_kexec(struct kimage *image)
#ifdef CONFIG_KEXEC_FILE
void *arch_kexec_kernel_image_load(struct kimage *image)
{
- vfree(image->arch.elf_headers);
- image->arch.elf_headers = NULL;
-
if (!image->fops || !image->fops->load)
return ERR_PTR(-ENOEXEC);
@@ -540,6 +537,15 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
(int)ELF64_R_TYPE(rel[i].r_info), value);
return -ENOEXEC;
}
+
+int arch_kimage_file_post_load_cleanup(struct kimage *image)
+{
+ vfree(image->arch.elf_headers);
+ image->arch.elf_headers = NULL;
+ image->arch.elf_headers_sz = 0;
+
+ return kexec_image_post_load_cleanup_default(image);
+}
#endif /* CONFIG_KEXEC_FILE */
static int
--
2.47.1