6.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
[ Upstream commit 40ad64ac25bb736740f895d99a4aebbda9b80991 ]
Propagate fwnode of the ACPI device to the SPI controller Linux device.
Currently only OF case propagates fwnode to the controller.
While at it, replace several calls to dev_fwnode() with a single one
cached in a local variable, and unify checks for fwnode type by using
is_*_node() APIs.
Fixes: 55ab8487e01d ("spi: spi-nxp-fspi: Add ACPI support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reviewed-by: Haibo Chen <haibo.chen(a)nxp.com>
Link: https://patch.msgid.link/20251126202501.2319679-1-andriy.shevchenko@linux.i…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/spi/spi-nxp-fspi.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/spi/spi-nxp-fspi.c b/drivers/spi/spi-nxp-fspi.c
index fcf10be66d391..f96638cae1d94 100644
--- a/drivers/spi/spi-nxp-fspi.c
+++ b/drivers/spi/spi-nxp-fspi.c
@@ -1270,7 +1270,7 @@ static int nxp_fspi_probe(struct platform_device *pdev)
{
struct spi_controller *ctlr;
struct device *dev = &pdev->dev;
- struct device_node *np = dev->of_node;
+ struct fwnode_handle *fwnode = dev_fwnode(dev);
struct resource *res;
struct nxp_fspi *f;
int ret, irq;
@@ -1292,7 +1292,7 @@ static int nxp_fspi_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, f);
/* find the resources - configuration register address space */
- if (is_acpi_node(dev_fwnode(f->dev)))
+ if (is_acpi_node(fwnode))
f->iobase = devm_platform_ioremap_resource(pdev, 0);
else
f->iobase = devm_platform_ioremap_resource_byname(pdev, "fspi_base");
@@ -1300,7 +1300,7 @@ static int nxp_fspi_probe(struct platform_device *pdev)
return PTR_ERR(f->iobase);
/* find the resources - controller memory mapped space */
- if (is_acpi_node(dev_fwnode(f->dev)))
+ if (is_acpi_node(fwnode))
res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
else
res = platform_get_resource_byname(pdev,
@@ -1313,7 +1313,7 @@ static int nxp_fspi_probe(struct platform_device *pdev)
f->memmap_phy_size = resource_size(res);
/* find the clocks */
- if (dev_of_node(&pdev->dev)) {
+ if (is_of_node(fwnode)) {
f->clk_en = devm_clk_get(dev, "fspi_en");
if (IS_ERR(f->clk_en))
return PTR_ERR(f->clk_en);
@@ -1366,7 +1366,7 @@ static int nxp_fspi_probe(struct platform_device *pdev)
else
ctlr->mem_caps = &nxp_fspi_mem_caps;
- ctlr->dev.of_node = np;
+ device_set_node(&ctlr->dev, fwnode);
ret = devm_add_action_or_reset(dev, nxp_fspi_cleanup, f);
if (ret)
--
2.51.0
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuang Wang <nashuiliang(a)gmail.com>
commit ac1499fcd40fe06479e9b933347b837ccabc2a40 upstream.
The sit driver's packet transmission path calls: sit_tunnel_xmit() ->
update_or_create_fnhe(), which lead to fnhe_remove_oldest() being called
to delete entries exceeding FNHE_RECLAIM_DEPTH+random.
The race window is between fnhe_remove_oldest() selecting fnheX for
deletion and the subsequent kfree_rcu(). During this time, the
concurrent path's __mkroute_output() -> find_exception() can fetch the
soon-to-be-deleted fnheX, and rt_bind_exception() then binds it with a
new dst using a dst_hold(). When the original fnheX is freed via RCU,
the dst reference remains permanently leaked.
CPU 0 CPU 1
__mkroute_output()
find_exception() [fnheX]
update_or_create_fnhe()
fnhe_remove_oldest() [fnheX]
rt_bind_exception() [bind dst]
RCU callback [fnheX freed, dst leak]
This issue manifests as a device reference count leak and a warning in
dmesg when unregistering the net device:
unregister_netdevice: waiting for sitX to become free. Usage count = N
Ido Schimmel provided the simple test validation method [1].
The fix clears 'oldest->fnhe_daddr' before calling fnhe_flush_routes().
Since rt_bind_exception() checks this field, setting it to zero prevents
the stale fnhe from being reused and bound to a new dst just before it
is freed.
[1]
ip netns add ns1
ip -n ns1 link set dev lo up
ip -n ns1 address add 192.0.2.1/32 dev lo
ip -n ns1 link add name dummy1 up type dummy
ip -n ns1 route add 192.0.2.2/32 dev dummy1
ip -n ns1 link add name gretap1 up arp off type gretap \
local 192.0.2.1 remote 192.0.2.2
ip -n ns1 route add 198.51.0.0/16 dev gretap1
taskset -c 0 ip netns exec ns1 mausezahn gretap1 \
-A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q &
taskset -c 2 ip netns exec ns1 mausezahn gretap1 \
-A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q &
sleep 10
ip netns pids ns1 | xargs kill
ip netns del ns1
Cc: stable(a)vger.kernel.org
Fixes: 67d6d681e15b ("ipv4: make exception cache less predictible")
Signed-off-by: Chuang Wang <nashuiliang(a)gmail.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Link: https://patch.msgid.link/20251111064328.24440-1-nashuiliang@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/route.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -624,6 +624,11 @@ static void fnhe_remove_oldest(struct fn
oldest_p = fnhe_p;
}
}
+
+ /* Clear oldest->fnhe_daddr to prevent this fnhe from being
+ * rebound with new dsts in rt_bind_exception().
+ */
+ oldest->fnhe_daddr = 0;
fnhe_flush_routes(oldest);
*oldest_p = oldest->fnhe_next;
kfree_rcu(oldest, rcu);
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nate Karstens <nate.karstens(a)garmin.com>
commit 4da4e4bde1c453ac5cc2dce5def81d504ae257ee upstream.
The `len` member of the sk_buff is an unsigned int. This is cast to
`ssize_t` (a signed type) for the first sk_buff in the comparison,
but not the second sk_buff. On 32-bit systems, this can result in
an integer underflow for certain values because unsigned arithmetic
is being used.
This appears to be an oversight: if the intention was to use unsigned
arithmetic, then the first cast would have been omitted. The change
ensures both len values are cast to `ssize_t`.
The underflow causes an issue with ktls when multiple TLS PDUs are
included in a single TCP segment. The mainline kernel does not use
strparser for ktls anymore, but this is still useful for other
features that still use strparser, and for backporting.
Signed-off-by: Nate Karstens <nate.karstens(a)garmin.com>
Cc: stable(a)vger.kernel.org
Fixes: 43a0c6751a32 ("strparser: Stream parser for messages")
Reviewed-by: Jacob Keller <jacob.e.keller(a)intel.com>
Reviewed-by: Sabrina Dubroca <sd(a)queasysnail.net>
Link: https://patch.msgid.link/20251106222835.1871628-1-nate.karstens@garmin.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/strparser/strparser.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -238,7 +238,7 @@ static int __strp_recv(read_descriptor_t
strp_parser_err(strp, -EMSGSIZE, desc);
break;
} else if (len <= (ssize_t)head->len -
- skb->len - stm->strp.offset) {
+ (ssize_t)skb->len - stm->strp.offset) {
/* Length must be into new skb (and also
* greater than zero)
*/
6.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown <neil(a)brown.name>
[ Upstream commit e9c70084a64e51b65bb68f810692a03dc8bedffa ]
As well as checking that the parent hasn't changed after getting the
lock we need to check that the dentry hasn't been unhashed.
Otherwise we might try to rename something that has been removed.
Reported-by: syzbot+bfc9a0ccf0de47d04e8c(a)syzkaller.appspotmail.com
Fixes: d2c995581c7c ("ovl: Call ovl_create_temp() without lock held.")
Signed-off-by: NeilBrown <neil(a)brown.name>
Link: https://patch.msgid.link/176429295510.634289.1552337113663461690@noble.neil…
Tested-by: syzbot+bfc9a0ccf0de47d04e8c(a)syzkaller.appspotmail.com
Reviewed-by: Amir Goldstein <amir73il(a)gmail.com>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/overlayfs/util.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 41033bac96cbb..ab652164ffc90 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -1234,9 +1234,9 @@ int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *work,
goto err;
if (trap)
goto err_unlock;
- if (work && work->d_parent != workdir)
+ if (work && (work->d_parent != workdir || d_unhashed(work)))
goto err_unlock;
- if (upper && upper->d_parent != upperdir)
+ if (upper && (upper->d_parent != upperdir || d_unhashed(upper)))
goto err_unlock;
return 0;
--
2.51.0
6.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells <dhowells(a)redhat.com>
[ Upstream commit d27c71257825dced46104eefe42e4d9964bd032e ]
The allocation of a cell's anonymous key is done in a background thread
along with other cell setup such as doing a DNS upcall. In the reported
bug, this is triggered by afs_parse_source() parsing the device name given
to mount() and calling afs_lookup_cell() with the name of the cell.
The normal key lookup then tries to use the key description on the
anonymous authentication key as the reference for request_key() - but it
may not yet be set and so an oops can happen.
This has been made more likely to happen by the fix for dynamic lookup
failure.
Fix this by firstly allocating a reference name and attaching it to the
afs_cell record when the record is created. It can share the memory
allocation with the cell name (unfortunately it can't just overlap the cell
name by prepending it with "afs@" as the cell name already has a '.'
prepended for other purposes). This reference name is then passed to
request_key().
Secondly, the anon key is now allocated on demand at the point a key is
requested in afs_request_key() if it is not already allocated. A mutex is
used to prevent multiple allocation for a cell.
Thirdly, make afs_request_key_rcu() return NULL if the anonymous key isn't
yet allocated (if we need it) and then the caller can return -ECHILD to
drop out of RCU-mode and afs_request_key() can be called.
Note that the anonymous key is kind of necessary to make the key lookup
cache work as that doesn't currently cache a negative lookup, but it's
probably worth some investigation to see if NULL can be used instead.
Fixes: 330e2c514823 ("afs: Fix dynamic lookup to fail on cell lookup failure")
Reported-by: syzbot+41c68824eefb67cdf00c(a)syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells(a)redhat.com>
Link: https://patch.msgid.link/800328.1764325145@warthog.procyon.org.uk
cc: Marc Dionne <marc.dionne(a)auristor.com>
cc: linux-afs(a)lists.infradead.org
cc: linux-fsdevel(a)vger.kernel.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/afs/cell.c | 43 ++++++++----------------------------------
fs/afs/internal.h | 1 +
fs/afs/security.c | 48 +++++++++++++++++++++++++++++++++++++++--------
3 files changed, 49 insertions(+), 43 deletions(-)
diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index d9b6fa1088b7b..71c10a05cebe5 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -140,7 +140,9 @@ static struct afs_cell *afs_alloc_cell(struct afs_net *net,
return ERR_PTR(-ENOMEM);
}
- cell->name = kmalloc(1 + namelen + 1, GFP_KERNEL);
+ /* Allocate the cell name and the key name in one go. */
+ cell->name = kmalloc(1 + namelen + 1 +
+ 4 + namelen + 1, GFP_KERNEL);
if (!cell->name) {
kfree(cell);
return ERR_PTR(-ENOMEM);
@@ -151,7 +153,11 @@ static struct afs_cell *afs_alloc_cell(struct afs_net *net,
cell->name_len = namelen;
for (i = 0; i < namelen; i++)
cell->name[i] = tolower(name[i]);
- cell->name[i] = 0;
+ cell->name[i++] = 0;
+
+ cell->key_desc = cell->name + i;
+ memcpy(cell->key_desc, "afs@", 4);
+ memcpy(cell->key_desc + 4, cell->name, cell->name_len + 1);
cell->net = net;
refcount_set(&cell->ref, 1);
@@ -710,33 +716,6 @@ void afs_set_cell_timer(struct afs_cell *cell, unsigned int delay_secs)
timer_reduce(&cell->management_timer, jiffies + delay_secs * HZ);
}
-/*
- * Allocate a key to use as a placeholder for anonymous user security.
- */
-static int afs_alloc_anon_key(struct afs_cell *cell)
-{
- struct key *key;
- char keyname[4 + AFS_MAXCELLNAME + 1], *cp, *dp;
-
- /* Create a key to represent an anonymous user. */
- memcpy(keyname, "afs@", 4);
- dp = keyname + 4;
- cp = cell->name;
- do {
- *dp++ = tolower(*cp);
- } while (*cp++);
-
- key = rxrpc_get_null_key(keyname);
- if (IS_ERR(key))
- return PTR_ERR(key);
-
- cell->anonymous_key = key;
-
- _debug("anon key %p{%x}",
- cell->anonymous_key, key_serial(cell->anonymous_key));
- return 0;
-}
-
/*
* Activate a cell.
*/
@@ -746,12 +725,6 @@ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell)
struct afs_cell *pcell;
int ret;
- if (!cell->anonymous_key) {
- ret = afs_alloc_anon_key(cell);
- if (ret < 0)
- return ret;
- }
-
ret = afs_proc_cell_setup(cell);
if (ret < 0)
return ret;
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 87828d685293f..470e6eef8bd49 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -413,6 +413,7 @@ struct afs_cell {
u8 name_len; /* Length of name */
char *name; /* Cell name, case-flattened and NUL-padded */
+ char *key_desc; /* Authentication key description */
};
/*
diff --git a/fs/afs/security.c b/fs/afs/security.c
index 6a7744c9e2a2d..ff8830e6982fb 100644
--- a/fs/afs/security.c
+++ b/fs/afs/security.c
@@ -16,6 +16,30 @@
static DEFINE_HASHTABLE(afs_permits_cache, 10);
static DEFINE_SPINLOCK(afs_permits_lock);
+static DEFINE_MUTEX(afs_key_lock);
+
+/*
+ * Allocate a key to use as a placeholder for anonymous user security.
+ */
+static int afs_alloc_anon_key(struct afs_cell *cell)
+{
+ struct key *key;
+
+ mutex_lock(&afs_key_lock);
+ if (!cell->anonymous_key) {
+ key = rxrpc_get_null_key(cell->key_desc);
+ if (!IS_ERR(key))
+ cell->anonymous_key = key;
+ }
+ mutex_unlock(&afs_key_lock);
+
+ if (IS_ERR(key))
+ return PTR_ERR(key);
+
+ _debug("anon key %p{%x}",
+ cell->anonymous_key, key_serial(cell->anonymous_key));
+ return 0;
+}
/*
* get a key
@@ -23,11 +47,12 @@ static DEFINE_SPINLOCK(afs_permits_lock);
struct key *afs_request_key(struct afs_cell *cell)
{
struct key *key;
+ int ret;
- _enter("{%x}", key_serial(cell->anonymous_key));
+ _enter("{%s}", cell->key_desc);
- _debug("key %s", cell->anonymous_key->description);
- key = request_key_net(&key_type_rxrpc, cell->anonymous_key->description,
+ _debug("key %s", cell->key_desc);
+ key = request_key_net(&key_type_rxrpc, cell->key_desc,
cell->net->net, NULL);
if (IS_ERR(key)) {
if (PTR_ERR(key) != -ENOKEY) {
@@ -35,6 +60,12 @@ struct key *afs_request_key(struct afs_cell *cell)
return key;
}
+ if (!cell->anonymous_key) {
+ ret = afs_alloc_anon_key(cell);
+ if (ret < 0)
+ return ERR_PTR(ret);
+ }
+
/* act as anonymous user */
_leave(" = {%x} [anon]", key_serial(cell->anonymous_key));
return key_get(cell->anonymous_key);
@@ -52,11 +83,10 @@ struct key *afs_request_key_rcu(struct afs_cell *cell)
{
struct key *key;
- _enter("{%x}", key_serial(cell->anonymous_key));
+ _enter("{%s}", cell->key_desc);
- _debug("key %s", cell->anonymous_key->description);
- key = request_key_net_rcu(&key_type_rxrpc,
- cell->anonymous_key->description,
+ _debug("key %s", cell->key_desc);
+ key = request_key_net_rcu(&key_type_rxrpc, cell->key_desc,
cell->net->net);
if (IS_ERR(key)) {
if (PTR_ERR(key) != -ENOKEY) {
@@ -65,6 +95,8 @@ struct key *afs_request_key_rcu(struct afs_cell *cell)
}
/* act as anonymous user */
+ if (!cell->anonymous_key)
+ return NULL; /* Need to allocate */
_leave(" = {%x} [anon]", key_serial(cell->anonymous_key));
return key_get(cell->anonymous_key);
} else {
@@ -408,7 +440,7 @@ int afs_permission(struct mnt_idmap *idmap, struct inode *inode,
if (mask & MAY_NOT_BLOCK) {
key = afs_request_key_rcu(vnode->volume->cell);
- if (IS_ERR(key))
+ if (IS_ERR_OR_NULL(key))
return -ECHILD;
ret = -ECHILD;
--
2.51.0
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter <dan.carpenter(a)linaro.org>
[ Upstream commit 97315e7c901a1de60e8ca9b11e0e96d0f9253e18 ]
This was supposed to pass "onenand" instead of "&onenand" with the
ampersand. Passing a random stack address which will be gone when the
function ends makes no sense. However the good thing is that the pointer
is never used, so this doesn't cause a problem at run time.
Fixes: e23abf4b7743 ("mtd: OneNAND: S5PC110: Implement DMA interrupt method")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/mtd/nand/onenand/onenand_samsung.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/onenand/onenand_samsung.c b/drivers/mtd/nand/onenand/onenand_samsung.c
index b64895573515e..48608632280c5 100644
--- a/drivers/mtd/nand/onenand/onenand_samsung.c
+++ b/drivers/mtd/nand/onenand/onenand_samsung.c
@@ -909,7 +909,7 @@ static int s3c_onenand_probe(struct platform_device *pdev)
err = devm_request_irq(&pdev->dev, r->start,
s5pc110_onenand_irq,
IRQF_SHARED, "onenand",
- &onenand);
+ onenand);
if (err) {
dev_err(&pdev->dev, "failed to get irq\n");
return err;
--
2.51.0
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers <ebiggers(a)kernel.org>
commit 44e8241c51f762aafa50ed116da68fd6ecdcc954 upstream.
On big endian arm kernels, the arm optimized Curve25519 code produces
incorrect outputs and fails the Curve25519 test. This has been true
ever since this code was added.
It seems that hardly anyone (or even no one?) actually uses big endian
arm kernels. But as long as they're ostensibly supported, we should
disable this code on them so that it's not accidentally used.
Note: for future-proofing, use !CPU_BIG_ENDIAN instead of
CPU_LITTLE_ENDIAN. Both of these are arch-specific options that could
get removed in the future if big endian support gets dropped.
Fixes: d8f1308a025f ("crypto: arm/curve25519 - wire up NEON implementation")
Cc: stable(a)vger.kernel.org
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Link: https://lore.kernel.org/r/20251104054906.716914-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/crypto/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 149a5bd6b88c1..d3d318df0e389 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -166,7 +166,7 @@ config CRYPTO_NHPOLY1305_NEON
config CRYPTO_CURVE25519_NEON
tristate "NEON accelerated Curve25519 scalar multiplication library"
- depends on KERNEL_MODE_NEON
+ depends on KERNEL_MODE_NEON && !CPU_BIG_ENDIAN
select CRYPTO_LIB_CURVE25519_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CURVE25519
--
2.51.0
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Acs <acsjakub(a)amazon.de>
[ Upstream commit f04aad36a07cc17b7a5d5b9a2d386ce6fae63e93 ]
syzkaller discovered the following crash: (kernel BUG)
[ 44.607039] ------------[ cut here ]------------
[ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
[ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
[ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
<snip other registers, drop unreliable trace>
[ 44.617726] Call Trace:
[ 44.617926] <TASK>
[ 44.619284] userfaultfd_release+0xef/0x1b0
[ 44.620976] __fput+0x3f9/0xb60
[ 44.621240] fput_close_sync+0x110/0x210
[ 44.622222] __x64_sys_close+0x8f/0x120
[ 44.622530] do_syscall_64+0x5b/0x2f0
[ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 44.623244] RIP: 0033:0x7f365bb3f227
Kernel panics because it detects UFFD inconsistency during
userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
The inconsistency is caused in ksm_madvise(): when user calls madvise()
with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR mode,
it accidentally clears all flags stored in the upper 32 bits of
vma->vm_flags.
Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int and
int are 32-bit wide. This setup causes the following mishap during the &=
~VM_MERGEABLE assignment.
VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
promoted to unsigned long before the & operation. This promotion fills
upper 32 bits with leading 0s, as we're doing unsigned conversion (and
even for a signed conversion, this wouldn't help as the leading bit is 0).
& operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
the upper 32-bits of its value.
Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
BIT() macro.
Note: other VM_* flags are not affected: This only happens to the
VM_MERGEABLE flag, as the other VM_* flags are all constants of type int
and after ~ operation, they end up with leading 1 and are thus converted
to unsigned long with leading 1s.
Note 2:
After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
no longer a kernel BUG, but a WARNING at the same place:
[ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
but the root-cause (flag-drop) remains the same.
[akpm(a)linux-foundation.org: rust bindgen wasn't able to handle BIT(), from Miguel]
Link: https://lore.kernel.org/oe-kbuild-all/202510030449.VfSaAjvd-lkp@intel.com/
Link: https://lkml.kernel.org/r/20251001090353.57523-2-acsjakub@amazon.de
Fixes: 7677f7fd8be7 ("userfaultfd: add minor fault registration mode")
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis(a)gmail.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: SeongJae Park <sj(a)kernel.org>
Tested-by: Alice Ryhl <aliceryhl(a)google.com>
Tested-by: Miguel Ojeda <miguel.ojeda.sandonis(a)gmail.com>
Cc: Xu Xin <xu.xin16(a)zte.com.cn>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[ acsjakub: drop rust-compatibility change (no rust in 5.15) ]
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/linux/mm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3598925561b13..071dd864a7b2b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -310,7 +310,7 @@ extern unsigned int kobjsize(const void *objp);
#define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */
#define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */
#define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */
-#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
+#define VM_MERGEABLE BIT(31) /* KSM may merge identical pages */
#ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
#define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */
--
2.51.0