The quilt patch titled
Subject: mm/ksm: fix exec/fork inheritance support for prctl
has been removed from the -mm tree. Its filename was
mm-ksm-fix-exec-fork-inheritance-support-for-prctl.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: xu xin <xu.xin16(a)zte.com.cn>
Subject: mm/ksm: fix exec/fork inheritance support for prctl
Date: Tue, 7 Oct 2025 18:28:21 +0800 (CST)
Patch series "ksm: fix exec/fork inheritance", v2.
This series fixes exec/fork inheritance. See the detailed description of
the issue below.
This patch (of 2):
Background
==========
commit d7597f59d1d33 ("mm: add new api to enable ksm per process")
introduced MMF_VM_MERGE_ANY for mm->flags, and allowed user to set it by
prctl() so that the process's VMAs are forcibly scanned by ksmd.
Subsequently, the 3c6f33b7273a ("mm/ksm: support fork/exec for prctl")
supported inheriting the MMF_VM_MERGE_ANY flag when a task calls execve().
Finally, commit 3a9e567ca45fb ("mm/ksm: fix ksm exec support for prctl")
fixed the issue that ksmd doesn't scan the mm_struct with MMF_VM_MERGE_ANY
by adding the mm_slot to ksm_mm_head in __bprm_mm_init().
Problem
=======
In some extreme scenarios, however, this inheritance of MMF_VM_MERGE_ANY
during exec/fork can fail. For example, when the scanning frequency of
ksmd is tuned extremely high, a process carrying MMF_VM_MERGE_ANY may
still fail to pass it to the newly exec'd process. This happens because
ksm_execve() is executed too early in the do_execve flow (prematurely
adding the new mm_struct to the ksm_mm_slot list).
As a result, before do_execve completes, ksmd may have already performed a
scan and found that this new mm_struct has no VM_MERGEABLE VMAs, thus
clearing its MMF_VM_MERGE_ANY flag. Consequently, when the new program
executes, the flag MMF_VM_MERGE_ANY inheritance missed.
Root reason
===========
commit d7597f59d1d33 ("mm: add new api to enable ksm per process") clear
the flag MMF_VM_MERGE_ANY when ksmd found no VM_MERGEABLE VMAs.
Solution
========
Firstly, Don't clear MMF_VM_MERGE_ANY when ksmd found no VM_MERGEABLE
VMAs, because perhaps their mm_struct has just been added to ksm_mm_slot
list, and its process has not yet officially started running or has not
yet performed mmap/brk to allocate anonymous VMAS.
Secondly, recheck MMF_VM_MERGEABLE again if a process takes
MMF_VM_MERGE_ANY, and create a mm_slot and join it into ksm_scan_list
again.
Link: https://lkml.kernel.org/r/20251007182504440BJgK8VXRHh8TD7IGSUIY4@zte.com.cn
Link: https://lkml.kernel.org/r/20251007182821572h_SoFqYZXEP1mvWI4n9VL@zte.com.cn
Fixes: 3c6f33b7273a ("mm/ksm: support fork/exec for prctl")
Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process")
Signed-off-by: xu xin <xu.xin16(a)zte.com.cn>
Cc: Stefan Roesch <shr(a)devkernel.io>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jinjiang Tu <tujinjiang(a)huawei.com>
Cc: Wang Yaxin <wang.yaxin(a)zte.com.cn>
Cc: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/ksm.h | 4 ++--
mm/ksm.c | 20 +++++++++++++++++---
2 files changed, 19 insertions(+), 5 deletions(-)
--- a/include/linux/ksm.h~mm-ksm-fix-exec-fork-inheritance-support-for-prctl
+++ a/include/linux/ksm.h
@@ -17,7 +17,7 @@
#ifdef CONFIG_KSM
int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
unsigned long end, int advice, vm_flags_t *vm_flags);
-vm_flags_t ksm_vma_flags(const struct mm_struct *mm, const struct file *file,
+vm_flags_t ksm_vma_flags(struct mm_struct *mm, const struct file *file,
vm_flags_t vm_flags);
int ksm_enable_merge_any(struct mm_struct *mm);
int ksm_disable_merge_any(struct mm_struct *mm);
@@ -103,7 +103,7 @@ bool ksm_process_mergeable(struct mm_str
#else /* !CONFIG_KSM */
-static inline vm_flags_t ksm_vma_flags(const struct mm_struct *mm,
+static inline vm_flags_t ksm_vma_flags(struct mm_struct *mm,
const struct file *file, vm_flags_t vm_flags)
{
return vm_flags;
--- a/mm/ksm.c~mm-ksm-fix-exec-fork-inheritance-support-for-prctl
+++ a/mm/ksm.c
@@ -2712,8 +2712,14 @@ no_vmas:
spin_unlock(&ksm_mmlist_lock);
mm_slot_free(mm_slot_cache, mm_slot);
+ /*
+ * Only clear MMF_VM_MERGEABLE. We must not clear
+ * MMF_VM_MERGE_ANY, because for those MMF_VM_MERGE_ANY process,
+ * perhaps their mm_struct has just been added to ksm_mm_slot
+ * list, and its process has not yet officially started running
+ * or has not yet performed mmap/brk to allocate anonymous VMAS.
+ */
mm_flags_clear(MMF_VM_MERGEABLE, mm);
- mm_flags_clear(MMF_VM_MERGE_ANY, mm);
mmap_read_unlock(mm);
mmdrop(mm);
} else {
@@ -2831,12 +2837,20 @@ static int __ksm_del_vma(struct vm_area_
*
* Returns: @vm_flags possibly updated to mark mergeable.
*/
-vm_flags_t ksm_vma_flags(const struct mm_struct *mm, const struct file *file,
+vm_flags_t ksm_vma_flags(struct mm_struct *mm, const struct file *file,
vm_flags_t vm_flags)
{
if (mm_flags_test(MMF_VM_MERGE_ANY, mm) &&
- __ksm_should_add_vma(file, vm_flags))
+ __ksm_should_add_vma(file, vm_flags)) {
vm_flags |= VM_MERGEABLE;
+ /*
+ * Generally, the flags here always include MMF_VM_MERGEABLE.
+ * However, in rare cases, this flag may be cleared by ksmd who
+ * scans a cycle without finding any mergeable vma.
+ */
+ if (unlikely(!mm_flags_test(MMF_VM_MERGEABLE, mm)))
+ __ksm_enter(mm);
+ }
return vm_flags;
}
_
Patches currently in -mm which might be from xu.xin16(a)zte.com.cn are
commit 44e8241c51f762aafa50ed116da68fd6ecdcc954 upstream.
On big endian arm kernels, the arm optimized Curve25519 code produces
incorrect outputs and fails the Curve25519 test. This has been true
ever since this code was added.
It seems that hardly anyone (or even no one?) actually uses big endian
arm kernels. But as long as they're ostensibly supported, we should
disable this code on them so that it's not accidentally used.
Note: for future-proofing, use !CPU_BIG_ENDIAN instead of
CPU_LITTLE_ENDIAN. Both of these are arch-specific options that could
get removed in the future if big endian support gets dropped.
Fixes: d8f1308a025f ("crypto: arm/curve25519 - wire up NEON implementation")
Cc: stable(a)vger.kernel.org
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Link: https://lore.kernel.org/r/20251104054906.716914-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
---
arch/arm/crypto/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 847b7a003356..1f684e29cff2 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -2,11 +2,11 @@
menu "Accelerated Cryptographic Algorithms for CPU (arm)"
config CRYPTO_CURVE25519_NEON
tristate "Public key crypto: Curve25519 (NEON)"
- depends on KERNEL_MODE_NEON
+ depends on KERNEL_MODE_NEON && !CPU_BIG_ENDIAN
select CRYPTO_LIB_CURVE25519_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CURVE25519
help
Curve25519 algorithm
base-commit: 0a805b6ea8cda0caa268b396a2e5117f3772d849
--
2.51.2
From: Vladimir Oltean <vladimir.oltean(a)nxp.com>
[ Upstream commit 6c24a03a61a245fe34d47582898331fa034b6ccd ]
Alexander Sverdlin presents 2 problems during shutdown with the
lan9303 driver. One is specific to lan9303 and the other just happens
to reproduce there.
The first problem is that lan9303 is unique among DSA drivers in that it
calls dev_get_drvdata() at "arbitrary runtime" (not probe, not shutdown,
not remove):
phy_state_machine()
-> ...
-> dsa_user_phy_read()
-> ds->ops->phy_read()
-> lan9303_phy_read()
-> chip->ops->phy_read()
-> lan9303_mdio_phy_read()
-> dev_get_drvdata()
But we never stop the phy_state_machine(), so it may continue to run
after dsa_switch_shutdown(). Our common pattern in all DSA drivers is
to set drvdata to NULL to suppress the remove() method that may come
afterwards. But in this case it will result in an NPD.
The second problem is that the way in which we set
dp->master->dsa_ptr = NULL; is concurrent with receive packet
processing. dsa_switch_rcv() checks once whether dev->dsa_ptr is NULL,
but afterwards, rather than continuing to use that non-NULL value,
dev->dsa_ptr is dereferenced again and again without NULL checks:
dsa_master_find_slave() and many other places. In between dereferences,
there is no locking to ensure that what was valid once continues to be
valid.
Both problems have the common aspect that closing the master interface
solves them.
In the first case, dev_close(master) triggers the NETDEV_GOING_DOWN
event in dsa_slave_netdevice_event() which closes slave ports as well.
dsa_port_disable_rt() calls phylink_stop(), which synchronously stops
the phylink state machine, and ds->ops->phy_read() will thus no longer
call into the driver after this point.
In the second case, dev_close(master) should do this, as per
Documentation/networking/driver.rst:
| Quiescence
| ----------
|
| After the ndo_stop routine has been called, the hardware must
| not receive or transmit any data. All in flight packets must
| be aborted. If necessary, poll or wait for completion of
| any reset commands.
So it should be sufficient to ensure that later, when we zeroize
master->dsa_ptr, there will be no concurrent dsa_switch_rcv() call
on this master.
The addition of the netif_device_detach() function is to ensure that
ioctls, rtnetlinks and ethtool requests on the slave ports no longer
propagate down to the driver - we're no longer prepared to handle them.
The race condition actually did not exist when commit 0650bf52b31f
("net: dsa: be compatible with masters which unregister on shutdown")
first introduced dsa_switch_shutdown(). It was created later, when we
stopped unregistering the slave interfaces from a bad spot, and we just
replaced that sequence with a racy zeroization of master->dsa_ptr
(one which doesn't ensure that the interfaces aren't up).
Reported-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
Closes: https://lore.kernel.org/netdev/2d2e3bba17203c14a5ffdabc174e3b6bbb9ad438.cam…
Closes: https://lore.kernel.org/netdev/c1bf4de54e829111e0e4a70e7bd1cf523c9550ff.cam…
Fixes: ee534378f005 ("net: dsa: fix panic when DSA master device unbinds on shutdown")
Reviewed-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Link: https://patch.msgid.link/20240913203549.3081071-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
[ Modification: Using dp->master and dp->slave instead of dp->conduit and dp->user ]
Signed-off-by: Rajani Kantha <681739313(a)139.com>
---
net/dsa/dsa.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 07736edc8b6a..c9bf1a9a6c99 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -1613,6 +1613,7 @@ EXPORT_SYMBOL_GPL(dsa_unregister_switch);
void dsa_switch_shutdown(struct dsa_switch *ds)
{
struct net_device *master, *slave_dev;
+ LIST_HEAD(close_list);
struct dsa_port *dp;
mutex_lock(&dsa2_mutex);
@@ -1622,10 +1623,16 @@ void dsa_switch_shutdown(struct dsa_switch *ds)
rtnl_lock();
+ dsa_switch_for_each_cpu_port(dp, ds)
+ list_add(&dp->master->close_list, &close_list);
+
+ dev_close_many(&close_list, true);
+
dsa_switch_for_each_user_port(dp, ds) {
master = dsa_port_to_master(dp);
slave_dev = dp->slave;
+ netif_device_detach(slave_dev);
netdev_upper_dev_unlink(master, slave_dev);
}
--
2.17.1
In hi3670_pcie_get_resources_from_pcie(), the reference obtained via
bus_find_device_by_of_node() is never released with put_device(). Each
call to this function increments the reference count of the PCIe
device without decrementing it, preventing proper device cleanup. And
the device node obtained via of_get_child_by_name() is never released
with of_node_put(). This could cause a leak of the node reference.
Add proper resource cleanup using goto labels to ensure all acquired
references are released before function return, regardless of the exit
path.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 73075011ffff ("phy: HiSilicon: Add driver for Kirin 970 PCIe PHY")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/phy/hisilicon/phy-hi3670-pcie.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/phy/hisilicon/phy-hi3670-pcie.c b/drivers/phy/hisilicon/phy-hi3670-pcie.c
index dbc7dcce682b..4c67a7613261 100644
--- a/drivers/phy/hisilicon/phy-hi3670-pcie.c
+++ b/drivers/phy/hisilicon/phy-hi3670-pcie.c
@@ -560,7 +560,8 @@ static int hi3670_pcie_get_resources_from_pcie(struct hi3670_pcie_phy *phy)
{
struct device_node *pcie_port;
struct device *dev = phy->dev;
- struct device *pcie_dev;
+ struct device *pcie_dev = NULL;
+ int ret = 0;
pcie_port = of_get_child_by_name(dev->parent->of_node, "pcie");
if (!pcie_port) {
@@ -572,7 +573,8 @@ static int hi3670_pcie_get_resources_from_pcie(struct hi3670_pcie_phy *phy)
pcie_dev = bus_find_device_by_of_node(&platform_bus_type, pcie_port);
if (!pcie_dev) {
dev_err(dev, "Didn't find pcie device\n");
- return -ENODEV;
+ ret = -ENODEV;
+ goto out_put_node;
}
/*
@@ -586,9 +588,14 @@ static int hi3670_pcie_get_resources_from_pcie(struct hi3670_pcie_phy *phy)
phy->apb = dev_get_regmap(pcie_dev, "kirin_pcie_apb");
if (!phy->apb) {
dev_err(dev, "Failed to get APB regmap\n");
- return -ENODEV;
+ ret = -ENODEV;
+ goto out_put_device;
}
+out_put_device:
+ put_device(pcie_dev);
+out_put_node:
+ of_node_put(pcie_port);
return 0;
}
--
2.17.1
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 845ed7e04d9ae0146d5e003a5defd90eb95535fc
Gitweb: https://git.kernel.org/tip/845ed7e04d9ae0146d5e003a5defd90eb95535fc
Author: Yazen Ghannam <yazen.ghannam(a)amd.com>
AuthorDate: Tue, 11 Nov 2025 14:53:57
Committer: Borislav Petkov (AMD) <bp(a)alien8.de>
CommitterDate: Sun, 16 Nov 2025 21:00:02 +01:00
x86/acpi/boot: Correct acpi_is_processor_usable() check again
ACPI v6.3 defined a new "Online Capable" MADT LAPIC flag. This bit is
used in conjunction with the "Enabled" MADT LAPIC flag to determine if
a CPU can be enabled/hotplugged by the OS after boot.
Before the new bit was defined, the "Enabled" bit was explicitly
described like this (ACPI v6.0 wording provided):
"If zero, this processor is unusable, and the operating system
support will not attempt to use it"
This means that CPU hotplug (based on MADT) is not possible. Many BIOS
implementations follow this guidance. They may include LAPIC entries in
MADT for unavailable CPUs, but since these entries are marked with
"Enabled=0" it is expected that the OS will completely ignore these
entries.
However, QEMU will do the same (include entries with "Enabled=0") for
the purpose of allowing CPU hotplug within the guest.
Comment from QEMU function pc_madt_cpu_entry():
/* ACPI spec says that LAPIC entry for non present
* CPU may be omitted from MADT or it must be marked
* as disabled. However omitting non present CPU from
* MADT breaks hotplug on linux. So possible CPUs
* should be put in MADT but kept disabled.
*/
Recent Linux topology changes broke the QEMU use case. A following fix
for the QEMU use case broke bare metal topology enumeration.
Rework the Linux MADT LAPIC flags check to allow the QEMU use case only
for guests and to maintain the ACPI spec behavior for bare metal.
Remove an unnecessary check added to fix a bare metal case introduced by
the QEMU "fix".
[ bp: Change logic as Michal suggested. ]
Fixes: fed8d8773b8e ("x86/acpi/boot: Correct acpi_is_processor_usable() check")
Fixes: f0551af02130 ("x86/topology: Ignore non-present APIC IDs in a present package")
Closes: https://lore.kernel.org/r/20251024204658.3da9bf3f.michal.pecio@gmail.com
Reported-by: Michal Pecio <michal.pecio(a)gmail.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam(a)amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Tested-by: Michal Pecio <michal.pecio(a)gmail.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/20251111145357.4031846-1-yazen.ghannam@amd.com
---
arch/x86/kernel/acpi/boot.c | 12 ++++++++----
arch/x86/kernel/cpu/topology.c | 15 ---------------
2 files changed, 8 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9fa321a..d6138b2 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -35,6 +35,7 @@
#include <asm/smp.h>
#include <asm/i8259.h>
#include <asm/setup.h>
+#include <asm/hypervisor.h>
#include "sleep.h" /* To include x86_acpi_suspend_lowlevel */
static int __initdata acpi_force = 0;
@@ -164,11 +165,14 @@ static bool __init acpi_is_processor_usable(u32 lapic_flags)
if (lapic_flags & ACPI_MADT_ENABLED)
return true;
- if (!acpi_support_online_capable ||
- (lapic_flags & ACPI_MADT_ONLINE_CAPABLE))
- return true;
+ if (acpi_support_online_capable)
+ return lapic_flags & ACPI_MADT_ONLINE_CAPABLE;
- return false;
+ /*
+ * QEMU expects legacy "Enabled=0" LAPIC entries to be counted as usable
+ * in order to support CPU hotplug in guests.
+ */
+ return !hypervisor_is_type(X86_HYPER_NATIVE);
}
static int __init
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 6073a16..425404e 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -27,7 +27,6 @@
#include <xen/xen.h>
#include <asm/apic.h>
-#include <asm/hypervisor.h>
#include <asm/io_apic.h>
#include <asm/mpspec.h>
#include <asm/msr.h>
@@ -240,20 +239,6 @@ static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
cpuid_to_apicid[cpu] = apic_id;
topo_set_cpuids(cpu, apic_id, acpi_id);
} else {
- u32 pkgid = topo_apicid(apic_id, TOPO_PKG_DOMAIN);
-
- /*
- * Check for present APICs in the same package when running
- * on bare metal. Allow the bogosity in a guest.
- */
- if (hypervisor_is_type(X86_HYPER_NATIVE) &&
- topo_unit_count(pkgid, TOPO_PKG_DOMAIN, phys_cpu_present_map)) {
- pr_info_once("Ignoring hot-pluggable APIC ID %x in present package.\n",
- apic_id);
- topo_info.nr_rejected_cpus++;
- return;
- }
-
topo_info.nr_disabled_cpus++;
}
From: Paolo Abeni <pabeni(a)redhat.com>
[ Upstream commit 14ad6ed30a10afbe91b0749d6378285f4225d482 ]
Sabrina reported the following splat:
WARNING: CPU: 0 PID: 1 at net/core/dev.c:6935 netif_napi_add_weight_locked+0x8f2/0xba0
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-net-00092-g011b03359038 #996
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
RIP: 0010:netif_napi_add_weight_locked+0x8f2/0xba0
Code: e8 c3 e6 6a fe 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc c7 44 24 10 ff ff ff ff e9 8f fb ff ff e8 9e e6 6a fe <0f> 0b e9 d3 fe ff ff e8 92 e6 6a fe 48 8b 04 24 be ff ff ff ff 48
RSP: 0000:ffffc9000001fc60 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88806ce48128 RCX: 1ffff11001664b9e
RDX: ffff888008f00040 RSI: ffffffff8317ca42 RDI: ffff88800b325cb6
RBP: ffff88800b325c40 R08: 0000000000000001 R09: ffffed100167502c
R10: ffff88800b3a8163 R11: 0000000000000000 R12: ffff88800ac1c168
R13: ffff88800ac1c168 R14: ffff88800ac1c168 R15: 0000000000000007
FS: 0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888008201000 CR3: 0000000004c94001 CR4: 0000000000370ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
gro_cells_init+0x1ba/0x270
xfrm_input_init+0x4b/0x2a0
xfrm_init+0x38/0x50
ip_rt_init+0x2d7/0x350
ip_init+0xf/0x20
inet_init+0x406/0x590
do_one_initcall+0x9d/0x2e0
do_initcalls+0x23b/0x280
kernel_init_freeable+0x445/0x490
kernel_init+0x20/0x1d0
ret_from_fork+0x46/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
irq event stamp: 584330
hardirqs last enabled at (584338): [<ffffffff8168bf87>] __up_console_sem+0x77/0xb0
hardirqs last disabled at (584345): [<ffffffff8168bf6c>] __up_console_sem+0x5c/0xb0
softirqs last enabled at (583242): [<ffffffff833ee96d>] netlink_insert+0x14d/0x470
softirqs last disabled at (583754): [<ffffffff8317c8cd>] netif_napi_add_weight_locked+0x77d/0xba0
on kernel built with MAX_SKB_FRAGS=45, where SKB_WITH_OVERHEAD(1024)
is smaller than GRO_MAX_HEAD.
Such built additionally contains the revert of the single page frag cache
so that napi_get_frags() ends up using the page frag allocator, triggering
the splat.
Note that the underlying issue is independent from the mentioned
revert; address it ensuring that the small head cache will fit either TCP
and GRO allocation and updating napi_alloc_skb() and __netdev_alloc_skb()
to select kmalloc() usage for any allocation fitting such cache.
Reported-by: Sabrina Dubroca <sd(a)queasysnail.net>
Suggested-by: Eric Dumazet <edumazet(a)google.com>
Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
[ Minor context change fixed. ]
Signed-off-by: Wenshan Lan <jetlan9(a)163.com>
---
include/net/gro.h | 3 +++
net/core/gro.c | 3 ---
net/core/skbuff.c | 10 +++++++---
3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/include/net/gro.h b/include/net/gro.h
index 018343254c90..9260ed367c91 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -10,6 +10,9 @@
#include <linux/skbuff.h>
#include <net/udp.h>
+/* This should be increased if a protocol with a bigger head is added. */
+#define GRO_MAX_HEAD (MAX_HEADER + 128)
+
struct napi_gro_cb {
union {
struct {
diff --git a/net/core/gro.c b/net/core/gro.c
index 397cf5984250..b8cc44406e69 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -6,9 +6,6 @@
#define MAX_GRO_SKBS 8
-/* This should be increased if a protocol with a bigger head is added. */
-#define GRO_MAX_HEAD (MAX_HEADER + 128)
-
static DEFINE_SPINLOCK(offload_lock);
struct list_head offload_base __read_mostly = LIST_HEAD_INIT(offload_base);
/* Maximum number of GRO_NORMAL skbs to batch up for list-RX */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 867832f8bbae..073e2c527407 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -67,6 +67,7 @@
#include <net/dst.h>
#include <net/sock.h>
#include <net/checksum.h>
+#include <net/gro.h>
#include <net/gso.h>
#include <net/ip6_checksum.h>
#include <net/xfrm.h>
@@ -96,7 +97,9 @@ static struct kmem_cache *skbuff_ext_cache __ro_after_init;
static struct kmem_cache *skb_small_head_cache __ro_after_init;
-#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER)
+#define GRO_MAX_HEAD_PAD (GRO_MAX_HEAD + NET_SKB_PAD + NET_IP_ALIGN)
+#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(max(MAX_TCP_HEADER, \
+ GRO_MAX_HEAD_PAD))
/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two.
* This should ensure that SKB_SMALL_HEAD_HEADROOM is a unique
@@ -708,7 +711,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
/* If requested length is either too small or too big,
* we use kmalloc() for skb->head allocation.
*/
- if (len <= SKB_WITH_OVERHEAD(1024) ||
+ if (len <= SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE) ||
len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
(gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
@@ -785,7 +788,8 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
* When the small frag allocator is available, prefer it over kmalloc
* for small fragments
*/
- if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) ||
+ if ((!NAPI_HAS_SMALL_PAGE_FRAG &&
+ len <= SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)) ||
len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
(gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX | SKB_ALLOC_NAPI,
--
2.43.0
Reading the interrupt register `SUN4I_REG_INT_ADDR` causes all of its bits
to be reset. If we ever reach the condition of handling more than
`SUN4I_CAN_MAX_IRQ` IRQs, we will have read the register and reset all its
bits but without actually handling the interrupt inside of the loop body.
This may, among other issues, cause us to never `netif_wake_queue()` again
after a transmission interrupt.
Fixes: 0738eff14d81 ("can: Allwinner A10/A20 CAN Controller support - Kernel module")
Cc: stable(a)vger.kernel.org
Co-developed-by: Thomas Mühlbacher <tmuehlbacher(a)posteo.net>
Signed-off-by: Thomas Mühlbacher <tmuehlbacher(a)posteo.net>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
I've ported the fix from the sja1000 driver to the sun4i_can, which based
on the sja1000 driver.
---
drivers/net/can/sun4i_can.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/sun4i_can.c b/drivers/net/can/sun4i_can.c
index 53bfd873de9b..0a7ba0942839 100644
--- a/drivers/net/can/sun4i_can.c
+++ b/drivers/net/can/sun4i_can.c
@@ -657,8 +657,8 @@ static irqreturn_t sun4i_can_interrupt(int irq, void *dev_id)
u8 isrc, status;
int n = 0;
- while ((isrc = readl(priv->base + SUN4I_REG_INT_ADDR)) &&
- (n < SUN4I_CAN_MAX_IRQ)) {
+ while ((n < SUN4I_CAN_MAX_IRQ) &&
+ (isrc = readl(priv->base + SUN4I_REG_INT_ADDR))) {
n++;
status = readl(priv->base + SUN4I_REG_STA_ADDR);
---
base-commit: 5442a9da69789741bfda39f34ee7f69552bf0c56
change-id: 20251116-sun4i-fix-loop-f265621b6a99
Best regards,
--
Marc Kleine-Budde <mkl(a)pengutronix.de>