From: Antoine Tenart <atenart(a)kernel.org>
[ Upstream commit 3a0a3ff6593d670af2451ec363ccb7b18aec0c0a ]
Upstream fix ac888d58869b ("net: do not delay dst_entries_add() in
dst_release()") moved decrementing the dst count from dst_destroy to
dst_release to avoid accessing already freed data in case of netns
dismantle. However in case CONFIG_DST_CACHE is enabled and OvS+tunnels
are used, this fix is incomplete as the same issue will be seen for
cached dsts:
Unable to handle kernel paging request at virtual address ffff5aabf6b5c000
Call trace:
percpu_counter_add_batch+0x3c/0x160 (P)
dst_release+0xec/0x108
dst_cache_destroy+0x68/0xd8
dst_destroy+0x13c/0x168
dst_destroy_rcu+0x1c/0xb0
rcu_do_batch+0x18c/0x7d0
rcu_core+0x174/0x378
rcu_core_si+0x18/0x30
Fix this by invalidating the cache, and thus decrementing cached dst
counters, in dst_release too.
Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel")
Signed-off-by: Antoine Tenart <atenart(a)kernel.org>
Link: https://patch.msgid.link/20250326173634.31096-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
[Minor conflict resolved due to code context change.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
net/core/dst.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/core/dst.c b/net/core/dst.c
index 6d74b4663085..c1ea331c4bfd 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -173,6 +173,14 @@ void dst_release(struct dst_entry *dst)
net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
__func__, dst, newrefcnt);
if (!newrefcnt){
+#ifdef CONFIG_DST_CACHE
+ if (dst->flags & DST_METADATA) {
+ struct metadata_dst *md_dst = (struct metadata_dst *)dst;
+
+ if (md_dst->type == METADATA_IP_TUNNEL)
+ dst_cache_reset_now(&md_dst->u.tun_info.dst_cache);
+ }
+#endif
dst_count_dec(dst);
call_rcu(&dst->rcu_head, dst_destroy_rcu);
}
--
2.34.1
Hi
In Debian Roland Clobus reported a regression with setting up loop
devices from a backing squashfs file lying on read-only mounted target
directory from a iso.
The original report is at:
https://bugs.debian.org/1106070
Quoting the report:
On Mon, May 19, 2025 at 12:15:10PM +0200, Roland Clobus wrote:
> Package: linux-image-6.12.29-amd64
> Version: 6.12.29-1
> Severity: important
> X-Debbugs-Cc: debian-amd64(a)lists.debian.org
> User: debian-amd64(a)lists.debian.org
> Usertags: amd64
> X-Debbugs-Cc: phil(a)hands.com
> User: debian-qa(a)lists.debian.org
> Usertags: openqa
> X-Debbugs-Cc: debian-boot
>
> Hello maintainers of the kernel,
>
> The new kernel (6.12.29) has a modified behaviour (compared to 6.12.27) for
> the loop device.
>
> This causes the Debian live images (for sid) to fail to boot.
>
> The change happened between 20250518T201633Z and 20250519T021902Z, which
> matches the upload of 6.12.29 (https://tracker.debian.org/news/1646619/accepted-linux-signed-amd64-612291-…)
> at 20250518T230426Z.
>
> To reproduce:
> * Download the daily live image from https://openqa.debian.net/tests/396941/asset/iso/smallest-build_sid_2025051…
> * Boot into the live image (the first boot option)
> * Result: an initramfs shell (instead of a live system) -> FAIL
> * Try: `losetup -r /dev/loop1 /run/live/medium/live/filesystem.squashfs`
> * Result: `failed to set up loop device: invalid argument` -> FAIL
> * Try: `cp /run/live/medium/live/filesystem.squashfs /`
> * Try: `losetup -r /dev/loop2 /filesystem.squashfs`
> * Result: `loop2: detected capacity change from 0 to 1460312` -> PASS
>
> It appears that the loopback device cannot be used any more with the mount
> /run/live/medium (which is on /dev/sr0).
>
> I've verified: the md5sum of the squashfs file is OK.
>
> The newer kernel is not in trixie yet.
>
> With kind regards,
> Roland Clobus
A short reproducer is as follows:
iso="netinst.iso"
url="https://openqa.debian.net/tests/396941/asset/iso/smallest-build_sid_2025051…"
if [ ! -e "${iso}" ]; then
wget "${url}" -O "${iso}"
fi
mountdir="$(mktemp -d)"
mount -v "./${iso}" "${mountdir}"
losetup -v -r -f "${mountdir}/live/filesystem.squashfs"
loosetup -l
resulting in:
mount: /tmp/tmp.HgbNe7ek3h: WARNING: source write-protected, mounted read-only.
mount: /dev/loop0 mounted on /tmp/tmp.HgbNe7ek3h.
losetup: /tmp/tmp.HgbNe7ek3h/live/filesystem.squashfs: failed to set up loop device: Invalid argument
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC
/dev/loop0 0 0 1 0 /root/netinst.iso 0 512
Reverting 184b147b9f7f ("loop: Add sanity check for read/write_iter")
on top of 6.12.29 fixes the issue:
mount: /tmp/tmp.ACkkdCdYvB: WARNING: source write-protected, mounted read-only.
mount: /dev/loop0 mounted on /tmp/tmp.ACkkdCdYvB.
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC
/dev/loop1 0 0 0 1 /tmp/tmp.ACkkdCdYvB/live/filesystem.squashfs 0 512
/dev/loop0 0 0 1 0 /root/netinst.iso 0 512
For completeness, netinst.iso is a iso9660 fstype with mount options
"ro,relatime,nojoliet,check=s,map=n,blocksize=2048,iocharset=utf8".
#regzbot introduced: 184b147b9f7f
#regzbot link: https://bugs.debian.org/1106070
Regards,
Salvatore
From: Daniel Gomez <da.gomez(a)samsung.com>
[ Upstream commit a26fe287eed112b4e21e854f173c8918a6a8596d ]
The scripts/kconfig/merge_config.sh script requires an existing
$INITFILE (or the $1 argument) as a base file for merging Kconfig
fragments. However, an empty $INITFILE can serve as an initial starting
point, later referenced by the KCONFIG_ALLCONFIG Makefile variable
if -m is not used. This variable can point to any configuration file
containing preset config symbols (the merged output) as stated in
Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will
contain just the merge output requiring the user to run make (i.e.
KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make
olddefconfig).
Instead of failing when `$INITFILE` is missing, create an empty file and
use it as the starting point for merges.
Signed-off-by: Daniel Gomez <da.gomez(a)samsung.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
scripts/kconfig/merge_config.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
index 0b7952471c18f..79c09b378be81 100755
--- a/scripts/kconfig/merge_config.sh
+++ b/scripts/kconfig/merge_config.sh
@@ -112,8 +112,8 @@ INITFILE=$1
shift;
if [ ! -r "$INITFILE" ]; then
- echo "The base file '$INITFILE' does not exist. Exit." >&2
- exit 1
+ echo "The base file '$INITFILE' does not exist. Creating one..." >&2
+ touch "$INITFILE"
fi
MERGE_LIST=$*
--
2.39.5
From: Daniel Gomez <da.gomez(a)samsung.com>
[ Upstream commit a26fe287eed112b4e21e854f173c8918a6a8596d ]
The scripts/kconfig/merge_config.sh script requires an existing
$INITFILE (or the $1 argument) as a base file for merging Kconfig
fragments. However, an empty $INITFILE can serve as an initial starting
point, later referenced by the KCONFIG_ALLCONFIG Makefile variable
if -m is not used. This variable can point to any configuration file
containing preset config symbols (the merged output) as stated in
Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will
contain just the merge output requiring the user to run make (i.e.
KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make
olddefconfig).
Instead of failing when `$INITFILE` is missing, create an empty file and
use it as the starting point for merges.
Signed-off-by: Daniel Gomez <da.gomez(a)samsung.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
scripts/kconfig/merge_config.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
index 72da3b8d6f307..151f9938abaa7 100755
--- a/scripts/kconfig/merge_config.sh
+++ b/scripts/kconfig/merge_config.sh
@@ -105,8 +105,8 @@ INITFILE=$1
shift;
if [ ! -r "$INITFILE" ]; then
- echo "The base file '$INITFILE' does not exist. Exit." >&2
- exit 1
+ echo "The base file '$INITFILE' does not exist. Creating one..." >&2
+ touch "$INITFILE"
fi
MERGE_LIST=$*
--
2.39.5
From: Daniel Gomez <da.gomez(a)samsung.com>
[ Upstream commit a26fe287eed112b4e21e854f173c8918a6a8596d ]
The scripts/kconfig/merge_config.sh script requires an existing
$INITFILE (or the $1 argument) as a base file for merging Kconfig
fragments. However, an empty $INITFILE can serve as an initial starting
point, later referenced by the KCONFIG_ALLCONFIG Makefile variable
if -m is not used. This variable can point to any configuration file
containing preset config symbols (the merged output) as stated in
Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will
contain just the merge output requiring the user to run make (i.e.
KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make
olddefconfig).
Instead of failing when `$INITFILE` is missing, create an empty file and
use it as the starting point for merges.
Signed-off-by: Daniel Gomez <da.gomez(a)samsung.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
scripts/kconfig/merge_config.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
index d7d5c58b8b6aa..557f37f481fdf 100755
--- a/scripts/kconfig/merge_config.sh
+++ b/scripts/kconfig/merge_config.sh
@@ -98,8 +98,8 @@ INITFILE=$1
shift;
if [ ! -r "$INITFILE" ]; then
- echo "The base file '$INITFILE' does not exist. Exit." >&2
- exit 1
+ echo "The base file '$INITFILE' does not exist. Creating one..." >&2
+ touch "$INITFILE"
fi
MERGE_LIST=$*
--
2.39.5
blk_mq_freeze_queue() never terminates if one or more bios are on the plug
list and if the block device driver defines a .submit_bio() method.
This is the case for device mapper drivers. The deadlock happens because
blk_mq_freeze_queue() waits for q_usage_counter to drop to zero, because
a queue reference is held by bios on the plug list and because the
__bio_queue_enter() call in __submit_bio() waits for the queue to be
unfrozen.
This patch fixes the following deadlock:
Workqueue: dm-51_zwplugs blk_zone_wplug_bio_work
Call trace:
__schedule+0xb08/0x1160
schedule+0x48/0xc8
__bio_queue_enter+0xcc/0x1d0
__submit_bio+0x100/0x1b0
submit_bio_noacct_nocheck+0x230/0x49c
blk_zone_wplug_bio_work+0x168/0x250
process_one_work+0x26c/0x65c
worker_thread+0x33c/0x498
kthread+0x110/0x134
ret_from_fork+0x10/0x20
Call trace:
__switch_to+0x230/0x410
__schedule+0xb08/0x1160
schedule+0x48/0xc8
blk_mq_freeze_queue_wait+0x78/0xb8
blk_mq_freeze_queue+0x90/0xa4
queue_attr_store+0x7c/0xf0
sysfs_kf_write+0x98/0xc8
kernfs_fop_write_iter+0x12c/0x1d4
vfs_write+0x340/0x3ac
ksys_write+0x78/0xe8
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Damien Le Moal <dlemoal(a)kernel.org>
Cc: Yu Kuai <yukuai1(a)huaweicloud.com>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
---
block/blk-core.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 4b728fa1c138..e961896a8717 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -621,6 +621,13 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
return BLK_STS_OK;
}
+/*
+ * Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been
+ * set because this causes blk_mq_freeze_queue() to deadlock if
+ * blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios
+ * on the plug list is not necessary since a q_usage_counter reference is held
+ * while a bio is on the plug list.
+ */
static void __submit_bio(struct bio *bio)
{
/* If plug is not used, add new plug here to cache nsecs time. */
@@ -633,7 +640,8 @@ static void __submit_bio(struct bio *bio)
if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
blk_mq_submit_bio(bio);
- } else if (likely(bio_queue_enter(bio) == 0)) {
+ } else if (likely(bio_zone_write_plugging(bio) ||
+ bio_queue_enter(bio) == 0)) {
struct gendisk *disk = bio->bi_bdev->bd_disk;
if ((bio->bi_opf & REQ_POLLED) &&
@@ -643,7 +651,8 @@ static void __submit_bio(struct bio *bio)
} else {
disk->fops->submit_bio(bio);
}
- blk_queue_exit(disk->queue);
+ if (!bio_zone_write_plugging(bio))
+ blk_queue_exit(disk->queue);
}
blk_finish_plug(&plug);
There exists the following error when building perf tools on LoongArch:
CC util/syscalltbl.o
In file included from util/syscalltbl.c:16:
tools/perf/arch/loongarch/include/syscall_table.h:2:10: fatal error: asm/syscall_table_64.h: No such file or directory
2 | #include <asm/syscall_table_64.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
This is because the generated syscall header is syscalls_64.h rather
than syscall_table_64.h. The above problem was introduced from v6.14,
then the header syscall_table.h has been removed from mainline tree
in commit af472d3c4454 ("perf syscalltbl: Remove syscall_table.h"),
just fix it only for the linux-6.14.y branch of stable tree.
By the way, no need to fix the mainline tree and there is no upstream
git id for this patch.
How to reproduce:
git clone https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
cd linux && git checkout origin/linux-6.14.y
make JOBS=1 -C tools/perf
Fixes: fa70857a27e5 ("perf tools loongarch: Use syscall table")
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
---
tools/perf/arch/loongarch/include/syscall_table.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/arch/loongarch/include/syscall_table.h b/tools/perf/arch/loongarch/include/syscall_table.h
index 9d0646d3455c..b53e31c15805 100644
--- a/tools/perf/arch/loongarch/include/syscall_table.h
+++ b/tools/perf/arch/loongarch/include/syscall_table.h
@@ -1,2 +1,2 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#include <asm/syscall_table_64.h>
+#include <asm/syscalls_64.h>
--
2.42.0
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x fefc075182275057ce607effaa3daa9e6e3bdc73
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051945-yiddish-xerox-03f5@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fefc075182275057ce607effaa3daa9e6e3bdc73 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Date: Tue, 6 May 2025 16:32:07 +0300
Subject: [PATCH] mm/page_alloc: fix race condition in unaccepted memory
handling
The page allocator tracks the number of zones that have unaccepted memory
using static_branch_enc/dec() and uses that static branch in hot paths to
determine if it needs to deal with unaccepted memory.
Borislav and Thomas pointed out that the tracking is racy: operations on
static_branch are not serialized against adding/removing unaccepted pages
to/from the zone.
Sanity checks inside static_branch machinery detects it:
WARNING: CPU: 0 PID: 10 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x8e/0xa0
The comment around the WARN() explains the problem:
/*
* Warn about the '-1' case though; since that means a
* decrement is concurrent with a first (0->1) increment. IOW
* people are trying to disable something that wasn't yet fully
* enabled. This suggests an ordering problem on the user side.
*/
The effect of this static_branch optimization is only visible on
microbenchmark.
Instead of adding more complexity around it, remove it altogether.
Link: https://lkml.kernel.org/r/20250506133207.1009676-1-kirill.shutemov@linux.in…
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Link: https://lore.kernel.org/all/20250506092445.GBaBnVXXyvnazly6iF@fat_crate.loc…
Reported-by: Borislav Petkov <bp(a)alien8.de>
Tested-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reported-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Brendan Jackman <jackmanb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/internal.h b/mm/internal.h
index 25a29872c634..5c7a2b43ad76 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1590,7 +1590,6 @@ unsigned long move_page_tables(struct pagetable_move_control *pmc);
#ifdef CONFIG_UNACCEPTED_MEMORY
void accept_page(struct page *page);
-void unaccepted_cleanup_work(struct work_struct *work);
#else /* CONFIG_UNACCEPTED_MEMORY */
static inline void accept_page(struct page *page)
{
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 327764ca0ee4..eedce9321e13 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1441,7 +1441,6 @@ static void __meminit zone_init_free_lists(struct zone *zone)
#ifdef CONFIG_UNACCEPTED_MEMORY
INIT_LIST_HEAD(&zone->unaccepted_pages);
- INIT_WORK(&zone->unaccepted_cleanup, unaccepted_cleanup_work);
#endif
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7248e300d36e..8258349e49ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7172,16 +7172,8 @@ bool has_managed_dma(void)
#ifdef CONFIG_UNACCEPTED_MEMORY
-/* Counts number of zones with unaccepted pages. */
-static DEFINE_STATIC_KEY_FALSE(zones_with_unaccepted_pages);
-
static bool lazy_accept = true;
-void unaccepted_cleanup_work(struct work_struct *work)
-{
- static_branch_dec(&zones_with_unaccepted_pages);
-}
-
static int __init accept_memory_parse(char *p)
{
if (!strcmp(p, "lazy")) {
@@ -7206,11 +7198,7 @@ static bool page_contains_unaccepted(struct page *page, unsigned int order)
static void __accept_page(struct zone *zone, unsigned long *flags,
struct page *page)
{
- bool last;
-
list_del(&page->lru);
- last = list_empty(&zone->unaccepted_pages);
-
account_freepages(zone, -MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE);
__mod_zone_page_state(zone, NR_UNACCEPTED, -MAX_ORDER_NR_PAGES);
__ClearPageUnaccepted(page);
@@ -7219,28 +7207,6 @@ static void __accept_page(struct zone *zone, unsigned long *flags,
accept_memory(page_to_phys(page), PAGE_SIZE << MAX_PAGE_ORDER);
__free_pages_ok(page, MAX_PAGE_ORDER, FPI_TO_TAIL);
-
- if (last) {
- /*
- * There are two corner cases:
- *
- * - If allocation occurs during the CPU bring up,
- * static_branch_dec() cannot be used directly as
- * it causes a deadlock on cpu_hotplug_lock.
- *
- * Instead, use schedule_work() to prevent deadlock.
- *
- * - If allocation occurs before workqueues are initialized,
- * static_branch_dec() should be called directly.
- *
- * Workqueues are initialized before CPU bring up, so this
- * will not conflict with the first scenario.
- */
- if (system_wq)
- schedule_work(&zone->unaccepted_cleanup);
- else
- unaccepted_cleanup_work(&zone->unaccepted_cleanup);
- }
}
void accept_page(struct page *page)
@@ -7277,20 +7243,12 @@ static bool try_to_accept_memory_one(struct zone *zone)
return true;
}
-static inline bool has_unaccepted_memory(void)
-{
- return static_branch_unlikely(&zones_with_unaccepted_pages);
-}
-
static bool cond_accept_memory(struct zone *zone, unsigned int order,
int alloc_flags)
{
long to_accept, wmark;
bool ret = false;
- if (!has_unaccepted_memory())
- return false;
-
if (list_empty(&zone->unaccepted_pages))
return false;
@@ -7328,22 +7286,17 @@ static bool __free_unaccepted(struct page *page)
{
struct zone *zone = page_zone(page);
unsigned long flags;
- bool first = false;
if (!lazy_accept)
return false;
spin_lock_irqsave(&zone->lock, flags);
- first = list_empty(&zone->unaccepted_pages);
list_add_tail(&page->lru, &zone->unaccepted_pages);
account_freepages(zone, MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE);
__mod_zone_page_state(zone, NR_UNACCEPTED, MAX_ORDER_NR_PAGES);
__SetPageUnaccepted(page);
spin_unlock_irqrestore(&zone->lock, flags);
- if (first)
- static_branch_inc(&zones_with_unaccepted_pages);
-
return true;
}
Hi Greg, Sasha,
This batch contains a backport fix for 5.10 -stable.
The following list shows the backported patches, I am using original commit
IDs for reference:
1) 8965d42bcf54 ("netfilter: nf_tables: pass nft_chain to destroy function, not nft_ctx")
This is a stable dependency for the next patch.
2) c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal")
3) b04df3da1b5c ("netfilter: nf_tables: do not defer rule destruction via call_rcu")
This is a fix-for-fix for patch 2.
These three patches are required to fix the netdevice release path for
netdev family basechains.
Please, apply,
Thanks
Florian Westphal (2):
netfilter: nf_tables: pass nft_chain to destroy function, not nft_ctx
netfilter: nf_tables: do not defer rule destruction via call_rcu
Pablo Neira Ayuso (1):
netfilter: nf_tables: wait for rcu grace period on net_device removal
include/net/netfilter/nf_tables.h | 2 +-
net/netfilter/nf_tables_api.c | 54 ++++++++++++++++++++++---------
net/netfilter/nft_immediate.c | 2 +-
3 files changed, 41 insertions(+), 17 deletions(-)
--
2.30.2
Hi Greg, Sasha,
This batch contains backported fixes for 6.1 -stable.
The following list shows the backported patches, I am using original commit
IDs for reference:
1) 8965d42bcf54 ("netfilter: nf_tables: pass nft_chain to destroy function, not nft_ctx")
This is a stable dependency for the next patch.
2) c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal")
3) b04df3da1b5c ("netfilter: nf_tables: do not defer rule destruction via call_rcu")
This is a fix-for-fix for patch 2.
These three patches are required to fix the netdevice release path for
netdev family basechains.
Please, apply,
Thanks
Florian Westphal (2):
netfilter: nf_tables: pass nft_chain to destroy function, not nft_ctx
netfilter: nf_tables: do not defer rule destruction via call_rcu
Pablo Neira Ayuso (1):
netfilter: nf_tables: wait for rcu grace period on net_device removal
include/net/netfilter/nf_tables.h | 3 +-
net/netfilter/nf_tables_api.c | 54 ++++++++++++++++++++++---------
net/netfilter/nft_immediate.c | 2 +-
3 files changed, 42 insertions(+), 17 deletions(-)
--
2.30.2
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x fefc075182275057ce607effaa3daa9e6e3bdc73
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051947-dimly-marina-9d5e@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fefc075182275057ce607effaa3daa9e6e3bdc73 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Date: Tue, 6 May 2025 16:32:07 +0300
Subject: [PATCH] mm/page_alloc: fix race condition in unaccepted memory
handling
The page allocator tracks the number of zones that have unaccepted memory
using static_branch_enc/dec() and uses that static branch in hot paths to
determine if it needs to deal with unaccepted memory.
Borislav and Thomas pointed out that the tracking is racy: operations on
static_branch are not serialized against adding/removing unaccepted pages
to/from the zone.
Sanity checks inside static_branch machinery detects it:
WARNING: CPU: 0 PID: 10 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x8e/0xa0
The comment around the WARN() explains the problem:
/*
* Warn about the '-1' case though; since that means a
* decrement is concurrent with a first (0->1) increment. IOW
* people are trying to disable something that wasn't yet fully
* enabled. This suggests an ordering problem on the user side.
*/
The effect of this static_branch optimization is only visible on
microbenchmark.
Instead of adding more complexity around it, remove it altogether.
Link: https://lkml.kernel.org/r/20250506133207.1009676-1-kirill.shutemov@linux.in…
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Link: https://lore.kernel.org/all/20250506092445.GBaBnVXXyvnazly6iF@fat_crate.loc…
Reported-by: Borislav Petkov <bp(a)alien8.de>
Tested-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reported-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Brendan Jackman <jackmanb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/internal.h b/mm/internal.h
index 25a29872c634..5c7a2b43ad76 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1590,7 +1590,6 @@ unsigned long move_page_tables(struct pagetable_move_control *pmc);
#ifdef CONFIG_UNACCEPTED_MEMORY
void accept_page(struct page *page);
-void unaccepted_cleanup_work(struct work_struct *work);
#else /* CONFIG_UNACCEPTED_MEMORY */
static inline void accept_page(struct page *page)
{
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 327764ca0ee4..eedce9321e13 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1441,7 +1441,6 @@ static void __meminit zone_init_free_lists(struct zone *zone)
#ifdef CONFIG_UNACCEPTED_MEMORY
INIT_LIST_HEAD(&zone->unaccepted_pages);
- INIT_WORK(&zone->unaccepted_cleanup, unaccepted_cleanup_work);
#endif
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7248e300d36e..8258349e49ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7172,16 +7172,8 @@ bool has_managed_dma(void)
#ifdef CONFIG_UNACCEPTED_MEMORY
-/* Counts number of zones with unaccepted pages. */
-static DEFINE_STATIC_KEY_FALSE(zones_with_unaccepted_pages);
-
static bool lazy_accept = true;
-void unaccepted_cleanup_work(struct work_struct *work)
-{
- static_branch_dec(&zones_with_unaccepted_pages);
-}
-
static int __init accept_memory_parse(char *p)
{
if (!strcmp(p, "lazy")) {
@@ -7206,11 +7198,7 @@ static bool page_contains_unaccepted(struct page *page, unsigned int order)
static void __accept_page(struct zone *zone, unsigned long *flags,
struct page *page)
{
- bool last;
-
list_del(&page->lru);
- last = list_empty(&zone->unaccepted_pages);
-
account_freepages(zone, -MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE);
__mod_zone_page_state(zone, NR_UNACCEPTED, -MAX_ORDER_NR_PAGES);
__ClearPageUnaccepted(page);
@@ -7219,28 +7207,6 @@ static void __accept_page(struct zone *zone, unsigned long *flags,
accept_memory(page_to_phys(page), PAGE_SIZE << MAX_PAGE_ORDER);
__free_pages_ok(page, MAX_PAGE_ORDER, FPI_TO_TAIL);
-
- if (last) {
- /*
- * There are two corner cases:
- *
- * - If allocation occurs during the CPU bring up,
- * static_branch_dec() cannot be used directly as
- * it causes a deadlock on cpu_hotplug_lock.
- *
- * Instead, use schedule_work() to prevent deadlock.
- *
- * - If allocation occurs before workqueues are initialized,
- * static_branch_dec() should be called directly.
- *
- * Workqueues are initialized before CPU bring up, so this
- * will not conflict with the first scenario.
- */
- if (system_wq)
- schedule_work(&zone->unaccepted_cleanup);
- else
- unaccepted_cleanup_work(&zone->unaccepted_cleanup);
- }
}
void accept_page(struct page *page)
@@ -7277,20 +7243,12 @@ static bool try_to_accept_memory_one(struct zone *zone)
return true;
}
-static inline bool has_unaccepted_memory(void)
-{
- return static_branch_unlikely(&zones_with_unaccepted_pages);
-}
-
static bool cond_accept_memory(struct zone *zone, unsigned int order,
int alloc_flags)
{
long to_accept, wmark;
bool ret = false;
- if (!has_unaccepted_memory())
- return false;
-
if (list_empty(&zone->unaccepted_pages))
return false;
@@ -7328,22 +7286,17 @@ static bool __free_unaccepted(struct page *page)
{
struct zone *zone = page_zone(page);
unsigned long flags;
- bool first = false;
if (!lazy_accept)
return false;
spin_lock_irqsave(&zone->lock, flags);
- first = list_empty(&zone->unaccepted_pages);
list_add_tail(&page->lru, &zone->unaccepted_pages);
account_freepages(zone, MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE);
__mod_zone_page_state(zone, NR_UNACCEPTED, MAX_ORDER_NR_PAGES);
__SetPageUnaccepted(page);
spin_unlock_irqrestore(&zone->lock, flags);
- if (first)
- static_branch_inc(&zones_with_unaccepted_pages);
-
return true;
}
The patch below does not apply to the 6.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.14.y
git checkout FETCH_HEAD
git cherry-pick -x fefc075182275057ce607effaa3daa9e6e3bdc73
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051944-undone-repayment-6c7e@gregkh' --subject-prefix 'PATCH 6.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fefc075182275057ce607effaa3daa9e6e3bdc73 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Date: Tue, 6 May 2025 16:32:07 +0300
Subject: [PATCH] mm/page_alloc: fix race condition in unaccepted memory
handling
The page allocator tracks the number of zones that have unaccepted memory
using static_branch_enc/dec() and uses that static branch in hot paths to
determine if it needs to deal with unaccepted memory.
Borislav and Thomas pointed out that the tracking is racy: operations on
static_branch are not serialized against adding/removing unaccepted pages
to/from the zone.
Sanity checks inside static_branch machinery detects it:
WARNING: CPU: 0 PID: 10 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x8e/0xa0
The comment around the WARN() explains the problem:
/*
* Warn about the '-1' case though; since that means a
* decrement is concurrent with a first (0->1) increment. IOW
* people are trying to disable something that wasn't yet fully
* enabled. This suggests an ordering problem on the user side.
*/
The effect of this static_branch optimization is only visible on
microbenchmark.
Instead of adding more complexity around it, remove it altogether.
Link: https://lkml.kernel.org/r/20250506133207.1009676-1-kirill.shutemov@linux.in…
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Link: https://lore.kernel.org/all/20250506092445.GBaBnVXXyvnazly6iF@fat_crate.loc…
Reported-by: Borislav Petkov <bp(a)alien8.de>
Tested-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reported-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Brendan Jackman <jackmanb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/internal.h b/mm/internal.h
index 25a29872c634..5c7a2b43ad76 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1590,7 +1590,6 @@ unsigned long move_page_tables(struct pagetable_move_control *pmc);
#ifdef CONFIG_UNACCEPTED_MEMORY
void accept_page(struct page *page);
-void unaccepted_cleanup_work(struct work_struct *work);
#else /* CONFIG_UNACCEPTED_MEMORY */
static inline void accept_page(struct page *page)
{
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 327764ca0ee4..eedce9321e13 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1441,7 +1441,6 @@ static void __meminit zone_init_free_lists(struct zone *zone)
#ifdef CONFIG_UNACCEPTED_MEMORY
INIT_LIST_HEAD(&zone->unaccepted_pages);
- INIT_WORK(&zone->unaccepted_cleanup, unaccepted_cleanup_work);
#endif
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7248e300d36e..8258349e49ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7172,16 +7172,8 @@ bool has_managed_dma(void)
#ifdef CONFIG_UNACCEPTED_MEMORY
-/* Counts number of zones with unaccepted pages. */
-static DEFINE_STATIC_KEY_FALSE(zones_with_unaccepted_pages);
-
static bool lazy_accept = true;
-void unaccepted_cleanup_work(struct work_struct *work)
-{
- static_branch_dec(&zones_with_unaccepted_pages);
-}
-
static int __init accept_memory_parse(char *p)
{
if (!strcmp(p, "lazy")) {
@@ -7206,11 +7198,7 @@ static bool page_contains_unaccepted(struct page *page, unsigned int order)
static void __accept_page(struct zone *zone, unsigned long *flags,
struct page *page)
{
- bool last;
-
list_del(&page->lru);
- last = list_empty(&zone->unaccepted_pages);
-
account_freepages(zone, -MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE);
__mod_zone_page_state(zone, NR_UNACCEPTED, -MAX_ORDER_NR_PAGES);
__ClearPageUnaccepted(page);
@@ -7219,28 +7207,6 @@ static void __accept_page(struct zone *zone, unsigned long *flags,
accept_memory(page_to_phys(page), PAGE_SIZE << MAX_PAGE_ORDER);
__free_pages_ok(page, MAX_PAGE_ORDER, FPI_TO_TAIL);
-
- if (last) {
- /*
- * There are two corner cases:
- *
- * - If allocation occurs during the CPU bring up,
- * static_branch_dec() cannot be used directly as
- * it causes a deadlock on cpu_hotplug_lock.
- *
- * Instead, use schedule_work() to prevent deadlock.
- *
- * - If allocation occurs before workqueues are initialized,
- * static_branch_dec() should be called directly.
- *
- * Workqueues are initialized before CPU bring up, so this
- * will not conflict with the first scenario.
- */
- if (system_wq)
- schedule_work(&zone->unaccepted_cleanup);
- else
- unaccepted_cleanup_work(&zone->unaccepted_cleanup);
- }
}
void accept_page(struct page *page)
@@ -7277,20 +7243,12 @@ static bool try_to_accept_memory_one(struct zone *zone)
return true;
}
-static inline bool has_unaccepted_memory(void)
-{
- return static_branch_unlikely(&zones_with_unaccepted_pages);
-}
-
static bool cond_accept_memory(struct zone *zone, unsigned int order,
int alloc_flags)
{
long to_accept, wmark;
bool ret = false;
- if (!has_unaccepted_memory())
- return false;
-
if (list_empty(&zone->unaccepted_pages))
return false;
@@ -7328,22 +7286,17 @@ static bool __free_unaccepted(struct page *page)
{
struct zone *zone = page_zone(page);
unsigned long flags;
- bool first = false;
if (!lazy_accept)
return false;
spin_lock_irqsave(&zone->lock, flags);
- first = list_empty(&zone->unaccepted_pages);
list_add_tail(&page->lru, &zone->unaccepted_pages);
account_freepages(zone, MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE);
__mod_zone_page_state(zone, NR_UNACCEPTED, MAX_ORDER_NR_PAGES);
__SetPageUnaccepted(page);
spin_unlock_irqrestore(&zone->lock, flags);
- if (first)
- static_branch_inc(&zones_with_unaccepted_pages);
-
return true;
}
Hi Greg, Sasha,
This batch contains a backport fix for 5.15 -stable.
The following list shows the backported patches, I am using original commit
IDs for reference:
1) 8965d42bcf54 ("netfilter: nf_tables: pass nft_chain to destroy function, not nft_ctx")
This is a stable dependency for the next patch.
2) c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal")
3) b04df3da1b5c ("netfilter: nf_tables: do not defer rule destruction via call_rcu")
This is a fix-for-fix for patch 2.
These three patches are required to fix the netdevice release path for
netdev family basechains.
Please, apply,
Thanks
Florian Westphal (2):
netfilter: nf_tables: pass nft_chain to destroy function, not nft_ctx
netfilter: nf_tables: do not defer rule destruction via call_rcu
Pablo Neira Ayuso (1):
netfilter: nf_tables: wait for rcu grace period on net_device removal
include/net/netfilter/nf_tables.h | 2 +-
net/netfilter/nf_tables_api.c | 54 ++++++++++++++++++++++---------
net/netfilter/nft_immediate.c | 2 +-
3 files changed, 41 insertions(+), 17 deletions(-)
--
2.30.2
The irdma_puda_send() calls the irdma_puda_get_next_send_wqe() to get
entries, but does not clear the entries after the function call. A proper
implementation can be found in irdma_uk_send().
Add the irdma_clr_wqes() after irdma_puda_get_next_send_wqe(). Add the
headfile of the irdma_clr_wqes().
Fixes: a3a06db504d3 ("RDMA/irdma: Add privileged UDA queue implementation")
Cc: stable(a)vger.kernel.org # v5.14
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
v2: Fix code error and remove improper description.
drivers/infiniband/hw/irdma/puda.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/infiniband/hw/irdma/puda.c b/drivers/infiniband/hw/irdma/puda.c
index 7e3f9bca2c23..f7a826a5bedf 100644
--- a/drivers/infiniband/hw/irdma/puda.c
+++ b/drivers/infiniband/hw/irdma/puda.c
@@ -7,6 +7,7 @@
#include "protos.h"
#include "puda.h"
#include "ws.h"
+#include "user.h"
static void irdma_ieq_receive(struct irdma_sc_vsi *vsi,
struct irdma_puda_buf *buf);
@@ -444,6 +445,8 @@ int irdma_puda_send(struct irdma_sc_qp *qp, struct irdma_puda_send_info *info)
if (!wqe)
return -ENOMEM;
+ irdma_clr_wqes(&qp->qp_uk, wqe_idx);
+
qp->qp_uk.sq_wrtrk_array[wqe_idx].wrid = (uintptr_t)info->scratch;
/* Third line of WQE descriptor */
/* maclen is in words */
--
2.42.0.windows.2