Before this change the LED was added to leds_list before led_init_core()
gets called adding it the list before led_classdev.set_brightness_work gets
initialized.
This leaves a window where led_trigger_register() of a LED's default
trigger will call led_trigger_set() which calls led_set_brightness()
which in turn will end up queueing the *uninitialized*
led_classdev.set_brightness_work.
This race gets hit by the lenovo-thinkpad-t14s EC driver which registers
2 LEDs with a default trigger provided by snd_ctl_led.ko in quick
succession. The first led_classdev_register() causes an async modprobe of
snd_ctl_led to run and that async modprobe manages to exactly hit
the window where the second LED is on the leds_list without led_init_core()
being called for it, resulting in:
------------[ cut here ]------------
WARNING: CPU: 11 PID: 5608 at kernel/workqueue.c:4234 __flush_work+0x344/0x390
Hardware name: LENOVO 21N2S01F0B/21N2S01F0B, BIOS N42ET93W (2.23 ) 09/01/2025
...
Call trace:
__flush_work+0x344/0x390 (P)
flush_work+0x2c/0x50
led_trigger_set+0x1c8/0x340
led_trigger_register+0x17c/0x1c0
led_trigger_register_simple+0x84/0xe8
snd_ctl_led_init+0x40/0xf88 [snd_ctl_led]
do_one_initcall+0x5c/0x318
do_init_module+0x9c/0x2b8
load_module+0x7e0/0x998
Close the race window by moving the adding of the LED to leds_list to
after the led_init_core() call.
Cc: Sebastian Reichel <sre(a)kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <johannes.goede(a)oss.qualcomm.com>
---
Note no Fixes tag as this problem has been around for a long long time,
so I could not really find a good commit for the Fixes tag.
---
drivers/leds/led-class.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
index f3faf37f9a08..6b9fa060c3a1 100644
--- a/drivers/leds/led-class.c
+++ b/drivers/leds/led-class.c
@@ -560,11 +560,6 @@ int led_classdev_register_ext(struct device *parent,
#ifdef CONFIG_LEDS_BRIGHTNESS_HW_CHANGED
led_cdev->brightness_hw_changed = -1;
#endif
- /* add to the list of leds */
- down_write(&leds_list_lock);
- list_add_tail(&led_cdev->node, &leds_list);
- up_write(&leds_list_lock);
-
if (!led_cdev->max_brightness)
led_cdev->max_brightness = LED_FULL;
@@ -574,6 +569,11 @@ int led_classdev_register_ext(struct device *parent,
led_init_core(led_cdev);
+ /* add to the list of leds */
+ down_write(&leds_list_lock);
+ list_add_tail(&led_cdev->node, &leds_list);
+ up_write(&leds_list_lock);
+
#ifdef CONFIG_LEDS_TRIGGERS
led_trigger_set_default(led_cdev);
#endif
--
2.52.0
In gs_can_open(), the URBs for USB-in transfers are allocated, added to the
parent->rx_submitted anchor and submitted. In the complete callback
gs_usb_receive_bulk_callback(), the URB is processed and resubmitted. In
gs_can_close() the URBs are freed by calling
usb_kill_anchored_urbs(parent->rx_submitted).
However, this does not take into account that the USB framework
unanchors the URB before the close function is called. This means that
once an in-URB has been completed, it is no longer anchored and is
ultimately not released in gs_can_close().
Fix the memory leak by anchoring the URB in the
gs_usb_receive_bulk_callback() to the parent->rx_submitted anchor.
Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices")
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
Changes in v2:
- correct "Fixes" tag
- add stable(a)v.k.o on Cc
- Link to v1: https://patch.msgid.link/20251225-gs_usb-fix-memory-leak-v1-1-a7b09e70d01d@…
---
drivers/net/can/usb/gs_usb.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index a0233e550a5a..d093babbc320 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -751,6 +751,8 @@ static void gs_usb_receive_bulk_callback(struct urb *urb)
hf, parent->hf_size_rx,
gs_usb_receive_bulk_callback, parent);
+ usb_anchor_urb(urb, &parent->rx_submitted);
+
rc = usb_submit_urb(urb, GFP_ATOMIC);
/* USB failure take down all interfaces */
---
base-commit: 1806d210e5a8f431ad4711766ae4a333d407d972
change-id: 20251225-gs_usb-fix-memory-leak-062c24898cc9
Best regards,
--
Marc Kleine-Budde <mkl(a)pengutronix.de>
When both KASAN and SLAB_STORE_USER are enabled, accesses to
struct kasan_alloc_meta fields can be misaligned on 64-bit architectures.
This occurs because orig_size is currently defined as unsigned int,
which only guarantees 4-byte alignment. When struct kasan_alloc_meta is
placed after orig_size, it may end up at a 4-byte boundary rather than
the required 8-byte boundary on 64-bit systems.
Note that 64-bit architectures without HAVE_EFFICIENT_UNALIGNED_ACCESS
are assumed to require 64-bit accesses to be 64-bit aligned.
See HAVE_64BIT_ALIGNED_ACCESS and commit adab66b71abf ("Revert:
"ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"") for more details.
Change orig_size from unsigned int to unsigned long to ensure proper
alignment for any subsequent metadata. This should not waste additional
memory because kmalloc objects are already aligned to at least
ARCH_KMALLOC_MINALIGN.
Suggested-by: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc")
Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com>
---
mm/slub.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index ad71f01571f0..1c747435a6ab 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -857,7 +857,7 @@ static inline bool slab_update_freelist(struct kmem_cache *s, struct slab *slab,
* request size in the meta data area, for better debug and sanity check.
*/
static inline void set_orig_size(struct kmem_cache *s,
- void *object, unsigned int orig_size)
+ void *object, unsigned long orig_size)
{
void *p = kasan_reset_tag(object);
@@ -867,10 +867,10 @@ static inline void set_orig_size(struct kmem_cache *s,
p += get_info_end(s);
p += sizeof(struct track) * 2;
- *(unsigned int *)p = orig_size;
+ *(unsigned long *)p = orig_size;
}
-static inline unsigned int get_orig_size(struct kmem_cache *s, void *object)
+static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
{
void *p = kasan_reset_tag(object);
@@ -883,7 +883,7 @@ static inline unsigned int get_orig_size(struct kmem_cache *s, void *object)
p += get_info_end(s);
p += sizeof(struct track) * 2;
- return *(unsigned int *)p;
+ return *(unsigned long *)p;
}
#ifdef CONFIG_SLUB_DEBUG
@@ -1198,7 +1198,7 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
off += 2 * sizeof(struct track);
if (slub_debug_orig_size(s))
- off += sizeof(unsigned int);
+ off += sizeof(unsigned long);
off += kasan_metadata_size(s, false);
@@ -1394,7 +1394,7 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
off += 2 * sizeof(struct track);
if (s->flags & SLAB_KMALLOC)
- off += sizeof(unsigned int);
+ off += sizeof(unsigned long);
}
off += kasan_metadata_size(s, false);
@@ -7949,7 +7949,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
/* Save the original kmalloc request size */
if (flags & SLAB_KMALLOC)
- size += sizeof(unsigned int);
+ size += sizeof(unsigned long);
}
#endif
--
2.43.0
Workqueue xe-ggtt-wq has been allocated using WQ_MEM_RECLAIM, but
the flag has been passed as 3rd parameter (max_active) instead
of 2nd (flags) creating the workqueue as per-cpu with max_active = 8
(the WQ_MEM_RECLAIM value).
So change this by set WQ_MEM_RECLAIM as the 2nd parameter with a
default max_active.
Fixes: 60df57e496e4 ("drm/xe: Mark GGTT work queue with WQ_MEM_RECLAIM")
Cc: stable(a)vger.kernel.org
Signed-off-by: Marco Crivellari <marco.crivellari(a)suse.com>
---
drivers/gpu/drm/xe/xe_ggtt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index ef481b334af4..793d7324a395 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -322,7 +322,7 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
else
ggtt->pt_ops = &xelp_pt_ops;
- ggtt->wq = alloc_workqueue("xe-ggtt-wq", 0, WQ_MEM_RECLAIM);
+ ggtt->wq = alloc_workqueue("xe-ggtt-wq", WQ_MEM_RECLAIM, 0);
if (!ggtt->wq)
return -ENOMEM;
--
2.52.0
This series converts all DRM bridge drivers (*) from the now deprecated
of_drm_find_bridge() to its replacement of_drm_find_and_get_bridge() which
allows correct bridge refcounting. It also converts per-driver
"next_bridge" pointers to the unified drm_bridge::next_bridge which puts
the reference automatically on bridge deallocation.
This is part of the work to support hotplug of DRM bridges. The grand plan
was discussed in [0].
Here's the work breakdown (➜ marks the current series):
1. ➜ add refcounting to DRM bridges struct drm_bridge,
based on devm_drm_bridge_alloc()
A. ✔ add new alloc API and refcounting (v6.16)
B. ✔ convert all bridge drivers to new API (v6.17)
C. ✔ kunit tests (v6.17)
D. ✔ add get/put to drm_bridge_add/remove() + attach/detach()
and warn on old allocation pattern (v6.17)
E. ➜ add get/put on drm_bridge accessors
1. ✔ drm_bridge_chain_get_first_bridge(), add cleanup action (v6.18)
2. ✔ drm_bridge_get_prev_bridge() (v6.18)
3. ✔ drm_bridge_get_next_bridge() (v6.19)
4. ✔ drm_for_each_bridge_in_chain() (v6.19)
5. ✔ drm_bridge_connector_init (v6.19)
6. … protect encoder bridge chain with a mutex
7. ➜ of_drm_find_bridge
a. ✔… add of_drm_get_bridge(), convert basic direct users
(v6.20?, one driver still pending)
b. ➜ convert direct of_drm_get_bridge() users, part 2
c. … convert direct of_drm_get_bridge() users, part 3
d. convert direct of_drm_get_bridge() users, part 4
e. convert bridge-only drm_of_find_panel_or_bridge() users
8. drm_of_find_panel_or_bridge, *_of_get_bridge
9. ✔ enforce drm_bridge_add before drm_bridge_attach (v6.19)
F. ✔ debugfs improvements
1. ✔ add top-level 'bridges' file (v6.16)
2. ✔ show refcount and list lingering bridges (v6.19)
2. … handle gracefully atomic updates during bridge removal
A. ✔ Add drm_dev_enter/exit() to protect device resources (v6.20?)
B. … protect private_obj removal from list
3. … DSI host-device driver interaction
4. ✔ removing the need for the "always-disconnected" connector
5. finish the hotplug bridge work, moving code to the core and potentially
removing the hotplug-bridge itself (this needs to be clarified as
points 1-3 are developed)
[0] https://lore.kernel.org/lkml/20250206-hotplug-drm-bridge-v6-0-9d6f2c9c3058@…
This work is a continuation of the work to correctly handle bridge
refcounting for existing of_drm_find_bridge(). The ground work is in:
- commit 293a8fd7721a ("drm/bridge: add of_drm_find_and_get_bridge()")
- commit 9da0e06abda8 ("drm/bridge: deprecate of_drm_find_bridge()")
- commit 3fdeae134ba9 ("drm/bridge: add next_bridge pointer to struct drm_bridge")
The whole conversion is split in multiple series to make the review process
a bit smoother. Parts 3 and 4 are converting non-bridge drivers (mostly
encoders).
(*) One bridge driver (synopsys/dw-hdmi) is converted in another series,
together with its (non-bridge) users. Additionally this series converts
drm_of_panel_bridge_remove() which is a special case, and has a bugfix
for it too.
Signed-off-by: Luca Ceresoli <luca.ceresoli(a)bootlin.com>
---
Changes in v2:
- Fix bug in samsung-dsim code, patch 10;
update patch 12 on top of those changes
- Fix in imx8qxp-ldb code, patch 9
- Link to v1: https://lore.kernel.org/r/20260107-drm-bridge-alloc-getput-drm_of_find_brid…
---
Luca Ceresoli (12):
drm: of: drm_of_panel_bridge_remove(): fix device_node leak
drm: of: drm_of_panel_bridge_remove(): convert to of_drm_find_and_get_bridge()
drm/bridge: sii902x: convert to of_drm_find_and_get_bridge()
drm/bridge: thc63lvd1024: convert to of_drm_find_and_get_bridge()
drm/bridge: tfp410: convert to of_drm_find_and_get_bridge()
drm/bridge: tpd12s015: convert to of_drm_find_and_get_bridge()
drm/bridge: lt8912b: convert to of_drm_find_and_get_bridge()
drm/bridge: imx8mp-hdmi-pvi: convert to of_drm_find_and_get_bridge()
drm/bridge: imx8qxp-ldb: convert to of_drm_find_and_get_bridge()
drm/bridge: samsung-dsim: samsung_dsim_host_attach: use a temporary variable for the next bridge
drm/bridge: samsung-dsim: samsung_dsim_host_attach: don't use the bridge pointer as an error indicator
drm/bridge: samsung-dsim: samsung_dsim_host_attach: convert to of_drm_find_and_get_bridge()
drivers/gpu/drm/bridge/imx/imx8mp-hdmi-pvi.c | 15 ++++++-----
drivers/gpu/drm/bridge/imx/imx8qxp-ldb.c | 12 ++++++++-
drivers/gpu/drm/bridge/lontium-lt8912b.c | 31 +++++++++++------------
drivers/gpu/drm/bridge/samsung-dsim.c | 37 +++++++++++++++++++---------
drivers/gpu/drm/bridge/sii902x.c | 7 +++---
drivers/gpu/drm/bridge/thc63lvd1024.c | 7 +++---
drivers/gpu/drm/bridge/ti-tfp410.c | 27 ++++++++++----------
drivers/gpu/drm/bridge/ti-tpd12s015.c | 8 +++---
include/drm/bridge/samsung-dsim.h | 1 -
include/drm/drm_of.h | 6 ++++-
10 files changed, 86 insertions(+), 65 deletions(-)
---
base-commit: 814b88f91b121246b35be9c519b537041fe3d178
change-id: 20251223-drm-bridge-alloc-getput-drm_of_find_bridge-2-12c6bbcb6896
Best regards,
--
Luca Ceresoli <luca.ceresoli(a)bootlin.com>
xhci_alloc_command() allocates a command structure and, when the
second argument is true, also allocates a completion structure.
Currently, the error handling path in xhci_disable_slot() only frees
the command structure using kfree(), causing the completion structure
to leak.
Use xhci_free_command() instead of kfree(). xhci_free_command() correctly
frees both the command structure and the associated completion structure.
Since the command structure is allocated with zero-initialization,
command->in_ctx is NULL and will not be erroneously freed by
xhci_free_command().
This bug was found using an experimental static analysis tool we are
developing. The tool is based on the LLVM framework and is specifically
designed to detect memory management issues. It is currently under
active development and not yet publicly available, but we plan to
open-source it after our research is published.
The bug was originally detected on v6.13-rc1 using our static analysis
tool, and we have verified that the issue persists in the latest mainline
kernel.
We performed build testing on x86_64 with allyesconfig using GCC=11.4.0.
Since triggering these error paths in xhci_disable_slot() requires specific
hardware conditions or abnormal state, we were unable to construct a test
case to reliably trigger these specific error paths at runtime.
Fixes: 7faac1953ed1 ("xhci: avoid race between disable slot command and host runtime suspend")
CC: stable(a)vger.kernel.org
Signed-off-by: Zilin Guan <zilin(a)seu.edu.cn>
---
Changes in v3:
- Update the commit message to reflect that the analysis applies to the mainline kernel.
Changes in v2:
- Add detailed information required by researcher guidelines.
- Clarify the safety of using xhci_free_command() in this context.
- Correct the Fixes tag to point to the commit that introduced this issue.
drivers/usb/host/xhci.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 02c9bfe21ae2..f0beed054954 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -4137,7 +4137,7 @@ int xhci_disable_slot(struct xhci_hcd *xhci, u32 slot_id)
if (state == 0xffffffff || (xhci->xhc_state & XHCI_STATE_DYING) ||
(xhci->xhc_state & XHCI_STATE_HALTED)) {
spin_unlock_irqrestore(&xhci->lock, flags);
- kfree(command);
+ xhci_free_command(xhci, command);
return -ENODEV;
}
@@ -4145,7 +4145,7 @@ int xhci_disable_slot(struct xhci_hcd *xhci, u32 slot_id)
slot_id);
if (ret) {
spin_unlock_irqrestore(&xhci->lock, flags);
- kfree(command);
+ xhci_free_command(xhci, command);
return ret;
}
xhci_ring_cmd_db(xhci);
--
2.34.1
Syzbot has found a deadlock (analyzed by Lance Yang):
1) Task (5749): Holds folio_lock, then tries to acquire i_mmap_rwsem(read lock).
2) Task (5754): Holds i_mmap_rwsem(write lock), then tries to acquire
folio_lock.
migrate_pages()
-> migrate_hugetlbs()
-> unmap_and_move_huge_page() <- Takes folio_lock!
-> remove_migration_ptes()
-> __rmap_walk_file()
-> i_mmap_lock_read() <- Waits for i_mmap_rwsem(read lock)!
hugetlbfs_fallocate()
-> hugetlbfs_punch_hole() <- Takes i_mmap_rwsem(write lock)!
-> hugetlbfs_zero_partial_page()
-> filemap_lock_hugetlb_folio()
-> filemap_lock_folio()
-> __filemap_get_folio <- Waits for folio_lock!
The migration path is the one taking locks in the wrong order according
to the documentation at the top of mm/rmap.c. So expand the scope of the
existing i_mmap_lock to cover the calls to remove_migration_ptes() too.
This is (mostly) how it used to be after commit c0d0381ade79. That was
removed by 336bf30eb765 for both file & anon hugetlb pages when it should
only have been removed for anon hugetlb pages.
Fixes: 336bf30eb765 (hugetlbfs: fix anon huge page migration race)
Reported-by: syzbot+2d9c96466c978346b55f(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com
Debugged-by: Lance Yang <lance.yang(a)linux.dev>
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: stable(a)vger.kernel.org
---
mm/migrate.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 5169f9717f60..4688b9e38cd2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1458,6 +1458,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
int page_was_mapped = 0;
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
+ enum ttu_flags ttu = 0;
if (folio_ref_count(src) == 1) {
/* page was freed from under us. So we are done. */
@@ -1498,8 +1499,6 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
goto put_anon;
if (folio_mapped(src)) {
- enum ttu_flags ttu = 0;
-
if (!folio_test_anon(src)) {
/*
* In shared mappings, try_to_unmap could potentially
@@ -1516,16 +1515,17 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
try_to_migrate(src, ttu);
page_was_mapped = 1;
-
- if (ttu & TTU_RMAP_LOCKED)
- i_mmap_unlock_write(mapping);
}
if (!folio_mapped(src))
rc = move_to_new_folio(dst, src, mode);
if (page_was_mapped)
- remove_migration_ptes(src, !rc ? dst : src, 0);
+ remove_migration_ptes(src, !rc ? dst : src,
+ ttu ? RMP_LOCKED : 0);
+
+ if (ttu & TTU_RMAP_LOCKED)
+ i_mmap_unlock_write(mapping);
unlock_put_anon:
folio_unlock(dst);
--
2.47.3
Syzbot has found a deadlock (analyzed by Lance Yang):
1) Task (5749): Holds folio_lock, then tries to acquire i_mmap_rwsem(read lock).
2) Task (5754): Holds i_mmap_rwsem(write lock), then tries to acquire
folio_lock.
migrate_pages()
-> migrate_hugetlbs()
-> unmap_and_move_huge_page() <- Takes folio_lock!
-> remove_migration_ptes()
-> __rmap_walk_file()
-> i_mmap_lock_read() <- Waits for i_mmap_rwsem(read lock)!
hugetlbfs_fallocate()
-> hugetlbfs_punch_hole() <- Takes i_mmap_rwsem(write lock)!
-> hugetlbfs_zero_partial_page()
-> filemap_lock_hugetlb_folio()
-> filemap_lock_folio()
-> __filemap_get_folio <- Waits for folio_lock!
The migration path is the one taking locks in the wrong order according
to the documentation at the top of mm/rmap.c. So expand the scope of the
existing i_mmap_lock to cover the calls to remove_migration_ptes() too.
This is (mostly) how it used to be after commit c0d0381ade79. That was
removed by 336bf30eb765 for both file & anon hugetlb pages when it should
only have been removed for anon hugetlb pages.
Fixes: 336bf30eb765 (hugetlbfs: fix anon huge page migration race)
Reported-by: syzbot+2d9c96466c978346b55f(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com
Debugged-by: Lance Yang <lance.yang(a)linux.dev>
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: stable(a)vger.kernel.org
---
mm/migrate.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 5169f9717f60..4688b9e38cd2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1458,6 +1458,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
int page_was_mapped = 0;
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
+ enum ttu_flags ttu = 0;
if (folio_ref_count(src) == 1) {
/* page was freed from under us. So we are done. */
@@ -1498,8 +1499,6 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
goto put_anon;
if (folio_mapped(src)) {
- enum ttu_flags ttu = 0;
-
if (!folio_test_anon(src)) {
/*
* In shared mappings, try_to_unmap could potentially
@@ -1516,16 +1515,17 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
try_to_migrate(src, ttu);
page_was_mapped = 1;
-
- if (ttu & TTU_RMAP_LOCKED)
- i_mmap_unlock_write(mapping);
}
if (!folio_mapped(src))
rc = move_to_new_folio(dst, src, mode);
if (page_was_mapped)
- remove_migration_ptes(src, !rc ? dst : src, 0);
+ remove_migration_ptes(src, !rc ? dst : src,
+ ttu ? RMP_LOCKED : 0);
+
+ if (ttu & TTU_RMAP_LOCKED)
+ i_mmap_unlock_write(mapping);
unlock_put_anon:
folio_unlock(dst);
--
2.47.3
xhci_alloc_command() allocates a command structure and, when the
second argument is true, also allocates a completion structure.
Currently, the error handling path in xhci_disable_slot() only frees
the command structure using kfree(), causing the completion structure
to leak.
Use xhci_free_command() instead of kfree(). xhci_free_command() correctly
frees both the command structure and the associated completion structure.
Since the command structure is allocated with zero-initialization,
command->in_ctx is NULL and will not be erroneously freed by
xhci_free_command().
This bug was found using an experimental static analysis tool we are
developing. The tool is based on the LLVM framework and is specifically
designed to detect memory management issues. It is currently under
active development and not yet publicly available, but we plan to
open-source it after our research is published.
The analysis was performed on Linux kernel v6.13-rc1.
We performed build testing on x86_64 with allyesconfig using GCC=11.4.0.
Since triggering these error paths in xhci_disable_slot() requires specific
hardware conditions or abnormal state, we were unable to construct a test
case to reliably trigger these specific error paths at runtime.
Fixes: 7faac1953ed1 ("xhci: avoid race between disable slot command and host runtime suspend")
CC: stable(a)vger.kernel.org
Signed-off-by: Zilin Guan <zilin(a)seu.edu.cn>
---
Changes in v2:
- Add detailed information required by researcher guidelines.
- Clarify the safety of using xhci_free_command() in this context.
- Correct the Fixes tag to point to the commit that introduced this issue.
drivers/usb/host/xhci.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 02c9bfe21ae2..f0beed054954 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -4137,7 +4137,7 @@ int xhci_disable_slot(struct xhci_hcd *xhci, u32 slot_id)
if (state == 0xffffffff || (xhci->xhc_state & XHCI_STATE_DYING) ||
(xhci->xhc_state & XHCI_STATE_HALTED)) {
spin_unlock_irqrestore(&xhci->lock, flags);
- kfree(command);
+ xhci_free_command(xhci, command);
return -ENODEV;
}
@@ -4145,7 +4145,7 @@ int xhci_disable_slot(struct xhci_hcd *xhci, u32 slot_id)
slot_id);
if (ret) {
spin_unlock_irqrestore(&xhci->lock, flags);
- kfree(command);
+ xhci_free_command(xhci, command);
return ret;
}
xhci_ring_cmd_db(xhci);
--
2.34.1
When multiple registered buffers share the same compound page, only the
first buffer accounts for the memory via io_buffer_account_pin(). The
subsequent buffers skip accounting since headpage_already_acct() returns
true.
When the first buffer is unregistered, the accounting is decremented,
but the compound page remains pinned by the remaining buffers. This
creates a state where pinned memory is not properly accounted against
RLIMIT_MEMLOCK.
On systems with HugeTLB pages pre-allocated, an unprivileged user can
exploit this to pin memory beyond RLIMIT_MEMLOCK by cycling buffer
registrations. The bypass amount is proportional to the number of
available huge pages, potentially allowing gigabytes of memory to be
pinned while the kernel accounting shows near-zero.
Fix this by recalculating the actual pages to unaccount when unmapping
a buffer. For regular pages, always unaccount. For compound pages, only
unaccount if no other registered buffer references the same compound
page. This ensures the accounting persists until the last buffer
referencing the compound page is released.
Reported-by: Yuhao Jiang <danisjiang(a)gmail.com>
Fixes: 57bebf807e2a ("io_uring/rsrc: optimise registered huge pages")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yuhao Jiang <danisjiang(a)gmail.com>
---
io_uring/rsrc.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 67 insertions(+), 2 deletions(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index a63474b331bf..dcf2340af5a2 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -139,15 +139,80 @@ static void io_free_imu(struct io_ring_ctx *ctx, struct io_mapped_ubuf *imu)
kvfree(imu);
}
+/*
+ * Calculate pages to unaccount when unmapping a buffer. Regular pages are
+ * always counted. Compound pages are only counted if no other registered
+ * buffer references them, ensuring accounting persists until the last user.
+ */
+static unsigned long io_buffer_calc_unaccount(struct io_ring_ctx *ctx,
+ struct io_mapped_ubuf *imu)
+{
+ struct page *last_hpage = NULL;
+ unsigned long acct = 0;
+ unsigned int i;
+
+ for (i = 0; i < imu->nr_bvecs; i++) {
+ struct page *page = imu->bvec[i].bv_page;
+ struct page *hpage;
+ unsigned int j;
+
+ if (!PageCompound(page)) {
+ acct++;
+ continue;
+ }
+
+ hpage = compound_head(page);
+ if (hpage == last_hpage)
+ continue;
+ last_hpage = hpage;
+
+ /* Check if we already processed this hpage earlier in this buffer */
+ for (j = 0; j < i; j++) {
+ if (PageCompound(imu->bvec[j].bv_page) &&
+ compound_head(imu->bvec[j].bv_page) == hpage)
+ goto next_hpage;
+ }
+
+ /* Only unaccount if no other buffer references this page */
+ for (j = 0; j < ctx->buf_table.nr; j++) {
+ struct io_rsrc_node *node = ctx->buf_table.nodes[j];
+ struct io_mapped_ubuf *other;
+ unsigned int k;
+
+ if (!node)
+ continue;
+ other = node->buf;
+ if (other == imu)
+ continue;
+
+ for (k = 0; k < other->nr_bvecs; k++) {
+ struct page *op = other->bvec[k].bv_page;
+
+ if (!PageCompound(op))
+ continue;
+ if (compound_head(op) == hpage)
+ goto next_hpage;
+ }
+ }
+ acct += page_size(hpage) >> PAGE_SHIFT;
+next_hpage:
+ ;
+ }
+ return acct;
+}
+
static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_mapped_ubuf *imu)
{
+ unsigned long acct;
+
if (unlikely(refcount_read(&imu->refs) > 1)) {
if (!refcount_dec_and_test(&imu->refs))
return;
}
- if (imu->acct_pages)
- io_unaccount_mem(ctx->user, ctx->mm_account, imu->acct_pages);
+ acct = io_buffer_calc_unaccount(ctx, imu);
+ if (acct)
+ io_unaccount_mem(ctx->user, ctx->mm_account, acct);
imu->release(imu->priv);
io_free_imu(ctx, imu);
}
--
2.34.1
V1 -> V2:
- Because `pmd_val` variable broke ppc builds due to its name,
renamed it to `_pmd`. see [1].
[1] https://lore.kernel.org/stable/aS7lPZPYuChOTdXU@hyeyoo
- Added David Hildenbrand's Acked-by [2], thanks a lot!
[2] https://lore.kernel.org/linux-mm/ac8d7137-3819-4a75-9dd3-fb3d2259ebe4@kerne…
# TL;DR
previous discussion: https://lore.kernel.org/linux-mm/20250921232709.1608699-1-harry.yoo@oracle.…
A "bad pmd" error occurs due to race condition between
change_prot_numa() and THP migration. The mainline kernel does not have
this bug as commit 670ddd8cdc fixes the race condition. 6.1.y, 5.15.y,
5.10.y, 5.4.y are affected by this bug.
Fixing this in -stable kernels is tricky because pte_map_offset_lock()
has different semantics in pre-6.5 and post-6.5 kernels. I am trying to
backport the same mechanism we have in the mainline kernel.
Since the code looks bit different due to different semantics of
pte_map_offset_lock(), it'd be best to get this reviewed by MM folks.
# Testing
I verified that the bug described below is not reproduced anymore
(on a downstream kernel) after applying this patch series. It used to
trigger in few days of intensive numa balancing testing, but it survived
2 weeks with this applied.
# Bug Description
It was reported that a bad pmd is seen when automatic NUMA
balancing is marking page table entries as prot_numa:
[2437548.196018] mm/pgtable-generic.c:50: bad pmd 00000000af22fc02(dffffffe71fbfe02)
[2437548.235022] Call Trace:
[2437548.238234] <TASK>
[2437548.241060] dump_stack_lvl+0x46/0x61
[2437548.245689] panic+0x106/0x2e5
[2437548.249497] pmd_clear_bad+0x3c/0x3c
[2437548.253967] change_pmd_range.isra.0+0x34d/0x3a7
[2437548.259537] change_p4d_range+0x156/0x20e
[2437548.264392] change_protection_range+0x116/0x1a9
[2437548.269976] change_prot_numa+0x15/0x37
[2437548.274774] task_numa_work+0x1b8/0x302
[2437548.279512] task_work_run+0x62/0x95
[2437548.283882] exit_to_user_mode_loop+0x1a4/0x1a9
[2437548.289277] exit_to_user_mode_prepare+0xf4/0xfc
[2437548.294751] ? sysvec_apic_timer_interrupt+0x34/0x81
[2437548.300677] irqentry_exit_to_user_mode+0x5/0x25
[2437548.306153] asm_sysvec_apic_timer_interrupt+0x16/0x1b
This is due to a race condition between change_prot_numa() and
THP migration because the kernel doesn't check is_swap_pmd() and
pmd_trans_huge() atomically:
change_prot_numa() THP migration
======================================================================
- change_pmd_range()
-> is_swap_pmd() returns false,
meaning it's not a PMD migration
entry.
- do_huge_pmd_numa_page()
-> migrate_misplaced_page() sets
migration entries for the THP.
- change_pmd_range()
-> pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_none() and pmd_trans_huge() returns false
- pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_bad() returns true for the migration entry!
The upstream commit 670ddd8cdcbd ("mm/mprotect: delete
pmd_none_or_clear_bad_unless_trans_huge()") closes this race condition
by checking is_swap_pmd() and pmd_trans_huge() atomically.
# Backporting note
commit a79390f5d6a7 ("mm/mprotect: use long for page accountings and retval")
is backported to return an error code (negative value) in
change_pte_range().
Unlike the mainline, pte_offset_map_lock() does not check if the pmd
entry is a migration entry or a hugepage; acquires PTL unconditionally
instead of returning failure. Therefore, it is necessary to keep the
!is_swap_pmd() && !pmd_trans_huge() && !pmd_devmap() checks in
change_pmd_range() before acquiring the PTL.
After acquiring the lock, open-code the semantics of
pte_offset_map_lock() in the mainline kernel; change_pte_range() fails
if the pmd value has changed. This requires adding pmd_old parameter
(pmd_t value that is read before calling the function) to
change_pte_range().
Hugh Dickins (1):
mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge()
Peter Xu (1):
mm/mprotect: use long for page accountings and retval
include/linux/hugetlb.h | 4 +-
include/linux/mm.h | 2 +-
mm/hugetlb.c | 4 +-
mm/mempolicy.c | 2 +-
mm/mprotect.c | 124 +++++++++++++++++-----------------------
5 files changed, 60 insertions(+), 76 deletions(-)
--
2.43.0
Now that the upstream code has been getting broader test coverage by our
users we occasionally see issues with USB2 devices plugged in during boot.
Before Linux is running, the USB2 PHY has usually been running in device
mode and it turns out that sometimes host->device or device->host
transitions don't work.
The root cause: If the role inside the USB2 PHY is re-configured when it
has already been powered on or when dwc2 has already enabled the ULPI
interface the new configuration sometimes doesn't take affect until dwc3
is reset again. Fix this rare issue by configuring the role much earlier.
Note that the USB3 PHY does not suffer from this issue and actually
requires dwc3 to be up before the correct role can be configured there.
Reported-by: James Calligeros <jcalligeros99(a)gmail.com>
Reported-by: Janne Grunau <j(a)jannau.net>
Fixes: 0ec946d32ef7 ("usb: dwc3: Add Apple Silicon DWC3 glue layer driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sven Peter <sven(a)kernel.org>
---
drivers/usb/dwc3/dwc3-apple.c | 48 +++++++++++++++++++++++++++++--------------
1 file changed, 33 insertions(+), 15 deletions(-)
diff --git a/drivers/usb/dwc3/dwc3-apple.c b/drivers/usb/dwc3/dwc3-apple.c
index cc47cad232e397ac4498b09165dfdb5bd215ded7..c2ae8eb21d514e5e493d2927bc12908c308dfe19 100644
--- a/drivers/usb/dwc3/dwc3-apple.c
+++ b/drivers/usb/dwc3/dwc3-apple.c
@@ -218,25 +218,31 @@ static int dwc3_apple_core_init(struct dwc3_apple *appledwc)
return ret;
}
-static void dwc3_apple_phy_set_mode(struct dwc3_apple *appledwc, enum phy_mode mode)
-{
- lockdep_assert_held(&appledwc->lock);
-
- /*
- * This platform requires SUSPHY to be enabled here already in order to properly configure
- * the PHY and switch dwc3's PIPE interface to USB3 PHY.
- */
- dwc3_enable_susphy(&appledwc->dwc, true);
- phy_set_mode(appledwc->dwc.usb2_generic_phy[0], mode);
- phy_set_mode(appledwc->dwc.usb3_generic_phy[0], mode);
-}
-
static int dwc3_apple_init(struct dwc3_apple *appledwc, enum dwc3_apple_state state)
{
int ret, ret_reset;
lockdep_assert_held(&appledwc->lock);
+ /*
+ * The USB2 PHY on this platform must be configured for host or device mode while it is
+ * still powered off and before dwc3 tries to access it. Otherwise, the new configuration
+ * will sometimes only take affect after the *next* time dwc3 is brought up which causes
+ * the connected device to just not work.
+ * The USB3 PHY must be configured later after dwc3 has already been initialized.
+ */
+ switch (state) {
+ case DWC3_APPLE_HOST:
+ phy_set_mode(appledwc->dwc.usb2_generic_phy[0], PHY_MODE_USB_HOST);
+ break;
+ case DWC3_APPLE_DEVICE:
+ phy_set_mode(appledwc->dwc.usb2_generic_phy[0], PHY_MODE_USB_DEVICE);
+ break;
+ default:
+ /* Unreachable unless there's a bug in this driver */
+ return -EINVAL;
+ }
+
ret = reset_control_deassert(appledwc->reset);
if (ret) {
dev_err(appledwc->dev, "Failed to deassert reset, err=%d\n", ret);
@@ -257,7 +263,13 @@ static int dwc3_apple_init(struct dwc3_apple *appledwc, enum dwc3_apple_state st
case DWC3_APPLE_HOST:
appledwc->dwc.dr_mode = USB_DR_MODE_HOST;
dwc3_apple_set_ptrcap(appledwc, DWC3_GCTL_PRTCAP_HOST);
- dwc3_apple_phy_set_mode(appledwc, PHY_MODE_USB_HOST);
+ /*
+ * This platform requires SUSPHY to be enabled here already in order to properly
+ * configure the PHY and switch dwc3's PIPE interface to USB3 PHY. The USB2 PHY
+ * has already been configured to the correct mode earlier.
+ */
+ dwc3_enable_susphy(&appledwc->dwc, true);
+ phy_set_mode(appledwc->dwc.usb3_generic_phy[0], PHY_MODE_USB_HOST);
ret = dwc3_host_init(&appledwc->dwc);
if (ret) {
dev_err(appledwc->dev, "Failed to initialize host, ret=%d\n", ret);
@@ -268,7 +280,13 @@ static int dwc3_apple_init(struct dwc3_apple *appledwc, enum dwc3_apple_state st
case DWC3_APPLE_DEVICE:
appledwc->dwc.dr_mode = USB_DR_MODE_PERIPHERAL;
dwc3_apple_set_ptrcap(appledwc, DWC3_GCTL_PRTCAP_DEVICE);
- dwc3_apple_phy_set_mode(appledwc, PHY_MODE_USB_DEVICE);
+ /*
+ * This platform requires SUSPHY to be enabled here already in order to properly
+ * configure the PHY and switch dwc3's PIPE interface to USB3 PHY. The USB2 PHY
+ * has already been configured to the correct mode earlier.
+ */
+ dwc3_enable_susphy(&appledwc->dwc, true);
+ phy_set_mode(appledwc->dwc.usb3_generic_phy[0], PHY_MODE_USB_DEVICE);
ret = dwc3_gadget_init(&appledwc->dwc);
if (ret) {
dev_err(appledwc->dev, "Failed to initialize gadget, ret=%d\n", ret);
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20260108-dwc3-apple-usb2phy-fix-cf1d26018dd0
Best regards,
--
Sven Peter <sven(a)kernel.org>
The arm64 kernel doesn't boot with annotated branches
(PROFILE_ANNOTATED_BRANCHES) enabled and CONFIG_DEBUG_VIRTUAL together.
Bisecting it, I found that disabling branch profiling in arch/arm64/mm
solved the problem. Narrowing down a bit further, I found that
physaddr.c is the file that needs to have branch profiling disabled to
get the machine to boot.
I suspect that it might invoke some ftrace helper very early in the boot
process and ftrace is still not enabled(!?).
Rather than playing whack-a-mole with individual files, disable branch
profiling for the entire arch/arm64 tree, similar to what x86 already
does in arch/x86/Kbuild.
Cc: stable(a)vger.kernel.org
Fixes: ec6d06efb0bac ("arm64: Add support for CONFIG_DEBUG_VIRTUAL")
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v2:
- Expand the scope to arch/arm64 instead of just physaddr.c
- Link to v1: https://lore.kernel.org/all/20251231-annotated-v1-1-9db1c0d03062@debian.org/
---
arch/arm64/Kbuild | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/arm64/Kbuild b/arch/arm64/Kbuild
index 5bfbf7d79c99..d876bc0e5421 100644
--- a/arch/arm64/Kbuild
+++ b/arch/arm64/Kbuild
@@ -1,4 +1,8 @@
# SPDX-License-Identifier: GPL-2.0-only
+
+# Branch profiling isn't noinstr-safe
+subdir-ccflags-$(CONFIG_TRACE_BRANCH_PROFILING) += -DDISABLE_BRANCH_PROFILING
+
obj-y += kernel/ mm/ net/
obj-$(CONFIG_KVM) += kvm/
obj-$(CONFIG_XEN) += xen/
---
base-commit: c8ebd433459bcbf068682b09544e830acd7ed222
change-id: 20251231-annotated-75de3f33cd7b
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Commit 7f9ab862e05c ("leds: spi-byte: Call of_node_put() on error path")
was merged in 6.11 and then backported to stable trees through 5.10. It
relocates the line that initializes the variable 'child' to a later
point in spi_byte_probe().
Versions < 6.9 do not have commit ccc35ff2fd29 ("leds: spi-byte: Use
devm_led_classdev_register_ext()"), which removes a line that reads a
property from 'child' before its new initialization point. Consequently,
spi_byte_probe() reads from an uninitialized device node in stable
kernels 6.6-5.10.
Initialize 'child' before it is first accessed.
Fixes: 7f9ab862e05c ("leds: spi-byte: Call of_node_put() on error path")
Signed-off-by: Tiffany Yang <ynaffit(a)google.com>
---
As an alternative to moving the initialization of 'child' up,
Fedor Pchelkin proposed [1] backporting the change that removes the
intermediate access.
[1] https://lore.kernel.org/stable/20241029204128.527033-1-pchelkin@ispras.ru/
---
drivers/leds/leds-spi-byte.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/leds/leds-spi-byte.c b/drivers/leds/leds-spi-byte.c
index afe9bff7c7c1..4520df1e2341 100644
--- a/drivers/leds/leds-spi-byte.c
+++ b/drivers/leds/leds-spi-byte.c
@@ -96,6 +96,7 @@ static int spi_byte_probe(struct spi_device *spi)
if (!led)
return -ENOMEM;
+ child = of_get_next_available_child(dev_of_node(dev), NULL);
of_property_read_string(child, "label", &name);
strscpy(led->name, name, sizeof(led->name));
led->spi = spi;
@@ -106,7 +107,6 @@ static int spi_byte_probe(struct spi_device *spi)
led->ldev.max_brightness = led->cdef->max_value - led->cdef->off_value;
led->ldev.brightness_set_blocking = spi_byte_brightness_set_blocking;
- child = of_get_next_available_child(dev_of_node(dev), NULL);
state = of_get_property(child, "default-state", NULL);
if (state) {
if (!strcmp(state, "on")) {
--
2.52.0.351.gbe84eed79e-goog
This series contains 3 small pmgr related fixes for Apple silicon
devices with M1 and M2.
1. Prevent the display controller and DPTX phy from powered down after
initial boot on the M2 Mac mini. This is the only fix worthwhile for
stable kernels. Given how long it has been broken and that it's not a
regression I think it can wait to the next merge window and get
backported from there into stable kernels.
2. Mark ps_atc?_usb_aon as always-on. This is required to keep the soon
to be suported USB type-c working across suspend an resume. The later
submitted devicetrees for M1 Pro/Max/Ultra, M2 and M2 Pro/Max/Ultra
already have this property since the initial change. Only the
separate for M1 was forgotten.
3. Model the hidden dependency between GPU and pmp as power-domain
dependency. This is required to avoid crashing the GPU firmware
immediately after booting.
Signed-off-by: Janne Grunau <j(a)jannau.net>
---
Hector Martin (1):
arm64: dts: apple: t8103: Mark ATC USB AON domains as always-on
Janne Grunau (2):
arm64: dts: apple: t8112-j473: Keep the HDMI port powered on
arm64: dts: apple: t8103: Add ps_pmp dependency to ps_gfx
arch/arm64/boot/dts/apple/t8103-pmgr.dtsi | 3 +++
arch/arm64/boot/dts/apple/t8112-j473.dts | 19 +++++++++++++++++++
2 files changed, 22 insertions(+)
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20260108-apple-dt-pmgr-fixes-dc66a8658aed
Best regards,
--
Janne Grunau <j(a)jannau.net>
From: Ankit Garg <nktgrg(a)google.com>
This series fixes a kernel panic in the GVE driver caused by
out-of-bounds array access when the network stack provides an invalid
TX queue index.
The issue impacts both GQI and DQO queue formats. For both cases, the
driver is updated to validate the queue index and drop the packet if
the index is out of range.
Ankit Garg (2):
gve: drop packets on invalid queue indices in GQI TX path
gve: drop packets on invalid queue indices in DQO TX path
drivers/net/ethernet/google/gve/gve_tx.c | 12 +++++++++---
drivers/net/ethernet/google/gve/gve_tx_dqo.c | 9 ++++++++-
2 files changed, 17 insertions(+), 4 deletions(-)
--
2.52.0.351.gbe84eed79e-goog
The patch titled
Subject: mm/vmscan: fix demotion targets checks in reclaim/demotion
has been added to the -mm mm-new branch. Its filename is
mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
The mm-new branch of mm.git is not included in linux-next
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Bing Jiao <bingjiao(a)google.com>
Subject: mm/vmscan: fix demotion targets checks in reclaim/demotion
Date: Thu, 8 Jan 2026 03:32:46 +0000
Fix two bugs in demote_folio_list() and can_demote() due to incorrect
demotion target checks in reclaim/demotion.
Commit 7d709f49babc ("vmscan,cgroup: apply mems_effective to reclaim")
introduces the cpuset.mems_effective check and applies it to can_demote().
However:
1. It does not apply this check in demote_folio_list(), which leads
to situations where pages are demoted to nodes that are
explicitly excluded from the task's cpuset.mems.
2. It checks only the nodes in the immediate next demotion hierarchy
and does not check all allowed demotion targets in can_demote().
This can cause pages to never be demoted if the nodes in the next
demotion hierarchy are not set in mems_effective.
These bugs break resource isolation provided by cpuset.mems. This is
visible from userspace because pages can either fail to be demoted
entirely or are demoted to nodes that are not allowed in multi-tier memory
systems.
To address these bugs, update cpuset_node_allowed() to return
effective_mems and mem_cgroup_node_allowed() to filter out nodes that are
not set in effective_mems. Also update can_demote() and
demote_folio_list() accordingly.
Bug 1 reproduction:
Assume a system with 4 nodes, where nodes 0-1 are top-tier and
nodes 2-3 are far-tier memory. All nodes have equal capacity.
Test script:
echo 1 > /sys/kernel/mm/numa/demotion_enabled
mkdir /sys/fs/cgroup/test
echo +cpuset > /sys/fs/cgroup/cgroup.subtree_control
echo "0-2" > /sys/fs/cgroup/test/cpuset.mems
echo $$ > /sys/fs/cgroup/test/cgroup.procs
swapoff -a
# Expectation: Should respect node 0-2 limit.
# Observation: Node 3 shows significant allocation (MemFree drops)
stress-ng --oomable --vm 1 --vm-bytes 150% --mbind 0,1
Bug 2 reproduction:
Assume a system with 6 nodes, where nodes 0-2 are top-tier,
node 3 is a far-tier node, and nodes 4-5 are the farthest-tier nodes.
All nodes have equal capacity.
Test script:
echo 1 > /sys/kernel/mm/numa/demotion_enabled
mkdir /sys/fs/cgroup/test
echo +cpuset > /sys/fs/cgroup/cgroup.subtree_control
echo "0-2,4-5" > /sys/fs/cgroup/test/cpuset.mems
echo $$ > /sys/fs/cgroup/test/cgroup.procs
swapoff -a
# Expectation: Pages are demoted to Nodes 4-5
# Observation: No pages are demoted before oom.
stress-ng --oomable --vm 1 --vm-bytes 150% --mbind 0,1,2
Link: https://lkml.kernel.org/r/20260108033248.2791579-2-bingjiao@google.com
Fixes: 7d709f49babc ("vmscan,cgroup: apply mems_effective to reclaim")
Signed-off-by: Bing Jiao <bingjiao(a)google.com>
Cc: Akinobu Mita <akinobu.mita(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Hildenbrand <david(a)kernel.org>
Cc: Gregory Price <gourry(a)gourry.net>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Joshua Hahn <joshua.hahnjy(a)gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Michal Koutn�� <mkoutny(a)suse.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Waiman Long <longman(a)redhat.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: Yuanchu Xie <yuanchu(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/cpuset.h | 6 +--
include/linux/memcontrol.h | 6 +--
kernel/cgroup/cpuset.c | 54 +++++++++++++++++++++++------------
mm/memcontrol.c | 16 +++++++++-
mm/vmscan.c | 28 ++++++++++++------
5 files changed, 75 insertions(+), 35 deletions(-)
--- a/include/linux/cpuset.h~mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion
+++ a/include/linux/cpuset.h
@@ -174,7 +174,7 @@ static inline void set_mems_allowed(node
task_unlock(current);
}
-extern bool cpuset_node_allowed(struct cgroup *cgroup, int nid);
+extern void cpuset_nodes_allowed(struct cgroup *cgroup, nodemask_t *mask);
#else /* !CONFIG_CPUSETS */
static inline bool cpusets_enabled(void) { return false; }
@@ -301,9 +301,9 @@ static inline bool read_mems_allowed_ret
return false;
}
-static inline bool cpuset_node_allowed(struct cgroup *cgroup, int nid)
+static inline void cpuset_nodes_allowed(struct cgroup *cgroup, nodemask_t *mask)
{
- return true;
+ nodes_copy(*mask, node_states[N_MEMORY]);
}
#endif /* !CONFIG_CPUSETS */
--- a/include/linux/memcontrol.h~mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion
+++ a/include/linux/memcontrol.h
@@ -1736,7 +1736,7 @@ static inline void count_objcg_events(st
rcu_read_unlock();
}
-bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid);
+void mem_cgroup_node_filter_allowed(struct mem_cgroup *memcg, nodemask_t *mask);
void mem_cgroup_show_protected_memory(struct mem_cgroup *memcg);
@@ -1807,9 +1807,9 @@ static inline ino_t page_cgroup_ino(stru
return 0;
}
-static inline bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid)
+static inline void mem_cgroup_node_filter_allowed(struct mem_cgroup *memcg,
+ nodemask_t *mask)
{
- return true;
}
static inline void mem_cgroup_show_protected_memory(struct mem_cgroup *memcg)
--- a/kernel/cgroup/cpuset.c~mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion
+++ a/kernel/cgroup/cpuset.c
@@ -4416,40 +4416,58 @@ bool cpuset_current_node_allowed(int nod
return allowed;
}
-bool cpuset_node_allowed(struct cgroup *cgroup, int nid)
+/**
+ * cpuset_nodes_allowed - return effective_mems mask from a cgroup cpuset.
+ * @cgroup: pointer to struct cgroup.
+ * @mask: pointer to struct nodemask_t to be returned.
+ *
+ * Returns effective_mems mask from a cgroup cpuset if it is cgroup v2 and
+ * has cpuset subsys. Otherwise, returns node_states[N_MEMORY].
+ *
+ * This function intentionally avoids taking the cpuset_mutex or callback_lock
+ * when accessing effective_mems. This is because the obtained effective_mems
+ * is stale immediately after the query anyway (e.g., effective_mems is updated
+ * immediately after releasing the lock but before returning).
+ *
+ * As a result, returned @mask may be empty because cs->effective_mems can be
+ * rebound during this call. Besides, nodes in @mask are not guaranteed to be
+ * online due to hot plugins. Callers should check the mask for validity on
+ * return based on its subsequent use.
+ **/
+void cpuset_nodes_allowed(struct cgroup *cgroup, nodemask_t *mask)
{
struct cgroup_subsys_state *css;
struct cpuset *cs;
- bool allowed;
/*
* In v1, mem_cgroup and cpuset are unlikely in the same hierarchy
* and mems_allowed is likely to be empty even if we could get to it,
- * so return true to avoid taking a global lock on the empty check.
+ * so return directly to avoid taking a global lock on the empty check.
*/
- if (!cpuset_v2())
- return true;
+ if (!cgroup || !cpuset_v2()) {
+ nodes_copy(*mask, node_states[N_MEMORY]);
+ return;
+ }
css = cgroup_get_e_css(cgroup, &cpuset_cgrp_subsys);
- if (!css)
- return true;
+ if (!css) {
+ nodes_copy(*mask, node_states[N_MEMORY]);
+ return;
+ }
/*
- * Normally, accessing effective_mems would require the cpuset_mutex
- * or callback_lock - but node_isset is atomic and the reference
- * taken via cgroup_get_e_css is sufficient to protect css.
- *
- * Since this interface is intended for use by migration paths, we
- * relax locking here to avoid taking global locks - while accepting
- * there may be rare scenarios where the result may be innaccurate.
+ * The reference taken via cgroup_get_e_css is sufficient to
+ * protect css, but it does not imply safe accesses to effective_mems.
*
- * Reclaim and migration are subject to these same race conditions, and
- * cannot make strong isolation guarantees, so this is acceptable.
+ * Normally, accessing effective_mems would require the cpuset_mutex
+ * or callback_lock - but the correctness of this information is stale
+ * immediately after the query anyway. We do not acquire the lock
+ * during this process to save lock contention in exchange for racing
+ * against mems_allowed rebinds.
*/
cs = container_of(css, struct cpuset, css);
- allowed = node_isset(nid, cs->effective_mems);
+ nodes_copy(*mask, cs->effective_mems);
css_put(css);
- return allowed;
}
/**
--- a/mm/memcontrol.c~mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion
+++ a/mm/memcontrol.c
@@ -5593,9 +5593,21 @@ subsys_initcall(mem_cgroup_swap_init);
#endif /* CONFIG_SWAP */
-bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid)
+void mem_cgroup_node_filter_allowed(struct mem_cgroup *memcg, nodemask_t *mask)
{
- return memcg ? cpuset_node_allowed(memcg->css.cgroup, nid) : true;
+ nodemask_t allowed;
+
+ if (!memcg)
+ return;
+
+ /*
+ * Since this interface is intended for use by migration paths, and
+ * reclaim and migration are subject to race conditions such as changes
+ * in effective_mems and hot-unpluging of nodes, inaccurate allowed
+ * mask is acceptable.
+ */
+ cpuset_nodes_allowed(memcg->css.cgroup, &allowed);
+ nodes_and(*mask, *mask, allowed);
}
void mem_cgroup_show_protected_memory(struct mem_cgroup *memcg)
--- a/mm/vmscan.c~mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion
+++ a/mm/vmscan.c
@@ -344,19 +344,21 @@ static void flush_reclaim_state(struct s
static bool can_demote(int nid, struct scan_control *sc,
struct mem_cgroup *memcg)
{
- int demotion_nid;
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ nodemask_t allowed_mask;
- if (!numa_demotion_enabled)
+ if (!pgdat || !numa_demotion_enabled)
return false;
if (sc && sc->no_demotion)
return false;
- demotion_nid = next_demotion_node(nid);
- if (demotion_nid == NUMA_NO_NODE)
+ node_get_allowed_targets(pgdat, &allowed_mask);
+ if (nodes_empty(allowed_mask))
return false;
- /* If demotion node isn't in the cgroup's mems_allowed, fall back */
- return mem_cgroup_node_allowed(memcg, demotion_nid);
+ /* Filter out nodes that are not in cgroup's mems_allowed. */
+ mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
+ return !nodes_empty(allowed_mask);
}
static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
@@ -1018,7 +1020,8 @@ static struct folio *alloc_demote_folio(
* Folios which are not demoted are left on @demote_folios.
*/
static unsigned int demote_folio_list(struct list_head *demote_folios,
- struct pglist_data *pgdat)
+ struct pglist_data *pgdat,
+ struct mem_cgroup *memcg)
{
int target_nid = next_demotion_node(pgdat->node_id);
unsigned int nr_succeeded;
@@ -1032,7 +1035,6 @@ static unsigned int demote_folio_list(st
*/
.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
__GFP_NOMEMALLOC | GFP_NOWAIT,
- .nid = target_nid,
.nmask = &allowed_mask,
.reason = MR_DEMOTION,
};
@@ -1041,9 +1043,17 @@ static unsigned int demote_folio_list(st
return 0;
if (target_nid == NUMA_NO_NODE)
+ /* No lower-tier nodes or nodes were hot-unplugged. */
return 0;
node_get_allowed_targets(pgdat, &allowed_mask);
+ mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
+ if (nodes_empty(allowed_mask))
+ return 0;
+
+ if (!node_isset(target_nid, allowed_mask))
+ target_nid = node_random(&allowed_mask);
+ mtc.nid = target_nid;
/* Demotion ignores all cpuset and mempolicy settings */
migrate_pages(demote_folios, alloc_demote_folio, NULL,
@@ -1565,7 +1575,7 @@ keep:
/* 'folio_list' is always empty here */
/* Migrate folios selected for demotion */
- nr_demoted = demote_folio_list(&demote_folios, pgdat);
+ nr_demoted = demote_folio_list(&demote_folios, pgdat, memcg);
nr_reclaimed += nr_demoted;
stat->nr_demoted += nr_demoted;
/* Folios that could not be demoted are still in @demote_folios */
_
Patches currently in -mm which might be from bingjiao(a)google.com are
mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion.patch
mm-vmscan-select-the-closest-preferred-node-in-demote_folio_list.patch
When fscrypt is enabled, move_dirty_folio_in_page_array() may fail
because it needs to allocate bounce buffers to store the encrypted
versions of each folio. Each folio beyond the first allocates its bounce
buffer with GFP_NOWAIT. Failures are common (and expected) under this
allocation mode; they should flush (not abort) the batch.
However, ceph_process_folio_batch() uses the same `rc` variable for its
own return code and for capturing the return codes of its routine calls;
failing to reset `rc` back to 0 results in the error being propagated
out to the main writeback loop, which cannot actually tolerate any
errors here: once `ceph_wbc.pages` is allocated, it must be passed to
ceph_submit_write() to be freed. If it survives until the next iteration
(e.g. due to the goto being followed), ceph_allocate_page_array()'s
BUG_ON() will oops the worker. (Subsequent patches in this series make
the loop more robust.)
Note that this failure mode is currently masked due to another bug
(addressed later in this series) that prevents multiple encrypted folios
from being selected for the same write.
For now, just reset `rc` when redirtying the folio to prevent errors in
move_dirty_folio_in_page_array() from propagating. (Note that
move_dirty_folio_in_page_array() is careful never to return errors on
the first folio, so there is no need to check for that.) After this
change, ceph_process_folio_batch() no longer returns errors; its only
remaining failure indicator is `locked_pages == 0`, which the caller
already handles correctly. The next patch in this series addresses this.
Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sam Edwards <CFSworks(a)gmail.com>
---
fs/ceph/addr.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 63b75d214210..3462df35d245 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1369,6 +1369,7 @@ int ceph_process_folio_batch(struct address_space *mapping,
rc = move_dirty_folio_in_page_array(mapping, wbc, ceph_wbc,
folio);
if (rc) {
+ rc = 0;
folio_redirty_for_writepage(wbc, folio);
folio_unlock(folio);
break;
--
2.51.2
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 913f7cf77bf14c13cfea70e89bcb6d0b22239562
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122941-civic-revered-b250@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 913f7cf77bf14c13cfea70e89bcb6d0b22239562 Mon Sep 17 00:00:00 2001
From: Chuck Lever <chuck.lever(a)oracle.com>
Date: Tue, 18 Nov 2025 19:51:19 -0500
Subject: [PATCH] NFSD: NFSv4 file creation neglects setting ACL
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
An NFSv4 client that sets an ACL with a named principal during file
creation retrieves the ACL afterwards, and finds that it is only a
default ACL (based on the mode bits) and not the ACL that was
requested during file creation. This violates RFC 8881 section
6.4.1.3: "the ACL attribute is set as given".
The issue occurs in nfsd_create_setattr(), which calls
nfsd_attrs_valid() to determine whether to call nfsd_setattr().
However, nfsd_attrs_valid() checks only for iattr changes and
security labels, but not POSIX ACLs. When only an ACL is present,
the function returns false, nfsd_setattr() is skipped, and the
POSIX ACL is never applied to the inode.
Subsequently, when the client retrieves the ACL, the server finds
no POSIX ACL on the inode and returns one generated from the file's
mode bits rather than returning the originally-specified ACL.
Reported-by: Aurélien Couderc <aurelien.couderc2002(a)gmail.com>
Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs")
Cc: Roland Mainz <roland.mainz(a)nrubsig.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index fa46f8b5f132..1dd3ae3ceb3a 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -67,7 +67,8 @@ static inline bool nfsd_attrs_valid(struct nfsd_attrs *attrs)
struct iattr *iap = attrs->na_iattr;
return (iap->ia_valid || (attrs->na_seclabel &&
- attrs->na_seclabel->len));
+ attrs->na_seclabel->len) ||
+ attrs->na_pacl || attrs->na_dpacl);
}
__be32 nfserrno (int errno);
In flush_write_buffer, &p->frag_sem is acquired and then the loaded store
function is called, which, here, is target_core_item_dbroot_store().
This function called filp_open(), following which these functions were
called (in reverse order), according to the call trace:
down_read
__configfs_open_file
do_dentry_open
vfs_open
do_open
path_openat
do_filp_open
file_open_name
filp_open
target_core_item_dbroot_store
flush_write_buffer
configfs_write_iter
Hence ultimately, __configfs_open_file() was called, indirectly by
target_core_item_dbroot_store(), and it also attempted to acquire
&p->frag_sem, which was already held by the same thread, acquired earlier
in flush_write_buffer. This poses a possibility of recursive locking,
which triggers the lockdep warning.
Fix this by modifying target_core_item_dbroot_store() to use kern_path()
instead of filp_open() to avoid opening the file using filesystem-specific
function __configfs_open_file(), and further modifying it to make this
fix compatible.
Reported-by: syzbot+f6e8174215573a84b797(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f6e8174215573a84b797
Tested-by: syzbot+f6e8174215573a84b797(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Prithvi Tambewagh <activprithvi(a)gmail.com>
---
drivers/target/target_core_configfs.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/target/target_core_configfs.c b/drivers/target/target_core_configfs.c
index b19acd662726..f29052e6a87d 100644
--- a/drivers/target/target_core_configfs.c
+++ b/drivers/target/target_core_configfs.c
@@ -108,8 +108,8 @@ static ssize_t target_core_item_dbroot_store(struct config_item *item,
const char *page, size_t count)
{
ssize_t read_bytes;
- struct file *fp;
ssize_t r = -EINVAL;
+ struct path path = {};
mutex_lock(&target_devices_lock);
if (target_devices) {
@@ -131,17 +131,18 @@ static ssize_t target_core_item_dbroot_store(struct config_item *item,
db_root_stage[read_bytes - 1] = '\0';
/* validate new db root before accepting it */
- fp = filp_open(db_root_stage, O_RDONLY, 0);
- if (IS_ERR(fp)) {
+ r = kern_path(db_root_stage, LOOKUP_FOLLOW, &path);
+ if (r) {
pr_err("db_root: cannot open: %s\n", db_root_stage);
goto unlock;
}
- if (!S_ISDIR(file_inode(fp)->i_mode)) {
- filp_close(fp, NULL);
+ if (!d_is_dir(path.dentry)) {
+ path_put(&path);
pr_err("db_root: not a directory: %s\n", db_root_stage);
+ r = -ENOTDIR;
goto unlock;
}
- filp_close(fp, NULL);
+ path_put(&path);
strscpy(db_root, db_root_stage);
pr_debug("Target_Core_ConfigFS: db_root set to %s\n", db_root);
base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787
--
2.34.1
The patch titled
Subject: mm: numa,memblock: include <asm/numa.h> for 'numa_nodes_parsed'
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-numamemblock-include-asm-numah-for-numa_nodes_parsed.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Ben Dooks <ben.dooks(a)codethink.co.uk>
Subject: mm: numa,memblock: include <asm/numa.h> for 'numa_nodes_parsed'
Date: Thu, 8 Jan 2026 10:15:39 +0000
The 'numa_nodes_parsed' is defined in <asm/numa.h> but this file
is not included in mm/numa_memblks.c (build x86_64) so add this
to the incldues to fix the following sparse warning:
mm/numa_memblks.c:13:12: warning: symbol 'numa_nodes_parsed' was not declared. Should it be static?
Link: https://lkml.kernel.org/r/20260108101539.229192-1-ben.dooks@codethink.co.uk
Fixes: 87482708210f ("mm: introduce numa_memblks")
Signed-off-by: Ben Dooks <ben.dooks(a)codethink.co.uk>
Cc: Ben Dooks <ben.dooks(a)codethink.co.uk>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/numa_memblks.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/numa_memblks.c~mm-numamemblock-include-asm-numah-for-numa_nodes_parsed
+++ a/mm/numa_memblks.c
@@ -7,6 +7,8 @@
#include <linux/numa.h>
#include <linux/numa_memblks.h>
+#include <asm/numa.h>
+
int numa_distance_cnt;
static u8 *numa_distance;
_
Patches currently in -mm which might be from ben.dooks(a)codethink.co.uk are
mm-numamemblock-include-asm-numah-for-numa_nodes_parsed.patch
When switching to regmap, the i2c_client pointer was removed from struct
pcf8563 so this function switched to using the RTC device instead. But
the RTC device is a child of the original I2C device and does not have
an associated of_node.
Reference the correct device's of_node to ensure that the output clock
can be found when referenced by other devices and so that the override
clock name is read correctly.
Cc: stable(a)vger.kernel.org
Fixes: 00f1bb9b8486b ("rtc: pcf8563: Switch to regmap")
Signed-off-by: John Keeping <jkeeping(a)inmusicbrands.com>
---
drivers/rtc/rtc-pcf8563.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/rtc/rtc-pcf8563.c b/drivers/rtc/rtc-pcf8563.c
index 4e61011fb7a96..b281e9489df1d 100644
--- a/drivers/rtc/rtc-pcf8563.c
+++ b/drivers/rtc/rtc-pcf8563.c
@@ -424,7 +424,7 @@ static const struct clk_ops pcf8563_clkout_ops = {
static struct clk *pcf8563_clkout_register_clk(struct pcf8563 *pcf8563)
{
- struct device_node *node = pcf8563->rtc->dev.of_node;
+ struct device_node *node = pcf8563->rtc->dev.parent->of_node;
struct clk_init_data init;
struct clk *clk;
int ret;
--
2.52.0
When a device is matched via PRP0001, the driver's OF (DT) match table
must be used to obtain the device match data. If a driver provides both
an acpi_match_table and an of_match_table, the current
acpi_device_get_match_data() path consults the driver's acpi_match_table
and returns NULL (no ACPI ID matches).
Explicitly detect PRP0001 and fetch match data from the driver's
of_match_table via acpi_of_device_get_match_data().
Fixes: 886ca88be6b3 ("ACPI / bus: Respect PRP0001 when retrieving device match data")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kartik Rajput <kkartik(a)nvidia.com>
---
Changes in v2:
* Fix build errors.
---
drivers/acpi/bus.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 5e110badac7b..6658c4339656 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1031,8 +1031,9 @@ const void *acpi_device_get_match_data(const struct device *dev)
{
const struct acpi_device_id *acpi_ids = dev->driver->acpi_match_table;
const struct acpi_device_id *match;
+ struct acpi_device *adev = ACPI_COMPANION(dev);
- if (!acpi_ids)
+ if (!strcmp(ACPI_DT_NAMESPACE_HID, acpi_device_hid(adev)))
return acpi_of_device_get_match_data(dev);
match = acpi_match_device(acpi_ids, dev);
--
2.43.0
When software issues a Cache Maintenance Operation (CMO) targeting a
dirty cache line, the CPU and DSU cluster may optimize the operation by
combining the CopyBack Write and CMO into a single combined CopyBack
Write plus CMO transaction presented to the interconnect (MCN).
For these combined transactions, the MCN splits the operation into two
separate transactions, one Write and one CMO, and then propagates the
write and optionally the CMO to the downstream memory system or external
Point of Serialization (PoS).
However, the MCN may return an early CompCMO response to the DSU cluster
before the corresponding Write and CMO transactions have completed at
the external PoS or downstream memory. As a result, stale data may be
observed by external observers that are directly connected to the
external PoS or downstream memory.
This erratum affects any system topology in which the following
conditions apply:
- The Point of Serialization (PoS) is located downstream of the
interconnect.
- A downstream observer accesses memory directly, bypassing the
interconnect.
Conditions:
This erratum occurs only when all of the following conditions are met:
1. Software executes a data cache maintenance operation, specifically,
a clean or invalidate by virtual address (DC CVAC, DC CIVAC, or DC
IVAC), that hits on unique dirty data in the CPU or DSU cache. This
results in a combined CopyBack and CMO being issued to the
interconnect.
2. The interconnect splits the combined transaction into separate Write
and CMO transactions and returns an early completion response to the
CPU or DSU before the write has completed at the downstream memory
or PoS.
3. A downstream observer accesses the affected memory address after the
early completion response is issued but before the actual memory
write has completed. This allows the observer to read stale data
that has not yet been updated at the PoS or downstream memory.
The implementation of workaround put a second loop of CMOs at the same
virtual address whose operation meet erratum conditions to wait until
cache data be cleaned to PoC.. This way of implementation mitigates
performance panalty compared to purly duplicate orignial CMO.
Reported-by: kernel test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org # 6.12.x
Signed-off-by: Lucas Wei <lucaswei(a)google.com>
---
Changes in v2:
1. Fixed warning from kernel test robot by changing
arm_si_l1_workaround_4311569 to static
[Reported-by: kernel test robot <lkp(a)intel.com>]
---
Documentation/arch/arm64/silicon-errata.rst | 3 ++
arch/arm64/Kconfig | 19 +++++++++++++
arch/arm64/include/asm/assembler.h | 10 +++++++
arch/arm64/kernel/cpu_errata.c | 31 +++++++++++++++++++++
arch/arm64/mm/cache.S | 13 ++++++++-
arch/arm64/tools/cpucaps | 1 +
6 files changed, 76 insertions(+), 1 deletion(-)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index a7ec57060f64..98efdf528719 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -213,6 +213,9 @@ stable kernels.
| ARM | GIC-700 | #2941627 | ARM64_ERRATUM_2941627 |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | SI L1 | #4311569 | ARM64_ERRATUM_4311569 |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_843419 |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 65db12f66b8f..a834d30859cc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1153,6 +1153,25 @@ config ARM64_ERRATUM_3194386
If unsure, say Y.
+config ARM64_ERRATUM_4311569
+ bool "SI L1: 4311569: workaround for premature CMO completion erratum"
+ default y
+ help
+ This option adds the workaround for ARM SI L1 erratum 4311569.
+
+ The erratum of SI L1 can cause an early response to a combined write
+ and cache maintenance operation (WR+CMO) before the operation is fully
+ completed to the Point of Serialization (POS).
+ This can result in a non-I/O coherent agent observing stale data,
+ potentially leading to system instability or incorrect behavior.
+
+ Enabling this option implements a software workaround by inserting a
+ second loop of Cache Maintenance Operation (CMO) immediately following the
+ end of function to do CMOs. This ensures that the data is correctly serialized
+ before the buffer is handed off to a non-coherent agent.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index f0ca7196f6fa..d3d46e5f7188 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -381,6 +381,9 @@ alternative_endif
.macro dcache_by_myline_op op, domain, start, end, linesz, tmp, fixup
sub \tmp, \linesz, #1
bic \start, \start, \tmp
+alternative_if ARM64_WORKAROUND_4311569
+ mov \tmp, \start
+alternative_else_nop_endif
.Ldcache_op\@:
.ifc \op, cvau
__dcache_op_workaround_clean_cache \op, \start
@@ -402,6 +405,13 @@ alternative_endif
add \start, \start, \linesz
cmp \start, \end
b.lo .Ldcache_op\@
+alternative_if ARM64_WORKAROUND_4311569
+ .ifnc \op, cvau
+ mov \start, \tmp
+ mov \tmp, xzr
+ cbnz \start, .Ldcache_op\@
+ .endif
+alternative_else_nop_endif
dsb \domain
_cond_uaccess_extable .Ldcache_op\@, \fixup
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 8cb3b575a031..5c0ab6bfd44a 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
return (ctr_real != sys) && (ctr_raw != sys);
}
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+static DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
+static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
+{
+ static_branch_enable(&arm_si_l1_workaround_4311569);
+ pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
+
+ return 0;
+}
+early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
+
+/*
+ * We have some earlier use cases to call cache maintenance operation functions, for example,
+ * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
+ * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
+ * safe to have the workaround off by default.
+ */
+static bool
+need_arm_si_l1_workaround_4311569(const struct arm64_cpu_capabilities *entry, int scope)
+{
+ return static_branch_unlikely(&arm_si_l1_workaround_4311569);
+}
+#endif
+
static void
cpu_enable_trap_ctr_access(const struct arm64_cpu_capabilities *cap)
{
@@ -870,6 +894,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
ERRATA_MIDR_RANGE_LIST(erratum_spec_ssbs_list),
},
#endif
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+ {
+ .capability = ARM64_WORKAROUND_4311569,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = need_arm_si_l1_workaround_4311569,
+ },
+#endif
#ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD
{
.desc = "ARM errata 2966298, 3117295",
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 503567c864fd..ddf0097624ed 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -143,9 +143,14 @@ SYM_FUNC_END(dcache_clean_pou)
* - end - kernel end address of region
*/
SYM_FUNC_START(__pi_dcache_inval_poc)
+alternative_if ARM64_WORKAROUND_4311569
+ mov x4, x0
+ mov x5, x1
+ mov x6, #1
+alternative_else_nop_endif
dcache_line_size x2, x3
sub x3, x2, #1
- tst x1, x3 // end cache line aligned?
+again: tst x1, x3 // end cache line aligned?
bic x1, x1, x3
b.eq 1f
dc civac, x1 // clean & invalidate D / U line
@@ -158,6 +163,12 @@ SYM_FUNC_START(__pi_dcache_inval_poc)
3: add x0, x0, x2
cmp x0, x1
b.lo 2b
+alternative_if ARM64_WORKAROUND_4311569
+ mov x0, x4
+ mov x1, x5
+ sub x6, x6, #1
+ cbz x6, again
+alternative_else_nop_endif
dsb sy
ret
SYM_FUNC_END(__pi_dcache_inval_poc)
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 1b32c1232d28..3b18734f9744 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -101,6 +101,7 @@ WORKAROUND_2077057
WORKAROUND_2457168
WORKAROUND_2645198
WORKAROUND_2658417
+WORKAROUND_4311569
WORKAROUND_AMPERE_AC03_CPU_38
WORKAROUND_AMPERE_AC04_CPU_23
WORKAROUND_TRBE_OVERWRITE_FILL_MODE
base-commit: edde060637b92607f3522252c03d64ad06369933
--
2.52.0.358.g0dd7633a29-goog
The size of the buffer is not the same when alloc'd with
dma_alloc_coherent() in he_init_tpdrq() and freed.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Thomas Fourier <fourier.thomas(a)gmail.com>
---
v1->v2:
- change Fixes: tag to before the change from pci-consistent to dma-coherent.
drivers/atm/he.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index ad91cc6a34fc..92a041d5387b 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -1587,7 +1587,8 @@ he_stop(struct he_dev *he_dev)
he_dev->tbrq_base, he_dev->tbrq_phys);
if (he_dev->tpdrq_base)
- dma_free_coherent(&he_dev->pci_dev->dev, CONFIG_TBRQ_SIZE * sizeof(struct he_tbrq),
+ dma_free_coherent(&he_dev->pci_dev->dev,
+ CONFIG_TPDRQ_SIZE * sizeof(struct he_tpdrq),
he_dev->tpdrq_base, he_dev->tpdrq_phys);
dma_pool_destroy(he_dev->tpd_pool);
--
2.43.0
When bnxt_init_one() fails during initialization (e.g.,
bnxt_init_int_mode returns -ENODEV), the error path calls
bnxt_free_hwrm_resources() which destroys the DMA pool and sets
bp->hwrm_dma_pool to NULL. Subsequently, bnxt_ptp_clear() is called,
which invokes ptp_clock_unregister().
Since commit a60fc3294a37 ("ptp: rework ptp_clock_unregister() to
disable events"), ptp_clock_unregister() now calls
ptp_disable_all_events(), which in turn invokes the driver's .enable()
callback (bnxt_ptp_enable()) to disable PTP events before completing the
unregistration.
bnxt_ptp_enable() attempts to send HWRM commands via bnxt_ptp_cfg_pin()
and bnxt_ptp_cfg_event(), both of which call hwrm_req_init(). This
function tries to allocate from bp->hwrm_dma_pool, causing a NULL
pointer dereference:
bnxt_en 0000:01:00.0 (unnamed net_device) (uninitialized): bnxt_init_int_mode err: ffffffed
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
Call Trace:
__hwrm_req_init (drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.c:72)
bnxt_ptp_enable (drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:323 drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:517)
ptp_disable_all_events (drivers/ptp/ptp_chardev.c:66)
ptp_clock_unregister (drivers/ptp/ptp_clock.c:518)
bnxt_ptp_clear (drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:1134)
bnxt_init_one (drivers/net/ethernet/broadcom/bnxt/bnxt.c:16889)
Lines are against commit f8f9c1f4d0c7 ("Linux 6.19-rc3")
Fix this by clearing and unregistering ptp (bnxt_ptp_clear()) before
freeing HWRM resources.
Suggested-by: Pavan Chebbi <pavan.chebbi(a)broadcom.com>
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Fixes: a60fc3294a37 ("ptp: rework ptp_clock_unregister() to disable events")
Cc: stable(a)vger.kernel.org
---
Changes in v3:
- Moved bp->ptp_cfg to be closer to the kfree(). (Pavan/Jakub)
- Link to v2: https://patch.msgid.link/20260105-bnxt-v2-1-9ac69edef726@debian.org
Changes in v2:
- Instead of checking for HWRM resources in bnxt_ptp_enable(), call it
when HWRM resources are availble (Pavan Chebbi)
- Link to v1: https://patch.msgid.link/20251231-bnxt-v1-1-8f9cde6698b4@debian.org
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d160e54ac121..8419d1eb4035 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -16891,12 +16891,12 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
init_err_pci_clean:
bnxt_hwrm_func_drv_unrgtr(bp);
- bnxt_free_hwrm_resources(bp);
- bnxt_hwmon_uninit(bp);
- bnxt_ethtool_free(bp);
bnxt_ptp_clear(bp);
kfree(bp->ptp_cfg);
bp->ptp_cfg = NULL;
+ bnxt_free_hwrm_resources(bp);
+ bnxt_hwmon_uninit(bp);
+ bnxt_ethtool_free(bp);
kfree(bp->fw_health);
bp->fw_health = NULL;
bnxt_cleanup_pci(bp);
---
base-commit: e146b276a817807b8f4a94b5781bf80c6c00601b
change-id: 20251231-bnxt-c54d317d8bfe
Best regards,
--
Breno Leitao <leitao(a)debian.org>
From: Willem de Bruijn <willemb(a)google.com>
NULL pointer dereference fix.
msg_get_inq is an input field from caller to callee. Don't set it in
the callee, as the caller may not clear it on struct reuse.
This is a kernel-internal variant of msghdr only, and the only user
does reinitialize the field. So this is not critical for that reason.
But it is more robust to avoid the write, and slightly simpler code.
And it fixes a bug, see below.
Callers set msg_get_inq to request the input queue length to be
returned in msg_inq. This is equivalent to but independent from the
SO_INQ request to return that same info as a cmsg (tp->recvmsg_inq).
To reduce branching in the hot path the second also sets the msg_inq.
That is WAI.
This is a fix to commit 4d1442979e4a ("af_unix: don't post cmsg for
SO_INQ unless explicitly asked for"), which fixed the inverse.
Also avoid NULL pointer dereference in unix_stream_read_generic if
state->msg is NULL and msg->msg_get_inq is written. A NULL state->msg
can happen when splicing as of commit 2b514574f7e8 ("net: af_unix:
implement splice for stream af_unix sockets").
Also collapse two branches using a bitwise or.
Cc: stable(a)vger.kernel.org
Fixes: 4d1442979e4a ("af_unix: don't post cmsg for SO_INQ unless explicitly asked for")
Link: https://lore.kernel.org/netdev/willemdebruijn.kernel.24d8030f7a3de@gmail.co…
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
Jens, I dropped your Reviewed-by because of the commit message updates.
But code is unchanged.
changes nn v1 -> net v1
- add Fixes tag and explain reason
- redirect to net
- s/caller/callee in subject line
nn v1: https://lore.kernel.org/netdev/20260105163338.3461512-1-willemdebruijn.kern…
---
net/ipv4/tcp.c | 8 +++-----
net/unix/af_unix.c | 8 +++-----
2 files changed, 6 insertions(+), 10 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f035440c475a..d5319ebe2452 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2652,10 +2652,8 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
if (sk->sk_state == TCP_LISTEN)
goto out;
- if (tp->recvmsg_inq) {
+ if (tp->recvmsg_inq)
*cmsg_flags = TCP_CMSG_INQ;
- msg->msg_get_inq = 1;
- }
timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
/* Urgent data needs to be handled specially. */
@@ -2929,10 +2927,10 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
ret = tcp_recvmsg_locked(sk, msg, len, flags, &tss, &cmsg_flags);
release_sock(sk);
- if ((cmsg_flags || msg->msg_get_inq) && ret >= 0) {
+ if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
if (cmsg_flags & TCP_CMSG_TS)
tcp_recv_timestamp(msg, sk, &tss);
- if (msg->msg_get_inq) {
+ if ((cmsg_flags & TCP_CMSG_INQ) | msg->msg_get_inq) {
msg->msg_inq = tcp_inq_hint(sk);
if (cmsg_flags & TCP_CMSG_INQ)
put_cmsg(msg, SOL_TCP, TCP_CM_INQ,
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index a7ca74653d94..d0511225799b 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2904,7 +2904,6 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
unsigned int last_len;
struct unix_sock *u;
int copied = 0;
- bool do_cmsg;
int err = 0;
long timeo;
int target;
@@ -2930,9 +2929,6 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
u = unix_sk(sk);
- do_cmsg = READ_ONCE(u->recvmsg_inq);
- if (do_cmsg)
- msg->msg_get_inq = 1;
redo:
/* Lock the socket to prevent queue disordering
* while sleeps in memcpy_tomsg
@@ -3090,9 +3086,11 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
mutex_unlock(&u->iolock);
if (msg) {
+ bool do_cmsg = READ_ONCE(u->recvmsg_inq);
+
scm_recv_unix(sock, msg, &scm, flags);
- if (msg->msg_get_inq && (copied ?: err) >= 0) {
+ if ((do_cmsg | msg->msg_get_inq) && (copied ?: err) >= 0) {
msg->msg_inq = READ_ONCE(u->inq_len);
if (do_cmsg)
put_cmsg(msg, SOL_SOCKET, SCM_INQ,
--
2.52.0.351.gbe84eed79e-goog
From: Sean Christopherson <seanjc(a)google.com>
When loading guest XSAVE state via KVM_SET_XSAVE, and when updating XFD in
response to a guest WRMSR, clear XFD-disabled features in the saved (or to
be restored) XSTATE_BV to ensure KVM doesn't attempt to load state for
features that are disabled via the guest's XFD. Because the kernel
executes XRSTOR with the guest's XFD, saving XSTATE_BV[i]=1 with XFD[i]=1
will cause XRSTOR to #NM and panic the kernel.
E.g. if fpu_update_guest_xfd() sets XFD without clearing XSTATE_BV:
------------[ cut here ]------------
WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#29: amx_test/848
Modules linked in: kvm_intel kvm irqbypass
CPU: 29 UID: 1000 PID: 848 Comm: amx_test Not tainted 6.19.0-rc2-ffa07f7fd437-x86_amx_nm_xfd_non_init-vm #171 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:exc_device_not_available+0x101/0x110
Call Trace:
<TASK>
asm_exc_device_not_available+0x1a/0x20
RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
switch_fpu_return+0x4a/0xb0
kvm_arch_vcpu_ioctl_run+0x1245/0x1e40 [kvm]
kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x62/0x940
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
This can happen if the guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1,
and a host IRQ triggers kernel_fpu_begin() prior to the vmexit handler's
call to fpu_update_guest_xfd().
and if userspace stuffs XSTATE_BV[i]=1 via KVM_SET_XSAVE:
------------[ cut here ]------------
WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#14: amx_test/867
Modules linked in: kvm_intel kvm irqbypass
CPU: 14 UID: 1000 PID: 867 Comm: amx_test Not tainted 6.19.0-rc2-2dace9faccd6-x86_amx_nm_xfd_non_init-vm #168 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:exc_device_not_available+0x101/0x110
Call Trace:
<TASK>
asm_exc_device_not_available+0x1a/0x20
RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
fpu_swap_kvm_fpstate+0x6b/0x120
kvm_load_guest_fpu+0x30/0x80 [kvm]
kvm_arch_vcpu_ioctl_run+0x85/0x1e40 [kvm]
kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x62/0x940
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
The new behavior is consistent with the AMX architecture. Per Intel's SDM,
XSAVE saves XSTATE_BV as '0' for components that are disabled via XFD
(and non-compacted XSAVE saves the initial configuration of the state
component):
If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i,
the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1;
instead, it operates as if XINUSE[i] = 0 (and the state component was
in its initial state): it saves bit i of XSTATE_BV field of the XSAVE
header as 0; in addition, XSAVE saves the initial configuration of the
state component (the other instructions do not save state component i).
Alternatively, KVM could always do XRSTOR with XFD=0, e.g. by using
a constant XFD based on the set of enabled features when XSAVEing for
a struct fpu_guest. However, having XSTATE_BV[i]=1 for XFD-disabled
features can only happen in the above interrupt case, or in similar
scenarios involving preemption on preemptible kernels, because
fpu_swap_kvm_fpstate()'s call to save_fpregs_to_fpstate() saves the
outgoing FPU state with the current XFD; and that is (on all but the
first WRMSR to XFD) the guest XFD.
Therefore, XFD can only go out of sync with XSTATE_BV in the above
interrupt case, or in similar scenarios involving preemption on
preemptible kernels, and it we can consider it (de facto) part of KVM
ABI that KVM_GET_XSAVE returns XSTATE_BV[i]=0 for XFD-disabled features.
Reported-by: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: 820a6ee944e7 ("kvm: x86: Add emulation for IA32_XFD", 2022-01-14)
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
[Move clearing of XSTATE_BV from fpu_copy_uabi_to_guest_fpstate
to kvm_vcpu_ioctl_x86_set_xsave. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kernel/fpu/core.c | 32 +++++++++++++++++++++++++++++---
arch/x86/kvm/x86.c | 9 +++++++++
2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index da233f20ae6f..166c380b0161 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -319,10 +319,29 @@ EXPORT_SYMBOL_FOR_KVM(fpu_enable_guest_xfd_features);
#ifdef CONFIG_X86_64
void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd)
{
+ struct fpstate *fpstate = guest_fpu->fpstate;
+
fpregs_lock();
- guest_fpu->fpstate->xfd = xfd;
- if (guest_fpu->fpstate->in_use)
- xfd_update_state(guest_fpu->fpstate);
+
+ /*
+ * KVM's guest ABI is that setting XFD[i]=1 *can* immediately revert
+ * the save state to initialized. Likewise, KVM_GET_XSAVE does the
+ * same as XSAVE and returns XSTATE_BV[i]=0 whenever XFD[i]=1.
+ *
+ * If the guest's FPU state is in hardware, just update XFD: the XSAVE
+ * in fpu_swap_kvm_fpstate will clear XSTATE_BV[i] whenever XFD[i]=1.
+ *
+ * If however the guest's FPU state is NOT resident in hardware, clear
+ * disabled components in XSTATE_BV now, or a subsequent XRSTOR will
+ * attempt to load disabled components and generate #NM _in the host_.
+ */
+ if (xfd && test_thread_flag(TIF_NEED_FPU_LOAD))
+ fpstate->regs.xsave.header.xfeatures &= ~xfd;
+
+ fpstate->xfd = xfd;
+ if (fpstate->in_use)
+ xfd_update_state(fpstate);
+
fpregs_unlock();
}
EXPORT_SYMBOL_FOR_KVM(fpu_update_guest_xfd);
@@ -430,6 +449,13 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;
+ /*
+ * Disabled features must be in their initial state, otherwise XRSTOR
+ * causes an exception.
+ */
+ if (WARN_ON_ONCE(ustate->xsave.header.xfeatures & kstate->xfd))
+ return -EINVAL;
+
/*
* Nullify @vpkru to preserve its current value if PKRU's bit isn't set
* in the header. KVM's odd ABI is to leave PKRU untouched in this
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff8812f3a129..c0416f53b5f5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5807,9 +5807,18 @@ static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave)
{
+ union fpregs_state *xstate = (union fpregs_state *)guest_xsave->region;
+
if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
+ /*
+ * Do not reject non-initialized disabled features for backwards
+ * compatibility, but clear XSTATE_BV[i] whenever XFD[i]=1.
+ * Otherwise, XRSTOR would cause a #NM.
+ */
+ xstate->xsave.header.xfeatures &= ~vcpu->arch.guest_fpu.fpstate->xfd;
+
return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu,
guest_xsave->region,
kvm_caps.supported_xcr0,
--
2.52.0
Recenly when test uvc gadget function I find some YUYV pixel format
720p and 1080p stream can't output normally. However, small resulution
and MJPEG format stream works fine. The first patch#1 is to fix the issue.
Patch#2 and #3 are small fix or improvement.
For patch#4: it's a workaround for a long-term issue in videobuf2. With
it, many device can work well and not solely based on the SG allocation
method.
Signed-off-by: Xu Yang <xu.yang_2(a)nxp.com>
---
Xu Yang (4):
usb: gadget: uvc: fix req_payload_size calculation
usb: gadget: uvc: fix interval_duration calculation
usb: gadget: uvc: improve error handling in uvcg_video_init()
usb: gadget: uvc: retry vb2_reqbufs() with vb_vmalloc_memops if use_sg fail
drivers/usb/gadget/function/f_uvc.c | 4 ++++
drivers/usb/gadget/function/uvc.h | 3 ++-
drivers/usb/gadget/function/uvc_queue.c | 23 +++++++++++++++++++----
drivers/usb/gadget/function/uvc_video.c | 14 +++++++-------
4 files changed, 32 insertions(+), 12 deletions(-)
---
base-commit: 56a512a9b4107079f68701e7d55da8507eb963d9
change-id: 20260108-uvc-gadget-fix-patch-aa5996332bb5
Best regards,
--
Xu Yang <xu.yang_2(a)nxp.com>
Commit 7346e7a058a2 ("pwm: stm32: Always do lazy disabling") triggered a
regression where PWM polarity changes could be ignored.
stm32_pwm_set_polarity() was skipped due to a mismatch between the
cached pwm->state.polarity and the actual hardware state, leaving the
hardware polarity unchanged.
Fixes: 7edf7369205b ("pwm: Add driver for STM32 plaftorm")
Cc: stable(a)vger.kernel.org # <= 6.12
Signed-off-by: Sean Nyekjaer <sean(a)geanix.com>
Co-developed-by: Uwe Kleine-König <ukleinek(a)kernel.org>
---
This patch is only applicable for stable tree's <= 6.12
---
Changes in v2:
- Taken patch improvements for Uwe
- Link to v1: https://lore.kernel.org/r/20260106-stm32-pwm-v1-1-33e9e8a9fc33@geanix.com
---
drivers/pwm/pwm-stm32.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c
index eb24054f9729734da21eb96f2e37af03339e3440..86e6eb7396f67990249509dd347cb5a60c9ccf16 100644
--- a/drivers/pwm/pwm-stm32.c
+++ b/drivers/pwm/pwm-stm32.c
@@ -458,8 +458,7 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
return 0;
}
- if (state->polarity != pwm->state.polarity)
- stm32_pwm_set_polarity(priv, pwm->hwpwm, state->polarity);
+ stm32_pwm_set_polarity(priv, pwm->hwpwm, state->polarity);
ret = stm32_pwm_config(priv, pwm->hwpwm,
state->duty_cycle, state->period);
---
base-commit: eb18504ca5cf1e6a76a752b73daf0ef51de3551b
change-id: 20260105-stm32-pwm-91cb843680f4
Best regards,
--
Sean Nyekjaer <sean(a)geanix.com>
Backport commit:5701875f9609 ("ext4: fix out-of-bound read in
ext4_xattr_inode_dec_ref_all()" to linux 5.10 branch.
The fix depends on commit:69f3a3039b0d ("ext4: introduce ITAIL helper")
In order to make a clean backport on stable kernel, backport 2 commits.
It has a single merge conflict where static inline int, which changed
to static int.
Signed-off-by: David Nyström <david.nystrom(a)est.tech>
---
Changes in v2:
- Resend identical patchset with correct "Upstream commit" denotation.
- Link to v1: https://patch.msgid.link/20251216-ext4_splat-v1-0-b76fd8748f44@est.tech
---
Ye Bin (2):
ext4: introduce ITAIL helper
ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all()
fs/ext4/inode.c | 5 +++++
fs/ext4/xattr.c | 32 ++++----------------------------
fs/ext4/xattr.h | 10 ++++++++++
3 files changed, 19 insertions(+), 28 deletions(-)
---
base-commit: f964b940099f9982d723d4c77988d4b0dda9c165
change-id: 20251215-ext4_splat-f59c1acd9e88
Best regards,
--
David Nyström <david.nystrom(a)est.tech>
From: Steven Rostedt <rostedt(a)goodmis.org>
A bug was reported about an infinite recursion caused by tracing the rcu
events with the kernel stack trace trigger enabled. The stack trace code
called back into RCU which then called the stack trace again.
Expand the ftrace recursion protection to add a set of bits to protect
events from recursion. Each bit represents the context that the event is
in (normal, softirq, interrupt and NMI).
Have the stack trace code use the interrupt context to protect against
recursion.
Note, the bug showed an issue in both the RCU code as well as the tracing
stacktrace code. This only handles the tracing stack trace side of the
bug. The RCU fix will be handled separately.
Link: https://lore.kernel.org/all/20260102122807.7025fc87@gandalf.local.home/
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Link: https://patch.msgid.link/20260105203141.515cd49f@gandalf.local.home
Reported-by: Yao Kai <yaokai34(a)huawei.com>
Tested-by: Yao Kai <yaokai34(a)huawei.com>
Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in __rcu_read_unlock()")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/trace_recursion.h | 9 +++++++++
kernel/trace/trace.c | 6 ++++++
2 files changed, 15 insertions(+)
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index ae04054a1be3..e6ca052b2a85 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -34,6 +34,13 @@ enum {
TRACE_INTERNAL_SIRQ_BIT,
TRACE_INTERNAL_TRANSITION_BIT,
+ /* Internal event use recursion bits */
+ TRACE_INTERNAL_EVENT_BIT,
+ TRACE_INTERNAL_EVENT_NMI_BIT,
+ TRACE_INTERNAL_EVENT_IRQ_BIT,
+ TRACE_INTERNAL_EVENT_SIRQ_BIT,
+ TRACE_INTERNAL_EVENT_TRANSITION_BIT,
+
TRACE_BRANCH_BIT,
/*
* Abuse of the trace_recursion.
@@ -58,6 +65,8 @@ enum {
#define TRACE_LIST_START TRACE_INTERNAL_BIT
+#define TRACE_EVENT_START TRACE_INTERNAL_EVENT_BIT
+
#define TRACE_CONTEXT_MASK ((1 << (TRACE_LIST_START + TRACE_CONTEXT_BITS)) - 1)
/*
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6f2148df14d9..aef9058537d5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3012,6 +3012,11 @@ static void __ftrace_trace_stack(struct trace_array *tr,
struct ftrace_stack *fstack;
struct stack_entry *entry;
int stackidx;
+ int bit;
+
+ bit = trace_test_and_set_recursion(_THIS_IP_, _RET_IP_, TRACE_EVENT_START);
+ if (bit < 0)
+ return;
/*
* Add one, for this function and the call to save_stack_trace()
@@ -3080,6 +3085,7 @@ static void __ftrace_trace_stack(struct trace_array *tr,
/* Again, don't let gcc optimize things here */
barrier();
__this_cpu_dec(ftrace_stack_reserve);
+ trace_clear_recursion(bit);
}
static inline void ftrace_trace_stack(struct trace_array *tr,
--
2.51.0
From: Steven Rostedt <rostedt(a)goodmis.org>
The code has integrity checks to make sure that depth never goes below
zero. But the depth field has recently been converted to unsigned long
from "int" (for alignment reasons). As unsigned long can never be less
than zero, the integrity checks no longer work.
Convert depth to long from unsigned long to allow the integrity checks to
work again.
Cc: stable(a)vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: pengdonglin <pengdonglin(a)xiaomi.com>
Link: https://patch.msgid.link/20260102143148.251c2e16@gandalf.local.home
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/all/aS6kGi0maWBl-MjZ@stanley.mountain/
Fixes: f83ac7544fbf7 ("function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
---
include/linux/ftrace.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 770f0dc993cc..a3a8989e3268 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1167,7 +1167,7 @@ static inline void ftrace_init(void) { }
*/
struct ftrace_graph_ent {
unsigned long func; /* Current function */
- unsigned long depth;
+ long depth; /* signed to check for less than zero */
} __packed;
/*
--
2.51.0
If a process receives a signal while it executes some kernel code that
calls mm_take_all_locks, we get -EINTR error. The -EINTR is propagated up
the call stack to userspace and userspace may fail if it gets this
error.
This commit changes -EINTR to -ERESTARTSYS, so that if the signal handler
was installed with the SA_RESTART flag, the operation is automatically
restarted.
For example, this problem happens when using OpenCL on AMDGPU. If some
signal races with clGetDeviceIDs, clGetDeviceIDs returns an error
CL_DEVICE_NOT_FOUND (and strace shows that open("/dev/kfd") failed with
EINTR).
This problem can be reproduced with the following program.
To run this program, you need AMD graphics card and the package
"rocm-opencl" installed. You must not have the package "mesa-opencl-icd"
installed, because it redirects the default OpenCL implementation to
itself.
include <stdio.h>
include <stdlib.h>
include <unistd.h>
include <string.h>
include <signal.h>
include <sys/time.h>
define CL_TARGET_OPENCL_VERSION 300
include <CL/opencl.h>
static void fn(void)
{
while (1) {
int32_t err;
cl_device_id device;
err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
if (err != CL_SUCCESS) {
fprintf(stderr, "clGetDeviceIDs failed: %d\n", err);
exit(1);
}
write(2, "-", 1);
}
}
static void alrm(int sig)
{
write(2, ".", 1);
}
int main(void)
{
struct itimerval it;
struct sigaction sa;
memset(&sa, 0, sizeof sa);
sa.sa_handler = alrm;
sa.sa_flags = SA_RESTART;
sigaction(SIGALRM, &sa, NULL);
it.it_interval.tv_sec = 0;
it.it_interval.tv_usec = 50;
it.it_value.tv_sec = 0;
it.it_value.tv_usec = 50;
setitimer(ITIMER_REAL, &it, NULL);
fn();
return 1;
}
I'm submitting this patch for the stable kernels, because the AMD ROCm
stack fails if it receives EINTR from open (it seems to restart EINTR
from ioctl correctly). The process may receive signals at unpredictable
times, so the OpenCL implementation may fail at unpredictable times.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Link: https://lists.freedesktop.org/archives/amd-gfx/2025-November/133141.html
Link: https://yhbt.net/lore/linux-mm/6f16b618-26fc-3031-abe8-65c2090262e7@redhat.…
Cc: stable(a)vger.kernel.org
Fixes: 7906d00cd1f6 ("mmu-notifiers: add mm_take_all_locks() operation")
---
mm/vma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: mm/mm/vma.c
===================================================================
--- mm.orig/mm/vma.c 2026-01-07 20:11:21.000000000 +0100
+++ mm/mm/vma.c 2026-01-07 20:11:21.000000000 +0100
@@ -2202,7 +2202,7 @@ int mm_take_all_locks(struct mm_struct *
out_unlock:
mm_drop_all_locks(mm);
- return -EINTR;
+ return -ERESTARTSYS;
}
static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
Hi all,
This backport wires up AMD perfmon v2 so BPF and other software clients
can snapshot LBR stacks on demand, similar to the Intel support
upstream. The series keeps the LBR-freeze path branchless, adds the
perf_snapshot_branch_stack callback for AMD, and drops the
sampling-only restriction now that snapshots can be taken from software
contexts.
Leon Hwang (4):
perf/x86/amd: Ensure amd_pmu_core_disable_all() is always inlined
perf/x86/amd: Avoid taking branches before disabling LBR
perf/x86/amd: Support capturing LBR from software events
perf/x86/amd: Don't reject non-sampling events with configured LBR
arch/x86/events/amd/core.c | 37 +++++++++++++++++++++++++++++++++++-
arch/x86/events/amd/lbr.c | 13 +------------
arch/x86/events/perf_event.h | 13 +++++++++++++
3 files changed, 50 insertions(+), 13 deletions(-)
--
2.52.0
Hello stable maintainers,
Several Debian users reported a regression after updating to kernel
version 5.10.247.
Commit f0982400648a ("fbdev: Add bounds checking in bit_putcs to fix
vmalloc-out-of-bounds"), a backport of upstream commit 3637d34b35b2,
depends on vc_data::vc_font.charcount being initialised correctly.
However, before commit a1ac250a82a5 ("fbcon: Avoid using FNTCHARCNT()
and hard-coded built-in font charcount") in 5.11, this member was set
to 256 for VTs initially created with a built-in font and 0 for VTs
initially created with a user font.
Since Debian normally sets a user font before creating VTs 2 and up,
those additional VTs became unusable. VT 1 also doesn't work correctly
if the user font has > 256 characters, and the bounds check is
ineffective if it has < 256 characters.
This can be fixed by backporting the following commits from 5.11:
7a089ec7d77f console: Delete unused con_font_copy() callback implementations
259a252c1f4e console: Delete dummy con_font_set() and con_font_default() callback implementations
4ee573086bd8 Fonts: Add charcount field to font_desc
4497364e5f61 parisc/sticore: Avoid hard-coding built-in font charcount
a1ac250a82a5 fbcon: Avoid using FNTCHARCNT() and hard-coded built-in font charcount
These all apply without fuzz and builds cleanly for x86_64 and parisc64.
I tested on x86_64 that:
- VT 2 works again
- bit_putcs_aligned() is setting charcnt = 256
- After loading a font with 512 characters, bit_putcs_aligned() sets
charcnt = 512 and is able to display characters at positions >= 256
Ben.
--
Ben Hutchings
Man invented language to satisfy his deep need to complain.
- Lily Tomlin
Hi,
I would like to request backporting 5326ab737a47 ("virtio_console: fix
order of fields cols and rows") to all LTS kernels.
I'm working on QEMU patches that add virtio console size support.
Without the fix, rows and columns will be swapped.
As far as I know, there are no device implementations that use the
wrong order and would by broken by the fix.
Note: A previous version [1] of the patch contained "Cc: stable" and
"Fixes:" tags, but they seem to have been accidentally left out from
the final version.
[1]: https://lore.kernel.org/all/20250320172654.624657-1-maxbr@linux.ibm.com/
Thanks,
Filip Hejsek
Hello,
please backport
commit fbf5892df21a8ccfcb2fda0fd65bc3169c89ed28
Author: Martin Nybo Andersen <tweek(a)tweek.dk>
Date: Fri Sep 15 12:15:39 2023 +0200
kbuild: Use CRC32 and a 1MiB dictionary for XZ compressed modules
Kmod is now (since kmod commit 09c9f8c5df04 ("libkmod: Use kernel
decompression when available")) using the kernel decompressor, when
loading compressed modules.
However, the kernel XZ decompressor is XZ Embedded, which doesn't
handle CRC64 and dictionaries larger than 1MiB.
Use CRC32 and 1MiB dictionary when XZ compressing and installing
kernel modules.
to the 6.1 stable kernel, and possibly older ones as well.
The commit message actually has it all, so just my story: There's a
hardware that has or had issues with never kernels (no time to check),
my kernel for this board is usually static. But after building a kernel
with xz-compressed modules, they wouldn't load but trigger
"decompression failed with status 6". Investigation led to a CRC64 check
for these files, and eventually to the above commit.
The commit applies (with an offset), the resulting modules work as
expected.
Kernel 6.6 and newer already have that commit. Older kernels could
possibly benefit from this as well, I haven't checked.
Kind regards,
Christoph
The ov02c10 is capable of having its (crop) window shifted around with 1
pixel precision while streaming.
This allows changing the x/y window coordinates when changing flipping to
preserve the bayer-pattern.
__v4l2_ctrl_handler_setup() will now write the window coordinates at 0x3810
and 0x3812 so these can be dropped from sensor_1928x1092_30fps_setting.
Since the bayer-pattern is now unchanged, the V4L2_CTRL_FLAG_MODIFY_LAYOUT
flag can be dropped from the flip controls.
Note the original use of the V4L2_CTRL_FLAG_MODIFY_LAYOUT flag was
incomplete, besides setting the flag the driver should also have reported
a different mbus code when getting the source pad's format depending on
the hflip / vflip settings see the ov2680.c driver for example.
Fixes: b7cd2ba3f692 ("media: ov02c10: Support hflip and vflip")
Cc: stable(a)vger.kernel.org
Cc: Sebastian Reichel <sre(a)kernel.org>
Reviewed-by: Bryan O'Donoghue <bod(a)kernel.org>
Signed-off-by: Hans de Goede <johannes.goede(a)oss.qualcomm.com>
---
drivers/media/i2c/ov02c10.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/media/i2c/ov02c10.c b/drivers/media/i2c/ov02c10.c
index 6369841de88b..384c2f0b1608 100644
--- a/drivers/media/i2c/ov02c10.c
+++ b/drivers/media/i2c/ov02c10.c
@@ -165,10 +165,6 @@ static const struct reg_sequence sensor_1928x1092_30fps_setting[] = {
{0x3809, 0x88},
{0x380a, 0x04},
{0x380b, 0x44},
- {0x3810, 0x00},
- {0x3811, 0x02},
- {0x3812, 0x00},
- {0x3813, 0x01},
{0x3814, 0x01},
{0x3815, 0x01},
{0x3816, 0x01},
@@ -465,11 +461,15 @@ static int ov02c10_set_ctrl(struct v4l2_ctrl *ctrl)
break;
case V4L2_CID_HFLIP:
+ cci_write(ov02c10->regmap, OV02C10_ISP_X_WIN_CONTROL,
+ ctrl->val ? 1 : 2, &ret);
cci_update_bits(ov02c10->regmap, OV02C10_ROTATE_CONTROL,
BIT(3), ov02c10->hflip->val << 3, &ret);
break;
case V4L2_CID_VFLIP:
+ cci_write(ov02c10->regmap, OV02C10_ISP_Y_WIN_CONTROL,
+ ctrl->val ? 2 : 1, &ret);
cci_update_bits(ov02c10->regmap, OV02C10_ROTATE_CONTROL,
BIT(4), ov02c10->vflip->val << 4, &ret);
break;
@@ -551,13 +551,9 @@ static int ov02c10_init_controls(struct ov02c10 *ov02c10)
ov02c10->hflip = v4l2_ctrl_new_std(ctrl_hdlr, &ov02c10_ctrl_ops,
V4L2_CID_HFLIP, 0, 1, 1, 0);
- if (ov02c10->hflip)
- ov02c10->hflip->flags |= V4L2_CTRL_FLAG_MODIFY_LAYOUT;
ov02c10->vflip = v4l2_ctrl_new_std(ctrl_hdlr, &ov02c10_ctrl_ops,
V4L2_CID_VFLIP, 0, 1, 1, 0);
- if (ov02c10->vflip)
- ov02c10->vflip->flags |= V4L2_CTRL_FLAG_MODIFY_LAYOUT;
v4l2_ctrl_new_std_menu_items(ctrl_hdlr, &ov02c10_ctrl_ops,
V4L2_CID_TEST_PATTERN,
--
2.52.0
After commit 7346e7a058a2 ("pwm: stm32: Always do lazy disabling"),
polarity changes are ignored. Updates to the TIMx_CCER CCxP bits are
ignored by the hardware when the master output is enabled via the
TIMx_BDTR MOE bit.
Handle polarity changes by temporarily disabling the PWM when required,
in line with apply() implementations used by other PWM drivers.
Fixes: 7346e7a058a2 ("pwm: stm32: Always do lazy disabling")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Nyekjaer <sean(a)geanix.com>
---
This patch is only applicable for stable tree's <= 6.12
How to explicitly state that and what is the procedure?
---
drivers/pwm/pwm-stm32.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c
index eb24054f9729734da21eb96f2e37af03339e3440..d5f79e87a0653e1710d46e6bf9268a59638943fe 100644
--- a/drivers/pwm/pwm-stm32.c
+++ b/drivers/pwm/pwm-stm32.c
@@ -452,15 +452,23 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
enabled = pwm->state.enabled;
+ if (state->polarity != pwm->state.polarity) {
+ if (enabled) {
+ stm32_pwm_disable(priv, pwm->hwpwm);
+ enabled = false;
+ }
+
+ ret = stm32_pwm_set_polarity(priv, pwm->hwpwm, state->polarity);
+ if (ret)
+ return ret;
+ }
+
if (!state->enabled) {
if (enabled)
stm32_pwm_disable(priv, pwm->hwpwm);
return 0;
}
- if (state->polarity != pwm->state.polarity)
- stm32_pwm_set_polarity(priv, pwm->hwpwm, state->polarity);
-
ret = stm32_pwm_config(priv, pwm->hwpwm,
state->duty_cycle, state->period);
if (ret)
---
base-commit: eb18504ca5cf1e6a76a752b73daf0ef51de3551b
change-id: 20260105-stm32-pwm-91cb843680f4
Best regards,
--
Sean Nyekjaer <sean(a)geanix.com>
There is a use-after-free error in cfg80211_shutdown_all_interfaces found
by syzkaller:
BUG: KASAN: use-after-free in cfg80211_shutdown_all_interfaces+0x213/0x220
Read of size 8 at addr ffff888112a78d98 by task kworker/0:5/5326
CPU: 0 UID: 0 PID: 5326 Comm: kworker/0:5 Not tainted 6.19.0-rc2 #2 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: events cfg80211_rfkill_block_work
Call Trace:
<TASK>
dump_stack_lvl+0x116/0x1f0
print_report+0xcd/0x630
kasan_report+0xe0/0x110
cfg80211_shutdown_all_interfaces+0x213/0x220
cfg80211_rfkill_block_work+0x1e/0x30
process_one_work+0x9cf/0x1b70
worker_thread+0x6c8/0xf10
kthread+0x3c5/0x780
ret_from_fork+0x56d/0x700
ret_from_fork_asm+0x1a/0x30
</TASK>
The problem arises due to the rfkill_block work is not cancelled when
cfg80211 device is being freed. In order to fix the issue cancel the
corresponding work before destroying rfkill in cfg80211_dev_free().
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 1f87f7d3a3b4 ("cfg80211: add rfkill support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Daniil Dulov <d.dulov(a)aladdin.ru>
---
net/wireless/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 54a34d8d356e..e94f69205f50 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1226,6 +1226,7 @@ void cfg80211_dev_free(struct cfg80211_registered_device *rdev)
spin_unlock_irqrestore(&rdev->wiphy_work_lock, flags);
cancel_work_sync(&rdev->wiphy_work);
+ cancel_work_sync(&rdev->rfkill_block);
rfkill_destroy(rdev->wiphy.rfkill);
list_for_each_entry_safe(reg, treg, &rdev->beacon_registrations, list) {
list_del(®->list);
--
2.34.1
When DRM_BUDDY_CONTIGUOUS_ALLOCATION is set, the requested size is
rounded up to the next power-of-two via roundup_pow_of_two().
Similarly, for non-contiguous allocations with large min_block_size,
the size is aligned up via round_up(). Both operations can produce a
rounded size that exceeds mm->size, which later triggers
BUG_ON(order > mm->max_order).
Example scenarios:
- 9G CONTIGUOUS allocation on 10G VRAM memory:
roundup_pow_of_two(9G) = 16G > 10G
- 9G allocation with 8G min_block_size on 10G VRAM memory:
round_up(9G, 8G) = 16G > 10G
Fix this by checking the rounded size against mm->size. For
non-contiguous or range allocations where size > mm->size is invalid,
return -EINVAL immediately. For contiguous allocations without range
restrictions, allow the request to fall through to the existing
__alloc_contig_try_harder() fallback.
This ensures invalid user input returns an error or uses the fallback
path instead of hitting BUG_ON.
v2: (Matt A)
- Add Fixes, Cc stable, and Closes tags for context
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6712
Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
Cc: <stable(a)vger.kernel.org> # v6.7+
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam(a)amd.com>
Suggested-by: Matthew Auld <matthew.auld(a)intel.com>
Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav(a)intel.com>
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam(a)amd.com>
---
drivers/gpu/drm/drm_buddy.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 2f279b46bd2c..5141348fc6c9 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -1155,6 +1155,15 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm,
order = fls(pages) - 1;
min_order = ilog2(min_block_size) - ilog2(mm->chunk_size);
+ if (order > mm->max_order || size > mm->size) {
+ if ((flags & DRM_BUDDY_CONTIGUOUS_ALLOCATION) &&
+ !(flags & DRM_BUDDY_RANGE_ALLOCATION))
+ return __alloc_contig_try_harder(mm, original_size,
+ original_min_size, blocks);
+
+ return -EINVAL;
+ }
+
do {
order = min(order, (unsigned int)fls(pages) - 1);
BUG_ON(order > mm->max_order);
--
2.52.0
Nilay reported that since commit daaa574aba6f ("powerpc/pseries/msi: Switch
to msi_create_parent_irq_domain()"), the NVMe driver cannot enable MSI-X
when the device's MSI-X table size is larger than the firmware's MSI quota
for the device.
This is because the commit changes how rtas_prepare_msi_irqs() is called:
- Before, it is called when interrupts are allocated at the global
interrupt domain with nvec_in being the number of allocated interrupts.
rtas_prepare_msi_irqs() can return a positive number and the allocation
will be retried.
- Now, it is called at the creation of per-device interrupt domain with
nvec_in being the number of interrupts that the device supports. If
rtas_prepare_msi_irqs() returns positive, domain creation just fails.
For Nilay's NVMe driver case, rtas_prepare_msi_irqs() returns a positive
number (the quota). This causes per-device interrupt domain creation to
fail and thus the NVMe driver cannot enable MSI-X.
Rework to make this scenario works again:
- pseries_msi_ops_prepare() only prepares as many interrupts as the quota
permit.
- pseries_irq_domain_alloc() fails if the device's quota is exceeded.
Now, if the quota is exceeded, pseries_msi_ops_prepare() will only prepare
as allowed by the quota. If device drivers attempt to allocate more
interrupts than the quota permits, pseries_irq_domain_alloc() will return
an error code and msi_handle_pci_fail() will allow device drivers a retry.
Reported-by: Nilay Shroff <nilay(a)linux.ibm.com>
Closes: https://lore.kernel.org/linuxppc-dev/6af2c4c2-97f6-4758-be33-256638ef39e5@l…
Fixes: daaa574aba6f ("powerpc/pseries/msi: Switch to msi_create_parent_irq_domain()")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Acked-by: Nilay Shroff <nilay(a)linux.ibm.com>
Cc: stable(a)vger.kernel.org
---
v2:
- change pseries_msi_ops_prepare()'s allocation logic to match the
original logic in __pci_enable_msix_range()
- fix up Nilay's email address
---
arch/powerpc/platforms/pseries/msi.c | 44 ++++++++++++++++++++++++++--
1 file changed, 41 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index a82aaa786e9e..edc30cda5dbc 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -19,6 +19,11 @@
#include "pseries.h"
+struct pseries_msi_device {
+ unsigned int msi_quota;
+ unsigned int msi_used;
+};
+
static int query_token, change_token;
#define RTAS_QUERY_FN 0
@@ -433,8 +438,28 @@ static int pseries_msi_ops_prepare(struct irq_domain *domain, struct device *dev
struct msi_domain_info *info = domain->host_data;
struct pci_dev *pdev = to_pci_dev(dev);
int type = (info->flags & MSI_FLAG_PCI_MSIX) ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI;
+ int ret;
+
+ struct pseries_msi_device *pseries_dev __free(kfree)
+ = kmalloc(sizeof(*pseries_dev), GFP_KERNEL);
+ if (!pseries_dev)
+ return -ENOMEM;
+
+ while (1) {
+ ret = rtas_prepare_msi_irqs(pdev, nvec, type, arg);
+ if (!ret)
+ break;
+ else if (ret > 0)
+ nvec = ret;
+ else
+ return ret;
+ }
- return rtas_prepare_msi_irqs(pdev, nvec, type, arg);
+ pseries_dev->msi_quota = nvec;
+ pseries_dev->msi_used = 0;
+
+ arg->scratchpad[0].ptr = no_free_ptr(pseries_dev);
+ return 0;
}
/*
@@ -443,9 +468,13 @@ static int pseries_msi_ops_prepare(struct irq_domain *domain, struct device *dev
*/
static void pseries_msi_ops_teardown(struct irq_domain *domain, msi_alloc_info_t *arg)
{
+ struct pseries_msi_device *pseries_dev = arg->scratchpad[0].ptr;
struct pci_dev *pdev = to_pci_dev(domain->dev);
rtas_disable_msi(pdev);
+
+ WARN_ON(pseries_dev->msi_used);
+ kfree(pseries_dev);
}
static void pseries_msi_shutdown(struct irq_data *d)
@@ -546,12 +575,18 @@ static int pseries_irq_domain_alloc(struct irq_domain *domain, unsigned int virq
unsigned int nr_irqs, void *arg)
{
struct pci_controller *phb = domain->host_data;
+ struct pseries_msi_device *pseries_dev;
msi_alloc_info_t *info = arg;
struct msi_desc *desc = info->desc;
struct pci_dev *pdev = msi_desc_to_pci_dev(desc);
int hwirq;
int i, ret;
+ pseries_dev = info->scratchpad[0].ptr;
+
+ if (pseries_dev->msi_used + nr_irqs > pseries_dev->msi_quota)
+ return -ENOSPC;
+
hwirq = rtas_query_irq_number(pci_get_pdn(pdev), desc->msi_index);
if (hwirq < 0) {
dev_err(&pdev->dev, "Failed to query HW IRQ: %d\n", hwirq);
@@ -567,9 +602,10 @@ static int pseries_irq_domain_alloc(struct irq_domain *domain, unsigned int virq
goto out;
irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i,
- &pseries_msi_irq_chip, domain->host_data);
+ &pseries_msi_irq_chip, pseries_dev);
}
+ pseries_dev->msi_used++;
return 0;
out:
@@ -582,9 +618,11 @@ static void pseries_irq_domain_free(struct irq_domain *domain, unsigned int virq
unsigned int nr_irqs)
{
struct irq_data *d = irq_domain_get_irq_data(domain, virq);
- struct pci_controller *phb = irq_data_get_irq_chip_data(d);
+ struct pseries_msi_device *pseries_dev = irq_data_get_irq_chip_data(d);
+ struct pci_controller *phb = domain->host_data;
pr_debug("%s bridge %pOF %d #%d\n", __func__, phb->dn, virq, nr_irqs);
+ pseries_dev->msi_used -= nr_irqs;
irq_domain_free_irqs_parent(domain, virq, nr_irqs);
}
--
2.51.0
From: Alexis Lothoré <alexis.lothore(a)bootlin.com>
If the ptp_rate recorded earlier in the driver happens to be 0, this
bogus value will propagate up to EST configuration, where it will
trigger a division by 0.
Prevent this division by 0 by adding the corresponding check and error
code.
Suggested-by: Maxime Chevallier <maxime.chevallier(a)bootlin.com>
Signed-off-by: Alexis Lothoré <alexis.lothore(a)bootlin.com>
Fixes: 8572aec3d0dc ("net: stmmac: Add basic EST support for XGMAC")
Link: https://patch.msgid.link/20250529-stmmac_tstamp_div-v4-2-d73340a794d5@bootl…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[ The context change is due to the commit c3f3b97238f6
("net: stmmac: Refactor EST implementation")
which is irrelevant to the logic of this patch. ]
Signed-off-by: Rahul Sharma <black.hawk(a)163.com>
---
drivers/net/ethernet/stmicro/stmmac/dwmac5.c | 5 +++++
drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
index 8fd167501fa0..0afd4644a985 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
@@ -597,6 +597,11 @@ int dwmac5_est_configure(void __iomem *ioaddr, struct stmmac_est *cfg,
int i, ret = 0x0;
u32 ctrl;
+ if (!ptp_rate) {
+ pr_warn("Dwmac5: Invalid PTP rate");
+ return -EINVAL;
+ }
+
ret |= dwmac5_est_write(ioaddr, BTR_LOW, cfg->btr[0], false);
ret |= dwmac5_est_write(ioaddr, BTR_HIGH, cfg->btr[1], false);
ret |= dwmac5_est_write(ioaddr, TER, cfg->ter, false);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
index 0bcb378fa0bc..aab02328a613 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
@@ -1537,6 +1537,11 @@ static int dwxgmac3_est_configure(void __iomem *ioaddr, struct stmmac_est *cfg,
int i, ret = 0x0;
u32 ctrl;
+ if (!ptp_rate) {
+ pr_warn("Dwxgmac2: Invalid PTP rate");
+ return -EINVAL;
+ }
+
ret |= dwxgmac3_est_write(ioaddr, XGMAC_BTR_LOW, cfg->btr[0], false);
ret |= dwxgmac3_est_write(ioaddr, XGMAC_BTR_HIGH, cfg->btr[1], false);
ret |= dwxgmac3_est_write(ioaddr, XGMAC_TER, cfg->ter, false);
--
2.34.1
This patch reverts fuse back to its original behavior of sync being a no-op.
This fixes the userspace regression reported by Athul and J. upstream in
[1][2] where if there is a bug in a fuse server that causes the server to
never complete writeback, it will make wait_sb_inodes() wait forever.
Thanks,
Joanne
[1] https://lore.kernel.org/regressions/CAJnrk1ZjQ8W8NzojsvJPRXiv9TuYPNdj8Ye7=C…
[2] https://lore.kernel.org/linux-fsdevel/aT7JRqhUvZvfUQlV@eldamar.lan/
Changelog:
v1: https://lore.kernel.org/linux-mm/20251120184211.2379439-1-joannelkoong@gmai…
* Change AS_WRITEBACK_MAY_HANG to AS_NO_DATA_INTEGRITY and keep
AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM as is.
Joanne Koong (1):
fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
fs/fs-writeback.c | 3 ++-
fs/fuse/file.c | 4 +++-
include/linux/pagemap.h | 11 +++++++++++
3 files changed, 16 insertions(+), 2 deletions(-)
--
2.47.3
From: Alex Deucher <alexander.deucher(a)amd.com>
commit eb296c09805ee37dd4ea520a7fb3ec157c31090f upstream.
SI hardware doesn't support pasids, user mode queues, or
KIQ/MES so there is no need for this. Doing so results in
a segfault as these callbacks are non-existent for SI.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4744
Fixes: f3854e04b708 ("drm/amdgpu: attach tlb fence to the PTs update")
Reviewed-by: Timur Kristóf <timur.kristof(a)gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Hans de Goede <johannes.goede(a)oss.qualcomm.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 676e24fb8864..cdcafde3c71a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1066,7 +1066,9 @@ amdgpu_vm_tlb_flush(struct amdgpu_vm_update_params *params,
}
/* Prepare a TLB flush fence to be attached to PTs */
- if (!params->unlocked) {
+ if (!params->unlocked &&
+ /* SI doesn't support pasid or KIQ/MES */
+ params->adev->family > AMDGPU_FAMILY_SI) {
amdgpu_vm_tlb_fence_create(params->adev, vm, fence);
/* Makes sure no PD/PT is freed before the flush */
--
2.52.0
From: Arnd Bergmann <arnd(a)arndb.de>
commit 36903abedfe8d419e90ce349b2b4ce6dc2883e17 upstream.
The __range_not_ok() helper is an x86 (and sparc64) specific interface
that does roughly the same thing as __access_ok(), but with different
calling conventions.
Change this to use the normal interface in order for consistency as we
clean up all access_ok() implementations.
This changes the limit from TASK_SIZE to TASK_SIZE_MAX, which Al points
out is the right thing do do here anyway.
The callers have to use __access_ok() instead of the normal access_ok()
though, because on x86 that contains a WARN_ON_IN_IRQ() check that cannot
be used inside of NMI context while tracing.
The check in copy_code() is not needed any more, because this one is
already done by copy_from_user_nmi().
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Link: https://lore.kernel.org/lkml/YgsUKcXGR7r4nINj@zeniv-ca.linux.org.uk/
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Stable-dep-of: d319f344561d ("mm: Fix copy_from_user_nofault().")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
---
arch/x86/events/core.c | 2 +-
arch/x86/include/asm/uaccess.h | 10 ++++++----
arch/x86/kernel/dumpstack.c | 6 ------
arch/x86/kernel/stacktrace.c | 2 +-
arch/x86/lib/usercopy.c | 2 +-
5 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 9a77c6062c24..fdf1e2aaa5ed 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2790,7 +2790,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
static inline int
valid_user_frame(const void __user *fp, unsigned long size)
{
- return (__range_not_ok(fp, size, TASK_SIZE) == 0);
+ return __access_ok(fp, size);
}
static unsigned long get_segment_base(unsigned int segment)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 3616fd4ba395..66284ac86076 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -16,8 +16,10 @@
* Test whether a block of memory is a valid user space address.
* Returns 0 if the range is valid, nonzero otherwise.
*/
-static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, unsigned long limit)
+static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size)
{
+ unsigned long limit = TASK_SIZE_MAX;
+
/*
* If we have used "sizeof()" for the size,
* we know it won't overflow the limit (but
@@ -35,10 +37,10 @@ static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, un
return unlikely(addr > limit);
}
-#define __range_not_ok(addr, size, limit) \
+#define __access_ok(addr, size) \
({ \
__chk_user_ptr(addr); \
- __chk_range_not_ok((unsigned long __force)(addr), size, limit); \
+ !__chk_range_not_ok((unsigned long __force)(addr), size); \
})
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
@@ -69,7 +71,7 @@ static inline bool pagefault_disabled(void);
#define access_ok(addr, size) \
({ \
WARN_ON_IN_IRQ(); \
- likely(!__range_not_ok(addr, size, TASK_SIZE_MAX)); \
+ likely(__access_ok(addr, size)); \
})
extern int __get_user_1(void);
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 8a8660074284..bf73dbfb2355 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -81,12 +81,6 @@ static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src,
/* The user space code from other tasks cannot be accessed. */
if (regs != task_pt_regs(current))
return -EPERM;
- /*
- * Make sure userspace isn't trying to trick us into dumping kernel
- * memory by pointing the userspace instruction pointer at it.
- */
- if (__chk_range_not_ok(src, nbytes, TASK_SIZE_MAX))
- return -EINVAL;
/*
* Even if named copy_from_user_nmi() this can be invoked from
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 15b058eefc4e..ee117fcf46ed 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -90,7 +90,7 @@ copy_stack_frame(const struct stack_frame_user __user *fp,
{
int ret;
- if (__range_not_ok(fp, sizeof(*frame), TASK_SIZE))
+ if (!__access_ok(fp, sizeof(*frame)))
return 0;
ret = 1;
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index c3e8a62ca561..ad0139d25401 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -32,7 +32,7 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
{
unsigned long ret;
- if (__range_not_ok(from, n, TASK_SIZE))
+ if (!__access_ok(from, n))
return n;
if (!nmi_uaccess_okay())
--
2.47.3
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 74098cc06e753d3ffd8398b040a3a1dfb65260c0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122918-sagging-divisible-a4a4@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 74098cc06e753d3ffd8398b040a3a1dfb65260c0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C5=81ukasz=20Bartosik?= <ukaszb(a)chromium.org>
Date: Thu, 27 Nov 2025 11:16:44 +0000
Subject: [PATCH] xhci: dbgtty: fix device unregister: fixup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This fixup replaces tty_vhangup() call with call to
tty_port_tty_vhangup(). Both calls hangup tty device
synchronously however tty_port_tty_vhangup() increases
reference count during the hangup operation using
scoped_guard(tty_port_tty).
Cc: stable <stable(a)kernel.org>
Fixes: 1f73b8b56cf3 ("xhci: dbgtty: fix device unregister")
Signed-off-by: Łukasz Bartosik <ukaszb(a)chromium.org>
Link: https://patch.msgid.link/20251127111644.3161386-1-ukaszb@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-dbgtty.c b/drivers/usb/host/xhci-dbgtty.c
index 57cdda4e09c8..90282e51e23e 100644
--- a/drivers/usb/host/xhci-dbgtty.c
+++ b/drivers/usb/host/xhci-dbgtty.c
@@ -554,7 +554,7 @@ static void xhci_dbc_tty_unregister_device(struct xhci_dbc *dbc)
* Hang up the TTY. This wakes up any blocked
* writers and causes subsequent writes to fail.
*/
- tty_vhangup(port->port.tty);
+ tty_port_tty_vhangup(&port->port);
tty_unregister_device(dbc_tty_driver, port->minor);
xhci_dbc_tty_exit_port(port);
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 74098cc06e753d3ffd8398b040a3a1dfb65260c0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122917-unsheathe-breeder-0ac2@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 74098cc06e753d3ffd8398b040a3a1dfb65260c0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C5=81ukasz=20Bartosik?= <ukaszb(a)chromium.org>
Date: Thu, 27 Nov 2025 11:16:44 +0000
Subject: [PATCH] xhci: dbgtty: fix device unregister: fixup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This fixup replaces tty_vhangup() call with call to
tty_port_tty_vhangup(). Both calls hangup tty device
synchronously however tty_port_tty_vhangup() increases
reference count during the hangup operation using
scoped_guard(tty_port_tty).
Cc: stable <stable(a)kernel.org>
Fixes: 1f73b8b56cf3 ("xhci: dbgtty: fix device unregister")
Signed-off-by: Łukasz Bartosik <ukaszb(a)chromium.org>
Link: https://patch.msgid.link/20251127111644.3161386-1-ukaszb@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-dbgtty.c b/drivers/usb/host/xhci-dbgtty.c
index 57cdda4e09c8..90282e51e23e 100644
--- a/drivers/usb/host/xhci-dbgtty.c
+++ b/drivers/usb/host/xhci-dbgtty.c
@@ -554,7 +554,7 @@ static void xhci_dbc_tty_unregister_device(struct xhci_dbc *dbc)
* Hang up the TTY. This wakes up any blocked
* writers and causes subsequent writes to fail.
*/
- tty_vhangup(port->port.tty);
+ tty_port_tty_vhangup(&port->port);
tty_unregister_device(dbc_tty_driver, port->minor);
xhci_dbc_tty_exit_port(port);
There is currently an issue with UEFI calibration data parsing for some
TAS devices, like the ASUS ROG Xbox Ally X (RC73XA), that causes audio
quality issues such as gaps in playback. Until the issue is root caused
and fixed, add a quirk to skip using the UEFI calibration data and fall
back to using the calibration data provided by the DSP firmware, which
restores full speaker functionality on affected devices.
Cc: stable(a)vger.kernel.org # 6.18
Link: https://lore.kernel.org/all/160aef32646c4d5498cbfd624fd683cc@ti.com/
Closes: https://lore.kernel.org/all/0ba100d0-9b6f-4a3b-bffa-61abe1b46cd5@linux.dev/
Suggested-by: Baojun Xu <baojun.xu(a)ti.com>
Signed-off-by: Matthew Schwartz <matthew.schwartz(a)linux.dev>
---
v1->v2: drop wrong Fixes tag, amend commit to clarify suspected root cause
and workaround being used.
---
sound/hda/codecs/side-codecs/tas2781_hda_i2c.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/sound/hda/codecs/side-codecs/tas2781_hda_i2c.c b/sound/hda/codecs/side-codecs/tas2781_hda_i2c.c
index c8619995b1d7..ec3761050cab 100644
--- a/sound/hda/codecs/side-codecs/tas2781_hda_i2c.c
+++ b/sound/hda/codecs/side-codecs/tas2781_hda_i2c.c
@@ -60,6 +60,7 @@ struct tas2781_hda_i2c_priv {
int (*save_calibration)(struct tas2781_hda *h);
int hda_chip_id;
+ bool skip_calibration;
};
static int tas2781_get_i2c_res(struct acpi_resource *ares, void *data)
@@ -489,7 +490,8 @@ static void tasdevice_dspfw_init(void *context)
/* If calibrated data occurs error, dsp will still works with default
* calibrated data inside algo.
*/
- hda_priv->save_calibration(tas_hda);
+ if (!hda_priv->skip_calibration)
+ hda_priv->save_calibration(tas_hda);
}
static void tasdev_fw_ready(const struct firmware *fmw, void *context)
@@ -546,6 +548,7 @@ static int tas2781_hda_bind(struct device *dev, struct device *master,
void *master_data)
{
struct tas2781_hda *tas_hda = dev_get_drvdata(dev);
+ struct tas2781_hda_i2c_priv *hda_priv = tas_hda->hda_priv;
struct hda_component_parent *parent = master_data;
struct hda_component *comp;
struct hda_codec *codec;
@@ -571,6 +574,14 @@ static int tas2781_hda_bind(struct device *dev, struct device *master,
break;
}
+ /*
+ * Using ASUS ROG Xbox Ally X (RC73XA) UEFI calibration data
+ * causes audio dropouts during playback, use fallback data
+ * from DSP firmware as a workaround.
+ */
+ if (codec->core.subsystem_id == 0x10431384)
+ hda_priv->skip_calibration = true;
+
pm_runtime_get_sync(dev);
comp->dev = dev;
--
2.52.0
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 74098cc06e753d3ffd8398b040a3a1dfb65260c0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122917-keenly-greyhound-4fa6@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 74098cc06e753d3ffd8398b040a3a1dfb65260c0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C5=81ukasz=20Bartosik?= <ukaszb(a)chromium.org>
Date: Thu, 27 Nov 2025 11:16:44 +0000
Subject: [PATCH] xhci: dbgtty: fix device unregister: fixup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This fixup replaces tty_vhangup() call with call to
tty_port_tty_vhangup(). Both calls hangup tty device
synchronously however tty_port_tty_vhangup() increases
reference count during the hangup operation using
scoped_guard(tty_port_tty).
Cc: stable <stable(a)kernel.org>
Fixes: 1f73b8b56cf3 ("xhci: dbgtty: fix device unregister")
Signed-off-by: Łukasz Bartosik <ukaszb(a)chromium.org>
Link: https://patch.msgid.link/20251127111644.3161386-1-ukaszb@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-dbgtty.c b/drivers/usb/host/xhci-dbgtty.c
index 57cdda4e09c8..90282e51e23e 100644
--- a/drivers/usb/host/xhci-dbgtty.c
+++ b/drivers/usb/host/xhci-dbgtty.c
@@ -554,7 +554,7 @@ static void xhci_dbc_tty_unregister_device(struct xhci_dbc *dbc)
* Hang up the TTY. This wakes up any blocked
* writers and causes subsequent writes to fail.
*/
- tty_vhangup(port->port.tty);
+ tty_port_tty_vhangup(&port->port);
tty_unregister_device(dbc_tty_driver, port->minor);
xhci_dbc_tty_exit_port(port);
> On 19.12.25 20:31, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> cpufreq: dt-platdev: Add JH7110S SOC to the allowlist
>
> to the 6.18-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-
> queue.git;a=summary
>
> The filename of the patch is:
> cpufreq-dt-platdev-add-jh7110s-soc-to-the-allowlist.patch
> and it can be found in the queue-6.18 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree, please let
> <stable(a)vger.kernel.org> know about it.
As series [1] is accepted, this patch will be not needed and will be reverted in the mainline.
[1] https://lore.kernel.org/all/20251212211934.135602-1-e@freeshell.de/
So we should not add it to the stable tree. Thanks.
Best regards,
Hal
>
>
>
> commit f922a9b37397e7db449e3fa61c33c542ad783f87
> Author: Hal Feng <hal.feng(a)starfivetech.com>
> Date: Thu Oct 16 16:00:48 2025 +0800
>
> cpufreq: dt-platdev: Add JH7110S SOC to the allowlist
>
> [ Upstream commit 6e7970cab51d01b8f7c56f120486c571c22e1b80 ]
>
> Add the compatible strings for supporting the generic
> cpufreq driver on the StarFive JH7110S SoC.
>
> Signed-off-by: Hal Feng <hal.feng(a)starfivetech.com>
> Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt(a)canonical.com>
> Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-
> dt-platdev.c
> index cd1816a12bb99..dc11b62399ad5 100644
> --- a/drivers/cpufreq/cpufreq-dt-platdev.c
> +++ b/drivers/cpufreq/cpufreq-dt-platdev.c
> @@ -87,6 +87,7 @@ static const struct of_device_id allowlist[] __initconst = {
> { .compatible = "st-ericsson,u9540", },
>
> { .compatible = "starfive,jh7110", },
> + { .compatible = "starfive,jh7110s", },
>
> { .compatible = "ti,omap2", },
> { .compatible = "ti,omap4", },
When simple_write_to_buffer() succeeds, it returns the number of bytes
actually copied to the buffer. The code incorrectly uses 'count'
as the index for null termination instead of the actual bytes copied.
If count exceeds the buffer size, this leads to out-of-bounds write.
Add a check for the count and use the return value as the index.
The bug was validated using a demo module that mirrors the original
code and was tested under QEMU.
Pattern of the bug:
- A fixed 64-byte stack buffer is filled using count.
- If count > 64, the code still does buf[count] = '\0', causing an
- out-of-bounds write on the stack.
Steps for reproduce:
- Opens the device node.
- Writes 128 bytes of A to it.
- This overflows the 64-byte stack buffer and KASAN reports the OOB.
Found via static analysis. This is similar to the
commit da9374819eb3 ("iio: backend: fix out-of-bound write")
Fixes: b1c5d68ea66e ("iio: dac: ad3552r-hs: add support for internal ramp")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006(a)gmail.com>
---
changes in v2:
- update commit message
- v1 link: https://lore.kernel.org/all/20251027150713.59067-1-linmq006@gmail.com/
---
drivers/iio/dac/ad3552r-hs.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/dac/ad3552r-hs.c b/drivers/iio/dac/ad3552r-hs.c
index 41b96b48ba98..a9578afa7015 100644
--- a/drivers/iio/dac/ad3552r-hs.c
+++ b/drivers/iio/dac/ad3552r-hs.c
@@ -549,12 +549,15 @@ static ssize_t ad3552r_hs_write_data_source(struct file *f,
guard(mutex)(&st->lock);
+ if (count >= sizeof(buf))
+ return -ENOSPC;
+
ret = simple_write_to_buffer(buf, sizeof(buf) - 1, ppos, userbuf,
count);
if (ret < 0)
return ret;
- buf[count] = '\0';
+ buf[ret] = '\0';
ret = match_string(dbgfs_attr_source, ARRAY_SIZE(dbgfs_attr_source),
buf);
--
2.39.5 (Apple Git-154)
To fix CVE-2025-22022, commit bb0ba4cb1065 is required; however,
it depends on commit 7476a2215c07. Therefore, both patches
have been backported to v6.6.
Michal Pecio (1):
usb: xhci: Apply the link chain quirk on NEC isoc endpoints
Niklas Neronin (1):
usb: xhci: move link chain bit quirk checks into one helper function.
drivers/usb/host/xhci-mem.c | 10 ++--------
drivers/usb/host/xhci-ring.c | 8 ++------
drivers/usb/host/xhci.h | 16 ++++++++++++++--
3 files changed, 18 insertions(+), 16 deletions(-)
--
2.43.7
To fix CVE-2025-22022, commit bb0ba4cb1065 is required; however,
it depends on commit 7476a2215c07. Therefore, both patches have
been backported to v5.10-v6.1.
Michal Pecio (1):
usb: xhci: Apply the link chain quirk on NEC isoc endpoints
Niklas Neronin (1):
usb: xhci: move link chain bit quirk checks into one helper function.
drivers/usb/host/xhci-mem.c | 10 ++--------
drivers/usb/host/xhci-ring.c | 8 ++------
drivers/usb/host/xhci.h | 16 ++++++++++++++--
3 files changed, 18 insertions(+), 16 deletions(-)
--
2.43.7
The Dell XPS 13 9350 and XPS 16 9640 both have an upside-down mounted
OV02C10 sensor. This rotation of 180° is reported in neither the SSDB nor
the _PLD for the sensor (both report a rotation of 0°).
Add a DMI quirk mechanism for upside-down sensors and add 2 initial entries
to the DMI quirk list for these 2 laptops.
Note the OV02C10 driver was originally developed on a XPS 16 9640 which
resulted in inverted vflip + hflip settings making it look like the sensor
was upright on the XPS 16 9640 and upside down elsewhere this has been
fixed in commit 69fe27173396 ("media: ov02c10: Fix default vertical flip").
This makes this commit a regression fix since now the video is upside down
on these Dell XPS models where it was not before.
Fixes: d5ebe3f7d13d ("media: ov02c10: Fix default vertical flip")
Cc: stable(a)vger.kernel.org
Reviewed-by: Bryan O'Donoghue <bod(a)kernel.org>
Signed-off-by: Hans de Goede <johannes.goede(a)oss.qualcomm.com>
---
Changes in v2:
- Fix fixes tag to use the correct commit hash
- Drop || COMPILE_TEST from Kconfig to fix compile errors when ACPI is disabled
---
drivers/media/pci/intel/Kconfig | 2 +-
drivers/media/pci/intel/ipu-bridge.c | 29 ++++++++++++++++++++++++++++
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/media/pci/intel/Kconfig b/drivers/media/pci/intel/Kconfig
index d9fcddce028b..3f14ca110d06 100644
--- a/drivers/media/pci/intel/Kconfig
+++ b/drivers/media/pci/intel/Kconfig
@@ -6,7 +6,7 @@ source "drivers/media/pci/intel/ivsc/Kconfig"
config IPU_BRIDGE
tristate "Intel IPU Bridge"
- depends on ACPI || COMPILE_TEST
+ depends on ACPI
depends on I2C
help
The IPU bridge is a helper library for Intel IPU drivers to
diff --git a/drivers/media/pci/intel/ipu-bridge.c b/drivers/media/pci/intel/ipu-bridge.c
index 58ea01d40c0d..6463b2a47d78 100644
--- a/drivers/media/pci/intel/ipu-bridge.c
+++ b/drivers/media/pci/intel/ipu-bridge.c
@@ -5,6 +5,7 @@
#include <acpi/acpi_bus.h>
#include <linux/cleanup.h>
#include <linux/device.h>
+#include <linux/dmi.h>
#include <linux/i2c.h>
#include <linux/mei_cl_bus.h>
#include <linux/platform_device.h>
@@ -99,6 +100,28 @@ static const struct ipu_sensor_config ipu_supported_sensors[] = {
IPU_SENSOR_CONFIG("XMCC0003", 1, 321468000),
};
+/*
+ * DMI matches for laptops which have their sensor mounted upside-down
+ * without reporting a rotation of 180° in neither the SSDB nor the _PLD.
+ */
+static const struct dmi_system_id upside_down_sensor_dmi_ids[] = {
+ {
+ .matches = {
+ DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+ DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "XPS 13 9350"),
+ },
+ .driver_data = "OVTI02C1",
+ },
+ {
+ .matches = {
+ DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+ DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "XPS 16 9640"),
+ },
+ .driver_data = "OVTI02C1",
+ },
+ {} /* Terminating entry */
+};
+
static const struct ipu_property_names prop_names = {
.clock_frequency = "clock-frequency",
.rotation = "rotation",
@@ -249,6 +272,12 @@ static int ipu_bridge_read_acpi_buffer(struct acpi_device *adev, char *id,
static u32 ipu_bridge_parse_rotation(struct acpi_device *adev,
struct ipu_sensor_ssdb *ssdb)
{
+ const struct dmi_system_id *dmi_id;
+
+ dmi_id = dmi_first_match(upside_down_sensor_dmi_ids);
+ if (dmi_id && acpi_dev_hid_match(adev, dmi_id->driver_data))
+ return 180;
+
switch (ssdb->degree) {
case IPU_SENSOR_ROTATION_NORMAL:
return 0;
--
2.52.0
According to TI, there is an issue with the UEFI calibration result on
some devices where the calibration can cause audio dropouts and other
quality issues. The ASUS ROG Xbox Ally X (RC73XA) is one such device.
Skipping the UEFI calibration result and using the fallback in the DSP
firmware fixes the audio issues.
Fixes: 945865a0ddf3 ("ALSA: hda/tas2781: fix speaker id retrieval for multiple probes")
Cc: stable(a)vger.kernel.org # 6.18
Link: https://lore.kernel.org/all/160aef32646c4d5498cbfd624fd683cc@ti.com/
Closes: https://lore.kernel.org/all/0ba100d0-9b6f-4a3b-bffa-61abe1b46cd5@linux.dev/
Suggested-by: Baojun Xu <baojun.xu(a)ti.com>
Signed-off-by: Matthew Schwartz <matthew.schwartz(a)linux.dev>
---
sound/hda/codecs/side-codecs/tas2781_hda_i2c.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/sound/hda/codecs/side-codecs/tas2781_hda_i2c.c b/sound/hda/codecs/side-codecs/tas2781_hda_i2c.c
index c8619995b1d7..ec3761050cab 100644
--- a/sound/hda/codecs/side-codecs/tas2781_hda_i2c.c
+++ b/sound/hda/codecs/side-codecs/tas2781_hda_i2c.c
@@ -60,6 +60,7 @@ struct tas2781_hda_i2c_priv {
int (*save_calibration)(struct tas2781_hda *h);
int hda_chip_id;
+ bool skip_calibration;
};
static int tas2781_get_i2c_res(struct acpi_resource *ares, void *data)
@@ -489,7 +490,8 @@ static void tasdevice_dspfw_init(void *context)
/* If calibrated data occurs error, dsp will still works with default
* calibrated data inside algo.
*/
- hda_priv->save_calibration(tas_hda);
+ if (!hda_priv->skip_calibration)
+ hda_priv->save_calibration(tas_hda);
}
static void tasdev_fw_ready(const struct firmware *fmw, void *context)
@@ -546,6 +548,7 @@ static int tas2781_hda_bind(struct device *dev, struct device *master,
void *master_data)
{
struct tas2781_hda *tas_hda = dev_get_drvdata(dev);
+ struct tas2781_hda_i2c_priv *hda_priv = tas_hda->hda_priv;
struct hda_component_parent *parent = master_data;
struct hda_component *comp;
struct hda_codec *codec;
@@ -571,6 +574,14 @@ static int tas2781_hda_bind(struct device *dev, struct device *master,
break;
}
+ /*
+ * ASUS ROG Xbox Ally X (RC73XA) UEFI calibration data
+ * causes audio dropouts during playback, use fallback data
+ * from DSP firmware instead.
+ */
+ if (codec->core.subsystem_id == 0x10431384)
+ hda_priv->skip_calibration = true;
+
pm_runtime_get_sync(dev);
comp->dev = dev;
--
2.52.0
From: Zhu Yanjun <yanjun.zhu(a)linux.dev>
[ Upstream commit d0706bfd3ee40923c001c6827b786a309e2a8713 ]
Call Trace:
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:408 [inline]
print_report+0xc3/0x670 mm/kasan/report.c:521
kasan_report+0xe0/0x110 mm/kasan/report.c:634
strlen+0x93/0xa0 lib/string.c:420
__fortify_strlen include/linux/fortify-string.h:268 [inline]
get_kobj_path_length lib/kobject.c:118 [inline]
kobject_get_path+0x3f/0x2a0 lib/kobject.c:158
kobject_uevent_env+0x289/0x1870 lib/kobject_uevent.c:545
ib_register_device drivers/infiniband/core/device.c:1472 [inline]
ib_register_device+0x8cf/0xe00 drivers/infiniband/core/device.c:1393
rxe_register_device+0x275/0x320 drivers/infiniband/sw/rxe/rxe_verbs.c:1552
rxe_net_add+0x8e/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:550
rxe_newlink+0x70/0x190 drivers/infiniband/sw/rxe/rxe.c:225
nldev_newlink+0x3a3/0x680 drivers/infiniband/core/nldev.c:1796
rdma_nl_rcv_msg+0x387/0x6e0 drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb.constprop.0.isra.0+0x2e5/0x450
netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
netlink_unicast+0x53a/0x7f0 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
sock_sendmsg_nosec net/socket.c:712 [inline]
__sock_sendmsg net/socket.c:727 [inline]
____sys_sendmsg+0xa95/0xc70 net/socket.c:2566
___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
__sys_sendmsg+0x16d/0x220 net/socket.c:2652
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x260 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
This problem is similar to the problem that the
commit 1d6a9e7449e2 ("RDMA/core: Fix use-after-free when rename device name")
fixes.
The root cause is: the function ib_device_rename() renames the name with
lock. But in the function kobject_uevent(), this name is accessed without
lock protection at the same time.
The solution is to add the lock protection when this name is accessed in
the function kobject_uevent().
Fixes: 779e0bf47632 ("RDMA/core: Do not indicate device ready when device enablement fails")
Link: https://patch.msgid.link/r/20250506151008.75701-1-yanjun.zhu@linux.dev
Reported-by: syzbot+e2ce9e275ecc70a30b72(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e2ce9e275ecc70a30b72
Signed-off-by: Zhu Yanjun <yanjun.zhu(a)linux.dev>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[ Ajay: Modified to apply on v5.10.y-v6.6.y
ib_device_notify_register() not present in v5.10.y-v6.6.y,
so directly added lock for kobject_uevent() ]
Signed-off-by: Ajay Kaher <ajay.kaher(a)broadcom.com>
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
drivers/infiniband/core/device.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 26f1d2f29..ea9b48108 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1396,8 +1396,13 @@ int ib_register_device(struct ib_device *device, const char *name,
return ret;
}
dev_set_uevent_suppress(&device->dev, false);
+
+ down_read(&devices_rwsem);
+
/* Mark for userspace that device is ready */
kobject_uevent(&device->dev.kobj, KOBJ_ADD);
+
+ up_read(&devices_rwsem);
ib_device_put(device);
return 0;
--
2.40.4
Fix CVE-2023-52975 by backporting the required upstream commit
6f1d64b13097. This commit depends on a1f3486b3b09, so both patches
have been backported to the v5.10 kernel.
Mike Christie (2):
scsi: iscsi: Move pool freeing
scsi: iscsi_tcp: Fix UAF during logout when accessing the shost
ipaddress
drivers/scsi/iscsi_tcp.c | 11 +++++++++--
drivers/scsi/libiscsi.c | 39 +++++++++++++++++++++++++++++++--------
include/scsi/libiscsi.h | 2 ++
3 files changed, 42 insertions(+), 10 deletions(-)
--
2.43.7
With PWRSTS_OFF_ON, PCIe GDSCs are turned off during gdsc_disable(). This
can happen during scenarios such as system suspend and breaks the resume
of PCIe controllers from suspend.
So use PWRSTS_RET_ON to indicate the GDSC driver to not turn off the GDSCs
during gdsc_disable() and allow the hardware to transition the GDSCs to
retention when the parent domain enters low power state during system
suspend.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru(a)oss.qualcomm.com>
---
Krishna Chaitanya Chundru (7):
clk: qcom: gcc-sc7280: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-sa8775p: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-sm8750: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-glymur: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-qcs8300: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-x1e80100: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-kaanapali: Do not turn off PCIe GDSCs during gdsc_disable()
drivers/clk/qcom/gcc-glymur.c | 16 ++++++++--------
drivers/clk/qcom/gcc-kaanapali.c | 2 +-
drivers/clk/qcom/gcc-qcs8300.c | 4 ++--
drivers/clk/qcom/gcc-sa8775p.c | 4 ++--
drivers/clk/qcom/gcc-sc7280.c | 2 +-
drivers/clk/qcom/gcc-sm8750.c | 2 +-
drivers/clk/qcom/gcc-x1e80100.c | 16 ++++++++--------
7 files changed, 23 insertions(+), 23 deletions(-)
---
base-commit: 98e506ee7d10390b527aeddee7bbeaf667129646
change-id: 20260102-pci_gdsc_fix-1dcf08223922
Best regards,
--
Krishna Chaitanya Chundru <krishna.chundru(a)oss.qualcomm.com>
Hi Greg, thanks for looking into this..
The full commit hash is 807221d3c5ff6e3c91ff57bc82a0b7a541462e20
Note: apologies if you received this multiple times, the previous one
got bounced due to html
Cheers,
JP
________________________________
From: Greg KH <gregkh(a)linuxfoundation.org>
Sent: Tuesday, December 23, 2025 6:39:22 PM
To: JP Dehollain <jpdehollain(a)gmail.com>
Cc: stable(a)vger.kernel.org <stable(a)vger.kernel.org>
Subject: Re: Request to add mainline merged patch to stable kernels
On Tue, Dec 23, 2025 at 04:05:24PM +1100, JP Dehollain wrote:
> Hello,
> I recently used the patch misc: rtsx_pci: Add separate CD/WP pin
> polarity reversal support with commit ID 807221d, to fix a bug causing
> the cardreader driver to always load sd cards in read-only mode.
> On the suggestion of the driver maintainer, I am requesting that this
> patch be applied to all stable kernel versions, as it is currently
> only applied to >=6.18.
> Thanks,
> JP
>
What is the git id of the commit you are looking to have backported?
thanks,
greg k-h
TCR2_ELx.E0POE is set during smp_init().
However, this bit is not reprogrammed when the CPU enters suspension and
later resumes via cpu_resume(), as __cpu_setup() does not re-enable E0POE
and there is no save/restore logic for the TCR2_ELx system register.
As a result, the E0POE feature no longer works after cpu_resume().
To address this, save and restore TCR2_EL1 in the cpu_suspend()/cpu_resume()
path, rather than adding related logic to __cpu_setup(), taking into account
possible future extensions of the TCR2_ELx feature.
Cc: stable(a)vger.kernel.org
Fixes: bf83dae90fbc ("arm64: enable the Permission Overlay Extension for EL0")
Signed-off-by: Yeoreum Yun <yeoreum.yun(a)arm.com>
---
Patch History
==============
from v1 to v2:
- following @Kevin Brodsky suggestion.
- https://lore.kernel.org/all/20260105200707.2071169-1-yeoreum.yun@arm.com/
NOTE:
This patch based on v6.19-rc4
---
arch/arm64/include/asm/suspend.h | 2 +-
arch/arm64/mm/proc.S | 8 ++++++++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/suspend.h b/arch/arm64/include/asm/suspend.h
index e65f33edf9d6..e9ce68d50ba4 100644
--- a/arch/arm64/include/asm/suspend.h
+++ b/arch/arm64/include/asm/suspend.h
@@ -2,7 +2,7 @@
#ifndef __ASM_SUSPEND_H
#define __ASM_SUSPEND_H
-#define NR_CTX_REGS 13
+#define NR_CTX_REGS 14
#define NR_CALLEE_SAVED_REGS 12
/*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 01e868116448..5d907ce3b6d3 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -110,6 +110,10 @@ SYM_FUNC_START(cpu_do_suspend)
* call stack.
*/
str x18, [x0, #96]
+alternative_if ARM64_HAS_TCR2
+ mrs x2, REG_TCR2_EL1
+ str x2, [x0, #104]
+alternative_else_nop_endif
ret
SYM_FUNC_END(cpu_do_suspend)
@@ -144,6 +148,10 @@ SYM_FUNC_START(cpu_do_resume)
msr tcr_el1, x8
msr vbar_el1, x9
msr mdscr_el1, x10
+alternative_if ARM64_HAS_TCR2
+ ldr x2, [x0, #104]
+ msr REG_TCR2_EL1, x2
+alternative_else_nop_endif
msr sctlr_el1, x12
set_this_cpu_offset x13
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
This series converts all DRM bridge drivers (*) from the now deprecated
of_drm_find_bridge() to its replacement of_drm_find_and_get_bridge() which
allows correct bridge refcounting. It also converts per-driver
"next_bridge" pointers to the unified drm_bridge::next_bridge which puts
the reference automatically on bridge deallocation.
This is part of the work to support hotplug of DRM bridges. The grand plan
was discussed in [0].
Here's the work breakdown (➜ marks the current series):
1. ➜ add refcounting to DRM bridges struct drm_bridge,
based on devm_drm_bridge_alloc()
A. ✔ add new alloc API and refcounting (v6.16)
B. ✔ convert all bridge drivers to new API (v6.17)
C. ✔ kunit tests (v6.17)
D. ✔ add get/put to drm_bridge_add/remove() + attach/detach()
and warn on old allocation pattern (v6.17)
E. ➜ add get/put on drm_bridge accessors
1. ✔ drm_bridge_chain_get_first_bridge(), add cleanup action (v6.18)
2. ✔ drm_bridge_get_prev_bridge() (v6.18)
3. ✔ drm_bridge_get_next_bridge() (v6.19)
4. ✔ drm_for_each_bridge_in_chain() (v6.19)
5. ✔ drm_bridge_connector_init (v6.19)
6. … protect encoder bridge chain with a mutex
7. ➜ of_drm_find_bridge
a. ✔… add of_drm_get_bridge(), convert basic direct users
(v6.20?, one driver still pending)
b. ➜ convert direct of_drm_get_bridge() users, part 2
c. convert direct of_drm_get_bridge() users, part 3
d. convert direct of_drm_get_bridge() users, part 4
e. convert bridge-only drm_of_find_panel_or_bridge() users
8. drm_of_find_panel_or_bridge, *_of_get_bridge
9. ✔ enforce drm_bridge_add before drm_bridge_attach (v6.19)
F. ✔ debugfs improvements
1. ✔ add top-level 'bridges' file (v6.16)
2. ✔ show refcount and list lingering bridges (v6.19)
2. … handle gracefully atomic updates during bridge removal
A. ✔ Add drm_dev_enter/exit() to protect device resources (v6.20?)
B. … protect private_obj removal from list
3. … DSI host-device driver interaction
4. ✔ removing the need for the "always-disconnected" connector
5. finish the hotplug bridge work, moving code to the core and potentially
removing the hotplug-bridge itself (this needs to be clarified as
points 1-3 are developed)
[0] https://lore.kernel.org/lkml/20250206-hotplug-drm-bridge-v6-0-9d6f2c9c3058@…
This work is a continuation of the work to correctly handle bridge
refcounting for existing of_drm_find_bridge(). The ground work is in:
- commit 293a8fd7721a ("drm/bridge: add of_drm_find_and_get_bridge()")
- commit 9da0e06abda8 ("drm/bridge: deprecate of_drm_find_bridge()")
- commit 3fdeae134ba9 ("drm/bridge: add next_bridge pointer to struct drm_bridge")
The whole conversion is split in multiple series to make the review process
a bit smoother. Parts 3 and 4 are converting non-bridge drivers (mostly
encoders).
(*) One bridge driver (synopsys/dw-hdmi) is converted in another series,
together with its (non-bridge) users. Additionally this series converts
drm_of_panel_bridge_remove() which is a special case, and has a bugfix
for it too.
Signed-off-by: Luca Ceresoli <luca.ceresoli(a)bootlin.com>
---
Luca Ceresoli (12):
drm: of: drm_of_panel_bridge_remove(): fix device_node leak
drm: of: drm_of_panel_bridge_remove(): convert to of_drm_find_and_get_bridge()
drm/bridge: sii902x: convert to of_drm_find_and_get_bridge()
drm/bridge: thc63lvd1024: convert to of_drm_find_and_get_bridge()
drm/bridge: tfp410: convert to of_drm_find_and_get_bridge()
drm/bridge: tpd12s015: convert to of_drm_find_and_get_bridge()
drm/bridge: lt8912b: convert to of_drm_find_and_get_bridge()
drm/bridge: imx8mp-hdmi-pvi: convert to of_drm_find_and_get_bridge()
drm/bridge: imx8qxp-ldb: convert to of_drm_find_and_get_bridge()
drm/bridge: samsung-dsim: samsung_dsim_host_attach: use a temporary variable for the next bridge
drm/bridge: samsung-dsim: samsung_dsim_host_attach: don't use the bridge pointer as an error indicator
drm/bridge: samsung-dsim: samsung_dsim_host_attach: convert to of_drm_find_and_get_bridge()
drivers/gpu/drm/bridge/imx/imx8mp-hdmi-pvi.c | 15 +++++++-------
drivers/gpu/drm/bridge/imx/imx8qxp-ldb.c | 3 ++-
drivers/gpu/drm/bridge/lontium-lt8912b.c | 31 ++++++++++++++--------------
drivers/gpu/drm/bridge/samsung-dsim.c | 28 ++++++++++++++++---------
drivers/gpu/drm/bridge/sii902x.c | 7 +++----
drivers/gpu/drm/bridge/thc63lvd1024.c | 7 +++----
drivers/gpu/drm/bridge/ti-tfp410.c | 27 ++++++++++++------------
drivers/gpu/drm/bridge/ti-tpd12s015.c | 8 +++----
include/drm/bridge/samsung-dsim.h | 1 -
include/drm/drm_of.h | 6 +++++-
10 files changed, 69 insertions(+), 64 deletions(-)
---
base-commit: 2bcba510a612cea32b8a536eedeabd7fcb413cd0
change-id: 20251223-drm-bridge-alloc-getput-drm_of_find_bridge-2-12c6bbcb6896
Best regards,
--
Luca Ceresoli <luca.ceresoli(a)bootlin.com>
The patch titled
Subject: x86/kfence: avoid writing L1TF-vulnerable PTEs
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
x86-kfence-avoid-writing-l1tf-vulnerable-ptes.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Andrew Cooper <andrew.cooper3(a)citrix.com>
Subject: x86/kfence: avoid writing L1TF-vulnerable PTEs
Date: Tue, 6 Jan 2026 18:04:26 +0000
For native, the choice of PTE is fine. There's real memory backing the
non-present PTE. However, for XenPV, Xen complains:
(XEN) d1 L1TF-vulnerable L1e 8010000018200066 - Shadowing
To explain, some background on XenPV pagetables:
Xen PV guests are control their own pagetables; they choose the new
PTE value, and use hypercalls to make changes so Xen can audit for
safety.
In addition to a regular reference count, Xen also maintains a type
reference count. e.g. SegDesc (referenced by vGDT/vLDT), Writable
(referenced with _PAGE_RW) or L{1..4} (referenced by vCR3 or a lower
pagetable level). This is in order to prevent e.g. a page being
inserted into the pagetables for which the guest has a writable mapping.
For non-present mappings, all other bits become software accessible,
and typically contain metadata rather a real frame address. There is
nothing that a reference count could sensibly be tied to. As such, even
if Xen could recognise the address as currently safe, nothing would
prevent that frame from changing owner to another VM in the future.
When Xen detects a PV guest writing a L1TF-PTE, it responds by
activating shadow paging. This is normally only used for the live phase
of migration, and comes with a reasonable overhead.
KFENCE only cares about getting #PF to catch wild accesses; it doesn't
care about the value for non-present mappings. Use a fully inverted PTE,
to avoid hitting the slow path when running under Xen.
While adjusting the logic, take the opportunity to skip all actions if the
PTE is already in the right state, half the number PVOps callouts, and
skip TLB maintenance on a !P -> P transition which benefits non-Xen cases
too.
Link: https://lkml.kernel.org/r/20260106180426.710013-1-andrew.cooper3@citrix.com
Fixes: 1dc0da6e9ec0 ("x86, kfence: enable KFENCE for x86")
Signed-off-by: Andrew Cooper <andrew.cooper3(a)citrix.com>
Tested-by: Marco Elver <elver(a)google.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/include/asm/kfence.h | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
--- a/arch/x86/include/asm/kfence.h~x86-kfence-avoid-writing-l1tf-vulnerable-ptes
+++ a/arch/x86/include/asm/kfence.h
@@ -42,10 +42,34 @@ static inline bool kfence_protect_page(u
{
unsigned int level;
pte_t *pte = lookup_address(addr, &level);
+ pteval_t val;
if (WARN_ON(!pte || level != PG_LEVEL_4K))
return false;
+ val = pte_val(*pte);
+
+ /*
+ * protect requires making the page not-present. If the PTE is
+ * already in the right state, there's nothing to do.
+ */
+ if (protect != !!(val & _PAGE_PRESENT))
+ return true;
+
+ /*
+ * Otherwise, invert the entire PTE. This avoids writing out an
+ * L1TF-vulnerable PTE (not present, without the high address bits
+ * set).
+ */
+ set_pte(pte, __pte(~val));
+
+ /*
+ * If the page was protected (non-present) and we're making it
+ * present, there is no need to flush the TLB at all.
+ */
+ if (!protect)
+ return true;
+
/*
* We need to avoid IPIs, as we may get KFENCE allocations or faults
* with interrupts disabled. Therefore, the below is best-effort, and
@@ -53,11 +77,6 @@ static inline bool kfence_protect_page(u
* lazy fault handling takes care of faults after the page is PRESENT.
*/
- if (protect)
- set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
- else
- set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
-
/*
* Flush this CPU's TLB, assuming whoever did the allocation/free is
* likely to continue running on this CPU.
_
Patches currently in -mm which might be from andrew.cooper3(a)citrix.com are
x86-kfence-avoid-writing-l1tf-vulnerable-ptes.patch
When a device is matched via PRP0001, the driver's OF (DT) match table
must be used to obtain the device match data. If a driver provides both
an acpi_match_table and an of_match_table, the current
acpi_device_get_match_data() path consults the driver's acpi_match_table
and returns NULL (no ACPI ID matches).
Explicitly detect PRP0001 and fetch match data from the driver's
of_match_table via acpi_of_device_get_match_data().
Fixes: 886ca88be6b3 ("ACPI / bus: Respect PRP0001 when retrieving device match data")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kartik Rajput <kkartik(a)nvidia.com>
---
drivers/acpi/bus.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 5e110badac7b..4cd425fffa97 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1031,8 +1031,9 @@ const void *acpi_device_get_match_data(const struct device *dev)
{
const struct acpi_device_id *acpi_ids = dev->driver->acpi_match_table;
const struct acpi_device_id *match;
+ struct acpi_device = ACPI_COMPANION(dev);
- if (!acpi_ids)
+ if (!strcmp(ACPI_DT_NAMESPACE_HID, acpi_device_hid(adev))
return acpi_of_device_get_match_data(dev);
match = acpi_match_device(acpi_ids, dev);
--
2.43.0
When the CPU frequency policy is set to CPUFREQ_POLICY_PERFORMANCE
(which occurs when EPP hint is set to "performance"), the driver
incorrectly sets the MinPerf field in CPPC request MSR to nominal_perf
instead of lowest_nonlinear_perf.
According to the AMD architectural programmer's manual volume 2 [1],
in section "17.6.4.1 CPPC_CAPABILITY_1", lowest_nonlinear_perf represents
the most energy efficient performance level (in terms of performance per
watt). The MinPerf field should be set to this value even in performance
mode to maintain proper power/performance characteristics.
This fixes a regression introduced by commit 0c411b39e4f4c ("amd-pstate: Set
min_perf to nominal_perf for active mode performance gov"), which correctly
identified that highest_perf was too high but chose nominal_perf as an
intermediate value instead of lowest_nonlinear_perf.
The fix changes amd_pstate_update_min_max_limit() to use lowest_nonlinear_perf
instead of nominal_perf when the policy is CPUFREQ_POLICY_PERFORMANCE.
[1] https://docs.amd.com/v/u/en-US/24593_3.43
AMD64 Architecture Programmer's Manual Volume 2: System Programming
Section 17.6.4.1 CPPC_CAPABILITY_1
(Referenced in commit 5d9a354cf839a)
Fixes: 0c411b39e4f4c ("amd-pstate: Set min_perf to nominal_perf for active mode performance gov")
Tested-by: Kaushik Reddy S <kaushik.reddys(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Juan Martinez <juan.martinez(a)amd.com>
---
drivers/cpufreq/amd-pstate.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index e4f1933dd7d47..de0bb5b325502 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -634,8 +634,8 @@ static void amd_pstate_update_min_max_limit(struct cpufreq_policy *policy)
WRITE_ONCE(cpudata->max_limit_freq, policy->max);
if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
- perf.min_limit_perf = min(perf.nominal_perf, perf.max_limit_perf);
- WRITE_ONCE(cpudata->min_limit_freq, min(cpudata->nominal_freq, cpudata->max_limit_freq));
+ perf.min_limit_perf = min(perf.lowest_nonlinear_perf, perf.max_limit_perf);
+ WRITE_ONCE(cpudata->min_limit_freq, min(cpudata->lowest_nonlinear_freq, cpudata->max_limit_freq));
} else {
perf.min_limit_perf = freq_to_perf(perf, cpudata->nominal_freq, policy->min);
WRITE_ONCE(cpudata->min_limit_freq, policy->min);
--
2.34.1
The kernel test robot has reported:
BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28
lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0
CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT 8cc09ef94dcec767faa911515ce9e609c45db470
Call Trace:
<IRQ>
__dump_stack (lib/dump_stack.c:95)
dump_stack_lvl (lib/dump_stack.c:123)
dump_stack (lib/dump_stack.c:130)
spin_dump (kernel/locking/spinlock_debug.c:71)
do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?)
_raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138)
__free_frozen_pages (mm/page_alloc.c:2973)
___free_pages (mm/page_alloc.c:5295)
__free_pages (mm/page_alloc.c:5334)
tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290)
? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289)
? rcu_core (kernel/rcu/tree.c:?)
rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861)
rcu_core_si (kernel/rcu/tree.c:2879)
handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623)
__irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725)
irq_exit_rcu (kernel/softirq.c:741)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052)
</IRQ>
<TASK>
RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194)
free_pcppages_bulk (mm/page_alloc.c:1494)
drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632)
__drain_all_pages (mm/page_alloc.c:2731)
drain_all_pages (mm/page_alloc.c:2747)
kcompactd (mm/compaction.c:3115)
kthread (kernel/kthread.c:465)
? __cfi_kcompactd (mm/compaction.c:3166)
? __cfi_kthread (kernel/kthread.c:412)
ret_from_fork (arch/x86/kernel/process.c:164)
? __cfi_kthread (kernel/kthread.c:412)
ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
</TASK>
Matthew has analyzed the report and identified that in drain_page_zone()
we are in a section protected by spin_lock(&pcp->lock) and then get an
interrupt that attempts spin_trylock() on the same lock. The code is
designed to work this way without disabling IRQs and occasionally fail
the trylock with a fallback. However, the SMP=n spinlock implementation
assumes spin_trylock() will always succeed, and thus it's normally a
no-op. Here the enabled lock debugging catches the problem, but
otherwise it could cause a corruption of the pcp structure.
The problem has been introduced by commit 574907741599 ("mm/page_alloc:
leave IRQs enabled for per-cpu page allocations"). The pcp locking
scheme recognizes the need for disabling IRQs to prevent nesting
spin_trylock() sections on SMP=n, but the need to prevent the nesting in
spin_lock() has not been recognized. Fix it by introducing local
wrappers that change the spin_lock() to spin_lock_iqsave() with SMP=n
and use them in all places that do spin_lock(&pcp->lock).
Fixes: 574907741599 ("mm/page_alloc: leave IRQs enabled for per-cpu page allocations")
Cc: stable(a)vger.kernel.org
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202512101320.e2f2dd6f-lkp@intel.com
Analyzed-by: Matthew Wilcox <willy(a)infradead.org>
Link: https://lore.kernel.org/all/aUW05pyc9nZkvY-1@casper.infradead.org/
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
---
This fix is intentionally made self-contained and not trying to expand
upon the existing pcp[u]_spin() helpers. This is to make stable
backports easier due to recent cleanups to that helpers.
We could follow up with a proper helpers integration going forward.
However I think the assumptions SMP=n of the spinlock UP implementation
are just wrong. It should be valid to do a spin_lock() without disabling
irq's and rely on a nested spin_trylock() to fail. I will thus try
proposing the remove the UP implementation first. It should be within
the current trend of removing stuff that's optimized for a minority
configuration if it makes maintainability of the majority worse.
(c.f. recent scheduler SMP=n removal)
---
mm/page_alloc.c | 45 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 37 insertions(+), 8 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 822e05f1a964..ec3551d56cde 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -167,6 +167,31 @@ static inline void __pcp_trylock_noop(unsigned long *flags) { }
pcp_trylock_finish(UP_flags); \
})
+/*
+ * With the UP spinlock implementation, when we spin_lock(&pcp->lock) (for i.e.
+ * a potentially remote cpu drain) and get interrupted by an operation that
+ * attempts pcp_spin_trylock(), we can't rely on the trylock failure due to UP
+ * spinlock assumptions making the trylock a no-op. So we have to turn that
+ * spin_lock() to a spin_lock_irqsave(). This works because on UP there are no
+ * remote cpu's so we can only be locking the only existing local one.
+ */
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
+static inline void __flags_noop(unsigned long *flags) { }
+#define spin_lock_maybe_irqsave(lock, flags) \
+({ \
+ __flags_noop(&(flags)); \
+ spin_lock(lock); \
+})
+#define spin_unlock_maybe_irqrestore(lock, flags) \
+({ \
+ spin_unlock(lock); \
+ __flags_noop(&(flags)); \
+})
+#else
+#define spin_lock_maybe_irqsave(lock, flags) spin_lock_irqsave(lock, flags)
+#define spin_unlock_maybe_irqrestore(lock, flags) spin_unlock_irqrestore(lock, flags)
+#endif
+
#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
DEFINE_PER_CPU(int, numa_node);
EXPORT_PER_CPU_SYMBOL(numa_node);
@@ -2556,6 +2581,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
{
int high_min, to_drain, to_drain_batched, batch;
+ unsigned long UP_flags;
bool todo = false;
high_min = READ_ONCE(pcp->high_min);
@@ -2575,9 +2601,9 @@ bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
to_drain = pcp->count - pcp->high;
while (to_drain > 0) {
to_drain_batched = min(to_drain, batch);
- spin_lock(&pcp->lock);
+ spin_lock_maybe_irqsave(&pcp->lock, UP_flags);
free_pcppages_bulk(zone, to_drain_batched, pcp, 0);
- spin_unlock(&pcp->lock);
+ spin_unlock_maybe_irqrestore(&pcp->lock, UP_flags);
todo = true;
to_drain -= to_drain_batched;
@@ -2594,14 +2620,15 @@ bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
*/
void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
{
+ unsigned long UP_flags;
int to_drain, batch;
batch = READ_ONCE(pcp->batch);
to_drain = min(pcp->count, batch);
if (to_drain > 0) {
- spin_lock(&pcp->lock);
+ spin_lock_maybe_irqsave(&pcp->lock, UP_flags);
free_pcppages_bulk(zone, to_drain, pcp, 0);
- spin_unlock(&pcp->lock);
+ spin_unlock_maybe_irqrestore(&pcp->lock, UP_flags);
}
}
#endif
@@ -2612,10 +2639,11 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
static void drain_pages_zone(unsigned int cpu, struct zone *zone)
{
struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
+ unsigned long UP_flags;
int count;
do {
- spin_lock(&pcp->lock);
+ spin_lock_maybe_irqsave(&pcp->lock, UP_flags);
count = pcp->count;
if (count) {
int to_drain = min(count,
@@ -2624,7 +2652,7 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone)
free_pcppages_bulk(zone, to_drain, pcp, 0);
count -= to_drain;
}
- spin_unlock(&pcp->lock);
+ spin_unlock_maybe_irqrestore(&pcp->lock, UP_flags);
} while (count);
}
@@ -6109,6 +6137,7 @@ static void zone_pcp_update_cacheinfo(struct zone *zone, unsigned int cpu)
{
struct per_cpu_pages *pcp;
struct cpu_cacheinfo *cci;
+ unsigned long UP_flags;
pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
cci = get_cpu_cacheinfo(cpu);
@@ -6119,12 +6148,12 @@ static void zone_pcp_update_cacheinfo(struct zone *zone, unsigned int cpu)
* This can reduce zone lock contention without hurting
* cache-hot pages sharing.
*/
- spin_lock(&pcp->lock);
+ spin_lock_maybe_irqsave(&pcp->lock, UP_flags);
if ((cci->per_cpu_data_slice_size >> PAGE_SHIFT) > 3 * pcp->batch)
pcp->flags |= PCPF_FREE_HIGH_BATCH;
else
pcp->flags &= ~PCPF_FREE_HIGH_BATCH;
- spin_unlock(&pcp->lock);
+ spin_unlock_maybe_irqrestore(&pcp->lock, UP_flags);
}
void setup_pcp_cacheinfo(unsigned int cpu)
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20260105-fix-pcp-up-3c88c09752ec
Best regards,
--
Vlastimil Babka <vbabka(a)suse.cz>
CephFS stores file data across multiple RADOS objects. An object is the
atomic unit of storage, so the writeback code must clean only folios
that belong to the same object with each OSD request.
CephFS also supports RAID0-style striping of file contents: if enabled,
each object stores multiple unbroken "stripe units" covering different
portions of the file; if disabled, a "stripe unit" is simply the whole
object. The stripe unit is (usually) reported as the inode's block size.
Though the writeback logic could, in principle, lock all dirty folios
belonging to the same object, its current design is to lock only a
single stripe unit at a time. Ever since this code was first written,
it has determined this size by checking the inode's block size.
However, the relatively-new fscrypt support needed to reduce the block
size for encrypted inodes to the crypto block size (see 'fixes' commit),
which causes an unnecessarily high number of write operations (~1024x as
many, with 4MiB objects) and correspondingly degraded performance.
Fix this (and clarify intent) by using i_layout.stripe_unit directly in
ceph_define_write_size() so that encrypted inodes are written back with
the same number of operations as if they were unencrypted.
Fixes: 94af0470924c ("ceph: add some fscrypt guardrails")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sam Edwards <CFSworks(a)gmail.com>
---
fs/ceph/addr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index f2db05b51a3b..b97a6120d4b9 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1000,7 +1000,8 @@ unsigned int ceph_define_write_size(struct address_space *mapping)
{
struct inode *inode = mapping->host;
struct ceph_fs_client *fsc = ceph_inode_to_fs_client(inode);
- unsigned int wsize = i_blocksize(inode);
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ unsigned int wsize = ci->i_layout.stripe_unit;
if (fsc->mount_options->wsize < wsize)
wsize = fsc->mount_options->wsize;
--
2.51.2
If `locked_pages` is zero, the page array must not be allocated:
ceph_process_folio_batch() uses `locked_pages` to decide when to
allocate `pages`, and redundant allocations trigger
ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and
writeback stall) or even a kernel panic. Consequently, the main loop in
ceph_writepages_start() assumes that the lifetime of `pages` is confined
to a single iteration.
The ceph_submit_write() function claims ownership of the page array on
success. But failures only redirty/unlock the pages and fail to free the
array, making the failure case in ceph_submit_write() fatal.
Free the page array (and reset locked_pages) in ceph_submit_write()'s
error-handling 'if' block so that the caller's invariant (that the array
does not outlive the iteration) is maintained unconditionally, making
failures in ceph_submit_write() recoverable as originally intended.
Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sam Edwards <CFSworks(a)gmail.com>
---
fs/ceph/addr.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 2b722916fb9b..467aa7242b49 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1466,6 +1466,14 @@ int ceph_submit_write(struct address_space *mapping,
unlock_page(page);
}
+ if (ceph_wbc->from_pool) {
+ mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);
+ ceph_wbc->from_pool = false;
+ } else
+ kfree(ceph_wbc->pages);
+ ceph_wbc->pages = NULL;
+ ceph_wbc->locked_pages = 0;
+
ceph_osdc_put_request(req);
return -EIO;
}
--
2.51.2
Greetings!
This is the mlmmj program managing the <stable(a)vger.kernel.org> mailing
list.
Your mail host rejected some of the messages we tried to deliver.
This usually happens if we have accepted a spam message, but your mail
server refused to take it from us (because it's spam).
This automated probe is just a test to make sure that there are no other
problems with mail delivery to your account. No action is required on
your part -- if you are seeing this message, that means everything is
normal.
(In fact, you can add "Subject: Bounce probe" to your filters to avoid
seeing this message in the future.)
For internal tracking purposes, here is the list of bounced messages:
- 206207
Hi,
This Linux kernel patch series introduces support for error recovery for
passthrough PCI devices on System Z (s390x).
Background
----------
For PCI devices on s390x an operating system receives platform specific
error events from firmware rather than through AER.Today for
passthrough/userspace devices, we don't attempt any error recovery and
ignore any error events for the devices. The passthrough/userspace devices
are managed by the vfio-pci driver. The driver does register error handling
callbacks (error_detected), and on an error trigger an eventfd to
userspace. But we need a mechanism to notify userspace
(QEMU/guest/userspace drivers) about the error event.
Proposal
--------
We can expose this error information (currently only the PCI Error Code)
via a device feature. Userspace can then obtain the error information
via VFIO_DEVICE_FEATURE ioctl and take appropriate actions such as driving
a device reset.
This is how a typical flow for passthrough devices to a VM would work:
For passthrough devices to a VM, the driver bound to the device on the host
is vfio-pci. vfio-pci driver does support the error_detected() callback
(vfio_pci_core_aer_err_detected()), and on an PCI error s390x recovery
code on the host will call the vfio-pci error_detected() callback. The
vfio-pci error_detected() callback will notify userspace/QEMU via an
eventfd, and return PCI_ERS_RESULT_CAN_RECOVER. At this point the s390x
error recovery on the host will skip any further action(see patch 6) and
let userspace drive the error recovery.
Once userspace/QEMU is notified, it then injects this error into the VM
so device drivers in the VM can take recovery actions. For example for a
passthrough NVMe device, the VM's OS NVMe driver will access the device.
At this point the VM's NVMe driver's error_detected() will drive the
recovery by returning PCI_ERS_RESULT_NEED_RESET, and the s390x error
recovery in the VM's OS will try to do a reset. Resets are privileged
operations and so the VM will need intervention from QEMU to perform the
reset. QEMU will invoke the VFIO_DEVICE_RESET ioctl to now notify the
host that the VM is requesting a reset of the device. The vfio-pci driver
on the host will then perform the reset on the device to recover it.
Thanks
Farhan
ChangeLog
---------
v6 series https://lore.kernel.org/all/2c609e61-1861-4bf3-b019-a11c137d26a5@linux.ibm.…
v6 -> v7
- Rebase on 6.19-rc4
- Update commit message based on Niklas's suggestion (patch 3).
v5 series https://lore.kernel.org/all/20251113183502.2388-1-alifm@linux.ibm.com/
v5 -> v6
- Rebase on 6.18 + Lukas's PCI: Universal error recoverability of
devices series (https://lore.kernel.org/all/cover.1763483367.git.lukas@wunner.de/)
- Re-work config space accessibility check to pci_dev_save_and_disable() (patch 3).
This avoids saving the config space, in the reset path, if the device's config space is
corrupted or inaccessible.
v4 series https://lore.kernel.org/all/20250924171628.826-1-alifm@linux.ibm.com/
v4 -> v5
- Rebase on 6.18-rc5
- Move bug fixes to the beginning of the series (patch 1 and 2). These patches
were posted as a separate fixes series
https://lore.kernel.org/all/a14936ac-47d6-461b-816f-0fd66f869b0f@linux.ibm.…
- Add matching pci_put_dev() for pci_get_slot() (patch 6).
v3 series https://lore.kernel.org/all/20250911183307.1910-1-alifm@linux.ibm.com/
v3 -> v4
- Remove warn messages for each PCI capability not restored (patch 1)
- Check PCI_COMMAND and PCI_STATUS register for error value instead of device id
(patch 1)
- Fix kernel crash in patch 3
- Added reviewed by tags
- Address comments from Niklas's (patches 4, 5, 7)
- Fix compilation error non s390x system (patch 8)
- Explicitly align struct vfio_device_feature_zpci_err (patch 8)
v2 series https://lore.kernel.org/all/20250825171226.1602-1-alifm@linux.ibm.com/
v2 -> v3
- Patch 1 avoids saving any config space state if the device is in error
(suggested by Alex)
- Patch 2 adds additional check only for FLR reset to try other function
reset method (suggested by Alex).
- Patch 3 fixes a bug in s390 for resetting PCI devices with multiple
functions. Creates a new flag pci_slot to allow per function slot.
- Patch 4 fixes a bug in s390 for resource to bus address translation.
- Rebase on 6.17-rc5
v1 series https://lore.kernel.org/all/20250813170821.1115-1-alifm@linux.ibm.com/
v1 - > v2
- Patches 1 and 2 adds some additional checks for FLR/PM reset to
try other function reset method (suggested by Alex).
- Patch 3 fixes a bug in s390 for resetting PCI devices with multiple
functions.
- Patch 7 adds a new device feature for zPCI devices for the VFIO_DEVICE_FEATURE
ioctl. The ioctl is used by userspace to retriece any PCI error
information for the device (suggested by Alex).
- Patch 8 adds a reset_done() callback for the vfio-pci driver, to
restore the state of the device after a reset.
- Patch 9 removes the pcie check for triggering VFIO_PCI_ERR_IRQ_INDEX.
Farhan Ali (9):
PCI: Allow per function PCI slots
s390/pci: Add architecture specific resource/bus address translation
PCI: Avoid saving config space state if inaccessible
PCI: Add additional checks for flr reset
s390/pci: Update the logic for detecting passthrough device
s390/pci: Store PCI error information for passthrough devices
vfio-pci/zdev: Add a device feature for error information
vfio: Add a reset_done callback for vfio-pci driver
vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX
arch/s390/include/asm/pci.h | 29 ++++++++
arch/s390/pci/pci.c | 75 +++++++++++++++++++++
arch/s390/pci/pci_event.c | 107 +++++++++++++++++-------------
drivers/pci/host-bridge.c | 4 +-
drivers/pci/pci.c | 19 +++++-
drivers/pci/slot.c | 25 ++++++-
drivers/vfio/pci/vfio_pci_core.c | 20 ++++--
drivers/vfio/pci/vfio_pci_intrs.c | 3 +-
drivers/vfio/pci/vfio_pci_priv.h | 9 +++
drivers/vfio/pci/vfio_pci_zdev.c | 45 ++++++++++++-
include/linux/pci.h | 1 +
include/uapi/linux/vfio.h | 16 +++++
12 files changed, 292 insertions(+), 61 deletions(-)
--
2.43.0
On s390 systems, which use a machine level hypervisor, PCI devices are
always accessed through a form of PCI pass-through which fundamentally
operates on a per PCI function granularity. This is also reflected in the
s390 PCI hotplug driver which creates hotplug slots for individual PCI
functions. Its reset_slot() function, which is a wrapper for
zpci_hot_reset_device(), thus also resets individual functions.
Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
to multifunction devices. This approach worked fine on s390 systems that
only exposed virtual functions as individual PCI domains to the operating
system. Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
s390 supports exposing the topology of multifunction PCI devices by
grouping them in a shared PCI domain. When attempting to reset a function
through the hotplug driver, the shared slot assignment causes the wrong
function to be reset instead of the intended one. It also leaks memory as
we do create a pci_slot object for the function, but don't correctly free
it in pci_slot_release().
Add a flag for struct pci_slot to allow per function PCI slots for
functions managed through a hypervisor, which exposes individual PCI
functions while retaining the topology.
Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
Cc: stable(a)vger.kernel.org
Suggested-by: Niklas Schnelle <schnelle(a)linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle(a)linux.ibm.com>
Signed-off-by: Farhan Ali <alifm(a)linux.ibm.com>
---
drivers/pci/pci.c | 5 +++--
drivers/pci/slot.c | 25 ++++++++++++++++++++++---
include/linux/pci.h | 1 +
3 files changed, 26 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 13dbb405dc31..c105e285cff8 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4832,8 +4832,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
{
- if (dev->multifunction || dev->subordinate || !dev->slot ||
- dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
+ if (dev->subordinate || !dev->slot ||
+ dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
+ (dev->multifunction && !dev->slot->per_func_slot))
return -ENOTTY;
return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 50fb3eb595fe..ed10fa3ae727 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -63,6 +63,22 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
return bus_speed_read(slot->bus->cur_bus_speed, buf);
}
+static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
+{
+ if (slot->per_func_slot)
+ return dev->devfn == slot->number;
+
+ return PCI_SLOT(dev->devfn) == slot->number;
+}
+
+static bool pci_slot_enabled_per_func(void)
+{
+ if (IS_ENABLED(CONFIG_S390))
+ return true;
+
+ return false;
+}
+
static void pci_slot_release(struct kobject *kobj)
{
struct pci_dev *dev;
@@ -73,7 +89,7 @@ static void pci_slot_release(struct kobject *kobj)
down_read(&pci_bus_sem);
list_for_each_entry(dev, &slot->bus->devices, bus_list)
- if (PCI_SLOT(dev->devfn) == slot->number)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = NULL;
up_read(&pci_bus_sem);
@@ -166,7 +182,7 @@ void pci_dev_assign_slot(struct pci_dev *dev)
mutex_lock(&pci_slot_mutex);
list_for_each_entry(slot, &dev->bus->slots, list)
- if (PCI_SLOT(dev->devfn) == slot->number)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = slot;
mutex_unlock(&pci_slot_mutex);
}
@@ -265,6 +281,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
slot->bus = pci_bus_get(parent);
slot->number = slot_nr;
+ if (pci_slot_enabled_per_func())
+ slot->per_func_slot = 1;
+
slot->kobj.kset = pci_slots_kset;
slot_name = make_slot_name(name);
@@ -285,7 +304,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
down_read(&pci_bus_sem);
list_for_each_entry(dev, &parent->devices, bus_list)
- if (PCI_SLOT(dev->devfn) == slot_nr)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = slot;
up_read(&pci_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 864775651c6f..08fa57beb7b2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -78,6 +78,7 @@ struct pci_slot {
struct list_head list; /* Node in list of slots */
struct hotplug_slot *hotplug; /* Hotplug info (move here) */
unsigned char number; /* PCI_SLOT(pci_dev->devfn) */
+ unsigned int per_func_slot:1; /* Allow per function slot */
struct kobject kobj;
};
--
2.43.0
In vmw_compat_shader_add(), the return value check of vmw_shader_alloc()
is not proper. Modify the check for the return pointer 'res'.
Found by code review and compiled on ubuntu 20.04.
Fixes: 18e4a4669c50 ("drm/vmwgfx: Fix compat shader namespace")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <lihaoxiang(a)isrc.iscas.ac.cn>
---
drivers/gpu/drm/vmwgfx/vmwgfx_shader.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
index 69dfe69ce0f8..7ed938710342 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
@@ -923,8 +923,10 @@ int vmw_compat_shader_add(struct vmw_private *dev_priv,
ttm_bo_unreserve(&buf->tbo);
res = vmw_shader_alloc(dev_priv, buf, size, 0, shader_type);
- if (unlikely(ret != 0))
+ if (IS_ERR(res)) {
+ ret = PTR_ERR(res);
goto no_reserve;
+ }
ret = vmw_cmdbuf_res_add(man, vmw_cmdbuf_res_shader,
vmw_shader_key(user_key, shader_type),
--
2.25.1
The Task::group_leader() method currently allows you to access the
group_leader() of any task, for example one you hold a refcount to. But
this is not safe in general since the group leader could change when a
task exits. See for example commit a15f37a40145c ("kernel/sys.c: fix the
racy usage of task_lock(tsk->group_leader) in sys_prlimit64() paths").
All existing users of Task::group_leader() call this method on current,
which is guaranteed running, so there's not an actual issue in Rust code
today. But to prevent code in the future from making this mistake,
restrict Task::group_leader() so that it can only be called on current.
There are some other cases where accessing task->group_leader is okay.
For example it can be safe if you hold tasklist_lock or rcu_read_lock().
However, only supporting current->group_leader is sufficient for all
in-tree Rust users of group_leader right now. Safe Rust functionality
for accessing it under rcu or while holding tasklist_lock may be added
in the future if required by any future Rust module.
Reported-by: Oleg Nesterov <oleg(a)redhat.com>
Closes: https://lore.kernel.org/all/aTLnV-5jlgfk1aRK@redhat.com/
Fixes: 313c4281bc9d ("rust: add basic `Task`")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alice Ryhl <aliceryhl(a)google.com>
---
The rust/kernel/task.rs file has had changes land through a few
different trees:
* Originally task.rs landed through Christian's tree together with
file.rs and pid_namespace.rs
* The change to add CurrentTask landed through Andrew Morton's tree
together with mm.rs
* There was a patch to mark some methods #[inline] that landed through
tip via Boqun.
I don't think there's a clear owner for this file, so to break ambiguity
I'm doing to declare that this patch is intended for Andrew Morton's
tree. Please let me know if you think a different tree is appropriate.
---
rust/kernel/task.rs | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 49fad6de06740a9b9ad80b2f4b430cc28cd134fa..9440692a3a6d0d3f908d61d51dcd377a272f6957 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -204,18 +204,6 @@ pub fn as_ptr(&self) -> *mut bindings::task_struct {
self.0.get()
}
- /// Returns the group leader of the given task.
- pub fn group_leader(&self) -> &Task {
- // SAFETY: The group leader of a task never changes after initialization, so reading this
- // field is not a data race.
- let ptr = unsafe { *ptr::addr_of!((*self.as_ptr()).group_leader) };
-
- // SAFETY: The lifetime of the returned task reference is tied to the lifetime of `self`,
- // and given that a task has a reference to its group leader, we know it must be valid for
- // the lifetime of the returned task reference.
- unsafe { &*ptr.cast() }
- }
-
/// Returns the PID of the given task.
pub fn pid(&self) -> Pid {
// SAFETY: The pid of a task never changes after initialization, so reading this field is
@@ -345,6 +333,18 @@ pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
// `release_task()` call.
Some(unsafe { PidNamespace::from_ptr(active_ns) })
}
+
+ /// Returns the group leader of the current task.
+ pub fn group_leader(&self) -> &Task {
+ // SAFETY: The group leader of a task never changes while the task is running, and `self`
+ // is the current task, which is guaranteed running.
+ let ptr = unsafe { (*self.as_ptr()).group_leader };
+
+ // SAFETY: `current.group_leader` stays valid for at least the duration in which `current`
+ // is running, and the signature of this function ensures that the returned `&Task` can
+ // only be used while `current` is still valid, thus still running.
+ unsafe { &*ptr.cast() }
+ }
}
// SAFETY: The type invariants guarantee that `Task` is always refcounted.
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20251218-task-group-leader-a71931ced643
Best regards,
--
Alice Ryhl <aliceryhl(a)google.com>
commit ed3ba9b6e280e14cc3148c1b226ba453f02fa76c upstream.
SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to
br_ioctl_call(), which causes unnecessary RTNL dance and the splat
below [0] under RTNL pressure.
Let's say Thread A is trying to detach a device from a bridge and
Thread B is trying to remove the bridge.
In dev_ioctl(), Thread A bumps the bridge device's refcnt by
netdev_hold() and releases RTNL because the following br_ioctl_call()
also re-acquires RTNL.
In the race window, Thread B could acquire RTNL and try to remove
the bridge device. Then, rtnl_unlock() by Thread B will release RTNL
and wait for netdev_put() by Thread A.
Thread A, however, must hold RTNL after the unlock in dev_ifsioc(),
which may take long under RTNL pressure, resulting in the splat by
Thread B.
Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR)
---------------------- ----------------------
sock_ioctl sock_ioctl
`- sock_do_ioctl `- br_ioctl_call
`- dev_ioctl `- br_ioctl_stub
|- rtnl_lock |
|- dev_ifsioc '
' |- dev = __dev_get_by_name(...)
|- netdev_hold(dev, ...) .
/ |- rtnl_unlock ------. |
| |- br_ioctl_call `---> |- rtnl_lock
Race | | `- br_ioctl_stub |- br_del_bridge
Window | | | |- dev = __dev_get_by_name(...)
| | | May take long | `- br_dev_delete(dev, ...)
| | | under RTNL pressure | `- unregister_netdevice_queue(dev, ...)
| | | | `- rtnl_unlock
\ | |- rtnl_lock <-' `- netdev_run_todo
| |- ... `- netdev_run_todo
| `- rtnl_unlock |- __rtnl_unlock
| |- netdev_wait_allrefs_any
|- netdev_put(dev, ...) <----------------'
Wait refcnt decrement
and log splat below
To avoid blocking SIOCBRDELBR unnecessarily, let's not call
dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF.
In the dev_ioctl() path, we do the following:
1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl()
2. Check CAP_NET_ADMIN in dev_ioctl()
3. Call dev_load() in dev_ioctl()
4. Fetch the master dev from ifr.ifr_name in dev_ifsioc()
3. can be done by request_module() in br_ioctl_call(), so we move
1., 2., and 4. to br_ioctl_stub().
Note that 2. is also checked later in add_del_if(), but it's better
performed before RTNL.
SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since
the pre-git era, and there seems to be no specific reason to process
them there.
[0]:
unregister_netdevice: waiting for wpan3 to become free. Usage count = 2
ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at
__netdev_tracker_alloc include/linux/netdevice.h:4282 [inline]
netdev_hold include/linux/netdevice.h:4311 [inline]
dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624
dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826
sock_do_ioctl+0x1ca/0x260 net/socket.c:1213
sock_ioctl+0x23a/0x6c0 net/socket.c:1318
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl fs/ioctl.c:892 [inline]
__x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 893b19587534 ("net: bridge: fix ioctl locking")
Reported-by: syzkaller <syzkaller(a)googlegroups.com>
Reported-by: yan kang <kangyan91(a)outlook.com>
Reported-by: yue sun <samsun1006219(a)gmail.com>
Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P3…
Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com>
Acked-by: Stanislav Fomichev <sdf(a)fomichev.me>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Acked-by: Nikolay Aleksandrov <razor(a)blackwall.org>
Link: https://patch.msgid.link/20250316192851.19781-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
[cascardo: fixed conflict at dev_ifsioc]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
---
include/linux/if_bridge.h | 6 ++----
net/bridge/br_ioctl.c | 36 +++++++++++++++++++++++++++++++++---
net/bridge/br_private.h | 3 +--
net/core/dev_ioctl.c | 15 ---------------
net/socket.c | 19 +++++++++----------
5 files changed, 45 insertions(+), 34 deletions(-)
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 509e18c7e740..37082feae336 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -62,11 +62,9 @@ struct br_ip_list {
#define BR_DEFAULT_AGEING_TIME (300 * HZ)
struct net_bridge;
-void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br,
- unsigned int cmd, struct ifreq *ifr,
+void brioctl_set(int (*hook)(struct net *net, unsigned int cmd,
void __user *uarg));
-int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd,
- struct ifreq *ifr, void __user *uarg);
+int br_ioctl_call(struct net *net, unsigned int cmd, void __user *uarg);
#if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING)
int br_multicast_list_adjacent(struct net_device *dev,
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index 9922497e59f8..85c23a38bdb2 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -368,10 +368,26 @@ static int old_deviceless(struct net *net, void __user *uarg)
return -EOPNOTSUPP;
}
-int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd,
- struct ifreq *ifr, void __user *uarg)
+int br_ioctl_stub(struct net *net, unsigned int cmd, void __user *uarg)
{
int ret = -EOPNOTSUPP;
+ struct ifreq ifr;
+
+ if (cmd == SIOCBRADDIF || cmd == SIOCBRDELIF) {
+ void __user *data;
+ char *colon;
+
+ if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return -EPERM;
+
+ if (get_user_ifreq(&ifr, &data, uarg))
+ return -EFAULT;
+
+ ifr.ifr_name[IFNAMSIZ - 1] = 0;
+ colon = strchr(ifr.ifr_name, ':');
+ if (colon)
+ *colon = 0;
+ }
rtnl_lock();
@@ -404,7 +420,21 @@ int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd,
break;
case SIOCBRADDIF:
case SIOCBRDELIF:
- ret = add_del_if(br, ifr->ifr_ifindex, cmd == SIOCBRADDIF);
+ {
+ struct net_device *dev;
+
+ dev = __dev_get_by_name(net, ifr.ifr_name);
+ if (!dev || !netif_device_present(dev)) {
+ ret = -ENODEV;
+ break;
+ }
+ if (!netif_is_bridge_master(dev)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ ret = add_del_if(netdev_priv(dev), ifr.ifr_ifindex, cmd == SIOCBRADDIF);
+ }
break;
}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 8acb427ae6de..0cf756677953 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -874,8 +874,7 @@ br_port_get_check_rtnl(const struct net_device *dev)
/* br_ioctl.c */
int br_dev_siocdevprivate(struct net_device *dev, struct ifreq *rq,
void __user *data, int cmd);
-int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd,
- struct ifreq *ifr, void __user *uarg);
+int br_ioctl_stub(struct net *net, unsigned int cmd, void __user *uarg);
/* br_multicast.c */
#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index 6ddfd7bfc512..30fca446f58a 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -375,19 +375,6 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, void __user *data,
case SIOCWANDEV:
return dev_siocwandev(dev, &ifr->ifr_settings);
- case SIOCBRADDIF:
- case SIOCBRDELIF:
- if (!netif_device_present(dev))
- return -ENODEV;
- if (!netif_is_bridge_master(dev))
- return -EOPNOTSUPP;
- dev_hold(dev);
- rtnl_unlock();
- err = br_ioctl_call(net, netdev_priv(dev), cmd, ifr, NULL);
- dev_put(dev);
- rtnl_lock();
- return err;
-
case SIOCSHWTSTAMP:
err = net_hwtstamp_validate(ifr);
if (err)
@@ -574,8 +561,6 @@ int dev_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr,
case SIOCBONDRELEASE:
case SIOCBONDSETHWADDR:
case SIOCBONDCHANGEACTIVE:
- case SIOCBRADDIF:
- case SIOCBRDELIF:
case SIOCSHWTSTAMP:
if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
return -EPERM;
diff --git a/net/socket.c b/net/socket.c
index 1d71fa44ace4..f3d0a8d66cce 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1097,12 +1097,10 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from)
*/
static DEFINE_MUTEX(br_ioctl_mutex);
-static int (*br_ioctl_hook)(struct net *net, struct net_bridge *br,
- unsigned int cmd, struct ifreq *ifr,
+static int (*br_ioctl_hook)(struct net *net, unsigned int cmd,
void __user *uarg);
-void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br,
- unsigned int cmd, struct ifreq *ifr,
+void brioctl_set(int (*hook)(struct net *net, unsigned int cmd,
void __user *uarg))
{
mutex_lock(&br_ioctl_mutex);
@@ -1111,8 +1109,7 @@ void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br,
}
EXPORT_SYMBOL(brioctl_set);
-int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd,
- struct ifreq *ifr, void __user *uarg)
+int br_ioctl_call(struct net *net, unsigned int cmd, void __user *uarg)
{
int err = -ENOPKG;
@@ -1121,7 +1118,7 @@ int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd,
mutex_lock(&br_ioctl_mutex);
if (br_ioctl_hook)
- err = br_ioctl_hook(net, br, cmd, ifr, uarg);
+ err = br_ioctl_hook(net, cmd, uarg);
mutex_unlock(&br_ioctl_mutex);
return err;
@@ -1218,7 +1215,9 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
case SIOCSIFBR:
case SIOCBRADDBR:
case SIOCBRDELBR:
- err = br_ioctl_call(net, NULL, cmd, NULL, argp);
+ case SIOCBRADDIF:
+ case SIOCBRDELIF:
+ err = br_ioctl_call(net, cmd, argp);
break;
case SIOCGIFVLAN:
case SIOCSIFVLAN:
@@ -3321,6 +3320,8 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
case SIOCGPGRP:
case SIOCBRADDBR:
case SIOCBRDELBR:
+ case SIOCBRADDIF:
+ case SIOCBRDELIF:
case SIOCGIFVLAN:
case SIOCSIFVLAN:
case SIOCGSKNS:
@@ -3358,8 +3359,6 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
case SIOCGIFPFLAGS:
case SIOCGIFTXQLEN:
case SIOCSIFTXQLEN:
- case SIOCBRADDIF:
- case SIOCBRDELIF:
case SIOCGIFNAME:
case SIOCSIFNAME:
case SIOCGMIIPHY:
--
2.47.3
From: ZhangGuoDong <zhangguodong(a)kylinos.cn>
[ Upstream commit 7c28f8eef5ac5312794d8a52918076dcd787e53b ]
When ksmbd_iov_pin_rsp() fails, we should call ksmbd_session_rpc_close().
Signed-off-by: ZhangGuoDong <zhangguodong(a)kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong(a)kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
### 3. CLASSIFICATION
**Bug Type:** Resource leak
- This is clearly a **bug fix**, not a feature addition
- When `ksmbd_iov_pin_rsp()` fails after `ksmbd_session_rpc_open()`
succeeds, the RPC session is never closed
- Resources leaked include:
- The `ksmbd_session_rpc` structure memory
- The IPC ID allocated via `ksmbd_ipc_id_alloc()`
- Entry remains in the session's `rpc_handle_list` xarray
### 4. SCOPE AND RISK ASSESSMENT
**Size:** Very small - 4 lines of actual code change
- Line 1: `int id;` → `int id = -1;` (initialization to enable cleanup
check)
- Lines 2-3: Added `if (id >= 0) ksmbd_session_rpc_close(work->sess,
id);` in error path
**Risk:** Very low
- Only affects the error path when `ksmbd_iov_pin_rsp()` fails
- Standard cleanup pattern already used elsewhere in the codebase
- Cannot affect normal (successful) operation
**Subsystem:** ksmbd (kernel SMB server)
- Mature component, been in mainline since Linux 5.15
- Has an active maintainer (Namjae Jeon)
### 5. USER IMPACT
- **Who is affected:** Users running ksmbd as their SMB server
- **Trigger condition:** Any time `ksmbd_iov_pin_rsp()` fails after
opening an RPC pipe
- **Severity:** Medium - resource leaks accumulate over time, can lead
to system degradation or resource exhaustion under sustained error
conditions
- **Real-world likelihood:** Moderate - `ksmbd_iov_pin_rsp()` can fail
with -ENOMEM under memory pressure
### 6. STABILITY INDICATORS
- **Acked-by:** Namjae Jeon (ksmbd maintainer)
- **Signed-off-by:** Steve French (SMB/CIFS maintainer)
- The fix follows the existing pattern in the codebase (similar cleanup
is done for `name` in the same error path)
### 7. DEPENDENCY CHECK
- No dependencies on other commits
- Uses `ksmbd_session_rpc_close()` which exists in all kernel versions
with ksmbd (5.15+)
- The affected code (`create_smb2_pipe()` and `ksmbd_iov_pin_rsp()`)
exists in stable trees
---
## Conclusion
This commit fixes a clear resource leak in the ksmbd SMB server. When
the final step of creating an SMB pipe (`ksmbd_iov_pin_rsp()`) fails,
the previously opened RPC session was never cleaned up, causing memory
and ID leaks.
**Why it should be backported:**
1. **Fixes a real bug** - Resource leaks are a well-known category of
bugs that accumulate over time
2. **Obviously correct** - Standard error path cleanup pattern, mirrors
how `name` is freed in the same path
3. **Small and surgical** - Only 4 lines changed, localized to one
function
4. **Low risk** - Only affects error path, cannot break normal operation
5. **Maintainer acknowledgment** - Acked by ksmbd maintainer
6. **Affects stable trees** - ksmbd has been in-kernel since 5.15,
stable users can hit this bug
**Concerns:** None significant. The fix is trivial and follows
established patterns.
**YES**
fs/smb/server/smb2pdu.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c
index 6a94cda0927d..e052dcb9a14c 100644
--- a/fs/smb/server/smb2pdu.c
+++ b/fs/smb/server/smb2pdu.c
@@ -2291,7 +2291,7 @@ static noinline int create_smb2_pipe(struct ksmbd_work *work)
{
struct smb2_create_rsp *rsp;
struct smb2_create_req *req;
- int id;
+ int id = -1;
int err;
char *name;
@@ -2348,6 +2348,9 @@ static noinline int create_smb2_pipe(struct ksmbd_work *work)
break;
}
+ if (id >= 0)
+ ksmbd_session_rpc_close(work->sess, id);
+
if (!IS_ERR(name))
kfree(name);
--
2.51.0
After discussion with the devicetree maintainers we agreed to not extend
lists with the generic compatible "apple,nvme-ans2" anymore [1]. Add
"apple,t8103-nvme-ans2" as fallback compatible as it is the SoC the
driver and bindings were written for.
[1]: https://lore.kernel.org/asahi/12ab93b7-1fc2-4ce0-926e-c8141cfe81bf@kernel.o…
Cc: stable(a)vger.kernel.org # v6.18+
Fixes: 5bd2927aceba ("nvme-apple: Add initial Apple SoC NVMe driver")
Reviewed-by: Neal Gompa <neal(a)gompa.dev>
Signed-off-by: Janne Grunau <j(a)jannau.net>
---
This is split off from the v1 series adding Apple M2 Pro/Max/Ultra
device trees in [2]. Handling this as fix adding a device id only for
v6.18+ for two reasons. apple_nvme_of_match gained hw_data in v6.18 and
device trees using this compatible were only added in v6.18.
2: https://lore.kernel.org/r/20250828-dt-apple-t6020-v1-0-507ba4c4b98e@jannau.…
Changes compared to the patch in that series:
- rebased onto v6.19-rc1 since commit 04d8ecf37b5e ("nvme: apple: Add
Apple A11 support") introduced hw data for the match table
---
drivers/nvme/host/apple.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index 15b3d07f8ccdd023cd3be75eedd349b747c1ecad..ed61b97fde59f7e02664798d9c2612ac16307f5c 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -1704,6 +1704,7 @@ static const struct apple_nvme_hw apple_nvme_t8103_hw = {
static const struct of_device_id apple_nvme_of_match[] = {
{ .compatible = "apple,t8015-nvme-ans2", .data = &apple_nvme_t8015_hw },
+ { .compatible = "apple,t8103-nvme-ans2", .data = &apple_nvme_t8103_hw },
{ .compatible = "apple,nvme-ans2", .data = &apple_nvme_t8103_hw },
{},
};
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20251026-nvme-apple-t8103-base-compat-358ba0564f76
Best regards,
--
Janne Grunau <j(a)jannau.net>
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5e49200593f331cd0629b5376fab9192f698e8ef
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010516-unfrosted-serotonin-e7b7@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5e49200593f331cd0629b5376fab9192f698e8ef Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Tue, 23 Sep 2025 17:23:37 +0200
Subject: [PATCH] drm/mediatek: Fix probe memory leak
The Mediatek DRM driver allocates private data for components without a
platform driver but as the lifetime is tied to each component device,
the memory is never freed.
Tie the allocation lifetime to the DRM platform device so that the
memory is released on probe failure (e.g. probe deferral) and when the
driver is unbound.
Fixes: c0d36de868a6 ("drm/mediatek: Move clk info from struct mtk_ddp_comp to sub driver private data")
Cc: stable(a)vger.kernel.org # 5.12
Cc: CK Hu <ck.hu(a)mediatek.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20250923152340.18234-3…
Signed-off-by: Chun-Kuang Hu <chunkuang.hu(a)kernel.org>
diff --git a/drivers/gpu/drm/mediatek/mtk_ddp_comp.c b/drivers/gpu/drm/mediatek/mtk_ddp_comp.c
index 0264017806ad..31d67a131c50 100644
--- a/drivers/gpu/drm/mediatek/mtk_ddp_comp.c
+++ b/drivers/gpu/drm/mediatek/mtk_ddp_comp.c
@@ -671,7 +671,7 @@ int mtk_ddp_comp_init(struct device *dev, struct device_node *node, struct mtk_d
type == MTK_DSI)
return 0;
- priv = devm_kzalloc(comp->dev, sizeof(*priv), GFP_KERNEL);
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5e49200593f331cd0629b5376fab9192f698e8ef
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010514-divisible-liftoff-cf3d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5e49200593f331cd0629b5376fab9192f698e8ef Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Tue, 23 Sep 2025 17:23:37 +0200
Subject: [PATCH] drm/mediatek: Fix probe memory leak
The Mediatek DRM driver allocates private data for components without a
platform driver but as the lifetime is tied to each component device,
the memory is never freed.
Tie the allocation lifetime to the DRM platform device so that the
memory is released on probe failure (e.g. probe deferral) and when the
driver is unbound.
Fixes: c0d36de868a6 ("drm/mediatek: Move clk info from struct mtk_ddp_comp to sub driver private data")
Cc: stable(a)vger.kernel.org # 5.12
Cc: CK Hu <ck.hu(a)mediatek.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20250923152340.18234-3…
Signed-off-by: Chun-Kuang Hu <chunkuang.hu(a)kernel.org>
diff --git a/drivers/gpu/drm/mediatek/mtk_ddp_comp.c b/drivers/gpu/drm/mediatek/mtk_ddp_comp.c
index 0264017806ad..31d67a131c50 100644
--- a/drivers/gpu/drm/mediatek/mtk_ddp_comp.c
+++ b/drivers/gpu/drm/mediatek/mtk_ddp_comp.c
@@ -671,7 +671,7 @@ int mtk_ddp_comp_init(struct device *dev, struct device_node *node, struct mtk_d
type == MTK_DSI)
return 0;
- priv = devm_kzalloc(comp->dev, sizeof(*priv), GFP_KERNEL);
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 5e49200593f331cd0629b5376fab9192f698e8ef
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010513-prance-imagines-5c6a@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5e49200593f331cd0629b5376fab9192f698e8ef Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Tue, 23 Sep 2025 17:23:37 +0200
Subject: [PATCH] drm/mediatek: Fix probe memory leak
The Mediatek DRM driver allocates private data for components without a
platform driver but as the lifetime is tied to each component device,
the memory is never freed.
Tie the allocation lifetime to the DRM platform device so that the
memory is released on probe failure (e.g. probe deferral) and when the
driver is unbound.
Fixes: c0d36de868a6 ("drm/mediatek: Move clk info from struct mtk_ddp_comp to sub driver private data")
Cc: stable(a)vger.kernel.org # 5.12
Cc: CK Hu <ck.hu(a)mediatek.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20250923152340.18234-3…
Signed-off-by: Chun-Kuang Hu <chunkuang.hu(a)kernel.org>
diff --git a/drivers/gpu/drm/mediatek/mtk_ddp_comp.c b/drivers/gpu/drm/mediatek/mtk_ddp_comp.c
index 0264017806ad..31d67a131c50 100644
--- a/drivers/gpu/drm/mediatek/mtk_ddp_comp.c
+++ b/drivers/gpu/drm/mediatek/mtk_ddp_comp.c
@@ -671,7 +671,7 @@ int mtk_ddp_comp_init(struct device *dev, struct device_node *node, struct mtk_d
type == MTK_DSI)
return 0;
- priv = devm_kzalloc(comp->dev, sizeof(*priv), GFP_KERNEL);
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 4fa944255be521b1bbd9780383f77206303a3a5c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010504-compacter-plow-0408@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4fa944255be521b1bbd9780383f77206303a3a5c Mon Sep 17 00:00:00 2001
From: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com>
Date: Tue, 25 Nov 2025 10:48:39 +0100
Subject: [PATCH] drm/amdgpu: add missing lock to amdgpu_ttm_access_memory_sdma
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Users of ttm entities need to hold the gtt_window_lock before using them
to guarantee proper ordering of jobs.
Cc: stable(a)vger.kernel.org
Fixes: cb5cc4f573e1 ("drm/amdgpu: improve debug VRAM access performance using sdma")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 5475f7117f10..1b799f895dbf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1486,6 +1486,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
if (r)
goto out;
+ mutex_lock(&adev->mman.gtt_window_lock);
amdgpu_res_first(abo->tbo.resource, offset, len, &src_mm);
src_addr = amdgpu_ttm_domain_start(adev, bo->resource->mem_type) +
src_mm.start;
@@ -1500,6 +1501,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
WARN_ON(job->ibs[0].length_dw > num_dw);
fence = amdgpu_job_submit(job);
+ mutex_unlock(&adev->mman.gtt_window_lock);
if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
r = -ETIMEDOUT;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 8defb4f081a5feccc3ea8372d0c7af3522124e1f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010550-duke-justly-8832@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8defb4f081a5feccc3ea8372d0c7af3522124e1f Mon Sep 17 00:00:00 2001
From: Natalie Vock <natalie.vock(a)gmx.de>
Date: Mon, 1 Dec 2025 12:52:38 -0500
Subject: [PATCH] drm/amdgpu: Forward VMID reservation errors
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Otherwise userspace may be fooled into believing it has a reserved VMID
when in reality it doesn't, ultimately leading to GPU hangs when SPM is
used.
Fixes: 80e709ee6ecc ("drm/amdgpu: add option params to enforce process isolation between graphics and compute")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Natalie Vock <natalie.vock(a)gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index d7cd84d33018..a67285118c37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2916,8 +2916,7 @@ int amdgpu_vm_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
switch (args->in.op) {
case AMDGPU_VM_OP_RESERVE_VMID:
/* We only have requirement to reserve vmid from gfxhub */
- amdgpu_vmid_alloc_reserved(adev, vm, AMDGPU_GFXHUB(0));
- break;
+ return amdgpu_vmid_alloc_reserved(adev, vm, AMDGPU_GFXHUB(0));
case AMDGPU_VM_OP_UNRESERVE_VMID:
amdgpu_vmid_free_reserved(adev, vm, AMDGPU_GFXHUB(0));
break;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 193d18f60588e95d62e0f82b6a53893e5f2f19f8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010524-editor-spinner-7ecb@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 193d18f60588e95d62e0f82b6a53893e5f2f19f8 Mon Sep 17 00:00:00 2001
From: Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
Date: Mon, 15 Dec 2025 17:11:34 +0200
Subject: [PATCH] wifi: mac80211: Discard Beacon frames to non-broadcast
address
Beacon frames are required to be sent to the broadcast address, see IEEE
Std 802.11-2020, 11.1.3.1 ("The Address 1 field of the Beacon .. frame
shall be set to the broadcast address"). A unicast Beacon frame might be
used as a targeted attack to get one of the associated STAs to do
something (e.g., using CSA to move it to another channel). As such, it
is better have strict filtering for this on the received side and
discard all Beacon frames that are sent to an unexpected address.
This is even more important for cases where beacon protection is used.
The current implementation in mac80211 is correctly discarding unicast
Beacon frames if the Protected Frame bit in the Frame Control field is
set to 0. However, if that bit is set to 1, the logic used for checking
for configured BIGTK(s) does not actually work. If the driver does not
have logic for dropping unicast Beacon frames with Protected Frame bit
1, these frames would be accepted in mac80211 processing as valid Beacon
frames even though they are not protected. This would allow beacon
protection to be bypassed. While the logic for checking beacon
protection could be extended to cover this corner case, a more generic
check for discard all Beacon frames based on A1=unicast address covers
this without needing additional changes.
Address all these issues by dropping received Beacon frames if they are
sent to a non-broadcast address.
Cc: stable(a)vger.kernel.org
Fixes: af2d14b01c32 ("mac80211: Beacon protection using the new BIGTK (STA)")
Signed-off-by: Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
Link: https://patch.msgid.link/20251215151134.104501-1-jouni.malinen@oss.qualcomm…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 6a1899512d07..e0ccd9749853 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -3511,6 +3511,11 @@ ieee80211_rx_h_mgmt_check(struct ieee80211_rx_data *rx)
rx->skb->len < IEEE80211_MIN_ACTION_SIZE)
return RX_DROP_U_RUNT_ACTION;
+ /* Drop non-broadcast Beacon frames */
+ if (ieee80211_is_beacon(mgmt->frame_control) &&
+ !is_broadcast_ether_addr(mgmt->da))
+ return RX_DROP;
+
if (rx->sdata->vif.type == NL80211_IFTYPE_AP &&
ieee80211_is_beacon(mgmt->frame_control) &&
!(rx->flags & IEEE80211_RX_BEACON_REPORTED)) {
From: Richa Bharti <richa.bharti(a)siemens.com>
[ Upstream commit 4b747cc628d8f500d56cf1338280eacc66362ff3 ]
Commit ac4e04d9e378 ("cpufreq: intel_pstate: Unchecked MSR aceess in
legacy mode") introduced a check for feature X86_FEATURE_IDA to verify
turbo mode support. Although this is the correct way to check for turbo
mode support, it causes issues on some platforms that disable turbo
during OS boot, but enable it later [1]. Before adding this feature
check, users were able to get turbo mode frequencies by writing 0 to
/sys/devices/system/cpu/intel_pstate/no_turbo post-boot.
To restore the old behavior on the affected systems while still
addressing the unchecked MSR issue on some Skylake-X systems, check
X86_FEATURE_IDA only immediately before updates of MSR_IA32_PERF_CTL
that may involve setting the Turbo Engage Bit (bit 32).
Fixes: ac4e04d9e378 ("cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode")
Reported-by: Aaron Rainbolt <arainbolt(a)kfocus.org>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2122531 [1]
Tested-by: Aaron Rainbolt <arainbolt(a)kfocus.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
[ rjw: Subject adjustment, changelog edits ]
Link: https://patch.msgid.link/20251111010840.141490-1-srinivas.pandruvada@linux.…
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
[ richa: Backport to 6.12.y with context adjustments ]
Signed-off-by: Richa Bharti <richa.bharti(a)siemens.com>
---
drivers/cpufreq/intel_pstate.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index d0f4f7c2ae4d..9d8cb44c26c7 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -600,9 +600,6 @@ static bool turbo_is_disabled(void)
{
u64 misc_en;
- if (!cpu_feature_enabled(X86_FEATURE_IDA))
- return true;
-
rdmsrl(MSR_IA32_MISC_ENABLE, misc_en);
return !!(misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE);
@@ -2018,7 +2015,8 @@ static u64 atom_get_val(struct cpudata *cpudata, int pstate)
u32 vid;
val = (u64)pstate << 8;
- if (READ_ONCE(global.no_turbo) && !READ_ONCE(global.turbo_disabled))
+ if (READ_ONCE(global.no_turbo) && !READ_ONCE(global.turbo_disabled) &&
+ cpu_feature_enabled(X86_FEATURE_IDA))
val |= (u64)1 << 32;
vid_fp = cpudata->vid.min + mul_fp(
@@ -2183,7 +2181,8 @@ static u64 core_get_val(struct cpudata *cpudata, int pstate)
u64 val;
val = (u64)pstate << 8;
- if (READ_ONCE(global.no_turbo) && !READ_ONCE(global.turbo_disabled))
+ if (READ_ONCE(global.no_turbo) && !READ_ONCE(global.turbo_disabled) &&
+ cpu_feature_enabled(X86_FEATURE_IDA))
val |= (u64)1 << 32;
return val;
--
2.39.5
Set CLK_IGNORE_UNUSED flag to clock adsp audio26m to prevent disabling
this clock during early boot, as turning it off causes other modules
to fail probing and leads to boot failures in ARM SystemReady test cases
and the EFI boot process (on Linux Distributions, for example, debian,
openSuse, etc.). Without this flag, disabling unused clocks cannot
complete properly after adsp_audio26m is turned off.
Fixes: 0d2f2cefba64 ("clk: mediatek: Add MT8188 adsp clock support")
Cc: Stable(a)vger.kernel.org
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
---
drivers/clk/mediatek/clk-mt8188-adsp_audio26m.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/clk/mediatek/clk-mt8188-adsp_audio26m.c b/drivers/clk/mediatek/clk-mt8188-adsp_audio26m.c
index dcde2187d24a..b9fe66ef4f2e 100644
--- a/drivers/clk/mediatek/clk-mt8188-adsp_audio26m.c
+++ b/drivers/clk/mediatek/clk-mt8188-adsp_audio26m.c
@@ -20,8 +20,8 @@ static const struct mtk_gate_regs adsp_audio26m_cg_regs = {
};
#define GATE_ADSP_FLAGS(_id, _name, _parent, _shift) \
- GATE_MTK(_id, _name, _parent, &adsp_audio26m_cg_regs, _shift, \
- &mtk_clk_gate_ops_no_setclr)
+ GATE_MTK_FLAGS(_id, _name, _parent, &adsp_audio26m_cg_regs, _shift, \
+ &mtk_clk_gate_ops_no_setclr, CLK_IGNORE_UNUSED)
static const struct mtk_gate adsp_audio26m_clks[] = {
GATE_ADSP_FLAGS(CLK_AUDIODSP_AUDIO26M, "audiodsp_audio26m", "clk26m", 3),
--
2.45.2
From: Stefan Binding <sbinding(a)opensource.cirrus.com>
[ Upstream commit 826c0b1ed09e5335abcae07292440ce72346e578 ]
Laptops use 2 CS35L41 Amps with HDA, using External boost, with I2C
Signed-off-by: Stefan Binding <sbinding(a)opensource.cirrus.com>
Link: https://patch.msgid.link/20251205150614.49590-3-sbinding@opensource.cirrus.…
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
## Commit Analysis: ALSA: hda/realtek: Add support for ASUS UM3406GA
### 1. COMMIT MESSAGE ANALYSIS
The commit message is straightforward: it adds support for a specific
ASUS laptop model (UM3406GA) that uses 2 CS35L41 amplifiers connected
via I2C with external boost. The message describes the hardware
configuration, which is standard for such device ID additions.
No Fixes: or Cc: stable tags are present, but as noted, this is expected
for commits requiring manual review.
### 2. CODE CHANGE ANALYSIS
The entire change is a single line addition:
```c
SND_PCI_QUIRK(0x1043, 0x1584, "ASUS UM3406GA ",
ALC287_FIXUP_CS35L41_I2C_2),
```
This adds:
- Vendor ID: 0x1043 (ASUS)
- Device/Subsystem ID: 0x1584 (ASUS UM3406GA)
- Fixup: `ALC287_FIXUP_CS35L41_I2C_2` (an **existing** fixup already
used by many other ASUS models)
Looking at the surrounding code, multiple other ASUS laptops use the
same fixup:
- ASUS PM3406CKA (0x1454)
- ASUS G513PI/PU/PV (0x14e3)
- ASUS G733PY/PZ/PZV/PYV (0x1503)
- ASUS GV302XA/XJ/XQ/XU/XV/XI (0x1533)
- ASUS UM3402YAR (0x1683)
### 3. CLASSIFICATION
This is a **NEW DEVICE ID** addition - explicitly listed as an exception
that IS appropriate for stable backporting. The driver infrastructure
and fixup code already exist; this merely adds an ID to enable the
existing fix for new hardware.
### 4. SCOPE AND RISK ASSESSMENT
- **Lines changed**: 1 line
- **Files touched**: 1 file
- **Complexity**: Zero - table entry addition only
- **Risk**: Extremely low - this cannot affect any other hardware
- **No new code paths**: Uses pre-existing `ALC287_FIXUP_CS35L41_I2C_2`
fixup
### 5. USER IMPACT
- **Who is affected**: Owners of ASUS UM3406GA laptops
- **Problem without fix**: Audio (specifically the CS35L41 amplifiers)
won't function properly
- **Severity**: Non-working audio is a significant user-facing issue for
laptop users
### 6. STABILITY INDICATORS
- Signed-off by Takashi Iwai (ALSA maintainer at SUSE)
- Standard quirk addition pattern used extensively throughout this file
- Follows exact same format as dozens of other ASUS quirk entries
### 7. DEPENDENCY CHECK
- The fixup `ALC287_FIXUP_CS35L41_I2C_2` has been in the kernel for some
time, supporting multiple other ASUS models
- No other commits are required for this to work
- This should apply cleanly to stable trees that have the CS35L41
support infrastructure
### CONCLUSION
This commit is a textbook example of what should be backported to stable
trees:
1. **Falls under explicit exception**: Adding device IDs to existing
drivers is explicitly allowed
2. **Minimal change**: Single line, single table entry
3. **Zero regression risk**: Cannot affect any hardware except the
targeted laptop
4. **Uses existing infrastructure**: The fixup is already well-tested on
similar ASUS models
5. **Fixes real user problem**: Enables audio on a production laptop
6. **Obviously correct**: Identical pattern to surrounding entries
The risk-benefit analysis strongly favors backporting: virtually zero
risk with clear user benefit (working audio on a specific laptop model).
**YES**
sound/hda/codecs/realtek/alc269.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/hda/codecs/realtek/alc269.c b/sound/hda/codecs/realtek/alc269.c
index b45fcc9a3785e..008bf9d5148e1 100644
--- a/sound/hda/codecs/realtek/alc269.c
+++ b/sound/hda/codecs/realtek/alc269.c
@@ -6752,6 +6752,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x1043, 0x1517, "Asus Zenbook UX31A", ALC269VB_FIXUP_ASUS_ZENBOOK_UX31A),
SND_PCI_QUIRK(0x1043, 0x1533, "ASUS GV302XA/XJ/XQ/XU/XV/XI", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x1043, 0x1573, "ASUS GZ301VV/VQ/VU/VJ/VA/VC/VE/VVC/VQC/VUC/VJC/VEC/VCC", ALC285_FIXUP_ASUS_HEADSET_MIC),
+ SND_PCI_QUIRK(0x1043, 0x1584, "ASUS UM3406GA ", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x1043, 0x1652, "ASUS ROG Zephyrus Do 15 SE", ALC289_FIXUP_ASUS_ZEPHYRUS_DUAL_SPK),
SND_PCI_QUIRK(0x1043, 0x1662, "ASUS GV301QH", ALC294_FIXUP_ASUS_DUAL_SPK),
SND_PCI_QUIRK(0x1043, 0x1663, "ASUS GU603ZI/ZJ/ZQ/ZU/ZV", ALC285_FIXUP_ASUS_HEADSET_MIC),
--
2.51.0
intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been
initialized, and dmc is thus NULL.
That would be the case when the call path is
intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() ->
gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as
intel_power_domains_init_hw() is called *before* intel_dmc_init().
However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count()
conditionally, depending on the current and target DC states. At probe,
the target is disabled, but if DC6 is enabled, the function is called,
and an oops follows. Apparently it's quite unlikely that DC6 is enabled
at probe, as we haven't seen this failure mode before.
Add NULL checks and switch the dmc->display references to just display.
Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter")
Cc: Mohammed Thasleem <mohammed.thasleem(a)intel.com>
Cc: Imre Deak <imre.deak(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.16+
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
---
Rare case, but this may also throw off the rc6 counting in debugfs when
it does happen.
---
drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c b/drivers/gpu/drm/i915/display/intel_dmc.c
index 2fb6fec6dc99..169bbbc91f6d 100644
--- a/drivers/gpu/drm/i915/display/intel_dmc.c
+++ b/drivers/gpu/drm/i915/display/intel_dmc.c
@@ -1570,10 +1570,10 @@ void intel_dmc_update_dc6_allowed_count(struct intel_display *display,
struct intel_dmc *dmc = display_to_dmc(display);
u32 dc5_cur_count;
- if (DISPLAY_VER(dmc->display) < 14)
+ if (!dmc || DISPLAY_VER(display) < 14)
return;
- dc5_cur_count = intel_de_read(dmc->display, DG1_DMC_DEBUG_DC5_COUNT);
+ dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT);
if (!start_tracking)
dmc->dc6_allowed.count += dc5_cur_count - dmc->dc6_allowed.dc5_start;
@@ -1587,7 +1587,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 *
struct intel_dmc *dmc = display_to_dmc(display);
bool dc6_enabled;
- if (DISPLAY_VER(display) < 14)
+ if (!dmc || DISPLAY_VER(display) < 14)
return false;
mutex_lock(&power_domains->lock);
--
2.47.3
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 8defb4f081a5feccc3ea8372d0c7af3522124e1f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010549-ensnare-embassy-d32e@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8defb4f081a5feccc3ea8372d0c7af3522124e1f Mon Sep 17 00:00:00 2001
From: Natalie Vock <natalie.vock(a)gmx.de>
Date: Mon, 1 Dec 2025 12:52:38 -0500
Subject: [PATCH] drm/amdgpu: Forward VMID reservation errors
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Otherwise userspace may be fooled into believing it has a reserved VMID
when in reality it doesn't, ultimately leading to GPU hangs when SPM is
used.
Fixes: 80e709ee6ecc ("drm/amdgpu: add option params to enforce process isolation between graphics and compute")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Natalie Vock <natalie.vock(a)gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index d7cd84d33018..a67285118c37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2916,8 +2916,7 @@ int amdgpu_vm_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
switch (args->in.op) {
case AMDGPU_VM_OP_RESERVE_VMID:
/* We only have requirement to reserve vmid from gfxhub */
- amdgpu_vmid_alloc_reserved(adev, vm, AMDGPU_GFXHUB(0));
- break;
+ return amdgpu_vmid_alloc_reserved(adev, vm, AMDGPU_GFXHUB(0));
case AMDGPU_VM_OP_UNRESERVE_VMID:
amdgpu_vmid_free_reserved(adev, vm, AMDGPU_GFXHUB(0));
break;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 193d18f60588e95d62e0f82b6a53893e5f2f19f8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010523-fifty-navigator-ec96@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 193d18f60588e95d62e0f82b6a53893e5f2f19f8 Mon Sep 17 00:00:00 2001
From: Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
Date: Mon, 15 Dec 2025 17:11:34 +0200
Subject: [PATCH] wifi: mac80211: Discard Beacon frames to non-broadcast
address
Beacon frames are required to be sent to the broadcast address, see IEEE
Std 802.11-2020, 11.1.3.1 ("The Address 1 field of the Beacon .. frame
shall be set to the broadcast address"). A unicast Beacon frame might be
used as a targeted attack to get one of the associated STAs to do
something (e.g., using CSA to move it to another channel). As such, it
is better have strict filtering for this on the received side and
discard all Beacon frames that are sent to an unexpected address.
This is even more important for cases where beacon protection is used.
The current implementation in mac80211 is correctly discarding unicast
Beacon frames if the Protected Frame bit in the Frame Control field is
set to 0. However, if that bit is set to 1, the logic used for checking
for configured BIGTK(s) does not actually work. If the driver does not
have logic for dropping unicast Beacon frames with Protected Frame bit
1, these frames would be accepted in mac80211 processing as valid Beacon
frames even though they are not protected. This would allow beacon
protection to be bypassed. While the logic for checking beacon
protection could be extended to cover this corner case, a more generic
check for discard all Beacon frames based on A1=unicast address covers
this without needing additional changes.
Address all these issues by dropping received Beacon frames if they are
sent to a non-broadcast address.
Cc: stable(a)vger.kernel.org
Fixes: af2d14b01c32 ("mac80211: Beacon protection using the new BIGTK (STA)")
Signed-off-by: Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
Link: https://patch.msgid.link/20251215151134.104501-1-jouni.malinen@oss.qualcomm…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 6a1899512d07..e0ccd9749853 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -3511,6 +3511,11 @@ ieee80211_rx_h_mgmt_check(struct ieee80211_rx_data *rx)
rx->skb->len < IEEE80211_MIN_ACTION_SIZE)
return RX_DROP_U_RUNT_ACTION;
+ /* Drop non-broadcast Beacon frames */
+ if (ieee80211_is_beacon(mgmt->frame_control) &&
+ !is_broadcast_ether_addr(mgmt->da))
+ return RX_DROP;
+
if (rx->sdata->vif.type == NL80211_IFTYPE_AP &&
ieee80211_is_beacon(mgmt->frame_control) &&
!(rx->flags & IEEE80211_RX_BEACON_REPORTED)) {
The 'clockid' field is not the correct way to check for a softirq base.
Fix the check to correctly compare the base type instead of the clockid.
Fixes: 1e7f7fbcd40c ("hrtimer: Avoid more SMP function calls in clock_was_set()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
kernel/time/hrtimer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index f8ea8c8fc895..d0c59fc0beb4 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -913,7 +913,7 @@ static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base,
return true;
/* Extra check for softirq clock bases */
- if (base->clockid < HRTIMER_BASE_MONOTONIC_SOFT)
+ if (base->index < HRTIMER_BASE_MONOTONIC_SOFT)
continue;
if (cpu_base->softirq_activated)
continue;
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20260105-hrtimer-clock-base-check-b91d586b70f7
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
[ View in browser](https://7m8ue.r.ag.d.sendibm3.com/mk/mr/sh/7nVTPdZCTJDXOxg5wYo2dj3…
مجلة روح للعلوم الإنسانية
=========================
مجلة علمية ، محكمة، فصلية ، مفتوحة الوصول ، تصدر عن مركز فكر للدراسات والتطوير
**مجلة روح** هي مساحة فكرية تُعيد للعلوم الإنسانية مكانتها بوصفها **روح المعرفة وجسر الفهم بين الإنسان وذاته ومحيطه**.
#### السادة أعضاء الهيئات التدريسية، الباحثون الكرام
السادة أعضاء الهيئات التدريسية،
الباحثون والباحثات الأفاضل،
**تحية طيبة وبعد،**
في عالمٍ تتزايد فيه المنافسة على النشر العلمي، تبرز **مجلة روح للعلوم الإنسانية**
** (RUH Journal of Humanities)** كمنصة أكاديمية واعية لا تبحث عن الكمّ، بل تصنع **قيمة معرفية قابلة للتأثير والاستشهاد**.
يسرّنا دعوتكم لتقديم أبحاثكم الأصيلة للنشر في الأعداد القادمة من مجلة روح — مجلة علمية **محكّمة، فصلية، مفتوحة الوصول** (ISSN: 3105-2436)، تُعنى بتقديم البحث الإنساني في أعلى صوره المنهجية واللغوية.
**لماذا يُعدّ النشر في مجلة روح استثمارًا أكاديميًا ذكيًا؟**
**🔹**** ****تحكيم يصنع الفارق**
تحكيم علمي مزدوج، مهني وشفاف، يركّز على الأصالة، المتانة المنهجية، وقابلية الإضافة الحقيقية للمعرفة، لا على الشكل فقط.
**🔹**** ****انتشار بلا حدود**** (Open Access)**
وصول مفتوح يضمن وصول أبحاثكم إلى القرّاء والباحثين دون قيود، ما يرفع فرص الاقتباس والتأثير العلمي.
**🔹**** ****فهرسة دولية واسعة وانتشار عالمي**
المجلة **مفهرسة دوليًا في أكثر من 40 قاعدة بيانات**، ضمن شبكات فهرسة متعدّدة تدعم ظهور الأبحاث في محركات البحث الأكاديمية وتزيد قابلية الوصول والاستشهاد.
**🔹**** ****معامل تأثير قوي وحضور أكاديمي متنامٍ**
تتبنّى المجلة سياسة جودة صارمة، وتعمل ضمن **منظومة فهرسة ومعايير تقييم** تدعم تعزيز الحضور العلمي وتحقيق **معامل تأثير قوي** ينعكس على قيمة النشر لدى الباحث.
**🔹**** ****تعدّد لغوي = جمهور أوسع**
النشر بثلاث لغات (**العربية – الإنجليزية – التركية**) يوسّع دائرة التفاعل العلمي ويضع أبحاثكم في فضاء دولي متعدد الثقافات.
**🔹**** ****موثوقية أكاديمية موثّقة**
رقم معياري دولي **ISSN: 3105-2436**، وهوية تحريرية واضحة، وانتظام في الإصدار يعزّز قوة السجل البحثي للباحث.
**🔹**** ****تجربة نشر احترافية**
بوابة إلكترونية حديثة لتقديم الأبحاث ومتابعتها، وإخراج فني دقيق يليق بالجهد العلمي المبذول.
**🔹**** ****رؤية تتجاوز النشر**
مجلة روح ليست مجرد وعاء نشر، بل **جسر معرفي** يربط الباحثين، ويشجّع الدراسات البينية، ويعزّز حضور البحث الإنساني في النقاشات الأكاديمية المعاصرة.
**مجالات النشر**
الدراسات الإسلامية، الفلسفة، الآداب واللغات، التاريخ والجغرافيا، علم النفس، الفنون، علم المكتبات والمعلومات، والدراسات الإنسانية المتقاطعة.
**رسوم النشر**
**50 ****دولارًا أمريكيًا فقط** تشمل التحكيم العلمي والإخراج الفني، في إطار سياسة رسوم عادلة تراعي الباحثين دون المساس بالجودة.
📄 **قدّم بحثك الآن**
[https://7m8ue.r.ag.d.sendibm3.com/mk/cl/f/sh/7nVU1aA2nfsTSeWOzmWNvJm5IIiZEZ…
🌐 **شروط النشر والمزيد عن المجلة**
[https://7m8ue.r.ag.d.sendibm3.com/mk/cl/f/sh/7nVU1aA2nfuMSB1hyLkpxg5tSfDLyV…
📧 info(a)ruh-journal.org
📲 **للاستفسار السريع عبر واتساب**
[https://7m8ue.r.ag.d.sendibm3.com/mk/cl/f/sh/7nVU1aA2nfwFRhX0wuzI02Phd1i8iS…
انشر بحثك حيث يُقرأ، ويُناقش، ويُستشهد به.
**مجلة روح للعلوم الإنسانية… حيث تتحوّل الأفكار الجادّة إلى معرفة مؤثّرة****.**
وتفضلوا بقبول فائق الاحترام،
**هيئة تحرير مجلة روح للعلوم الإنسانية**
[ Read the whole story](https://7m8ue.r.ag.d.sendibm3.com/mk/cl/f/sh/7nVU1aA2nfy8RE2JvUDk2Oj…
**fiker for research and development**
يمكن متابعة المجلة من خلال الاتصال الموضحة.
الموقع: [https://ruh-journal.org](https://ruh-journal.org) || البريد الإلكتروني: info(a)ruh-journal.org
هاتف: 00905398717003
You've received it because you've subscribed to our newsletter.
[Unsubscribe](https://7m8ue.r.ag.d.sendibm3.com/mk/un/v2/sh/7nVTPdbLJ2bPbEmD…
From: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
The driver updates struct sci_port::tx_cookie to zero right before the TX
work is scheduled, or to -EINVAL when DMA is disabled.
dma_async_is_complete(), called through dma_cookie_status() (and possibly
through dmaengine_tx_status()), considers cookies valid only if they have
values greater than or equal to 1.
Passing zero or -EINVAL to dmaengine_tx_status() before any TX DMA
transfer has started leads to an incorrect TX status being reported, as the
cookie is invalid for the DMA subsystem. This may cause long wait times
when the serial device is opened for configuration before any TX activity
has occurred.
Check that the TX cookie is valid before passing it to
dmaengine_tx_status().
Fixes: 7cc0e0a43a91 ("serial: sh-sci: Check if TX data was written to device in .tx_empty()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
---
drivers/tty/serial/sh-sci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 53edbf1d8963..fbfe5575bd3c 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -1914,7 +1914,7 @@ static void sci_dma_check_tx_occurred(struct sci_port *s)
struct dma_tx_state state;
enum dma_status status;
- if (!s->chan_tx)
+ if (!s->chan_tx || s->cookie_tx <= 0)
return;
status = dmaengine_tx_status(s->chan_tx, s->cookie_tx, &state);
--
2.43.0
The driver_override_show() function reads the driver_override string
without holding the device_lock. However, driver_override_store() uses
driver_set_override(), which modifies and frees the string while holding
the device_lock.
This can result in a concurrent use-after-free if the string is freed
by the store function while being read by the show function.
Fix this by holding the device_lock around the read operation.
Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02(a)gmail.com>
---
I verified this with a stress test that continuously writes/reads the
attribute. It triggered KASAN and leaked bytes like a0 f4 81 9f a3 ff ff
(likely kernel pointers). Since driver_override is world-readable (0644),
this allows unprivileged users to leak kernel pointers and bypass KASLR.
Similar races were fixed in other buses (e.g., commits 9561475db680 and
91d44c1afc61). Currently, 9 of 11 buses handle this correctly; this patch
fixes one of the remaining two.
---
drivers/bus/fsl-mc/fsl-mc-bus.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
index 25845c04e562..a97baf2cbcdd 100644
--- a/drivers/bus/fsl-mc/fsl-mc-bus.c
+++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
@@ -202,8 +202,12 @@ static ssize_t driver_override_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev);
+ ssize_t len;
- return sysfs_emit(buf, "%s\n", mc_dev->driver_override);
+ device_lock(dev);
+ len = sysfs_emit(buf, "%s\n", mc_dev->driver_override);
+ device_unlock(dev);
+ return len;
}
static DEVICE_ATTR_RW(driver_override);
--
2.43.0
The error path of xfs_attr_leaf_hasname() can leave a NULL
xfs_buf pointer. xfs_has_attr() checks for the NULL pointer but
the other callers do not.
We tripped over the NULL pointer in xfs_attr_leaf_get() but fix
the other callers too.
Fixes v5.8-rc4-95-g07120f1abdff ("xfs: Add xfs_has_attr and subroutines")
No reproducer.
Cc: stable(a)vger.kernel.org # v5.19+ with another port for v5.9 - v5.18
Signed-off-by: Mark Tinguely <mark.tinguely(a)oracle.com>
---
fs/xfs/libxfs/xfs_attr.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 8c04acd30d48..25e2ecf20d14 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1266,7 +1266,8 @@ xfs_attr_leaf_removename(
error = xfs_attr_leaf_hasname(args, &bp);
if (error == -ENOATTR) {
- xfs_trans_brelse(args->trans, bp);
+ if (bp)
+ xfs_trans_brelse(args->trans, bp);
if (args->op_flags & XFS_DA_OP_RECOVERY)
return 0;
return error;
@@ -1305,7 +1306,8 @@ xfs_attr_leaf_get(xfs_da_args_t *args)
error = xfs_attr_leaf_hasname(args, &bp);
if (error == -ENOATTR) {
- xfs_trans_brelse(args->trans, bp);
+ if (bp)
+ xfs_trans_brelse(args->trans, bp);
return error;
} else if (error != -EEXIST)
return error;
--
2.50.1 (Apple Git-155)
On 32-bit ARM, you may encounter linker errors such as this one:
ld.lld: error: undefined symbol: _find_next_zero_bit
>>> referenced by rust_binder_main.43196037ba7bcee1-cgu.0
>>> drivers/android/binder/rust_binder_main.o:(<rust_binder_main::process::Process>::insert_or_update_handle) in archive vmlinux.a
>>> referenced by rust_binder_main.43196037ba7bcee1-cgu.0
>>> drivers/android/binder/rust_binder_main.o:(<rust_binder_main::process::Process>::insert_or_update_handle) in archive vmlinux.a
This error occurs because even though the functions are declared by
include/linux/find.h, the definition is #ifdef'd out on 32-bit ARM. This
is because arch/arm/include/asm/bitops.h contains:
#define find_first_zero_bit(p,sz) _find_first_zero_bit_le(p,sz)
#define find_next_zero_bit(p,sz,off) _find_next_zero_bit_le(p,sz,off)
#define find_first_bit(p,sz) _find_first_bit_le(p,sz)
#define find_next_bit(p,sz,off) _find_next_bit_le(p,sz,off)
And the underscore-prefixed function is conditional on #ifndef of the
non-underscore-prefixed name, but the declaration in find.h is *not*
conditional on that #ifndef.
To fix the linker error, we ensure that the symbols in question exist
when compiling Rust code. We do this by definining them in rust/helpers/
whenever the normal definition is #ifndef'd out.
Note that these helpers are somewhat unusual in that they do not have
the rust_helper_ prefix that most helpers have. Adding the rust_helper_
prefix does not compile, as 'bindings::_find_next_zero_bit()' will
result in a call to a symbol called _find_next_zero_bit as defined by
include/linux/find.h rather than a symbol with the rust_helper_ prefix.
This is because when a symbol is present in both include/ and
rust/helpers/, the one from include/ wins under the assumption that the
current configuration is one where that helper is unnecessary. This
heuristic fails for _find_next_zero_bit() because the header file always
declares it even if the symbol does not exist.
The functions still use the __rust_helper annotation. This lets the
wrapper function be inlined into Rust code even if full kernel LTO is
not used once the patch series for that feature lands.
Cc: stable(a)vger.kernel.org
Fixes: 6cf93a9ed39e ("rust: add bindings for bitops.h")
Reported-by: Andreas Hindborg <a.hindborg(a)kernel.org>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/x/topic/x/near/5616773…
Signed-off-by: Alice Ryhl <aliceryhl(a)google.com>
---
Changes in v2:
- Remove rust_helper_ prefix from helpers.
- Improve commit message.
- The set of functions for which a helper is added is changed so that it
matches arch/arm/include/asm/bitops.h
- Link to v1: https://lore.kernel.org/r/20251203-bitops-find-helper-v1-1-5193deb57766@goo…
---
rust/helpers/bitops.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/rust/helpers/bitops.c b/rust/helpers/bitops.c
index 5d0861d29d3f0d705a014ae4601685828405f33b..e79ef9e6d98f969e2a0a2a6f62d9fcec3ef0fd72 100644
--- a/rust/helpers/bitops.c
+++ b/rust/helpers/bitops.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bitops.h>
+#include <linux/find.h>
void rust_helper___set_bit(unsigned long nr, unsigned long *addr)
{
@@ -21,3 +22,44 @@ void rust_helper_clear_bit(unsigned long nr, volatile unsigned long *addr)
{
clear_bit(nr, addr);
}
+
+/*
+ * The rust_helper_ prefix is intentionally omitted below so that the
+ * declarations in include/linux/find.h are compatible with these helpers.
+ *
+ * Note that the below #ifdefs mean that the helper is only created if C does
+ * not provide a definition.
+ */
+#ifdef find_first_zero_bit
+__rust_helper
+unsigned long _find_first_zero_bit(const unsigned long *p, unsigned long size)
+{
+ return find_first_zero_bit(p, size);
+}
+#endif /* find_first_zero_bit */
+
+#ifdef find_next_zero_bit
+__rust_helper
+unsigned long _find_next_zero_bit(const unsigned long *addr,
+ unsigned long size, unsigned long offset)
+{
+ return find_next_zero_bit(addr, size, offset);
+}
+#endif /* find_next_zero_bit */
+
+#ifdef find_first_bit
+__rust_helper
+unsigned long _find_first_bit(const unsigned long *addr, unsigned long size)
+{
+ return find_first_bit(addr, size);
+}
+#endif /* find_first_bit */
+
+#ifdef find_next_bit
+__rust_helper
+unsigned long _find_next_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset)
+{
+ return find_next_bit(addr, size, offset);
+}
+#endif /* find_next_bit */
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20251203-bitops-find-helper-25ed1bbae700
Best regards,
--
Alice Ryhl <aliceryhl(a)google.com>
V1 -> V2:
- Because `pmd_val` variable broke ppc builds due to its name,
renamed it to `_pmd`. see [1].
[1] https://lore.kernel.org/stable/aS7lPZPYuChOTdXU@hyeyoo
- Added David Hildenbrand's Acked-by [2], thanks a lot!
[2] https://lore.kernel.org/linux-mm/ac8d7137-3819-4a75-9dd3-fb3d2259ebe4@kerne…
# TL;DR
previous discussion: https://lore.kernel.org/linux-mm/20250921232709.1608699-1-harry.yoo@oracle.…
A "bad pmd" error occurs due to race condition between
change_prot_numa() and THP migration. The mainline kernel does not have
this bug as commit 670ddd8cdc fixes the race condition. 6.1.y, 5.15.y,
5.10.y, 5.4.y are affected by this bug.
Fixing this in -stable kernels is tricky because pte_map_offset_lock()
has different semantics in pre-6.5 and post-6.5 kernels. I am trying to
backport the same mechanism we have in the mainline kernel.
Since the code looks bit different due to different semantics of
pte_map_offset_lock(), it'd be best to get this reviewed by MM folks.
# Testing
I verified that the bug described below is not reproduced anymore
(on a downstream kernel) after applying this patch series. It used to
trigger in few days of intensive numa balancing testing, but it survived
2 weeks with this applied.
# Bug Description
It was reported that a bad pmd is seen when automatic NUMA
balancing is marking page table entries as prot_numa:
[2437548.196018] mm/pgtable-generic.c:50: bad pmd 00000000af22fc02(dffffffe71fbfe02)
[2437548.235022] Call Trace:
[2437548.238234] <TASK>
[2437548.241060] dump_stack_lvl+0x46/0x61
[2437548.245689] panic+0x106/0x2e5
[2437548.249497] pmd_clear_bad+0x3c/0x3c
[2437548.253967] change_pmd_range.isra.0+0x34d/0x3a7
[2437548.259537] change_p4d_range+0x156/0x20e
[2437548.264392] change_protection_range+0x116/0x1a9
[2437548.269976] change_prot_numa+0x15/0x37
[2437548.274774] task_numa_work+0x1b8/0x302
[2437548.279512] task_work_run+0x62/0x95
[2437548.283882] exit_to_user_mode_loop+0x1a4/0x1a9
[2437548.289277] exit_to_user_mode_prepare+0xf4/0xfc
[2437548.294751] ? sysvec_apic_timer_interrupt+0x34/0x81
[2437548.300677] irqentry_exit_to_user_mode+0x5/0x25
[2437548.306153] asm_sysvec_apic_timer_interrupt+0x16/0x1b
This is due to a race condition between change_prot_numa() and
THP migration because the kernel doesn't check is_swap_pmd() and
pmd_trans_huge() atomically:
change_prot_numa() THP migration
======================================================================
- change_pmd_range()
-> is_swap_pmd() returns false,
meaning it's not a PMD migration
entry.
- do_huge_pmd_numa_page()
-> migrate_misplaced_page() sets
migration entries for the THP.
- change_pmd_range()
-> pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_none() and pmd_trans_huge() returns false
- pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_bad() returns true for the migration entry!
The upstream commit 670ddd8cdcbd ("mm/mprotect: delete
pmd_none_or_clear_bad_unless_trans_huge()") closes this race condition
by checking is_swap_pmd() and pmd_trans_huge() atomically.
# Backporting note
commit a79390f5d6a7 ("mm/mprotect: use long for page accountings and retval")
is backported to return an error code (negative value) in
change_pte_range().
Unlike the mainline, pte_offset_map_lock() does not check if the pmd
entry is a migration entry or a hugepage; acquires PTL unconditionally
instead of returning failure. Therefore, it is necessary to keep the
!is_swap_pmd() && !pmd_trans_huge() && !pmd_devmap() checks in
change_pmd_range() before acquiring the PTL.
After acquiring the lock, open-code the semantics of
pte_offset_map_lock() in the mainline kernel; change_pte_range() fails
if the pmd value has changed. This requires adding pmd_old parameter
(pmd_t value that is read before calling the function) to
change_pte_range().
Hugh Dickins (1):
mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge()
Peter Xu (1):
mm/mprotect: use long for page accountings and retval
include/linux/hugetlb.h | 4 +-
include/linux/mm.h | 2 +-
mm/hugetlb.c | 4 +-
mm/mempolicy.c | 2 +-
mm/mprotect.c | 121 ++++++++++++++++++----------------------
5 files changed, 59 insertions(+), 74 deletions(-)
--
2.43.0
V1 -> V2:
- Because `pmd_val` variable broke ppc builds due to its name,
renamed it to `_pmd`. see [1].
[1] https://lore.kernel.org/stable/aS7lPZPYuChOTdXU@hyeyoo
- Added David Hildenbrand's Acked-by [2], thanks a lot!
[2] https://lore.kernel.org/linux-mm/ac8d7137-3819-4a75-9dd3-fb3d2259ebe4@kerne…
# TL;DR
previous discussion: https://lore.kernel.org/linux-mm/20250921232709.1608699-1-harry.yoo@oracle.…
A "bad pmd" error occurs due to race condition between
change_prot_numa() and THP migration. The mainline kernel does not have
this bug as commit 670ddd8cdc fixes the race condition. 6.1.y, 5.15.y,
5.10.y, 5.4.y are affected by this bug.
Fixing this in -stable kernels is tricky because pte_map_offset_lock()
has different semantics in pre-6.5 and post-6.5 kernels. I am trying to
backport the same mechanism we have in the mainline kernel.
Since the code looks bit different due to different semantics of
pte_map_offset_lock(), it'd be best to get this reviewed by MM folks.
# Testing
I verified that the bug described below is not reproduced anymore
(on a downstream kernel) after applying this patch series. It used to
trigger in few days of intensive numa balancing testing, but it survived
2 weeks with this applied.
# Bug Description
It was reported that a bad pmd is seen when automatic NUMA
balancing is marking page table entries as prot_numa:
[2437548.196018] mm/pgtable-generic.c:50: bad pmd 00000000af22fc02(dffffffe71fbfe02)
[2437548.235022] Call Trace:
[2437548.238234] <TASK>
[2437548.241060] dump_stack_lvl+0x46/0x61
[2437548.245689] panic+0x106/0x2e5
[2437548.249497] pmd_clear_bad+0x3c/0x3c
[2437548.253967] change_pmd_range.isra.0+0x34d/0x3a7
[2437548.259537] change_p4d_range+0x156/0x20e
[2437548.264392] change_protection_range+0x116/0x1a9
[2437548.269976] change_prot_numa+0x15/0x37
[2437548.274774] task_numa_work+0x1b8/0x302
[2437548.279512] task_work_run+0x62/0x95
[2437548.283882] exit_to_user_mode_loop+0x1a4/0x1a9
[2437548.289277] exit_to_user_mode_prepare+0xf4/0xfc
[2437548.294751] ? sysvec_apic_timer_interrupt+0x34/0x81
[2437548.300677] irqentry_exit_to_user_mode+0x5/0x25
[2437548.306153] asm_sysvec_apic_timer_interrupt+0x16/0x1b
This is due to a race condition between change_prot_numa() and
THP migration because the kernel doesn't check is_swap_pmd() and
pmd_trans_huge() atomically:
change_prot_numa() THP migration
======================================================================
- change_pmd_range()
-> is_swap_pmd() returns false,
meaning it's not a PMD migration
entry.
- do_huge_pmd_numa_page()
-> migrate_misplaced_page() sets
migration entries for the THP.
- change_pmd_range()
-> pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_none() and pmd_trans_huge() returns false
- pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_bad() returns true for the migration entry!
The upstream commit 670ddd8cdcbd ("mm/mprotect: delete
pmd_none_or_clear_bad_unless_trans_huge()") closes this race condition
by checking is_swap_pmd() and pmd_trans_huge() atomically.
# Backporting note
commit a79390f5d6a7 ("mm/mprotect: use long for page accountings and retval")
is backported to return an error code (negative value) in
change_pte_range().
Unlike the mainline, pte_offset_map_lock() does not check if the pmd
entry is a migration entry or a hugepage; acquires PTL unconditionally
instead of returning failure. Therefore, it is necessary to keep the
!is_swap_pmd() && !pmd_trans_huge() && !pmd_devmap() checks in
change_pmd_range() before acquiring the PTL.
After acquiring the lock, open-code the semantics of
pte_offset_map_lock() in the mainline kernel; change_pte_range() fails
if the pmd value has changed. This requires adding pmd_old parameter
(pmd_t value that is read before calling the function) to
change_pte_range().
Hugh Dickins (1):
mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge()
Peter Xu (1):
mm/mprotect: use long for page accountings and retval
include/linux/hugetlb.h | 4 +-
include/linux/mm.h | 2 +-
mm/hugetlb.c | 4 +-
mm/mempolicy.c | 2 +-
mm/mprotect.c | 107 ++++++++++++++++++++++------------------
5 files changed, 64 insertions(+), 55 deletions(-)
--
2.43.0
When waiting for the PCIe link to come up, both link up and link down
are valid results depending on the device state.
Since the link may come up later and to get rid of the following
mis-reported PM errors. Do not return an -ETIMEDOUT error, as the
outcome has already been reported in dw_pcie_wait_for_link().
PM error logs introduced by the -ETIMEDOUT error return.
imx6q-pcie 33800000.pcie: Phy link never came up
imx6q-pcie 33800000.pcie: PM: dpm_run_callback(): genpd_resume_noirq returns -110
imx6q-pcie 33800000.pcie: PM: failed to resume noirq: error -110
Cc: stable(a)vger.kernel.org
Fixes: 4774faf854f5 ("PCI: dwc: Implement generic suspend/resume functionality")
Signed-off-by: Richard Zhu <hongxing.zhu(a)nxp.com>
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
---
drivers/pci/controller/dwc/pcie-designware-host.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 06cbfd9e1f1e..025e11ebd571 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -1245,10 +1245,9 @@ int dw_pcie_resume_noirq(struct dw_pcie *pci)
if (ret)
return ret;
- ret = dw_pcie_wait_for_link(pci);
- if (ret)
- return ret;
+ /* Ignore errors, the link may come up later */
+ dw_pcie_wait_for_link(pci);
- return ret;
+ return 0;
}
EXPORT_SYMBOL_GPL(dw_pcie_resume_noirq);
--
2.37.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 193d18f60588e95d62e0f82b6a53893e5f2f19f8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010523-dice-pyramid-9839@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 193d18f60588e95d62e0f82b6a53893e5f2f19f8 Mon Sep 17 00:00:00 2001
From: Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
Date: Mon, 15 Dec 2025 17:11:34 +0200
Subject: [PATCH] wifi: mac80211: Discard Beacon frames to non-broadcast
address
Beacon frames are required to be sent to the broadcast address, see IEEE
Std 802.11-2020, 11.1.3.1 ("The Address 1 field of the Beacon .. frame
shall be set to the broadcast address"). A unicast Beacon frame might be
used as a targeted attack to get one of the associated STAs to do
something (e.g., using CSA to move it to another channel). As such, it
is better have strict filtering for this on the received side and
discard all Beacon frames that are sent to an unexpected address.
This is even more important for cases where beacon protection is used.
The current implementation in mac80211 is correctly discarding unicast
Beacon frames if the Protected Frame bit in the Frame Control field is
set to 0. However, if that bit is set to 1, the logic used for checking
for configured BIGTK(s) does not actually work. If the driver does not
have logic for dropping unicast Beacon frames with Protected Frame bit
1, these frames would be accepted in mac80211 processing as valid Beacon
frames even though they are not protected. This would allow beacon
protection to be bypassed. While the logic for checking beacon
protection could be extended to cover this corner case, a more generic
check for discard all Beacon frames based on A1=unicast address covers
this without needing additional changes.
Address all these issues by dropping received Beacon frames if they are
sent to a non-broadcast address.
Cc: stable(a)vger.kernel.org
Fixes: af2d14b01c32 ("mac80211: Beacon protection using the new BIGTK (STA)")
Signed-off-by: Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
Link: https://patch.msgid.link/20251215151134.104501-1-jouni.malinen@oss.qualcomm…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 6a1899512d07..e0ccd9749853 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -3511,6 +3511,11 @@ ieee80211_rx_h_mgmt_check(struct ieee80211_rx_data *rx)
rx->skb->len < IEEE80211_MIN_ACTION_SIZE)
return RX_DROP_U_RUNT_ACTION;
+ /* Drop non-broadcast Beacon frames */
+ if (ieee80211_is_beacon(mgmt->frame_control) &&
+ !is_broadcast_ether_addr(mgmt->da))
+ return RX_DROP;
+
if (rx->sdata->vif.type == NL80211_IFTYPE_AP &&
ieee80211_is_beacon(mgmt->frame_control) &&
!(rx->flags & IEEE80211_RX_BEACON_REPORTED)) {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1e5a541420b8c6d87d88eb50b6b978cdeafee1c9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010551-shredding-placidly-0c57@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e5a541420b8c6d87d88eb50b6b978cdeafee1c9 Mon Sep 17 00:00:00 2001
From: Miaoqian Lin <linmq006(a)gmail.com>
Date: Thu, 11 Dec 2025 12:13:13 +0400
Subject: [PATCH] net: phy: mediatek: fix nvmem cell reference leak in
mt798x_phy_calibration
When nvmem_cell_read() fails in mt798x_phy_calibration(), the function
returns without calling nvmem_cell_put(), leaking the cell reference.
Move nvmem_cell_put() right after nvmem_cell_read() to ensure the cell
reference is always released regardless of the read result.
Found via static analysis and code review.
Fixes: 98c485eaf509 ("net: phy: add driver for MediaTek SoC built-in GE PHYs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006(a)gmail.com>
Reviewed-by: Daniel Golle <daniel(a)makrotopia.org>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Link: https://patch.msgid.link/20251211081313.2368460-1-linmq006@gmail.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/phy/mediatek/mtk-ge-soc.c b/drivers/net/phy/mediatek/mtk-ge-soc.c
index cd09fbf92ef2..2c4bbc236202 100644
--- a/drivers/net/phy/mediatek/mtk-ge-soc.c
+++ b/drivers/net/phy/mediatek/mtk-ge-soc.c
@@ -1167,9 +1167,9 @@ static int mt798x_phy_calibration(struct phy_device *phydev)
}
buf = (u32 *)nvmem_cell_read(cell, &len);
+ nvmem_cell_put(cell);
if (IS_ERR(buf))
return PTR_ERR(buf);
- nvmem_cell_put(cell);
if (!buf[0] || !buf[1] || !buf[2] || !buf[3] || len < 4 * sizeof(u32)) {
phydev_err(phydev, "invalid efuse data\n");
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 1e5a541420b8c6d87d88eb50b6b978cdeafee1c9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010550-concise-enjoyment-35f1@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e5a541420b8c6d87d88eb50b6b978cdeafee1c9 Mon Sep 17 00:00:00 2001
From: Miaoqian Lin <linmq006(a)gmail.com>
Date: Thu, 11 Dec 2025 12:13:13 +0400
Subject: [PATCH] net: phy: mediatek: fix nvmem cell reference leak in
mt798x_phy_calibration
When nvmem_cell_read() fails in mt798x_phy_calibration(), the function
returns without calling nvmem_cell_put(), leaking the cell reference.
Move nvmem_cell_put() right after nvmem_cell_read() to ensure the cell
reference is always released regardless of the read result.
Found via static analysis and code review.
Fixes: 98c485eaf509 ("net: phy: add driver for MediaTek SoC built-in GE PHYs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006(a)gmail.com>
Reviewed-by: Daniel Golle <daniel(a)makrotopia.org>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Link: https://patch.msgid.link/20251211081313.2368460-1-linmq006@gmail.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/phy/mediatek/mtk-ge-soc.c b/drivers/net/phy/mediatek/mtk-ge-soc.c
index cd09fbf92ef2..2c4bbc236202 100644
--- a/drivers/net/phy/mediatek/mtk-ge-soc.c
+++ b/drivers/net/phy/mediatek/mtk-ge-soc.c
@@ -1167,9 +1167,9 @@ static int mt798x_phy_calibration(struct phy_device *phydev)
}
buf = (u32 *)nvmem_cell_read(cell, &len);
+ nvmem_cell_put(cell);
if (IS_ERR(buf))
return PTR_ERR(buf);
- nvmem_cell_put(cell);
if (!buf[0] || !buf[1] || !buf[2] || !buf[3] || len < 4 * sizeof(u32)) {
phydev_err(phydev, "invalid efuse data\n");
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f183663901f21fe0fba8bd31ae894bc529709ee0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010547-early-outnumber-a575@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f183663901f21fe0fba8bd31ae894bc529709ee0 Mon Sep 17 00:00:00 2001
From: Bijan Tabatabai <bijan311(a)gmail.com>
Date: Tue, 16 Dec 2025 14:07:27 -0600
Subject: [PATCH] mm: consider non-anon swap cache folios in
folio_expected_ref_count()
Currently, folio_expected_ref_count() only adds references for the swap
cache if the folio is anonymous. However, according to the comment above
the definition of PG_swapcache in enum pageflags, shmem folios can also
have PG_swapcache set. This patch makes sure references for the swap
cache are added if folio_test_swapcache(folio) is true.
This issue was found when trying to hot-unplug memory in a QEMU/KVM
virtual machine. When initiating hot-unplug when most of the guest memory
is allocated, hot-unplug hangs partway through removal due to migration
failures. The following message would be printed several times, and would
be printed again about every five seconds:
[ 49.641309] migrating pfn b12f25 failed ret:7
[ 49.641310] page: refcount:2 mapcount:0 mapping:0000000033bd8fe2 index:0x7f404d925 pfn:0xb12f25
[ 49.641311] aops:swap_aops
[ 49.641313] flags: 0x300000000030508(uptodate|active|owner_priv_1|reclaim|swapbacked|node=0|zone=3)
[ 49.641314] raw: 0300000000030508 ffffed312c4bc908 ffffed312c4bc9c8 0000000000000000
[ 49.641315] raw: 00000007f404d925 00000000000c823b 00000002ffffffff 0000000000000000
[ 49.641315] page dumped because: migration failure
When debugging this, I found that these migration failures were due to
__migrate_folio() returning -EAGAIN for a small set of folios because the
expected reference count it calculates via folio_expected_ref_count() is
one less than the actual reference count of the folios. Furthermore, all
of the affected folios were not anonymous, but had the PG_swapcache flag
set, inspiring this patch. After applying this patch, the memory
hot-unplug behaves as expected.
I tested this on a machine running Ubuntu 24.04 with kernel version
6.8.0-90-generic and 64GB of memory. The guest VM is managed by libvirt
and runs Ubuntu 24.04 with kernel version 6.18 (though the head of the
mm-unstable branch as a Dec 16, 2025 was also tested and behaves the same)
and 48GB of memory. The libvirt XML definition for the VM can be found at
[1]. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE is set in the guest
kernel so the hot-pluggable memory is automatically onlined.
Below are the steps to reproduce this behavior:
1) Define and start and virtual machine
host$ virsh -c qemu:///system define ./test_vm.xml # test_vm.xml from [1]
host$ virsh -c qemu:///system start test_vm
2) Setup swap in the guest
guest$ sudo fallocate -l 32G /swapfile
guest$ sudo chmod 0600 /swapfile
guest$ sudo mkswap /swapfile
guest$ sudo swapon /swapfile
3) Use alloc_data [2] to allocate most of the remaining guest memory
guest$ ./alloc_data 45
4) In a separate guest terminal, monitor the amount of used memory
guest$ watch -n1 free -h
5) When alloc_data has finished allocating, initiate the memory
hot-unplug using the provided xml file [3]
host$ virsh -c qemu:///system detach-device test_vm ./remove.xml --live
After initiating the memory hot-unplug, you should see the amount of
available memory in the guest decrease, and the amount of used swap data
increase. If everything works as expected, when all of the memory is
unplugged, there should be around 8.5-9GB of data in swap. If the
unplugging is unsuccessful, the amount of used swap data will settle below
that. If that happens, you should be able to see log messages in dmesg
similar to the one posted above.
Link: https://lkml.kernel.org/r/20251216200727.2360228-1-bijan311@gmail.com
Link: https://github.com/BijanT/linux_patch_files/blob/main/test_vm.xml [1]
Link: https://github.com/BijanT/linux_patch_files/blob/main/alloc_data.c [2]
Link: https://github.com/BijanT/linux_patch_files/blob/main/remove.xml [3]
Fixes: 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation")
Signed-off-by: Bijan Tabatabai <bijan311(a)gmail.com>
Acked-by: David Hildenbrand (Red Hat) <david(a)kernel.org>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Shivank Garg <shivankg(a)amd.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Kairui Song <ryncsn(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 15076261d0c2..6f959d8ca4b4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2459,10 +2459,10 @@ static inline int folio_expected_ref_count(const struct folio *folio)
if (WARN_ON_ONCE(page_has_type(&folio->page) && !folio_test_hugetlb(folio)))
return 0;
- if (folio_test_anon(folio)) {
- /* One reference per page from the swapcache. */
- ref_count += folio_test_swapcache(folio) << order;
- } else {
+ /* One reference per page from the swapcache. */
+ ref_count += folio_test_swapcache(folio) << order;
+
+ if (!folio_test_anon(folio)) {
/* One reference per page from the pagecache. */
ref_count += !!folio->mapping << order;
/* One reference from PG_private. */
CephFS stores file data across multiple RADOS objects. An object is the
atomic unit of storage, so the writeback code must clean only folios
that belong to the same object with each OSD request.
CephFS also supports RAID0-style striping of file contents: if enabled,
each object stores multiple unbroken "stripe units" covering different
portions of the file; if disabled, a "stripe unit" is simply the whole
object. The stripe unit is (usually) reported as the inode's block size.
Though the writeback logic could, in principle, lock all dirty folios
belonging to the same object, its current design is to lock only a
single stripe unit at a time. Ever since this code was first written,
it has determined this size by checking the inode's block size.
However, the relatively-new fscrypt support needed to reduce the block
size for encrypted inodes to the crypto block size (see 'fixes' commit),
which causes an unnecessarily high number of write operations (~1024x as
many, with 4MiB objects) and grossly degraded performance.
Fix this (and clarify intent) by using i_layout.stripe_unit directly in
ceph_define_write_size() so that encrypted inodes are written back with
the same number of operations as if they were unencrypted.
Fixes: 94af0470924c ("ceph: add some fscrypt guardrails")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sam Edwards <CFSworks(a)gmail.com>
---
fs/ceph/addr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index b3569d44d510..cb1da8e27c2b 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1000,7 +1000,8 @@ unsigned int ceph_define_write_size(struct address_space *mapping)
{
struct inode *inode = mapping->host;
struct ceph_fs_client *fsc = ceph_inode_to_fs_client(inode);
- unsigned int wsize = i_blocksize(inode);
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ unsigned int wsize = ci->i_layout.stripe_unit;
if (fsc->mount_options->wsize < wsize)
wsize = fsc->mount_options->wsize;
--
2.51.2
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a49a2a1baa0c553c3548a1c414b6a3c005a8deba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010501-unsworn-elated-41e1@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a49a2a1baa0c553c3548a1c414b6a3c005a8deba Mon Sep 17 00:00:00 2001
From: NeilBrown <neil(a)brown.name>
Date: Sat, 22 Nov 2025 12:00:36 +1100
Subject: [PATCH] lockd: fix vfs_test_lock() calls
Usage of vfs_test_lock() is somewhat confused. Documentation suggests
it is given a "lock" but this is not the case. It is given a struct
file_lock which contains some details of the sort of lock it should be
looking for.
In particular passing a "file_lock" containing fl_lmops or fl_ops is
meaningless and possibly confusing.
This is particularly problematic in lockd. nlmsvc_testlock() receives
an initialised "file_lock" from xdr-decode, including manager ops and an
owner. It then mistakenly passes this to vfs_test_lock() which might
replace the owner and the ops. This can lead to confusion when freeing
the lock.
The primary role of the 'struct file_lock' passed to vfs_test_lock() is
to report a conflicting lock that was found, so it makes more sense for
nlmsvc_testlock() to pass "conflock", which it uses for returning the
conflicting lock.
With this change, freeing of the lock is not confused and code in
__nlm4svc_proc_test() and __nlmsvc_proc_test() can be simplified.
Documentation for vfs_test_lock() is improved to reflect its real
purpose, and a WARN_ON_ONCE() is added to avoid a similar problem in the
future.
Reported-by: Olga Kornievskaia <okorniev(a)redhat.com>
Closes: https://lore.kernel.org/all/20251021130506.45065-1-okorniev@redhat.com
Signed-off-by: NeilBrown <neil(a)brown.name>
Fixes: 20fa19027286 ("nfs: add export operations")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
index 109e5caae8c7..4b6f18d97734 100644
--- a/fs/lockd/svc4proc.c
+++ b/fs/lockd/svc4proc.c
@@ -97,7 +97,6 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
struct nlm_args *argp = rqstp->rq_argp;
struct nlm_host *host;
struct nlm_file *file;
- struct nlm_lockowner *test_owner;
__be32 rc = rpc_success;
dprintk("lockd: TEST4 called\n");
@@ -107,7 +106,6 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file)))
return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
- test_owner = argp->lock.fl.c.flc_owner;
/* Now check for conflicting locks */
resp->status = nlmsvc_testlock(rqstp, file, host, &argp->lock,
&resp->lock);
@@ -116,7 +114,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
else
dprintk("lockd: TEST4 status %d\n", ntohl(resp->status));
- nlmsvc_put_lockowner(test_owner);
+ nlmsvc_release_lockowner(&argp->lock);
nlmsvc_release_host(host);
nlm_release_file(file);
return rc;
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 3a3d05cfe09a..6bce19fd024c 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -633,7 +633,13 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
}
mode = lock_to_openmode(&lock->fl);
- error = vfs_test_lock(file->f_file[mode], &lock->fl);
+ locks_init_lock(&conflock->fl);
+ /* vfs_test_lock only uses start, end, and owner, but tests flc_file */
+ conflock->fl.c.flc_file = lock->fl.c.flc_file;
+ conflock->fl.fl_start = lock->fl.fl_start;
+ conflock->fl.fl_end = lock->fl.fl_end;
+ conflock->fl.c.flc_owner = lock->fl.c.flc_owner;
+ error = vfs_test_lock(file->f_file[mode], &conflock->fl);
if (error) {
/* We can't currently deal with deferred test requests */
if (error == FILE_LOCK_DEFERRED)
@@ -643,22 +649,19 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
goto out;
}
- if (lock->fl.c.flc_type == F_UNLCK) {
+ if (conflock->fl.c.flc_type == F_UNLCK) {
ret = nlm_granted;
goto out;
}
dprintk("lockd: conflicting lock(ty=%d, %Ld-%Ld)\n",
- lock->fl.c.flc_type, (long long)lock->fl.fl_start,
- (long long)lock->fl.fl_end);
+ conflock->fl.c.flc_type, (long long)conflock->fl.fl_start,
+ (long long)conflock->fl.fl_end);
conflock->caller = "somehost"; /* FIXME */
conflock->len = strlen(conflock->caller);
conflock->oh.len = 0; /* don't return OH info */
- conflock->svid = lock->fl.c.flc_pid;
- conflock->fl.c.flc_type = lock->fl.c.flc_type;
- conflock->fl.fl_start = lock->fl.fl_start;
- conflock->fl.fl_end = lock->fl.fl_end;
- locks_release_private(&lock->fl);
+ conflock->svid = conflock->fl.c.flc_pid;
+ locks_release_private(&conflock->fl);
ret = nlm_lck_denied;
out:
diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
index f53d5177f267..5817ef272332 100644
--- a/fs/lockd/svcproc.c
+++ b/fs/lockd/svcproc.c
@@ -117,7 +117,6 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
struct nlm_args *argp = rqstp->rq_argp;
struct nlm_host *host;
struct nlm_file *file;
- struct nlm_lockowner *test_owner;
__be32 rc = rpc_success;
dprintk("lockd: TEST called\n");
@@ -127,8 +126,6 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
if ((resp->status = nlmsvc_retrieve_args(rqstp, argp, &host, &file)))
return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
- test_owner = argp->lock.fl.c.flc_owner;
-
/* Now check for conflicting locks */
resp->status = cast_status(nlmsvc_testlock(rqstp, file, host,
&argp->lock, &resp->lock));
@@ -138,7 +135,7 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
dprintk("lockd: TEST status %d vers %d\n",
ntohl(resp->status), rqstp->rq_vers);
- nlmsvc_put_lockowner(test_owner);
+ nlmsvc_release_lockowner(&argp->lock);
nlmsvc_release_host(host);
nlm_release_file(file);
return rc;
diff --git a/fs/locks.c b/fs/locks.c
index 04a3f0e20724..bf5e0d05a026 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2185,13 +2185,21 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
/**
* vfs_test_lock - test file byte range lock
* @filp: The file to test lock for
- * @fl: The lock to test; also used to hold result
+ * @fl: The byte-range in the file to test; also used to hold result
*
+ * On entry, @fl does not contain a lock, but identifies a range (fl_start, fl_end)
+ * in the file (c.flc_file), and an owner (c.flc_owner) for whom existing locks
+ * should be ignored. c.flc_type and c.flc_flags are ignored.
+ * Both fl_lmops and fl_ops in @fl must be NULL.
* Returns -ERRNO on failure. Indicates presence of conflicting lock by
- * setting conf->fl_type to something other than F_UNLCK.
+ * setting fl->fl_type to something other than F_UNLCK.
+ *
+ * If vfs_test_lock() does find a lock and return it, the caller must
+ * use locks_free_lock() or locks_release_private() on the returned lock.
*/
int vfs_test_lock(struct file *filp, struct file_lock *fl)
{
+ WARN_ON_ONCE(fl->fl_ops || fl->fl_lmops);
WARN_ON_ONCE(filp != fl->c.flc_file);
if (filp->f_op->lock)
return filp->f_op->lock(filp, F_GETLK, fl);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x f183663901f21fe0fba8bd31ae894bc529709ee0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010545-tracing-morbidly-6a4d@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f183663901f21fe0fba8bd31ae894bc529709ee0 Mon Sep 17 00:00:00 2001
From: Bijan Tabatabai <bijan311(a)gmail.com>
Date: Tue, 16 Dec 2025 14:07:27 -0600
Subject: [PATCH] mm: consider non-anon swap cache folios in
folio_expected_ref_count()
Currently, folio_expected_ref_count() only adds references for the swap
cache if the folio is anonymous. However, according to the comment above
the definition of PG_swapcache in enum pageflags, shmem folios can also
have PG_swapcache set. This patch makes sure references for the swap
cache are added if folio_test_swapcache(folio) is true.
This issue was found when trying to hot-unplug memory in a QEMU/KVM
virtual machine. When initiating hot-unplug when most of the guest memory
is allocated, hot-unplug hangs partway through removal due to migration
failures. The following message would be printed several times, and would
be printed again about every five seconds:
[ 49.641309] migrating pfn b12f25 failed ret:7
[ 49.641310] page: refcount:2 mapcount:0 mapping:0000000033bd8fe2 index:0x7f404d925 pfn:0xb12f25
[ 49.641311] aops:swap_aops
[ 49.641313] flags: 0x300000000030508(uptodate|active|owner_priv_1|reclaim|swapbacked|node=0|zone=3)
[ 49.641314] raw: 0300000000030508 ffffed312c4bc908 ffffed312c4bc9c8 0000000000000000
[ 49.641315] raw: 00000007f404d925 00000000000c823b 00000002ffffffff 0000000000000000
[ 49.641315] page dumped because: migration failure
When debugging this, I found that these migration failures were due to
__migrate_folio() returning -EAGAIN for a small set of folios because the
expected reference count it calculates via folio_expected_ref_count() is
one less than the actual reference count of the folios. Furthermore, all
of the affected folios were not anonymous, but had the PG_swapcache flag
set, inspiring this patch. After applying this patch, the memory
hot-unplug behaves as expected.
I tested this on a machine running Ubuntu 24.04 with kernel version
6.8.0-90-generic and 64GB of memory. The guest VM is managed by libvirt
and runs Ubuntu 24.04 with kernel version 6.18 (though the head of the
mm-unstable branch as a Dec 16, 2025 was also tested and behaves the same)
and 48GB of memory. The libvirt XML definition for the VM can be found at
[1]. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE is set in the guest
kernel so the hot-pluggable memory is automatically onlined.
Below are the steps to reproduce this behavior:
1) Define and start and virtual machine
host$ virsh -c qemu:///system define ./test_vm.xml # test_vm.xml from [1]
host$ virsh -c qemu:///system start test_vm
2) Setup swap in the guest
guest$ sudo fallocate -l 32G /swapfile
guest$ sudo chmod 0600 /swapfile
guest$ sudo mkswap /swapfile
guest$ sudo swapon /swapfile
3) Use alloc_data [2] to allocate most of the remaining guest memory
guest$ ./alloc_data 45
4) In a separate guest terminal, monitor the amount of used memory
guest$ watch -n1 free -h
5) When alloc_data has finished allocating, initiate the memory
hot-unplug using the provided xml file [3]
host$ virsh -c qemu:///system detach-device test_vm ./remove.xml --live
After initiating the memory hot-unplug, you should see the amount of
available memory in the guest decrease, and the amount of used swap data
increase. If everything works as expected, when all of the memory is
unplugged, there should be around 8.5-9GB of data in swap. If the
unplugging is unsuccessful, the amount of used swap data will settle below
that. If that happens, you should be able to see log messages in dmesg
similar to the one posted above.
Link: https://lkml.kernel.org/r/20251216200727.2360228-1-bijan311@gmail.com
Link: https://github.com/BijanT/linux_patch_files/blob/main/test_vm.xml [1]
Link: https://github.com/BijanT/linux_patch_files/blob/main/alloc_data.c [2]
Link: https://github.com/BijanT/linux_patch_files/blob/main/remove.xml [3]
Fixes: 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation")
Signed-off-by: Bijan Tabatabai <bijan311(a)gmail.com>
Acked-by: David Hildenbrand (Red Hat) <david(a)kernel.org>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Shivank Garg <shivankg(a)amd.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Kairui Song <ryncsn(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 15076261d0c2..6f959d8ca4b4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2459,10 +2459,10 @@ static inline int folio_expected_ref_count(const struct folio *folio)
if (WARN_ON_ONCE(page_has_type(&folio->page) && !folio_test_hugetlb(folio)))
return 0;
- if (folio_test_anon(folio)) {
- /* One reference per page from the swapcache. */
- ref_count += folio_test_swapcache(folio) << order;
- } else {
+ /* One reference per page from the swapcache. */
+ ref_count += folio_test_swapcache(folio) << order;
+
+ if (!folio_test_anon(folio)) {
/* One reference per page from the pagecache. */
ref_count += !!folio->mapping << order;
/* One reference from PG_private. */
When fscrypt is enabled, move_dirty_folio_in_page_array() may fail
because it needs to allocate bounce buffers to store the encrypted
versions of each folio. Each folio beyond the first allocates its bounce
buffer with GFP_NOWAIT. Failures are common (and expected) under this
allocation mode; they should flush (not abort) the batch.
However, ceph_process_folio_batch() uses the same `rc` variable for its
own return code and for capturing the return codes of its routine calls;
failing to reset `rc` back to 0 results in the error being propagated
out to the main writeback loop, which cannot actually tolerate any
errors here: once `ceph_wbc.pages` is allocated, it must be passed to
ceph_submit_write() to be freed. If it survives until the next iteration
(e.g. due to the goto being followed), ceph_allocate_page_array()'s
BUG_ON() will oops the worker. (Subsequent patches in this series make
the loop more robust.)
Note that this failure mode is currently masked due to another bug
(addressed later in this series) that prevents multiple encrypted folios
from being selected for the same write.
For now, just reset `rc` when redirtying the folio and prevent the
error from propagating. After this change, ceph_process_folio_batch() no
longer returns errors; its only remaining failure indicator is
`locked_pages == 0`, which the caller already handles correctly. The
next patch in this series addresses this.
Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sam Edwards <CFSworks(a)gmail.com>
---
fs/ceph/addr.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 63b75d214210..3462df35d245 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1369,6 +1369,7 @@ int ceph_process_folio_batch(struct address_space *mapping,
rc = move_dirty_folio_in_page_array(mapping, wbc, ceph_wbc,
folio);
if (rc) {
+ rc = 0;
folio_redirty_for_writepage(wbc, folio);
folio_unlock(folio);
break;
--
2.51.2
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 193d18f60588e95d62e0f82b6a53893e5f2f19f8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010516-playmate-shorthand-2e97@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 193d18f60588e95d62e0f82b6a53893e5f2f19f8 Mon Sep 17 00:00:00 2001
From: Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
Date: Mon, 15 Dec 2025 17:11:34 +0200
Subject: [PATCH] wifi: mac80211: Discard Beacon frames to non-broadcast
address
Beacon frames are required to be sent to the broadcast address, see IEEE
Std 802.11-2020, 11.1.3.1 ("The Address 1 field of the Beacon .. frame
shall be set to the broadcast address"). A unicast Beacon frame might be
used as a targeted attack to get one of the associated STAs to do
something (e.g., using CSA to move it to another channel). As such, it
is better have strict filtering for this on the received side and
discard all Beacon frames that are sent to an unexpected address.
This is even more important for cases where beacon protection is used.
The current implementation in mac80211 is correctly discarding unicast
Beacon frames if the Protected Frame bit in the Frame Control field is
set to 0. However, if that bit is set to 1, the logic used for checking
for configured BIGTK(s) does not actually work. If the driver does not
have logic for dropping unicast Beacon frames with Protected Frame bit
1, these frames would be accepted in mac80211 processing as valid Beacon
frames even though they are not protected. This would allow beacon
protection to be bypassed. While the logic for checking beacon
protection could be extended to cover this corner case, a more generic
check for discard all Beacon frames based on A1=unicast address covers
this without needing additional changes.
Address all these issues by dropping received Beacon frames if they are
sent to a non-broadcast address.
Cc: stable(a)vger.kernel.org
Fixes: af2d14b01c32 ("mac80211: Beacon protection using the new BIGTK (STA)")
Signed-off-by: Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
Link: https://patch.msgid.link/20251215151134.104501-1-jouni.malinen@oss.qualcomm…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 6a1899512d07..e0ccd9749853 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -3511,6 +3511,11 @@ ieee80211_rx_h_mgmt_check(struct ieee80211_rx_data *rx)
rx->skb->len < IEEE80211_MIN_ACTION_SIZE)
return RX_DROP_U_RUNT_ACTION;
+ /* Drop non-broadcast Beacon frames */
+ if (ieee80211_is_beacon(mgmt->frame_control) &&
+ !is_broadcast_ether_addr(mgmt->da))
+ return RX_DROP;
+
if (rx->sdata->vif.type == NL80211_IFTYPE_AP &&
ieee80211_is_beacon(mgmt->frame_control) &&
!(rx->flags & IEEE80211_RX_BEACON_REPORTED)) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a49a2a1baa0c553c3548a1c414b6a3c005a8deba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010500-dentist-busybody-6e8b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a49a2a1baa0c553c3548a1c414b6a3c005a8deba Mon Sep 17 00:00:00 2001
From: NeilBrown <neil(a)brown.name>
Date: Sat, 22 Nov 2025 12:00:36 +1100
Subject: [PATCH] lockd: fix vfs_test_lock() calls
Usage of vfs_test_lock() is somewhat confused. Documentation suggests
it is given a "lock" but this is not the case. It is given a struct
file_lock which contains some details of the sort of lock it should be
looking for.
In particular passing a "file_lock" containing fl_lmops or fl_ops is
meaningless and possibly confusing.
This is particularly problematic in lockd. nlmsvc_testlock() receives
an initialised "file_lock" from xdr-decode, including manager ops and an
owner. It then mistakenly passes this to vfs_test_lock() which might
replace the owner and the ops. This can lead to confusion when freeing
the lock.
The primary role of the 'struct file_lock' passed to vfs_test_lock() is
to report a conflicting lock that was found, so it makes more sense for
nlmsvc_testlock() to pass "conflock", which it uses for returning the
conflicting lock.
With this change, freeing of the lock is not confused and code in
__nlm4svc_proc_test() and __nlmsvc_proc_test() can be simplified.
Documentation for vfs_test_lock() is improved to reflect its real
purpose, and a WARN_ON_ONCE() is added to avoid a similar problem in the
future.
Reported-by: Olga Kornievskaia <okorniev(a)redhat.com>
Closes: https://lore.kernel.org/all/20251021130506.45065-1-okorniev@redhat.com
Signed-off-by: NeilBrown <neil(a)brown.name>
Fixes: 20fa19027286 ("nfs: add export operations")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
index 109e5caae8c7..4b6f18d97734 100644
--- a/fs/lockd/svc4proc.c
+++ b/fs/lockd/svc4proc.c
@@ -97,7 +97,6 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
struct nlm_args *argp = rqstp->rq_argp;
struct nlm_host *host;
struct nlm_file *file;
- struct nlm_lockowner *test_owner;
__be32 rc = rpc_success;
dprintk("lockd: TEST4 called\n");
@@ -107,7 +106,6 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file)))
return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
- test_owner = argp->lock.fl.c.flc_owner;
/* Now check for conflicting locks */
resp->status = nlmsvc_testlock(rqstp, file, host, &argp->lock,
&resp->lock);
@@ -116,7 +114,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
else
dprintk("lockd: TEST4 status %d\n", ntohl(resp->status));
- nlmsvc_put_lockowner(test_owner);
+ nlmsvc_release_lockowner(&argp->lock);
nlmsvc_release_host(host);
nlm_release_file(file);
return rc;
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 3a3d05cfe09a..6bce19fd024c 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -633,7 +633,13 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
}
mode = lock_to_openmode(&lock->fl);
- error = vfs_test_lock(file->f_file[mode], &lock->fl);
+ locks_init_lock(&conflock->fl);
+ /* vfs_test_lock only uses start, end, and owner, but tests flc_file */
+ conflock->fl.c.flc_file = lock->fl.c.flc_file;
+ conflock->fl.fl_start = lock->fl.fl_start;
+ conflock->fl.fl_end = lock->fl.fl_end;
+ conflock->fl.c.flc_owner = lock->fl.c.flc_owner;
+ error = vfs_test_lock(file->f_file[mode], &conflock->fl);
if (error) {
/* We can't currently deal with deferred test requests */
if (error == FILE_LOCK_DEFERRED)
@@ -643,22 +649,19 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
goto out;
}
- if (lock->fl.c.flc_type == F_UNLCK) {
+ if (conflock->fl.c.flc_type == F_UNLCK) {
ret = nlm_granted;
goto out;
}
dprintk("lockd: conflicting lock(ty=%d, %Ld-%Ld)\n",
- lock->fl.c.flc_type, (long long)lock->fl.fl_start,
- (long long)lock->fl.fl_end);
+ conflock->fl.c.flc_type, (long long)conflock->fl.fl_start,
+ (long long)conflock->fl.fl_end);
conflock->caller = "somehost"; /* FIXME */
conflock->len = strlen(conflock->caller);
conflock->oh.len = 0; /* don't return OH info */
- conflock->svid = lock->fl.c.flc_pid;
- conflock->fl.c.flc_type = lock->fl.c.flc_type;
- conflock->fl.fl_start = lock->fl.fl_start;
- conflock->fl.fl_end = lock->fl.fl_end;
- locks_release_private(&lock->fl);
+ conflock->svid = conflock->fl.c.flc_pid;
+ locks_release_private(&conflock->fl);
ret = nlm_lck_denied;
out:
diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
index f53d5177f267..5817ef272332 100644
--- a/fs/lockd/svcproc.c
+++ b/fs/lockd/svcproc.c
@@ -117,7 +117,6 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
struct nlm_args *argp = rqstp->rq_argp;
struct nlm_host *host;
struct nlm_file *file;
- struct nlm_lockowner *test_owner;
__be32 rc = rpc_success;
dprintk("lockd: TEST called\n");
@@ -127,8 +126,6 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
if ((resp->status = nlmsvc_retrieve_args(rqstp, argp, &host, &file)))
return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
- test_owner = argp->lock.fl.c.flc_owner;
-
/* Now check for conflicting locks */
resp->status = cast_status(nlmsvc_testlock(rqstp, file, host,
&argp->lock, &resp->lock));
@@ -138,7 +135,7 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
dprintk("lockd: TEST status %d vers %d\n",
ntohl(resp->status), rqstp->rq_vers);
- nlmsvc_put_lockowner(test_owner);
+ nlmsvc_release_lockowner(&argp->lock);
nlmsvc_release_host(host);
nlm_release_file(file);
return rc;
diff --git a/fs/locks.c b/fs/locks.c
index 04a3f0e20724..bf5e0d05a026 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2185,13 +2185,21 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
/**
* vfs_test_lock - test file byte range lock
* @filp: The file to test lock for
- * @fl: The lock to test; also used to hold result
+ * @fl: The byte-range in the file to test; also used to hold result
*
+ * On entry, @fl does not contain a lock, but identifies a range (fl_start, fl_end)
+ * in the file (c.flc_file), and an owner (c.flc_owner) for whom existing locks
+ * should be ignored. c.flc_type and c.flc_flags are ignored.
+ * Both fl_lmops and fl_ops in @fl must be NULL.
* Returns -ERRNO on failure. Indicates presence of conflicting lock by
- * setting conf->fl_type to something other than F_UNLCK.
+ * setting fl->fl_type to something other than F_UNLCK.
+ *
+ * If vfs_test_lock() does find a lock and return it, the caller must
+ * use locks_free_lock() or locks_release_private() on the returned lock.
*/
int vfs_test_lock(struct file *filp, struct file_lock *fl)
{
+ WARN_ON_ONCE(fl->fl_ops || fl->fl_lmops);
WARN_ON_ONCE(filp != fl->c.flc_file);
if (filp->f_op->lock)
return filp->f_op->lock(filp, F_GETLK, fl);
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 193d18f60588e95d62e0f82b6a53893e5f2f19f8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010510-wronged-recovery-c8c5@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 193d18f60588e95d62e0f82b6a53893e5f2f19f8 Mon Sep 17 00:00:00 2001
From: Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
Date: Mon, 15 Dec 2025 17:11:34 +0200
Subject: [PATCH] wifi: mac80211: Discard Beacon frames to non-broadcast
address
Beacon frames are required to be sent to the broadcast address, see IEEE
Std 802.11-2020, 11.1.3.1 ("The Address 1 field of the Beacon .. frame
shall be set to the broadcast address"). A unicast Beacon frame might be
used as a targeted attack to get one of the associated STAs to do
something (e.g., using CSA to move it to another channel). As such, it
is better have strict filtering for this on the received side and
discard all Beacon frames that are sent to an unexpected address.
This is even more important for cases where beacon protection is used.
The current implementation in mac80211 is correctly discarding unicast
Beacon frames if the Protected Frame bit in the Frame Control field is
set to 0. However, if that bit is set to 1, the logic used for checking
for configured BIGTK(s) does not actually work. If the driver does not
have logic for dropping unicast Beacon frames with Protected Frame bit
1, these frames would be accepted in mac80211 processing as valid Beacon
frames even though they are not protected. This would allow beacon
protection to be bypassed. While the logic for checking beacon
protection could be extended to cover this corner case, a more generic
check for discard all Beacon frames based on A1=unicast address covers
this without needing additional changes.
Address all these issues by dropping received Beacon frames if they are
sent to a non-broadcast address.
Cc: stable(a)vger.kernel.org
Fixes: af2d14b01c32 ("mac80211: Beacon protection using the new BIGTK (STA)")
Signed-off-by: Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
Link: https://patch.msgid.link/20251215151134.104501-1-jouni.malinen@oss.qualcomm…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 6a1899512d07..e0ccd9749853 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -3511,6 +3511,11 @@ ieee80211_rx_h_mgmt_check(struct ieee80211_rx_data *rx)
rx->skb->len < IEEE80211_MIN_ACTION_SIZE)
return RX_DROP_U_RUNT_ACTION;
+ /* Drop non-broadcast Beacon frames */
+ if (ieee80211_is_beacon(mgmt->frame_control) &&
+ !is_broadcast_ether_addr(mgmt->da))
+ return RX_DROP;
+
if (rx->sdata->vif.type == NL80211_IFTYPE_AP &&
ieee80211_is_beacon(mgmt->frame_control) &&
!(rx->flags & IEEE80211_RX_BEACON_REPORTED)) {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a49a2a1baa0c553c3548a1c414b6a3c005a8deba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010559-plaster-snowsuit-aa37@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a49a2a1baa0c553c3548a1c414b6a3c005a8deba Mon Sep 17 00:00:00 2001
From: NeilBrown <neil(a)brown.name>
Date: Sat, 22 Nov 2025 12:00:36 +1100
Subject: [PATCH] lockd: fix vfs_test_lock() calls
Usage of vfs_test_lock() is somewhat confused. Documentation suggests
it is given a "lock" but this is not the case. It is given a struct
file_lock which contains some details of the sort of lock it should be
looking for.
In particular passing a "file_lock" containing fl_lmops or fl_ops is
meaningless and possibly confusing.
This is particularly problematic in lockd. nlmsvc_testlock() receives
an initialised "file_lock" from xdr-decode, including manager ops and an
owner. It then mistakenly passes this to vfs_test_lock() which might
replace the owner and the ops. This can lead to confusion when freeing
the lock.
The primary role of the 'struct file_lock' passed to vfs_test_lock() is
to report a conflicting lock that was found, so it makes more sense for
nlmsvc_testlock() to pass "conflock", which it uses for returning the
conflicting lock.
With this change, freeing of the lock is not confused and code in
__nlm4svc_proc_test() and __nlmsvc_proc_test() can be simplified.
Documentation for vfs_test_lock() is improved to reflect its real
purpose, and a WARN_ON_ONCE() is added to avoid a similar problem in the
future.
Reported-by: Olga Kornievskaia <okorniev(a)redhat.com>
Closes: https://lore.kernel.org/all/20251021130506.45065-1-okorniev@redhat.com
Signed-off-by: NeilBrown <neil(a)brown.name>
Fixes: 20fa19027286 ("nfs: add export operations")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
index 109e5caae8c7..4b6f18d97734 100644
--- a/fs/lockd/svc4proc.c
+++ b/fs/lockd/svc4proc.c
@@ -97,7 +97,6 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
struct nlm_args *argp = rqstp->rq_argp;
struct nlm_host *host;
struct nlm_file *file;
- struct nlm_lockowner *test_owner;
__be32 rc = rpc_success;
dprintk("lockd: TEST4 called\n");
@@ -107,7 +106,6 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file)))
return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
- test_owner = argp->lock.fl.c.flc_owner;
/* Now check for conflicting locks */
resp->status = nlmsvc_testlock(rqstp, file, host, &argp->lock,
&resp->lock);
@@ -116,7 +114,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
else
dprintk("lockd: TEST4 status %d\n", ntohl(resp->status));
- nlmsvc_put_lockowner(test_owner);
+ nlmsvc_release_lockowner(&argp->lock);
nlmsvc_release_host(host);
nlm_release_file(file);
return rc;
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 3a3d05cfe09a..6bce19fd024c 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -633,7 +633,13 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
}
mode = lock_to_openmode(&lock->fl);
- error = vfs_test_lock(file->f_file[mode], &lock->fl);
+ locks_init_lock(&conflock->fl);
+ /* vfs_test_lock only uses start, end, and owner, but tests flc_file */
+ conflock->fl.c.flc_file = lock->fl.c.flc_file;
+ conflock->fl.fl_start = lock->fl.fl_start;
+ conflock->fl.fl_end = lock->fl.fl_end;
+ conflock->fl.c.flc_owner = lock->fl.c.flc_owner;
+ error = vfs_test_lock(file->f_file[mode], &conflock->fl);
if (error) {
/* We can't currently deal with deferred test requests */
if (error == FILE_LOCK_DEFERRED)
@@ -643,22 +649,19 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
goto out;
}
- if (lock->fl.c.flc_type == F_UNLCK) {
+ if (conflock->fl.c.flc_type == F_UNLCK) {
ret = nlm_granted;
goto out;
}
dprintk("lockd: conflicting lock(ty=%d, %Ld-%Ld)\n",
- lock->fl.c.flc_type, (long long)lock->fl.fl_start,
- (long long)lock->fl.fl_end);
+ conflock->fl.c.flc_type, (long long)conflock->fl.fl_start,
+ (long long)conflock->fl.fl_end);
conflock->caller = "somehost"; /* FIXME */
conflock->len = strlen(conflock->caller);
conflock->oh.len = 0; /* don't return OH info */
- conflock->svid = lock->fl.c.flc_pid;
- conflock->fl.c.flc_type = lock->fl.c.flc_type;
- conflock->fl.fl_start = lock->fl.fl_start;
- conflock->fl.fl_end = lock->fl.fl_end;
- locks_release_private(&lock->fl);
+ conflock->svid = conflock->fl.c.flc_pid;
+ locks_release_private(&conflock->fl);
ret = nlm_lck_denied;
out:
diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
index f53d5177f267..5817ef272332 100644
--- a/fs/lockd/svcproc.c
+++ b/fs/lockd/svcproc.c
@@ -117,7 +117,6 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
struct nlm_args *argp = rqstp->rq_argp;
struct nlm_host *host;
struct nlm_file *file;
- struct nlm_lockowner *test_owner;
__be32 rc = rpc_success;
dprintk("lockd: TEST called\n");
@@ -127,8 +126,6 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
if ((resp->status = nlmsvc_retrieve_args(rqstp, argp, &host, &file)))
return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
- test_owner = argp->lock.fl.c.flc_owner;
-
/* Now check for conflicting locks */
resp->status = cast_status(nlmsvc_testlock(rqstp, file, host,
&argp->lock, &resp->lock));
@@ -138,7 +135,7 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
dprintk("lockd: TEST status %d vers %d\n",
ntohl(resp->status), rqstp->rq_vers);
- nlmsvc_put_lockowner(test_owner);
+ nlmsvc_release_lockowner(&argp->lock);
nlmsvc_release_host(host);
nlm_release_file(file);
return rc;
diff --git a/fs/locks.c b/fs/locks.c
index 04a3f0e20724..bf5e0d05a026 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2185,13 +2185,21 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
/**
* vfs_test_lock - test file byte range lock
* @filp: The file to test lock for
- * @fl: The lock to test; also used to hold result
+ * @fl: The byte-range in the file to test; also used to hold result
*
+ * On entry, @fl does not contain a lock, but identifies a range (fl_start, fl_end)
+ * in the file (c.flc_file), and an owner (c.flc_owner) for whom existing locks
+ * should be ignored. c.flc_type and c.flc_flags are ignored.
+ * Both fl_lmops and fl_ops in @fl must be NULL.
* Returns -ERRNO on failure. Indicates presence of conflicting lock by
- * setting conf->fl_type to something other than F_UNLCK.
+ * setting fl->fl_type to something other than F_UNLCK.
+ *
+ * If vfs_test_lock() does find a lock and return it, the caller must
+ * use locks_free_lock() or locks_release_private() on the returned lock.
*/
int vfs_test_lock(struct file *filp, struct file_lock *fl)
{
+ WARN_ON_ONCE(fl->fl_ops || fl->fl_lmops);
WARN_ON_ONCE(filp != fl->c.flc_file);
if (filp->f_op->lock)
return filp->f_op->lock(filp, F_GETLK, fl);
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x f183663901f21fe0fba8bd31ae894bc529709ee0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010544-pavestone-gloating-a829@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f183663901f21fe0fba8bd31ae894bc529709ee0 Mon Sep 17 00:00:00 2001
From: Bijan Tabatabai <bijan311(a)gmail.com>
Date: Tue, 16 Dec 2025 14:07:27 -0600
Subject: [PATCH] mm: consider non-anon swap cache folios in
folio_expected_ref_count()
Currently, folio_expected_ref_count() only adds references for the swap
cache if the folio is anonymous. However, according to the comment above
the definition of PG_swapcache in enum pageflags, shmem folios can also
have PG_swapcache set. This patch makes sure references for the swap
cache are added if folio_test_swapcache(folio) is true.
This issue was found when trying to hot-unplug memory in a QEMU/KVM
virtual machine. When initiating hot-unplug when most of the guest memory
is allocated, hot-unplug hangs partway through removal due to migration
failures. The following message would be printed several times, and would
be printed again about every five seconds:
[ 49.641309] migrating pfn b12f25 failed ret:7
[ 49.641310] page: refcount:2 mapcount:0 mapping:0000000033bd8fe2 index:0x7f404d925 pfn:0xb12f25
[ 49.641311] aops:swap_aops
[ 49.641313] flags: 0x300000000030508(uptodate|active|owner_priv_1|reclaim|swapbacked|node=0|zone=3)
[ 49.641314] raw: 0300000000030508 ffffed312c4bc908 ffffed312c4bc9c8 0000000000000000
[ 49.641315] raw: 00000007f404d925 00000000000c823b 00000002ffffffff 0000000000000000
[ 49.641315] page dumped because: migration failure
When debugging this, I found that these migration failures were due to
__migrate_folio() returning -EAGAIN for a small set of folios because the
expected reference count it calculates via folio_expected_ref_count() is
one less than the actual reference count of the folios. Furthermore, all
of the affected folios were not anonymous, but had the PG_swapcache flag
set, inspiring this patch. After applying this patch, the memory
hot-unplug behaves as expected.
I tested this on a machine running Ubuntu 24.04 with kernel version
6.8.0-90-generic and 64GB of memory. The guest VM is managed by libvirt
and runs Ubuntu 24.04 with kernel version 6.18 (though the head of the
mm-unstable branch as a Dec 16, 2025 was also tested and behaves the same)
and 48GB of memory. The libvirt XML definition for the VM can be found at
[1]. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE is set in the guest
kernel so the hot-pluggable memory is automatically onlined.
Below are the steps to reproduce this behavior:
1) Define and start and virtual machine
host$ virsh -c qemu:///system define ./test_vm.xml # test_vm.xml from [1]
host$ virsh -c qemu:///system start test_vm
2) Setup swap in the guest
guest$ sudo fallocate -l 32G /swapfile
guest$ sudo chmod 0600 /swapfile
guest$ sudo mkswap /swapfile
guest$ sudo swapon /swapfile
3) Use alloc_data [2] to allocate most of the remaining guest memory
guest$ ./alloc_data 45
4) In a separate guest terminal, monitor the amount of used memory
guest$ watch -n1 free -h
5) When alloc_data has finished allocating, initiate the memory
hot-unplug using the provided xml file [3]
host$ virsh -c qemu:///system detach-device test_vm ./remove.xml --live
After initiating the memory hot-unplug, you should see the amount of
available memory in the guest decrease, and the amount of used swap data
increase. If everything works as expected, when all of the memory is
unplugged, there should be around 8.5-9GB of data in swap. If the
unplugging is unsuccessful, the amount of used swap data will settle below
that. If that happens, you should be able to see log messages in dmesg
similar to the one posted above.
Link: https://lkml.kernel.org/r/20251216200727.2360228-1-bijan311@gmail.com
Link: https://github.com/BijanT/linux_patch_files/blob/main/test_vm.xml [1]
Link: https://github.com/BijanT/linux_patch_files/blob/main/alloc_data.c [2]
Link: https://github.com/BijanT/linux_patch_files/blob/main/remove.xml [3]
Fixes: 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation")
Signed-off-by: Bijan Tabatabai <bijan311(a)gmail.com>
Acked-by: David Hildenbrand (Red Hat) <david(a)kernel.org>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Shivank Garg <shivankg(a)amd.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Kairui Song <ryncsn(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 15076261d0c2..6f959d8ca4b4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2459,10 +2459,10 @@ static inline int folio_expected_ref_count(const struct folio *folio)
if (WARN_ON_ONCE(page_has_type(&folio->page) && !folio_test_hugetlb(folio)))
return 0;
- if (folio_test_anon(folio)) {
- /* One reference per page from the swapcache. */
- ref_count += folio_test_swapcache(folio) << order;
- } else {
+ /* One reference per page from the swapcache. */
+ ref_count += folio_test_swapcache(folio) << order;
+
+ if (!folio_test_anon(folio)) {
/* One reference per page from the pagecache. */
ref_count += !!folio->mapping << order;
/* One reference from PG_private. */
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x a49a2a1baa0c553c3548a1c414b6a3c005a8deba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010559-outthink-mutilated-30df@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a49a2a1baa0c553c3548a1c414b6a3c005a8deba Mon Sep 17 00:00:00 2001
From: NeilBrown <neil(a)brown.name>
Date: Sat, 22 Nov 2025 12:00:36 +1100
Subject: [PATCH] lockd: fix vfs_test_lock() calls
Usage of vfs_test_lock() is somewhat confused. Documentation suggests
it is given a "lock" but this is not the case. It is given a struct
file_lock which contains some details of the sort of lock it should be
looking for.
In particular passing a "file_lock" containing fl_lmops or fl_ops is
meaningless and possibly confusing.
This is particularly problematic in lockd. nlmsvc_testlock() receives
an initialised "file_lock" from xdr-decode, including manager ops and an
owner. It then mistakenly passes this to vfs_test_lock() which might
replace the owner and the ops. This can lead to confusion when freeing
the lock.
The primary role of the 'struct file_lock' passed to vfs_test_lock() is
to report a conflicting lock that was found, so it makes more sense for
nlmsvc_testlock() to pass "conflock", which it uses for returning the
conflicting lock.
With this change, freeing of the lock is not confused and code in
__nlm4svc_proc_test() and __nlmsvc_proc_test() can be simplified.
Documentation for vfs_test_lock() is improved to reflect its real
purpose, and a WARN_ON_ONCE() is added to avoid a similar problem in the
future.
Reported-by: Olga Kornievskaia <okorniev(a)redhat.com>
Closes: https://lore.kernel.org/all/20251021130506.45065-1-okorniev@redhat.com
Signed-off-by: NeilBrown <neil(a)brown.name>
Fixes: 20fa19027286 ("nfs: add export operations")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
index 109e5caae8c7..4b6f18d97734 100644
--- a/fs/lockd/svc4proc.c
+++ b/fs/lockd/svc4proc.c
@@ -97,7 +97,6 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
struct nlm_args *argp = rqstp->rq_argp;
struct nlm_host *host;
struct nlm_file *file;
- struct nlm_lockowner *test_owner;
__be32 rc = rpc_success;
dprintk("lockd: TEST4 called\n");
@@ -107,7 +106,6 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file)))
return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
- test_owner = argp->lock.fl.c.flc_owner;
/* Now check for conflicting locks */
resp->status = nlmsvc_testlock(rqstp, file, host, &argp->lock,
&resp->lock);
@@ -116,7 +114,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
else
dprintk("lockd: TEST4 status %d\n", ntohl(resp->status));
- nlmsvc_put_lockowner(test_owner);
+ nlmsvc_release_lockowner(&argp->lock);
nlmsvc_release_host(host);
nlm_release_file(file);
return rc;
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 3a3d05cfe09a..6bce19fd024c 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -633,7 +633,13 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
}
mode = lock_to_openmode(&lock->fl);
- error = vfs_test_lock(file->f_file[mode], &lock->fl);
+ locks_init_lock(&conflock->fl);
+ /* vfs_test_lock only uses start, end, and owner, but tests flc_file */
+ conflock->fl.c.flc_file = lock->fl.c.flc_file;
+ conflock->fl.fl_start = lock->fl.fl_start;
+ conflock->fl.fl_end = lock->fl.fl_end;
+ conflock->fl.c.flc_owner = lock->fl.c.flc_owner;
+ error = vfs_test_lock(file->f_file[mode], &conflock->fl);
if (error) {
/* We can't currently deal with deferred test requests */
if (error == FILE_LOCK_DEFERRED)
@@ -643,22 +649,19 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
goto out;
}
- if (lock->fl.c.flc_type == F_UNLCK) {
+ if (conflock->fl.c.flc_type == F_UNLCK) {
ret = nlm_granted;
goto out;
}
dprintk("lockd: conflicting lock(ty=%d, %Ld-%Ld)\n",
- lock->fl.c.flc_type, (long long)lock->fl.fl_start,
- (long long)lock->fl.fl_end);
+ conflock->fl.c.flc_type, (long long)conflock->fl.fl_start,
+ (long long)conflock->fl.fl_end);
conflock->caller = "somehost"; /* FIXME */
conflock->len = strlen(conflock->caller);
conflock->oh.len = 0; /* don't return OH info */
- conflock->svid = lock->fl.c.flc_pid;
- conflock->fl.c.flc_type = lock->fl.c.flc_type;
- conflock->fl.fl_start = lock->fl.fl_start;
- conflock->fl.fl_end = lock->fl.fl_end;
- locks_release_private(&lock->fl);
+ conflock->svid = conflock->fl.c.flc_pid;
+ locks_release_private(&conflock->fl);
ret = nlm_lck_denied;
out:
diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
index f53d5177f267..5817ef272332 100644
--- a/fs/lockd/svcproc.c
+++ b/fs/lockd/svcproc.c
@@ -117,7 +117,6 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
struct nlm_args *argp = rqstp->rq_argp;
struct nlm_host *host;
struct nlm_file *file;
- struct nlm_lockowner *test_owner;
__be32 rc = rpc_success;
dprintk("lockd: TEST called\n");
@@ -127,8 +126,6 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
if ((resp->status = nlmsvc_retrieve_args(rqstp, argp, &host, &file)))
return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
- test_owner = argp->lock.fl.c.flc_owner;
-
/* Now check for conflicting locks */
resp->status = cast_status(nlmsvc_testlock(rqstp, file, host,
&argp->lock, &resp->lock));
@@ -138,7 +135,7 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
dprintk("lockd: TEST status %d vers %d\n",
ntohl(resp->status), rqstp->rq_vers);
- nlmsvc_put_lockowner(test_owner);
+ nlmsvc_release_lockowner(&argp->lock);
nlmsvc_release_host(host);
nlm_release_file(file);
return rc;
diff --git a/fs/locks.c b/fs/locks.c
index 04a3f0e20724..bf5e0d05a026 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2185,13 +2185,21 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
/**
* vfs_test_lock - test file byte range lock
* @filp: The file to test lock for
- * @fl: The lock to test; also used to hold result
+ * @fl: The byte-range in the file to test; also used to hold result
*
+ * On entry, @fl does not contain a lock, but identifies a range (fl_start, fl_end)
+ * in the file (c.flc_file), and an owner (c.flc_owner) for whom existing locks
+ * should be ignored. c.flc_type and c.flc_flags are ignored.
+ * Both fl_lmops and fl_ops in @fl must be NULL.
* Returns -ERRNO on failure. Indicates presence of conflicting lock by
- * setting conf->fl_type to something other than F_UNLCK.
+ * setting fl->fl_type to something other than F_UNLCK.
+ *
+ * If vfs_test_lock() does find a lock and return it, the caller must
+ * use locks_free_lock() or locks_release_private() on the returned lock.
*/
int vfs_test_lock(struct file *filp, struct file_lock *fl)
{
+ WARN_ON_ONCE(fl->fl_ops || fl->fl_lmops);
WARN_ON_ONCE(filp != fl->c.flc_file);
if (filp->f_op->lock)
return filp->f_op->lock(filp, F_GETLK, fl);
As reported, ever since commit 1013af4f585f ("mm/hugetlb: fix
huge_pmd_unshare() vs GUP-fast race") we can end up in some situations
where we perform so many IPI broadcasts when unsharing hugetlb PMD page
tables that it severely regresses some workloads.
In particular, when we fork()+exit(), or when we munmap() a large
area backed by many shared PMD tables, we perform one IPI broadcast per
unshared PMD table.
There are two optimizations to be had:
(1) When we process (unshare) multiple such PMD tables, such as during
exit(), it is sufficient to send a single IPI broadcast (as long as
we respect locking rules) instead of one per PMD table.
Locking prevents that any of these PMD tables could get reused before
we drop the lock.
(2) When we are not the last sharer (> 2 users including us), there is
no need to send the IPI broadcast. The shared PMD tables cannot
become exclusive (fully unshared) before an IPI will be broadcasted
by the last sharer.
Concurrent GUP-fast could walk into a PMD table just before we
unshared it. It could then succeed in grabbing a page from the
shared page table even after munmap() etc succeeded (and supressed
an IPI). But there is not difference compared to GUP-fast just
sleeping for a while after grabbing the page and re-enabling IRQs.
Most importantly, GUP-fast will never walk into page tables that are
no-longer shared, because the last sharer will issue an IPI
broadcast.
(if ever required, checking whether the PUD changed in GUP-fast
after grabbing the page like we do in the PTE case could handle
this)
So let's rework PMD sharing TLB flushing + IPI sync to use the mmu_gather
infrastructure so we can implement these optimizations and demystify the
code at least a bit. Extend the mmu_gather infrastructure to be able to
deal with our special hugetlb PMD table sharing implementation.
To make initialization of the mmu_gather easier when working on a single
VMA (in particular, when dealing with hugetlb), provide
tlb_gather_mmu_vma().
We'll consolidate the handling for (full) unsharing of PMD tables in
tlb_unshare_pmd_ptdesc() and tlb_flush_unshared_tables(), and track
in "struct mmu_gather" whether we had (full) unsharing of PMD tables.
Because locking is very special (concurrent unsharing+reuse must be
prevented), we disallow deferring flushing to tlb_finish_mmu() and instead
require an explicit earlier call to tlb_flush_unshared_tables().
From hugetlb code, we call huge_pmd_unshare_flush() where we make sure
that the expected lock protecting us from concurrent unsharing+reuse is
still held.
Check with a VM_WARN_ON_ONCE() in tlb_finish_mmu() that
tlb_flush_unshared_tables() was properly called earlier.
Document it all properly.
Notes about tlb_remove_table_sync_one() interaction with unsharing:
There are two fairly tricky things:
(1) tlb_remove_table_sync_one() is a NOP on architectures without
CONFIG_MMU_GATHER_RCU_TABLE_FREE.
Here, the assumption is that the previous TLB flush would send an
IPI to all relevant CPUs. Careful: some architectures like x86 only
send IPIs to all relevant CPUs when tlb->freed_tables is set.
The relevant architectures should be selecting
MMU_GATHER_RCU_TABLE_FREE, but x86 might not do that in stable
kernels and it might have been problematic before this patch.
Also, the arch flushing behavior (independent of IPIs) is different
when tlb->freed_tables is set. Do we have to enlighten them to also
take care of tlb->unshared_tables? So far we didn't care, so
hopefully we are fine. Of course, we could be setting
tlb->freed_tables as well, but that might then unnecessarily flush
too much, because the semantics of tlb->freed_tables are a bit
fuzzy.
This patch changes nothing in this regard.
(2) tlb_remove_table_sync_one() is not a NOP on architectures with
CONFIG_MMU_GATHER_RCU_TABLE_FREE that actually don't need a sync.
Take x86 as an example: in the common case (!pv, !X86_FEATURE_INVLPGB)
we still issue IPIs during TLB flushes and don't actually need the
second tlb_remove_table_sync_one().
This optimized can be implemented on top of this, by checking e.g., in
tlb_remove_table_sync_one() whether we really need IPIs. But as
described in (1), it really must honor tlb->freed_tables then to
send IPIs to all relevant CPUs.
Notes on TLB flushing changes:
(1) Flushing for non-shared PMD tables
We're converting from flush_hugetlb_tlb_range() to
tlb_remove_huge_tlb_entry(). Given that we properly initialize the
MMU gather in tlb_gather_mmu_vma() to be hugetlb aware, similar to
__unmap_hugepage_range(), that should be fine.
(2) Flushing for shared PMD tables
We're converting from various things (flush_hugetlb_tlb_range(),
tlb_flush_pmd_range(), flush_tlb_range()) to tlb_flush_pmd_range().
tlb_flush_pmd_range() achieves the same that
tlb_remove_huge_tlb_entry() would achieve in these scenarios.
Note that tlb_remove_huge_tlb_entry() also calls
__tlb_remove_tlb_entry(), however that is only implemented on
powerpc, which does not support PMD table sharing.
Similar to (1), tlb_gather_mmu_vma() should make sure that TLB
flushing keeps on working as expected.
Further, note that the ptdesc_pmd_pts_dec() in huge_pmd_share() is not a
concern, as we are holding the i_mmap_lock the whole time, preventing
concurrent unsharing. That ptdesc_pmd_pts_dec() usage will be removed
separately as a cleanup later.
There are plenty more cleanups to be had, but they have to wait until
this is fixed.
Fixes: 1013af4f585f ("mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race")
Reported-by: Uschakow, Stanislav" <suschako(a)amazon.de>
Closes: https://lore.kernel.org/all/4d3878531c76479d9f8ca9789dc6485d@amazon.de/
Tested-by: Laurence Oberman <loberman(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: David Hildenbrand (Red Hat) <david(a)kernel.org>
---
include/asm-generic/tlb.h | 77 +++++++++++++++++++++++-
include/linux/hugetlb.h | 15 +++--
include/linux/mm_types.h | 1 +
mm/hugetlb.c | 123 ++++++++++++++++++++++----------------
mm/mmu_gather.c | 33 ++++++++++
mm/rmap.c | 25 +++++---
6 files changed, 208 insertions(+), 66 deletions(-)
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 1fff717cae510..4d679d2a206b4 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -46,7 +46,8 @@
*
* The mmu_gather API consists of:
*
- * - tlb_gather_mmu() / tlb_gather_mmu_fullmm() / tlb_finish_mmu()
+ * - tlb_gather_mmu() / tlb_gather_mmu_fullmm() / tlb_gather_mmu_vma() /
+ * tlb_finish_mmu()
*
* start and finish a mmu_gather
*
@@ -364,6 +365,20 @@ struct mmu_gather {
unsigned int vma_huge : 1;
unsigned int vma_pfn : 1;
+ /*
+ * Did we unshare (unmap) any shared page tables? For now only
+ * used for hugetlb PMD table sharing.
+ */
+ unsigned int unshared_tables : 1;
+
+ /*
+ * Did we unshare any page tables such that they are now exclusive
+ * and could get reused+modified by the new owner? When setting this
+ * flag, "unshared_tables" will be set as well. For now only used
+ * for hugetlb PMD table sharing.
+ */
+ unsigned int fully_unshared_tables : 1;
+
unsigned int batch_count;
#ifndef CONFIG_MMU_GATHER_NO_GATHER
@@ -400,6 +415,7 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
tlb->cleared_pmds = 0;
tlb->cleared_puds = 0;
tlb->cleared_p4ds = 0;
+ tlb->unshared_tables = 0;
/*
* Do not reset mmu_gather::vma_* fields here, we do not
* call into tlb_start_vma() again to set them if there is an
@@ -484,7 +500,7 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
* these bits.
*/
if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
- tlb->cleared_puds || tlb->cleared_p4ds))
+ tlb->cleared_puds || tlb->cleared_p4ds || tlb->unshared_tables))
return;
tlb_flush(tlb);
@@ -773,6 +789,63 @@ static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd)
}
#endif
+#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
+static inline void tlb_unshare_pmd_ptdesc(struct mmu_gather *tlb, struct ptdesc *pt,
+ unsigned long addr)
+{
+ /*
+ * The caller must make sure that concurrent unsharing + exclusive
+ * reuse is impossible until tlb_flush_unshared_tables() was called.
+ */
+ VM_WARN_ON_ONCE(!ptdesc_pmd_is_shared(pt));
+ ptdesc_pmd_pts_dec(pt);
+
+ /* Clearing a PUD pointing at a PMD table with PMD leaves. */
+ tlb_flush_pmd_range(tlb, addr & PUD_MASK, PUD_SIZE);
+
+ /*
+ * If the page table is now exclusively owned, we fully unshared
+ * a page table.
+ */
+ if (!ptdesc_pmd_is_shared(pt))
+ tlb->fully_unshared_tables = true;
+ tlb->unshared_tables = true;
+}
+
+static inline void tlb_flush_unshared_tables(struct mmu_gather *tlb)
+{
+ /*
+ * As soon as the caller drops locks to allow for reuse of
+ * previously-shared tables, these tables could get modified and
+ * even reused outside of hugetlb context, so we have to make sure that
+ * any page table walkers (incl. TLB, GUP-fast) are aware of that
+ * change.
+ *
+ * Even if we are not fully unsharing a PMD table, we must
+ * flush the TLB for the unsharer now.
+ */
+ if (tlb->unshared_tables)
+ tlb_flush_mmu_tlbonly(tlb);
+
+ /*
+ * Similarly, we must make sure that concurrent GUP-fast will not
+ * walk previously-shared page tables that are getting modified+reused
+ * elsewhere. So broadcast an IPI to wait for any concurrent GUP-fast.
+ *
+ * We only perform this when we are the last sharer of a page table,
+ * as the IPI will reach all CPUs: any GUP-fast.
+ *
+ * Note that on configs where tlb_remove_table_sync_one() is a NOP,
+ * the expectation is that the tlb_flush_mmu_tlbonly() would have issued
+ * required IPIs already for us.
+ */
+ if (tlb->fully_unshared_tables) {
+ tlb_remove_table_sync_one();
+ tlb->fully_unshared_tables = false;
+ }
+}
+#endif /* CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING */
+
#endif /* CONFIG_MMU */
#endif /* _ASM_GENERIC__TLB_H */
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 03c8725efa289..e51b8ef0cebd9 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -240,8 +240,9 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
pte_t *huge_pte_offset(struct mm_struct *mm,
unsigned long addr, unsigned long sz);
unsigned long hugetlb_mask_last_page(struct hstate *h);
-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep);
+int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep);
+void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma);
void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
unsigned long *start, unsigned long *end);
@@ -300,13 +301,17 @@ static inline struct address_space *hugetlb_folio_mapping_lock_write(
return NULL;
}
-static inline int huge_pmd_unshare(struct mm_struct *mm,
- struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+static inline int huge_pmd_unshare(struct mmu_gather *tlb,
+ struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
{
return 0;
}
+static inline void huge_pmd_unshare_flush(struct mmu_gather *tlb,
+ struct vm_area_struct *vma)
+{
+}
+
static inline void adjust_range_if_pmd_sharing_possible(
struct vm_area_struct *vma,
unsigned long *start, unsigned long *end)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 42af2292951d4..d1053b2c1f800 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1522,6 +1522,7 @@ static inline unsigned int mm_cid_size(void)
struct mmu_gather;
extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm);
extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm);
+void tlb_gather_mmu_vma(struct mmu_gather *tlb, struct vm_area_struct *vma);
extern void tlb_finish_mmu(struct mmu_gather *tlb);
struct vm_fault;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3c77cdef12a32..2609b6d58f99e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5096,7 +5096,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
unsigned long last_addr_mask;
pte_t *src_pte, *dst_pte;
struct mmu_notifier_range range;
- bool shared_pmd = false;
+ struct mmu_gather tlb;
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, old_addr,
old_end);
@@ -5106,6 +5106,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
* range.
*/
flush_cache_range(vma, range.start, range.end);
+ tlb_gather_mmu_vma(&tlb, vma);
mmu_notifier_invalidate_range_start(&range);
last_addr_mask = hugetlb_mask_last_page(h);
@@ -5122,8 +5123,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
if (huge_pte_none(huge_ptep_get(mm, old_addr, src_pte)))
continue;
- if (huge_pmd_unshare(mm, vma, old_addr, src_pte)) {
- shared_pmd = true;
+ if (huge_pmd_unshare(&tlb, vma, old_addr, src_pte)) {
old_addr |= last_addr_mask;
new_addr |= last_addr_mask;
continue;
@@ -5134,15 +5134,16 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
break;
move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte, sz);
+ tlb_remove_huge_tlb_entry(h, &tlb, src_pte, old_addr);
}
- if (shared_pmd)
- flush_hugetlb_tlb_range(vma, range.start, range.end);
- else
- flush_hugetlb_tlb_range(vma, old_end - len, old_end);
+ tlb_flush_mmu_tlbonly(&tlb);
+ huge_pmd_unshare_flush(&tlb, vma);
+
mmu_notifier_invalidate_range_end(&range);
i_mmap_unlock_write(mapping);
hugetlb_vma_unlock_write(vma);
+ tlb_finish_mmu(&tlb);
return len + old_addr - old_end;
}
@@ -5161,7 +5162,6 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
unsigned long sz = huge_page_size(h);
bool adjust_reservation;
unsigned long last_addr_mask;
- bool force_flush = false;
WARN_ON(!is_vm_hugetlb_page(vma));
BUG_ON(start & ~huge_page_mask(h));
@@ -5184,10 +5184,8 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
}
ptl = huge_pte_lock(h, mm, ptep);
- if (huge_pmd_unshare(mm, vma, address, ptep)) {
+ if (huge_pmd_unshare(tlb, vma, address, ptep)) {
spin_unlock(ptl);
- tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
- force_flush = true;
address |= last_addr_mask;
continue;
}
@@ -5303,14 +5301,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
}
tlb_end_vma(tlb, vma);
- /*
- * There is nothing protecting a previously-shared page table that we
- * unshared through huge_pmd_unshare() from getting freed after we
- * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare()
- * succeeded, flush the range corresponding to the pud.
- */
- if (force_flush)
- tlb_flush_mmu_tlbonly(tlb);
+ huge_pmd_unshare_flush(tlb, vma);
}
void __hugetlb_zap_begin(struct vm_area_struct *vma,
@@ -6409,11 +6400,11 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
pte_t pte;
struct hstate *h = hstate_vma(vma);
long pages = 0, psize = huge_page_size(h);
- bool shared_pmd = false;
struct mmu_notifier_range range;
unsigned long last_addr_mask;
bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
+ struct mmu_gather tlb;
/*
* In the case of shared PMDs, the area to flush could be beyond
@@ -6426,6 +6417,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
BUG_ON(address >= end);
flush_cache_range(vma, range.start, range.end);
+ tlb_gather_mmu_vma(&tlb, vma);
mmu_notifier_invalidate_range_start(&range);
hugetlb_vma_lock_write(vma);
@@ -6452,7 +6444,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
}
}
ptl = huge_pte_lock(h, mm, ptep);
- if (huge_pmd_unshare(mm, vma, address, ptep)) {
+ if (huge_pmd_unshare(&tlb, vma, address, ptep)) {
/*
* When uffd-wp is enabled on the vma, unshare
* shouldn't happen at all. Warn about it if it
@@ -6461,7 +6453,6 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
WARN_ON_ONCE(uffd_wp || uffd_wp_resolve);
pages++;
spin_unlock(ptl);
- shared_pmd = true;
address |= last_addr_mask;
continue;
}
@@ -6522,22 +6513,16 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
pte = huge_pte_clear_uffd_wp(pte);
huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte);
pages++;
+ tlb_remove_huge_tlb_entry(h, &tlb, ptep, address);
}
next:
spin_unlock(ptl);
cond_resched();
}
- /*
- * There is nothing protecting a previously-shared page table that we
- * unshared through huge_pmd_unshare() from getting freed after we
- * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare()
- * succeeded, flush the range corresponding to the pud.
- */
- if (shared_pmd)
- flush_hugetlb_tlb_range(vma, range.start, range.end);
- else
- flush_hugetlb_tlb_range(vma, start, end);
+
+ tlb_flush_mmu_tlbonly(&tlb);
+ huge_pmd_unshare_flush(&tlb, vma);
/*
* No need to call mmu_notifier_arch_invalidate_secondary_tlbs() we are
* downgrading page table protection not changing it to point to a new
@@ -6548,6 +6533,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
i_mmap_unlock_write(vma->vm_file->f_mapping);
hugetlb_vma_unlock_write(vma);
mmu_notifier_invalidate_range_end(&range);
+ tlb_finish_mmu(&tlb);
return pages > 0 ? (pages << h->order) : pages;
}
@@ -6904,18 +6890,27 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
return pte;
}
-/*
- * unmap huge page backed by shared pte.
+/**
+ * huge_pmd_unshare - Unmap a pmd table if it is shared by multiple users
+ * @tlb: the current mmu_gather.
+ * @vma: the vma covering the pmd table.
+ * @addr: the address we are trying to unshare.
+ * @ptep: pointer into the (pmd) page table.
+ *
+ * Called with the page table lock held, the i_mmap_rwsem held in write mode
+ * and the hugetlb vma lock held in write mode.
*
- * Called with page table lock held.
+ * Note: The caller must call huge_pmd_unshare_flush() before dropping the
+ * i_mmap_rwsem.
*
- * returns: 1 successfully unmapped a shared pte page
- * 0 the underlying pte page is not shared, or it is the last user
+ * Returns: 1 if it was a shared PMD table and it got unmapped, or 0 if it
+ * was not a shared PMD table.
*/
-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
{
unsigned long sz = huge_page_size(hstate_vma(vma));
+ struct mm_struct *mm = vma->vm_mm;
pgd_t *pgd = pgd_offset(mm, addr);
p4d_t *p4d = p4d_offset(pgd, addr);
pud_t *pud = pud_offset(p4d, addr);
@@ -6927,18 +6922,36 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
i_mmap_assert_write_locked(vma->vm_file->f_mapping);
hugetlb_vma_assert_locked(vma);
pud_clear(pud);
- /*
- * Once our caller drops the rmap lock, some other process might be
- * using this page table as a normal, non-hugetlb page table.
- * Wait for pending gup_fast() in other threads to finish before letting
- * that happen.
- */
- tlb_remove_table_sync_one();
- ptdesc_pmd_pts_dec(virt_to_ptdesc(ptep));
+
+ tlb_unshare_pmd_ptdesc(tlb, virt_to_ptdesc(ptep), addr);
+
mm_dec_nr_pmds(mm);
return 1;
}
+/*
+ * huge_pmd_unshare_flush - Complete a sequence of huge_pmd_unshare() calls
+ * @tlb: the current mmu_gather.
+ * @vma: the vma covering the pmd table.
+ *
+ * Perform necessary TLB flushes or IPI broadcasts to synchronize PMD table
+ * unsharing with concurrent page table walkers.
+ *
+ * This function must be called after a sequence of huge_pmd_unshare()
+ * calls while still holding the i_mmap_rwsem.
+ */
+void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ /*
+ * We must synchronize page table unsharing such that nobody will
+ * try reusing a previously-shared page table while it might still
+ * be in use by previous sharers (TLB, GUP_fast).
+ */
+ i_mmap_assert_write_locked(vma->vm_file->f_mapping);
+
+ tlb_flush_unshared_tables(tlb);
+}
+
#else /* !CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING */
pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
@@ -6947,12 +6960,16 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
return NULL;
}
-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
{
return 0;
}
+void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+}
+
void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
unsigned long *start, unsigned long *end)
{
@@ -7219,6 +7236,7 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
unsigned long sz = huge_page_size(h);
struct mm_struct *mm = vma->vm_mm;
struct mmu_notifier_range range;
+ struct mmu_gather tlb;
unsigned long address;
spinlock_t *ptl;
pte_t *ptep;
@@ -7230,6 +7248,8 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
return;
flush_cache_range(vma, start, end);
+ tlb_gather_mmu_vma(&tlb, vma);
+
/*
* No need to call adjust_range_if_pmd_sharing_possible(), because
* we have already done the PUD_SIZE alignment.
@@ -7248,10 +7268,10 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
if (!ptep)
continue;
ptl = huge_pte_lock(h, mm, ptep);
- huge_pmd_unshare(mm, vma, address, ptep);
+ huge_pmd_unshare(&tlb, vma, address, ptep);
spin_unlock(ptl);
}
- flush_hugetlb_tlb_range(vma, start, end);
+ huge_pmd_unshare_flush(&tlb, vma);
if (take_locks) {
i_mmap_unlock_write(vma->vm_file->f_mapping);
hugetlb_vma_unlock_write(vma);
@@ -7261,6 +7281,7 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
* Documentation/mm/mmu_notifier.rst.
*/
mmu_notifier_invalidate_range_end(&range);
+ tlb_finish_mmu(&tlb);
}
/*
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 247e3f9db6c7a..cd32c2dbf501b 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -10,6 +10,7 @@
#include <linux/swap.h>
#include <linux/rmap.h>
#include <linux/pgalloc.h>
+#include <linux/hugetlb.h>
#include <asm/tlb.h>
@@ -426,6 +427,7 @@ static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
#endif
tlb->vma_pfn = 0;
+ tlb->fully_unshared_tables = 0;
__tlb_reset_range(tlb);
inc_tlb_flush_pending(tlb->mm);
}
@@ -459,6 +461,31 @@ void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm)
__tlb_gather_mmu(tlb, mm, true);
}
+/**
+ * tlb_gather_mmu - initialize an mmu_gather structure for operating on a single
+ * VMA
+ * @tlb: the mmu_gather structure to initialize
+ * @vma: the vm_area_struct
+ *
+ * Called to initialize an (on-stack) mmu_gather structure for operating on
+ * a single VMA. In contrast to tlb_gather_mmu(), calling this function will
+ * not require another call to tlb_start_vma(). In contrast to tlb_start_vma(),
+ * this function will *not* call flush_cache_range().
+ *
+ * For hugetlb VMAs, this function will also initialize the mmu_gather
+ * page_size accordingly, not requiring a separate call to
+ * tlb_change_page_size().
+ *
+ */
+void tlb_gather_mmu_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ tlb_gather_mmu(tlb, vma->vm_mm);
+ tlb_update_vma_flags(tlb, vma);
+ if (is_vm_hugetlb_page(vma))
+ /* All entries have the same size. */
+ tlb_change_page_size(tlb, huge_page_size(hstate_vma(vma)));
+}
+
/**
* tlb_finish_mmu - finish an mmu_gather structure
* @tlb: the mmu_gather structure to finish
@@ -468,6 +495,12 @@ void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm)
*/
void tlb_finish_mmu(struct mmu_gather *tlb)
{
+ /*
+ * We expect an earlier huge_pmd_unshare_flush() call to sort this out,
+ * due to complicated locking requirements with page table unsharing.
+ */
+ VM_WARN_ON_ONCE(tlb->fully_unshared_tables);
+
/*
* If there are parallel threads are doing PTE changes on same range
* under non-exclusive lock (e.g., mmap_lock read-side) but defer TLB
diff --git a/mm/rmap.c b/mm/rmap.c
index 748f48727a162..7b9879ef442d9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -76,7 +76,7 @@
#include <linux/mm_inline.h>
#include <linux/oom.h>
-#include <asm/tlbflush.h>
+#include <asm/tlb.h>
#define CREATE_TRACE_POINTS
#include <trace/events/migrate.h>
@@ -2008,13 +2008,17 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
* if unsuccessful.
*/
if (!anon) {
+ struct mmu_gather tlb;
+
VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
if (!hugetlb_vma_trylock_write(vma))
goto walk_abort;
- if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
+
+ tlb_gather_mmu_vma(&tlb, vma);
+ if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
hugetlb_vma_unlock_write(vma);
- flush_tlb_range(vma,
- range.start, range.end);
+ huge_pmd_unshare_flush(&tlb, vma);
+ tlb_finish_mmu(&tlb);
/*
* The PMD table was unmapped,
* consequently unmapping the folio.
@@ -2022,6 +2026,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
goto walk_done;
}
hugetlb_vma_unlock_write(vma);
+ tlb_finish_mmu(&tlb);
}
pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
if (pte_dirty(pteval))
@@ -2398,17 +2403,20 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
* fail if unsuccessful.
*/
if (!anon) {
+ struct mmu_gather tlb;
+
VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
if (!hugetlb_vma_trylock_write(vma)) {
page_vma_mapped_walk_done(&pvmw);
ret = false;
break;
}
- if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
- hugetlb_vma_unlock_write(vma);
- flush_tlb_range(vma,
- range.start, range.end);
+ tlb_gather_mmu_vma(&tlb, vma);
+ if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
+ hugetlb_vma_unlock_write(vma);
+ huge_pmd_unshare_flush(&tlb, vma);
+ tlb_finish_mmu(&tlb);
/*
* The PMD table was unmapped,
* consequently unmapping the folio.
@@ -2417,6 +2425,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
break;
}
hugetlb_vma_unlock_write(vma);
+ tlb_finish_mmu(&tlb);
}
/* Nuke the hugetlb page table entry */
pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
--
2.52.0
The patch titled
Subject: mm/vmscan: fix demotion targets checks in reclaim/demotion
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Bing Jiao <bingjiao(a)google.com>
Subject: mm/vmscan: fix demotion targets checks in reclaim/demotion
Date: Tue, 6 Jan 2026 07:56:54 +0000
Fix two bugs in demote_folio_list() and can_demote() due to incorrect
demotion target checks in reclaim/demotion.
Commit 7d709f49babc ("vmscan,cgroup: apply mems_effective to reclaim")
introduces the cpuset.mems_effective check and applies it to can_demote().
However:
1. It does not apply this check in demote_folio_list(), which leads
to situations where pages are demoted to nodes that are
explicitly excluded from the task's cpuset.mems.
2. It checks only the nodes in the immediate next demotion hierarchy
and does not check all allowed demotion targets in can_demote().
This can cause pages to never be demoted if the nodes in the next
demotion hierarchy are not set in mems_effective.
These bugs break resource isolation provided by cpuset.mems. This is
visible from userspace because pages can either fail to be demoted
entirely or are demoted to nodes that are not allowed in multi-tier memory
systems.
To address these bugs, update cpuset_node_allowed() and
mem_cgroup_node_allowed() to return effective_mems, allowing directly
logic-and operation against demotion targets. Also update can_demote()
and demote_folio_list() accordingly.
Bug 1 reproduction:
Assume a system with 4 nodes, where nodes 0-1 are top-tier and
nodes 2-3 are far-tier memory. All nodes have equal capacity.
Test script:
echo 1 > /sys/kernel/mm/numa/demotion_enabled
mkdir /sys/fs/cgroup/test
echo +cpuset > /sys/fs/cgroup/cgroup.subtree_control
echo "0-2" > /sys/fs/cgroup/test/cpuset.mems
echo $$ > /sys/fs/cgroup/test/cgroup.procs
swapoff -a
# Expectation: Should respect node 0-2 limit.
# Observation: Node 3 shows significant allocation (MemFree drops)
stress-ng --oomable --vm 1 --vm-bytes 150% --mbind 0,1
Bug 2 reproduction:
Assume a system with 6 nodes, where nodes 0-2 are top-tier,
node 3 is a far-tier node, and nodes 4-5 are the farthest-tier nodes.
All nodes have equal capacity.
Test script:
echo 1 > /sys/kernel/mm/numa/demotion_enabled
mkdir /sys/fs/cgroup/test
echo +cpuset > /sys/fs/cgroup/cgroup.subtree_control
echo "0-2,4-5" > /sys/fs/cgroup/test/cpuset.mems
echo $$ > /sys/fs/cgroup/test/cgroup.procs
swapoff -a
# Expectation: Pages are demoted to Nodes 4-5
# Observation: No pages are demoted before oom.
stress-ng --oomable --vm 1 --vm-bytes 150% --mbind 0,1,2
Link: https://lkml.kernel.org/r/20260106075703.1420072-1-bingjiao@google.com
Fixes: 7d709f49babc ("vmscan,cgroup: apply mems_effective to reclaim")
Signed-off-by: Bing Jiao <bingjiao(a)google.com>
Reviewed-by: Gregory Price <gourry(a)gourry.net>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Hildenbrand <david(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: "Michal Koutn��" <mkoutny(a)suse.com>
Cc: Michal Koutn�� <mkoutny(a)suse.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Waiman Long <longman(a)redhat.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: Yuanchu Xie <yuanchu(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/cpuset.h | 6 +--
include/linux/memcontrol.h | 6 +--
kernel/cgroup/cpuset.c | 54 +++++++++++++++++++++++------------
mm/memcontrol.c | 16 +++++++++-
mm/vmscan.c | 30 +++++++++++--------
5 files changed, 74 insertions(+), 38 deletions(-)
--- a/include/linux/cpuset.h~mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion
+++ a/include/linux/cpuset.h
@@ -174,7 +174,7 @@ static inline void set_mems_allowed(node
task_unlock(current);
}
-extern bool cpuset_node_allowed(struct cgroup *cgroup, int nid);
+extern void cpuset_nodes_allowed(struct cgroup *cgroup, nodemask_t *mask);
#else /* !CONFIG_CPUSETS */
static inline bool cpusets_enabled(void) { return false; }
@@ -301,9 +301,9 @@ static inline bool read_mems_allowed_ret
return false;
}
-static inline bool cpuset_node_allowed(struct cgroup *cgroup, int nid)
+static inline void cpuset_nodes_allowed(struct cgroup *cgroup, nodemask_t *mask)
{
- return true;
+ nodes_copy(*mask, node_states[N_MEMORY]);
}
#endif /* !CONFIG_CPUSETS */
--- a/include/linux/memcontrol.h~mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion
+++ a/include/linux/memcontrol.h
@@ -1744,7 +1744,7 @@ static inline void count_objcg_events(st
rcu_read_unlock();
}
-bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid);
+void mem_cgroup_node_filter_allowed(struct mem_cgroup *memcg, nodemask_t *mask);
void mem_cgroup_show_protected_memory(struct mem_cgroup *memcg);
@@ -1815,9 +1815,9 @@ static inline ino_t page_cgroup_ino(stru
return 0;
}
-static inline bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid)
+static inline void mem_cgroup_node_filter_allowed(struct mem_cgroup *memcg,
+ nodemask_t *mask)
{
- return true;
}
static inline void mem_cgroup_show_protected_memory(struct mem_cgroup *memcg)
--- a/kernel/cgroup/cpuset.c~mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion
+++ a/kernel/cgroup/cpuset.c
@@ -4416,40 +4416,58 @@ bool cpuset_current_node_allowed(int nod
return allowed;
}
-bool cpuset_node_allowed(struct cgroup *cgroup, int nid)
+/**
+ * cpuset_nodes_allowed - return effective_mems mask from a cgroup cpuset.
+ * @cgroup: pointer to struct cgroup.
+ * @mask: pointer to struct nodemask_t to be returned.
+ *
+ * Returns effective_mems mask from a cgroup cpuset if it is cgroup v2 and
+ * has cpuset subsys. Otherwise, returns node_states[N_MEMORY].
+ *
+ * This function intentionally avoids taking the cpuset_mutex or callback_lock
+ * when accessing effective_mems. This is because the obtained effective_mems
+ * is stale immediately after the query anyway (e.g., effective_mems is updated
+ * immediately after releasing the lock but before returning).
+ *
+ * As a result, returned @mask may be empty because cs->effective_mems can be
+ * rebound during this call. Besides, nodes in @mask are not guaranteed to be
+ * online due to hot plugins. Callers should check the mask for validity on
+ * return based on its subsequent use.
+ **/
+void cpuset_nodes_allowed(struct cgroup *cgroup, nodemask_t *mask)
{
struct cgroup_subsys_state *css;
struct cpuset *cs;
- bool allowed;
/*
* In v1, mem_cgroup and cpuset are unlikely in the same hierarchy
* and mems_allowed is likely to be empty even if we could get to it,
- * so return true to avoid taking a global lock on the empty check.
+ * so return directly to avoid taking a global lock on the empty check.
*/
- if (!cpuset_v2())
- return true;
+ if (!cgroup || !cpuset_v2()) {
+ nodes_copy(*mask, node_states[N_MEMORY]);
+ return;
+ }
css = cgroup_get_e_css(cgroup, &cpuset_cgrp_subsys);
- if (!css)
- return true;
+ if (!css) {
+ nodes_copy(*mask, node_states[N_MEMORY]);
+ return;
+ }
/*
- * Normally, accessing effective_mems would require the cpuset_mutex
- * or callback_lock - but node_isset is atomic and the reference
- * taken via cgroup_get_e_css is sufficient to protect css.
- *
- * Since this interface is intended for use by migration paths, we
- * relax locking here to avoid taking global locks - while accepting
- * there may be rare scenarios where the result may be innaccurate.
+ * The reference taken via cgroup_get_e_css is sufficient to
+ * protect css, but it does not imply safe accesses to effective_mems.
*
- * Reclaim and migration are subject to these same race conditions, and
- * cannot make strong isolation guarantees, so this is acceptable.
+ * Normally, accessing effective_mems would require the cpuset_mutex
+ * or callback_lock - but the correctness of this information is stale
+ * immediately after the query anyway. We do not acquire the lock
+ * during this process to save lock contention in exchange for racing
+ * against mems_allowed rebinds.
*/
cs = container_of(css, struct cpuset, css);
- allowed = node_isset(nid, cs->effective_mems);
+ nodes_copy(*mask, cs->effective_mems);
css_put(css);
- return allowed;
}
/**
--- a/mm/memcontrol.c~mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion
+++ a/mm/memcontrol.c
@@ -5624,9 +5624,21 @@ subsys_initcall(mem_cgroup_swap_init);
#endif /* CONFIG_SWAP */
-bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid)
+void mem_cgroup_node_filter_allowed(struct mem_cgroup *memcg, nodemask_t *mask)
{
- return memcg ? cpuset_node_allowed(memcg->css.cgroup, nid) : true;
+ nodemask_t allowed;
+
+ if (!memcg)
+ return;
+
+ /*
+ * Since this interface is intended for use by migration paths, and
+ * reclaim and migration are subject to race conditions such as changes
+ * in effective_mems and hot-unpluging of nodes, inaccurate allowed
+ * mask is acceptable.
+ */
+ cpuset_nodes_allowed(memcg->css.cgroup, &allowed);
+ nodes_and(*mask, *mask, allowed);
}
void mem_cgroup_show_protected_memory(struct mem_cgroup *memcg)
--- a/mm/vmscan.c~mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion
+++ a/mm/vmscan.c
@@ -344,19 +344,21 @@ static void flush_reclaim_state(struct s
static bool can_demote(int nid, struct scan_control *sc,
struct mem_cgroup *memcg)
{
- int demotion_nid;
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ nodemask_t allowed_mask;
- if (!numa_demotion_enabled)
+ if (!pgdat || !numa_demotion_enabled)
return false;
if (sc && sc->no_demotion)
return false;
- demotion_nid = next_demotion_node(nid);
- if (demotion_nid == NUMA_NO_NODE)
+ node_get_allowed_targets(pgdat, &allowed_mask);
+ if (nodes_empty(allowed_mask))
return false;
- /* If demotion node isn't in the cgroup's mems_allowed, fall back */
- return mem_cgroup_node_allowed(memcg, demotion_nid);
+ /* Filter out nodes that are not in cgroup's mems_allowed. */
+ mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
+ return !nodes_empty(allowed_mask);
}
static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
@@ -1019,7 +1021,8 @@ static struct folio *alloc_demote_folio(
* Folios which are not demoted are left on @demote_folios.
*/
static unsigned int demote_folio_list(struct list_head *demote_folios,
- struct pglist_data *pgdat)
+ struct pglist_data *pgdat,
+ struct mem_cgroup *memcg)
{
int target_nid = next_demotion_node(pgdat->node_id);
unsigned int nr_succeeded;
@@ -1033,7 +1036,6 @@ static unsigned int demote_folio_list(st
*/
.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
__GFP_NOMEMALLOC | GFP_NOWAIT,
- .nid = target_nid,
.nmask = &allowed_mask,
.reason = MR_DEMOTION,
};
@@ -1041,10 +1043,14 @@ static unsigned int demote_folio_list(st
if (list_empty(demote_folios))
return 0;
- if (target_nid == NUMA_NO_NODE)
- return 0;
-
node_get_allowed_targets(pgdat, &allowed_mask);
+ mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
+ if (nodes_empty(allowed_mask))
+ return false;
+
+ if (!node_isset(target_nid, allowed_mask))
+ target_nid = node_random(&allowed_mask);
+ mtc.nid = target_nid;
/* Demotion ignores all cpuset and mempolicy settings */
migrate_pages(demote_folios, alloc_demote_folio, NULL,
@@ -1566,7 +1572,7 @@ keep:
/* 'folio_list' is always empty here */
/* Migrate folios selected for demotion */
- nr_demoted = demote_folio_list(&demote_folios, pgdat);
+ nr_demoted = demote_folio_list(&demote_folios, pgdat, memcg);
nr_reclaimed += nr_demoted;
stat->nr_demoted += nr_demoted;
/* Folios that could not be demoted are still in @demote_folios */
_
Patches currently in -mm which might be from bingjiao(a)google.com are
mm-vmscan-fix-demotion-targets-checks-in-reclaim-demotion.patch
Here are some device tree updates to improve sound output on RK3576
boards.
The first two patches fix analog audio output on FriendlyElec NanoPi M5,
as it doesn't work with the current device tree.
The third one is purely cosmetic, to present a more user-friendly sound
card name to the userspace on NanoPi M5.
The rest add new functionality: HDMI sound output on three boards that
didn't enable it, and analog sound on RK3576 EVB1.
Signed-off-by: Alexey Charkov <alchark(a)gmail.com>
---
Alexey Charkov (7):
arm64: dts: rockchip: Fix headphones widget name on NanoPi M5
arm64: dts: rockchip: Configure MCLK for analog sound on NanoPi M5
arm64: dts: rockchip: Use a readable audio card name on NanoPi M5
arm64: dts: rockchip: Enable HDMI sound on FriendlyElec NanoPi M5
arm64: dts: rockchip: Enable HDMI sound on Luckfox Core3576
arm64: dts: rockchip: Enable HDMI sound on RK3576 EVB1
arm64: dts: rockchip: Enable analog sound on RK3576 EVB1
arch/arm64/boot/dts/rockchip/rk3576-evb1-v10.dts | 107 +++++++++++++++++++++
.../boot/dts/rockchip/rk3576-luckfox-core3576.dtsi | 8 ++
arch/arm64/boot/dts/rockchip/rk3576-nanopi-m5.dts | 22 ++++-
3 files changed, 132 insertions(+), 5 deletions(-)
---
base-commit: cc3aa43b44bdb43dfbac0fcb51c56594a11338a8
change-id: 20251222-rk3576-sound-0c26e3b16924
Best regards,
--
Alexey Charkov <alchark(a)gmail.com>
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 73cb5f6eafb0ac7aea8cdeb8ff12981aa741d8fb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010515-dragonish-pelican-5b3c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 73cb5f6eafb0ac7aea8cdeb8ff12981aa741d8fb Mon Sep 17 00:00:00 2001
From: Wentao Liang <vulab(a)iscas.ac.cn>
Date: Thu, 11 Dec 2025 04:02:52 +0000
Subject: [PATCH] pmdomain: imx: Fix reference count leak in imx_gpc_probe()
of_get_child_by_name() returns a node pointer with refcount incremented.
Use the __free() attribute to manage the pgc_node reference, ensuring
automatic of_node_put() cleanup when pgc_node goes out of scope.
This eliminates the need for explicit error handling paths and avoids
reference count leaks.
Fixes: 721cabf6c660 ("soc: imx: move PGC handling to a new GPC driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/pmdomain/imx/gpc.c b/drivers/pmdomain/imx/gpc.c
index a34b260274f7..de695f1944ab 100644
--- a/drivers/pmdomain/imx/gpc.c
+++ b/drivers/pmdomain/imx/gpc.c
@@ -402,13 +402,12 @@ static int imx_gpc_old_dt_init(struct device *dev, struct regmap *regmap,
static int imx_gpc_probe(struct platform_device *pdev)
{
const struct imx_gpc_dt_data *of_id_data = device_get_match_data(&pdev->dev);
- struct device_node *pgc_node;
+ struct device_node *pgc_node __free(device_node)
+ = of_get_child_by_name(pdev->dev.of_node, "pgc");
struct regmap *regmap;
void __iomem *base;
int ret;
- pgc_node = of_get_child_by_name(pdev->dev.of_node, "pgc");
-
/* bail out if DT too old and doesn't provide the necessary info */
if (!of_property_present(pdev->dev.of_node, "#power-domain-cells") &&
!pgc_node)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 73cb5f6eafb0ac7aea8cdeb8ff12981aa741d8fb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010515-quaking-outlook-7f2c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 73cb5f6eafb0ac7aea8cdeb8ff12981aa741d8fb Mon Sep 17 00:00:00 2001
From: Wentao Liang <vulab(a)iscas.ac.cn>
Date: Thu, 11 Dec 2025 04:02:52 +0000
Subject: [PATCH] pmdomain: imx: Fix reference count leak in imx_gpc_probe()
of_get_child_by_name() returns a node pointer with refcount incremented.
Use the __free() attribute to manage the pgc_node reference, ensuring
automatic of_node_put() cleanup when pgc_node goes out of scope.
This eliminates the need for explicit error handling paths and avoids
reference count leaks.
Fixes: 721cabf6c660 ("soc: imx: move PGC handling to a new GPC driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/pmdomain/imx/gpc.c b/drivers/pmdomain/imx/gpc.c
index a34b260274f7..de695f1944ab 100644
--- a/drivers/pmdomain/imx/gpc.c
+++ b/drivers/pmdomain/imx/gpc.c
@@ -402,13 +402,12 @@ static int imx_gpc_old_dt_init(struct device *dev, struct regmap *regmap,
static int imx_gpc_probe(struct platform_device *pdev)
{
const struct imx_gpc_dt_data *of_id_data = device_get_match_data(&pdev->dev);
- struct device_node *pgc_node;
+ struct device_node *pgc_node __free(device_node)
+ = of_get_child_by_name(pdev->dev.of_node, "pgc");
struct regmap *regmap;
void __iomem *base;
int ret;
- pgc_node = of_get_child_by_name(pdev->dev.of_node, "pgc");
-
/* bail out if DT too old and doesn't provide the necessary info */
if (!of_property_present(pdev->dev.of_node, "#power-domain-cells") &&
!pgc_node)
Fix a bug where an empty FDA (fd array) object with 0 fds would cause an
out-of-bounds error. The previous implementation used `skip == 0` to
mean "this is a pointer fixup", but 0 is also the correct skip length
for an empty FDA. If the FDA is at the end of the buffer, then this
results in an attempt to write 8-bytes out of bounds. This is caught and
results in an EINVAL error being returned to userspace.
The pattern of using `skip == 0` as a special value originates from the
C-implementation of Binder. As part of fixing this bug, this pattern is
replaced with a Rust enum.
I considered the alternate option of not pushing a fixup when the length
is zero, but I think it's cleaner to just get rid of the zero-is-special
stuff.
The root cause of this bug was diagnosed by Gemini CLI on first try. I
used the following prompt:
> There appears to be a bug in @drivers/android/binder/thread.rs where
> the Fixups oob bug is triggered with 316 304 316 324. This implies
> that we somehow ended up with a fixup where buffer A has a pointer to
> buffer B, but the pointer is located at an index in buffer A that is
> out of bounds. Please investigate the code to find the bug. You may
> compare with @drivers/android/binder.c that implements this correctly.
Cc: stable(a)vger.kernel.org
Reported-by: DeepChirp <DeepChirp(a)outlook.com>
Closes: https://github.com/waydroid/waydroid/issues/2157
Fixes: eafedbc7c050 ("rust_binder: add Rust Binder driver")
Tested-by: DeepChirp <DeepChirp(a)outlook.com>
Signed-off-by: Alice Ryhl <aliceryhl(a)google.com>
---
drivers/android/binder/thread.rs | 59 +++++++++++++++++++++++-----------------
1 file changed, 34 insertions(+), 25 deletions(-)
diff --git a/drivers/android/binder/thread.rs b/drivers/android/binder/thread.rs
index 1a8e6fdc0dc42369ee078e720aa02b2554fb7332..dcd47e10aeb8c748d04320fbbe15ad35201684b9 100644
--- a/drivers/android/binder/thread.rs
+++ b/drivers/android/binder/thread.rs
@@ -69,17 +69,24 @@ struct ScatterGatherEntry {
}
/// This entry specifies that a fixup should happen at `target_offset` of the
-/// buffer. If `skip` is nonzero, then the fixup is a `binder_fd_array_object`
-/// and is applied later. Otherwise if `skip` is zero, then the size of the
-/// fixup is `sizeof::<u64>()` and `pointer_value` is written to the buffer.
-struct PointerFixupEntry {
- /// The number of bytes to skip, or zero for a `binder_buffer_object` fixup.
- skip: usize,
- /// The translated pointer to write when `skip` is zero.
- pointer_value: u64,
- /// The offset at which the value should be written. The offset is relative
- /// to the original buffer.
- target_offset: usize,
+/// buffer.
+enum PointerFixupEntry {
+ /// A fixup for a `binder_buffer_object`.
+ Fixup {
+ /// The translated pointer to write.
+ pointer_value: u64,
+ /// The offset at which the value should be written. The offset is relative
+ /// to the original buffer.
+ target_offset: usize,
+ },
+ /// A skip for a `binder_fd_array_object`.
+ Skip {
+ /// The number of bytes to skip.
+ skip: usize,
+ /// The offset at which the skip should happen. The offset is relative
+ /// to the original buffer.
+ target_offset: usize,
+ },
}
/// Return type of `apply_and_validate_fixup_in_parent`.
@@ -762,8 +769,7 @@ fn translate_object(
parent_entry.fixup_min_offset = info.new_min_offset;
parent_entry.pointer_fixups.push(
- PointerFixupEntry {
- skip: 0,
+ PointerFixupEntry::Fixup {
pointer_value: buffer_ptr_in_user_space,
target_offset: info.target_offset,
},
@@ -807,9 +813,8 @@ fn translate_object(
parent_entry
.pointer_fixups
.push(
- PointerFixupEntry {
+ PointerFixupEntry::Skip {
skip: fds_len,
- pointer_value: 0,
target_offset: info.target_offset,
},
GFP_KERNEL,
@@ -871,17 +876,21 @@ fn apply_sg(&self, alloc: &mut Allocation, sg_state: &mut ScatterGatherState) ->
let mut reader =
UserSlice::new(UserPtr::from_addr(sg_entry.sender_uaddr), sg_entry.length).reader();
for fixup in &mut sg_entry.pointer_fixups {
- let fixup_len = if fixup.skip == 0 {
- size_of::<u64>()
- } else {
- fixup.skip
+ let (fixup_len, fixup_offset) = match fixup {
+ PointerFixupEntry::Fixup { target_offset, .. } => {
+ (size_of::<u64>(), *target_offset)
+ }
+ PointerFixupEntry::Skip {
+ skip,
+ target_offset,
+ } => (*skip, *target_offset),
};
- let target_offset_end = fixup.target_offset.checked_add(fixup_len).ok_or(EINVAL)?;
- if fixup.target_offset < end_of_previous_fixup || offset_end < target_offset_end {
+ let target_offset_end = fixup_offset.checked_add(fixup_len).ok_or(EINVAL)?;
+ if fixup_offset < end_of_previous_fixup || offset_end < target_offset_end {
pr_warn!(
"Fixups oob {} {} {} {}",
- fixup.target_offset,
+ fixup_offset,
end_of_previous_fixup,
offset_end,
target_offset_end
@@ -890,13 +899,13 @@ fn apply_sg(&self, alloc: &mut Allocation, sg_state: &mut ScatterGatherState) ->
}
let copy_off = end_of_previous_fixup;
- let copy_len = fixup.target_offset - end_of_previous_fixup;
+ let copy_len = fixup_offset - end_of_previous_fixup;
if let Err(err) = alloc.copy_into(&mut reader, copy_off, copy_len) {
pr_warn!("Failed copying into alloc: {:?}", err);
return Err(err.into());
}
- if fixup.skip == 0 {
- let res = alloc.write::<u64>(fixup.target_offset, &fixup.pointer_value);
+ if let PointerFixupEntry::Fixup { pointer_value, .. } = fixup {
+ let res = alloc.write::<u64>(fixup_offset, pointer_value);
if let Err(err) = res {
pr_warn!("Failed copying ptr into alloc: {:?}", err);
return Err(err.into());
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20251229-fda-zero-4e46e56be58d
Best regards,
--
Alice Ryhl <aliceryhl(a)google.com>
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 71154bbe49423128c1c8577b6576de1ed6836830
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010540-smuggler-passably-11d2@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 71154bbe49423128c1c8577b6576de1ed6836830 Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Fri, 12 Dec 2025 13:54:03 +0100
Subject: [PATCH] mptcp: fallback earlier on simult connection
Syzkaller reports a simult-connect race leading to inconsistent fallback
status:
WARNING: CPU: 3 PID: 33 at net/mptcp/subflow.c:1515 subflow_data_ready+0x40b/0x7c0 net/mptcp/subflow.c:1515
Modules linked in:
CPU: 3 UID: 0 PID: 33 Comm: ksoftirqd/3 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:subflow_data_ready+0x40b/0x7c0 net/mptcp/subflow.c:1515
Code: 89 ee e8 78 61 3c f6 40 84 ed 75 21 e8 8e 66 3c f6 44 89 fe bf 07 00 00 00 e8 c1 61 3c f6 41 83 ff 07 74 09 e8 76 66 3c f6 90 <0f> 0b 90 e8 6d 66 3c f6 48 89 df e8 e5 ad ff ff 31 ff 89 c5 89 c6
RSP: 0018:ffffc900006cf338 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888031acd100 RCX: ffffffff8b7f2abf
RDX: ffff88801e6ea440 RSI: ffffffff8b7f2aca RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000007
R10: 0000000000000004 R11: 0000000000002c10 R12: ffff88802ba69900
R13: 1ffff920000d9e67 R14: ffff888046f81800 R15: 0000000000000004
FS: 0000000000000000(0000) GS:ffff8880d69bc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000560fc0ca1670 CR3: 0000000032c3a000 CR4: 0000000000352ef0
Call Trace:
<TASK>
tcp_data_queue+0x13b0/0x4f90 net/ipv4/tcp_input.c:5197
tcp_rcv_state_process+0xfdf/0x4ec0 net/ipv4/tcp_input.c:6922
tcp_v6_do_rcv+0x492/0x1740 net/ipv6/tcp_ipv6.c:1672
tcp_v6_rcv+0x2976/0x41e0 net/ipv6/tcp_ipv6.c:1918
ip6_protocol_deliver_rcu+0x188/0x1520 net/ipv6/ip6_input.c:438
ip6_input_finish+0x1e4/0x4b0 net/ipv6/ip6_input.c:489
NF_HOOK include/linux/netfilter.h:318 [inline]
NF_HOOK include/linux/netfilter.h:312 [inline]
ip6_input+0x105/0x2f0 net/ipv6/ip6_input.c:500
dst_input include/net/dst.h:471 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
NF_HOOK include/linux/netfilter.h:318 [inline]
NF_HOOK include/linux/netfilter.h:312 [inline]
ipv6_rcv+0x264/0x650 net/ipv6/ip6_input.c:311
__netif_receive_skb_one_core+0x12d/0x1e0 net/core/dev.c:5979
__netif_receive_skb+0x1d/0x160 net/core/dev.c:6092
process_backlog+0x442/0x15e0 net/core/dev.c:6444
__napi_poll.constprop.0+0xba/0x550 net/core/dev.c:7494
napi_poll net/core/dev.c:7557 [inline]
net_rx_action+0xa9f/0xfe0 net/core/dev.c:7684
handle_softirqs+0x216/0x8e0 kernel/softirq.c:579
run_ksoftirqd kernel/softirq.c:968 [inline]
run_ksoftirqd+0x3a/0x60 kernel/softirq.c:960
smpboot_thread_fn+0x3f7/0xae0 kernel/smpboot.c:160
kthread+0x3c2/0x780 kernel/kthread.c:463
ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
The TCP subflow can process the simult-connect syn-ack packet after
transitioning to TCP_FIN1 state, bypassing the MPTCP fallback check,
as the sk_state_change() callback is not invoked for * -> FIN_WAIT1
transitions.
That will move the msk socket to an inconsistent status and the next
incoming data will hit the reported splat.
Close the race moving the simult-fallback check at the earliest possible
stage - that is at syn-ack generation time.
About the fixes tags: [2] was supposed to also fix this issue introduced
by [3]. [1] is required as a dependence: it was not explicitly marked as
a fix, but it is one and it has already been backported before [3]. In
other words, this commit should be backported up to [3], including [2]
and [1] if that's not already there.
Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().") [1]
Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race") [2]
Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support") [3]
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+0ff6b771b4f7a5bce83b(a)syzkaller.appspotmail.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/586
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-1-d1…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index f24ae7d40e88..43df4293f58b 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -408,6 +408,16 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
*/
subflow->snd_isn = TCP_SKB_CB(skb)->end_seq;
if (subflow->request_mptcp) {
+ if (unlikely(subflow_simultaneous_connect(sk))) {
+ WARN_ON_ONCE(!mptcp_try_fallback(sk, MPTCP_MIB_SIMULTCONNFALLBACK));
+
+ /* Ensure mptcp_finish_connect() will not process the
+ * MPC handshake.
+ */
+ subflow->request_mptcp = 0;
+ return false;
+ }
+
opts->suboptions = OPTION_MPTCP_MPC_SYN;
opts->csum_reqd = mptcp_is_checksum_enabled(sock_net(sk));
opts->allow_join_id0 = mptcp_allow_join_id0(sock_net(sk));
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 9c0d17876b22..bed0c9aa28b6 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1337,10 +1337,8 @@ static inline bool subflow_simultaneous_connect(struct sock *sk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
- return (1 << sk->sk_state) &
- (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSING) &&
- is_active_ssk(subflow) &&
- !subflow->conn_finished;
+ /* Note that the sk state implies !subflow->conn_finished. */
+ return sk->sk_state == TCP_SYN_RECV && is_active_ssk(subflow);
}
#ifdef CONFIG_SYN_COOKIES
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 86ce58ae533d..96d54cb2cd93 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1878,12 +1878,6 @@ static void subflow_state_change(struct sock *sk)
__subflow_state_change(sk);
- if (subflow_simultaneous_connect(sk)) {
- WARN_ON_ONCE(!mptcp_try_fallback(sk, MPTCP_MIB_SIMULTCONNFALLBACK));
- subflow->conn_finished = 1;
- mptcp_propagate_state(parent, sk, subflow, NULL);
- }
-
/* as recvmsg() does not acquire the subflow socket for ssk selection
* a fin packet carrying a DSS can be unnoticed if we don't trigger
* the data available machinery here.
On Thu, 20 Mar 2025 17:02:06 -0500
Sahil Gupta <s.gupta(a)arista.com> wrote:
> Hi Steven,
Hi,
Sorry for the really late reply. I'm cleaning out my inbox and just noticed
this email.
>
> On 6.12.0, sorttable is unable to sort 64-bit ELFs on 32-bit hosts
> because of the parsing of the start_mcount_loc and stop_mcount_loc
> values in get_mcount_loc():
>
> *_start = strtoul(start_buff, NULL, 16);
>
> and
>
> *_stop = strtoul(stop_buff, NULL, 16);
>
> This code makes the (often correct) assumption that the host and the
> target have the same architecture, however it runs into issues when
> compiling for a 64-bit target on a 32-bit host, as unsigned long is
> shorter than the pointer width. As a result, I've noticed that both
> start and stop max out at 2^32 - 1.
>
> It seems that commit 4acda8ed fixes this issue inadvertently by
> directly extracting them from the ELF using the correct width. I'm
> wondering if it is possible to backport this as well as the other
> sorttable refactors to 6.12.0 since they fix this issue.
I think this is a good reason to mark that commit as stable and backport it
to 6.12.
-- Steve
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 71154bbe49423128c1c8577b6576de1ed6836830
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010539-unify-blimp-e7a7@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 71154bbe49423128c1c8577b6576de1ed6836830 Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Fri, 12 Dec 2025 13:54:03 +0100
Subject: [PATCH] mptcp: fallback earlier on simult connection
Syzkaller reports a simult-connect race leading to inconsistent fallback
status:
WARNING: CPU: 3 PID: 33 at net/mptcp/subflow.c:1515 subflow_data_ready+0x40b/0x7c0 net/mptcp/subflow.c:1515
Modules linked in:
CPU: 3 UID: 0 PID: 33 Comm: ksoftirqd/3 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:subflow_data_ready+0x40b/0x7c0 net/mptcp/subflow.c:1515
Code: 89 ee e8 78 61 3c f6 40 84 ed 75 21 e8 8e 66 3c f6 44 89 fe bf 07 00 00 00 e8 c1 61 3c f6 41 83 ff 07 74 09 e8 76 66 3c f6 90 <0f> 0b 90 e8 6d 66 3c f6 48 89 df e8 e5 ad ff ff 31 ff 89 c5 89 c6
RSP: 0018:ffffc900006cf338 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888031acd100 RCX: ffffffff8b7f2abf
RDX: ffff88801e6ea440 RSI: ffffffff8b7f2aca RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000007
R10: 0000000000000004 R11: 0000000000002c10 R12: ffff88802ba69900
R13: 1ffff920000d9e67 R14: ffff888046f81800 R15: 0000000000000004
FS: 0000000000000000(0000) GS:ffff8880d69bc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000560fc0ca1670 CR3: 0000000032c3a000 CR4: 0000000000352ef0
Call Trace:
<TASK>
tcp_data_queue+0x13b0/0x4f90 net/ipv4/tcp_input.c:5197
tcp_rcv_state_process+0xfdf/0x4ec0 net/ipv4/tcp_input.c:6922
tcp_v6_do_rcv+0x492/0x1740 net/ipv6/tcp_ipv6.c:1672
tcp_v6_rcv+0x2976/0x41e0 net/ipv6/tcp_ipv6.c:1918
ip6_protocol_deliver_rcu+0x188/0x1520 net/ipv6/ip6_input.c:438
ip6_input_finish+0x1e4/0x4b0 net/ipv6/ip6_input.c:489
NF_HOOK include/linux/netfilter.h:318 [inline]
NF_HOOK include/linux/netfilter.h:312 [inline]
ip6_input+0x105/0x2f0 net/ipv6/ip6_input.c:500
dst_input include/net/dst.h:471 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
NF_HOOK include/linux/netfilter.h:318 [inline]
NF_HOOK include/linux/netfilter.h:312 [inline]
ipv6_rcv+0x264/0x650 net/ipv6/ip6_input.c:311
__netif_receive_skb_one_core+0x12d/0x1e0 net/core/dev.c:5979
__netif_receive_skb+0x1d/0x160 net/core/dev.c:6092
process_backlog+0x442/0x15e0 net/core/dev.c:6444
__napi_poll.constprop.0+0xba/0x550 net/core/dev.c:7494
napi_poll net/core/dev.c:7557 [inline]
net_rx_action+0xa9f/0xfe0 net/core/dev.c:7684
handle_softirqs+0x216/0x8e0 kernel/softirq.c:579
run_ksoftirqd kernel/softirq.c:968 [inline]
run_ksoftirqd+0x3a/0x60 kernel/softirq.c:960
smpboot_thread_fn+0x3f7/0xae0 kernel/smpboot.c:160
kthread+0x3c2/0x780 kernel/kthread.c:463
ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
The TCP subflow can process the simult-connect syn-ack packet after
transitioning to TCP_FIN1 state, bypassing the MPTCP fallback check,
as the sk_state_change() callback is not invoked for * -> FIN_WAIT1
transitions.
That will move the msk socket to an inconsistent status and the next
incoming data will hit the reported splat.
Close the race moving the simult-fallback check at the earliest possible
stage - that is at syn-ack generation time.
About the fixes tags: [2] was supposed to also fix this issue introduced
by [3]. [1] is required as a dependence: it was not explicitly marked as
a fix, but it is one and it has already been backported before [3]. In
other words, this commit should be backported up to [3], including [2]
and [1] if that's not already there.
Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().") [1]
Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race") [2]
Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support") [3]
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+0ff6b771b4f7a5bce83b(a)syzkaller.appspotmail.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/586
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-1-d1…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index f24ae7d40e88..43df4293f58b 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -408,6 +408,16 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
*/
subflow->snd_isn = TCP_SKB_CB(skb)->end_seq;
if (subflow->request_mptcp) {
+ if (unlikely(subflow_simultaneous_connect(sk))) {
+ WARN_ON_ONCE(!mptcp_try_fallback(sk, MPTCP_MIB_SIMULTCONNFALLBACK));
+
+ /* Ensure mptcp_finish_connect() will not process the
+ * MPC handshake.
+ */
+ subflow->request_mptcp = 0;
+ return false;
+ }
+
opts->suboptions = OPTION_MPTCP_MPC_SYN;
opts->csum_reqd = mptcp_is_checksum_enabled(sock_net(sk));
opts->allow_join_id0 = mptcp_allow_join_id0(sock_net(sk));
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 9c0d17876b22..bed0c9aa28b6 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1337,10 +1337,8 @@ static inline bool subflow_simultaneous_connect(struct sock *sk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
- return (1 << sk->sk_state) &
- (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSING) &&
- is_active_ssk(subflow) &&
- !subflow->conn_finished;
+ /* Note that the sk state implies !subflow->conn_finished. */
+ return sk->sk_state == TCP_SYN_RECV && is_active_ssk(subflow);
}
#ifdef CONFIG_SYN_COOKIES
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 86ce58ae533d..96d54cb2cd93 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1878,12 +1878,6 @@ static void subflow_state_change(struct sock *sk)
__subflow_state_change(sk);
- if (subflow_simultaneous_connect(sk)) {
- WARN_ON_ONCE(!mptcp_try_fallback(sk, MPTCP_MIB_SIMULTCONNFALLBACK));
- subflow->conn_finished = 1;
- mptcp_propagate_state(parent, sk, subflow, NULL);
- }
-
/* as recvmsg() does not acquire the subflow socket for ssk selection
* a fin packet carrying a DSS can be unnoticed if we don't trigger
* the data available machinery here.
The patch titled
Subject: panic: only warn about deprecated panic_print on write access
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
panic-only-warn-about-deprecated-panic_print-on-write-access.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Gal Pressman <gal(a)nvidia.com>
Subject: panic: only warn about deprecated panic_print on write access
Date: Tue, 6 Jan 2026 18:33:21 +0200
The panic_print_deprecated() warning is being triggered on both read and
write operations to the panic_print parameter.
This causes spurious warnings when users run 'sysctl -a' to list all
sysctl values, since that command reads /proc/sys/kernel/panic_print and
triggers the deprecation notice.
Modify the handlers to only emit the deprecation warning when the
parameter is actually being set:
- sysctl_panic_print_handler(): check 'write' flag before warning.
- panic_print_get(): remove the deprecation call entirely.
This way, users are only warned when they actively try to use the
deprecated parameter, not when passively querying system state.
Link: https://lkml.kernel.org/r/20260106163321.83586-1-gal@nvidia.com
Fixes: ee13240cd78b ("panic: add note that panic_print sysctl interface is deprecated")
Fixes: 2683df6539cb ("panic: add note that 'panic_print' parameter is deprecated")
Signed-off-by: Gal Pressman <gal(a)nvidia.com>
Reviewed-by: Mark Bloch <mbloch(a)nvidia.com>
Reviewed-by: Nimrod Oren <noren(a)nvidia.com>
Cc: Feng Tang <feng.tang(a)linux.alibaba.com>
Cc: Joel Granados <joel.granados(a)kernel.org>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/panic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/panic.c~panic-only-warn-about-deprecated-panic_print-on-write-access
+++ a/kernel/panic.c
@@ -131,7 +131,8 @@ static int proc_taint(const struct ctl_t
static int sysctl_panic_print_handler(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
- panic_print_deprecated();
+ if (write)
+ panic_print_deprecated();
return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
}
@@ -1014,7 +1015,6 @@ static int panic_print_set(const char *v
static int panic_print_get(char *val, const struct kernel_param *kp)
{
- panic_print_deprecated();
return param_get_ulong(val, kp);
}
_
Patches currently in -mm which might be from gal(a)nvidia.com are
panic-only-warn-about-deprecated-panic_print-on-write-access.patch
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 71154bbe49423128c1c8577b6576de1ed6836830
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010538-sage-flashcard-993f@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 71154bbe49423128c1c8577b6576de1ed6836830 Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Fri, 12 Dec 2025 13:54:03 +0100
Subject: [PATCH] mptcp: fallback earlier on simult connection
Syzkaller reports a simult-connect race leading to inconsistent fallback
status:
WARNING: CPU: 3 PID: 33 at net/mptcp/subflow.c:1515 subflow_data_ready+0x40b/0x7c0 net/mptcp/subflow.c:1515
Modules linked in:
CPU: 3 UID: 0 PID: 33 Comm: ksoftirqd/3 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:subflow_data_ready+0x40b/0x7c0 net/mptcp/subflow.c:1515
Code: 89 ee e8 78 61 3c f6 40 84 ed 75 21 e8 8e 66 3c f6 44 89 fe bf 07 00 00 00 e8 c1 61 3c f6 41 83 ff 07 74 09 e8 76 66 3c f6 90 <0f> 0b 90 e8 6d 66 3c f6 48 89 df e8 e5 ad ff ff 31 ff 89 c5 89 c6
RSP: 0018:ffffc900006cf338 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888031acd100 RCX: ffffffff8b7f2abf
RDX: ffff88801e6ea440 RSI: ffffffff8b7f2aca RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000007
R10: 0000000000000004 R11: 0000000000002c10 R12: ffff88802ba69900
R13: 1ffff920000d9e67 R14: ffff888046f81800 R15: 0000000000000004
FS: 0000000000000000(0000) GS:ffff8880d69bc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000560fc0ca1670 CR3: 0000000032c3a000 CR4: 0000000000352ef0
Call Trace:
<TASK>
tcp_data_queue+0x13b0/0x4f90 net/ipv4/tcp_input.c:5197
tcp_rcv_state_process+0xfdf/0x4ec0 net/ipv4/tcp_input.c:6922
tcp_v6_do_rcv+0x492/0x1740 net/ipv6/tcp_ipv6.c:1672
tcp_v6_rcv+0x2976/0x41e0 net/ipv6/tcp_ipv6.c:1918
ip6_protocol_deliver_rcu+0x188/0x1520 net/ipv6/ip6_input.c:438
ip6_input_finish+0x1e4/0x4b0 net/ipv6/ip6_input.c:489
NF_HOOK include/linux/netfilter.h:318 [inline]
NF_HOOK include/linux/netfilter.h:312 [inline]
ip6_input+0x105/0x2f0 net/ipv6/ip6_input.c:500
dst_input include/net/dst.h:471 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
NF_HOOK include/linux/netfilter.h:318 [inline]
NF_HOOK include/linux/netfilter.h:312 [inline]
ipv6_rcv+0x264/0x650 net/ipv6/ip6_input.c:311
__netif_receive_skb_one_core+0x12d/0x1e0 net/core/dev.c:5979
__netif_receive_skb+0x1d/0x160 net/core/dev.c:6092
process_backlog+0x442/0x15e0 net/core/dev.c:6444
__napi_poll.constprop.0+0xba/0x550 net/core/dev.c:7494
napi_poll net/core/dev.c:7557 [inline]
net_rx_action+0xa9f/0xfe0 net/core/dev.c:7684
handle_softirqs+0x216/0x8e0 kernel/softirq.c:579
run_ksoftirqd kernel/softirq.c:968 [inline]
run_ksoftirqd+0x3a/0x60 kernel/softirq.c:960
smpboot_thread_fn+0x3f7/0xae0 kernel/smpboot.c:160
kthread+0x3c2/0x780 kernel/kthread.c:463
ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
The TCP subflow can process the simult-connect syn-ack packet after
transitioning to TCP_FIN1 state, bypassing the MPTCP fallback check,
as the sk_state_change() callback is not invoked for * -> FIN_WAIT1
transitions.
That will move the msk socket to an inconsistent status and the next
incoming data will hit the reported splat.
Close the race moving the simult-fallback check at the earliest possible
stage - that is at syn-ack generation time.
About the fixes tags: [2] was supposed to also fix this issue introduced
by [3]. [1] is required as a dependence: it was not explicitly marked as
a fix, but it is one and it has already been backported before [3]. In
other words, this commit should be backported up to [3], including [2]
and [1] if that's not already there.
Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().") [1]
Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race") [2]
Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support") [3]
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+0ff6b771b4f7a5bce83b(a)syzkaller.appspotmail.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/586
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-1-d1…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index f24ae7d40e88..43df4293f58b 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -408,6 +408,16 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
*/
subflow->snd_isn = TCP_SKB_CB(skb)->end_seq;
if (subflow->request_mptcp) {
+ if (unlikely(subflow_simultaneous_connect(sk))) {
+ WARN_ON_ONCE(!mptcp_try_fallback(sk, MPTCP_MIB_SIMULTCONNFALLBACK));
+
+ /* Ensure mptcp_finish_connect() will not process the
+ * MPC handshake.
+ */
+ subflow->request_mptcp = 0;
+ return false;
+ }
+
opts->suboptions = OPTION_MPTCP_MPC_SYN;
opts->csum_reqd = mptcp_is_checksum_enabled(sock_net(sk));
opts->allow_join_id0 = mptcp_allow_join_id0(sock_net(sk));
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 9c0d17876b22..bed0c9aa28b6 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1337,10 +1337,8 @@ static inline bool subflow_simultaneous_connect(struct sock *sk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
- return (1 << sk->sk_state) &
- (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSING) &&
- is_active_ssk(subflow) &&
- !subflow->conn_finished;
+ /* Note that the sk state implies !subflow->conn_finished. */
+ return sk->sk_state == TCP_SYN_RECV && is_active_ssk(subflow);
}
#ifdef CONFIG_SYN_COOKIES
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 86ce58ae533d..96d54cb2cd93 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1878,12 +1878,6 @@ static void subflow_state_change(struct sock *sk)
__subflow_state_change(sk);
- if (subflow_simultaneous_connect(sk)) {
- WARN_ON_ONCE(!mptcp_try_fallback(sk, MPTCP_MIB_SIMULTCONNFALLBACK));
- subflow->conn_finished = 1;
- mptcp_propagate_state(parent, sk, subflow, NULL);
- }
-
/* as recvmsg() does not acquire the subflow socket for ssk selection
* a fin packet carrying a DSS can be unnoticed if we don't trigger
* the data available machinery here.
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 73cb5f6eafb0ac7aea8cdeb8ff12981aa741d8fb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010514-cahoots-scholar-954d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 73cb5f6eafb0ac7aea8cdeb8ff12981aa741d8fb Mon Sep 17 00:00:00 2001
From: Wentao Liang <vulab(a)iscas.ac.cn>
Date: Thu, 11 Dec 2025 04:02:52 +0000
Subject: [PATCH] pmdomain: imx: Fix reference count leak in imx_gpc_probe()
of_get_child_by_name() returns a node pointer with refcount incremented.
Use the __free() attribute to manage the pgc_node reference, ensuring
automatic of_node_put() cleanup when pgc_node goes out of scope.
This eliminates the need for explicit error handling paths and avoids
reference count leaks.
Fixes: 721cabf6c660 ("soc: imx: move PGC handling to a new GPC driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/pmdomain/imx/gpc.c b/drivers/pmdomain/imx/gpc.c
index a34b260274f7..de695f1944ab 100644
--- a/drivers/pmdomain/imx/gpc.c
+++ b/drivers/pmdomain/imx/gpc.c
@@ -402,13 +402,12 @@ static int imx_gpc_old_dt_init(struct device *dev, struct regmap *regmap,
static int imx_gpc_probe(struct platform_device *pdev)
{
const struct imx_gpc_dt_data *of_id_data = device_get_match_data(&pdev->dev);
- struct device_node *pgc_node;
+ struct device_node *pgc_node __free(device_node)
+ = of_get_child_by_name(pdev->dev.of_node, "pgc");
struct regmap *regmap;
void __iomem *base;
int ret;
- pgc_node = of_get_child_by_name(pdev->dev.of_node, "pgc");
-
/* bail out if DT too old and doesn't provide the necessary info */
if (!of_property_present(pdev->dev.of_node, "#power-domain-cells") &&
!pgc_node)
From: Ionut Nechita <ionut.nechita(a)windriver.com>
This series addresses two critical issues in the block layer multiqueue
(blk-mq) subsystem when running on PREEMPT_RT kernels.
The first patch fixes a severe performance regression where queue_lock
contention in the I/O hot path causes IRQ threads to sleep on RT kernels.
Testing on MegaRAID 12GSAS controller showed a 76% performance drop
(640 MB/s -> 153 MB/s). The fix replaces spinlock with memory barriers
to maintain ordering without sleeping.
The second patch fixes a WARN_ON that triggers during SCSI device scanning
when blk_freeze_queue_start() calls blk_mq_run_hw_queues() synchronously
from interrupt context. The warning "WARN_ON_ONCE(!async && in_interrupt())"
is resolved by switching to asynchronous execution.
Changes in v2:
- Removed the blk_mq_cpuhp_lock patch (needs more investigation)
- Added fix for WARN_ON in interrupt context during queue freezing
- Updated commit messages for clarity
Ionut Nechita (2):
block/blk-mq: fix RT kernel regression with queue_lock in hot path
block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt
context
block/blk-mq.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
--
2.52.0
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 73cb5f6eafb0ac7aea8cdeb8ff12981aa741d8fb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010514-aerospace-cathouse-bd44@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 73cb5f6eafb0ac7aea8cdeb8ff12981aa741d8fb Mon Sep 17 00:00:00 2001
From: Wentao Liang <vulab(a)iscas.ac.cn>
Date: Thu, 11 Dec 2025 04:02:52 +0000
Subject: [PATCH] pmdomain: imx: Fix reference count leak in imx_gpc_probe()
of_get_child_by_name() returns a node pointer with refcount incremented.
Use the __free() attribute to manage the pgc_node reference, ensuring
automatic of_node_put() cleanup when pgc_node goes out of scope.
This eliminates the need for explicit error handling paths and avoids
reference count leaks.
Fixes: 721cabf6c660 ("soc: imx: move PGC handling to a new GPC driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/pmdomain/imx/gpc.c b/drivers/pmdomain/imx/gpc.c
index a34b260274f7..de695f1944ab 100644
--- a/drivers/pmdomain/imx/gpc.c
+++ b/drivers/pmdomain/imx/gpc.c
@@ -402,13 +402,12 @@ static int imx_gpc_old_dt_init(struct device *dev, struct regmap *regmap,
static int imx_gpc_probe(struct platform_device *pdev)
{
const struct imx_gpc_dt_data *of_id_data = device_get_match_data(&pdev->dev);
- struct device_node *pgc_node;
+ struct device_node *pgc_node __free(device_node)
+ = of_get_child_by_name(pdev->dev.of_node, "pgc");
struct regmap *regmap;
void __iomem *base;
int ret;
- pgc_node = of_get_child_by_name(pdev->dev.of_node, "pgc");
-
/* bail out if DT too old and doesn't provide the necessary info */
if (!of_property_present(pdev->dev.of_node, "#power-domain-cells") &&
!pgc_node)
This reverts commit e5d527be7e6984882306b49c067f1fec18920735.
This software node change doesn't actually fix any current issues
with the kernel, it is an improvement to the lookup process rather
than fixing a live bug. It also causes a couple of regressions with
shipping laptops, which relied on the label based lookup.
There is a fix for the regressions in mainline, the first 5 patches
of [1]. However, those patches are fairly substantial changes and
given the patch causing the regression doesn't actually fix a bug
it seems better to just revert it in stable.
CC: stable(a)vger.kernel.org # 6.18
Link: https://lore.kernel.org/linux-sound/20251120-reset-gpios-swnodes-v7-0-a1004… [1]
Closes: https://github.com/thesofproject/linux/issues/5599
Closes: https://github.com/thesofproject/linux/issues/5603
Acked-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
Signed-off-by: Charles Keepax <ckeepax(a)opensource.cirrus.com>
---
Changes since v1:
- Corrected commit ID to match 6.18
drivers/gpio/gpiolib-swnode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpiolib-swnode.c b/drivers/gpio/gpiolib-swnode.c
index e3806db1c0e07..f21dbc28cf2c8 100644
--- a/drivers/gpio/gpiolib-swnode.c
+++ b/drivers/gpio/gpiolib-swnode.c
@@ -41,7 +41,7 @@ static struct gpio_device *swnode_get_gpio_device(struct fwnode_handle *fwnode)
!strcmp(gdev_node->name, GPIOLIB_SWNODE_UNDEFINED_NAME))
return ERR_PTR(-ENOENT);
- gdev = gpio_device_find_by_fwnode(fwnode);
+ gdev = gpio_device_find_by_label(gdev_node->name);
return gdev ?: ERR_PTR(-EPROBE_DEFER);
}
--
2.47.3
When we fail to refill the receive buffers, we schedule a delayed worker
to retry later. However, this worker creates some concurrency issues.
For example, when the worker runs concurrently with virtnet_xdp_set,
both need to temporarily disable queue's NAPI before enabling again.
Without proper synchronization, a deadlock can happen when
napi_disable() is called on an already disabled NAPI. That
napi_disable() call will be stuck and so will the subsequent
napi_enable() call.
To simplify the logic and avoid further problems, we will instead retry
refilling in the next NAPI poll.
Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
Reported-by: Paolo Abeni <pabeni(a)redhat.com>
Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-…
Cc: stable(a)vger.kernel.org
Suggested-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: Bui Quang Minh <minhquangbui99(a)gmail.com>
---
drivers/net/virtio_net.c | 48 +++++++++++++++++++++-------------------
1 file changed, 25 insertions(+), 23 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1bb3aeca66c6..f986abf0c236 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3046,16 +3046,16 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
else
packets = virtnet_receive_packets(vi, rq, budget, xdp_xmit, &stats);
+ u64_stats_set(&stats.packets, packets);
if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
- if (!try_fill_recv(vi, rq, GFP_ATOMIC)) {
- spin_lock(&vi->refill_lock);
- if (vi->refill_enabled)
- schedule_delayed_work(&vi->refill, 0);
- spin_unlock(&vi->refill_lock);
- }
+ if (!try_fill_recv(vi, rq, GFP_ATOMIC))
+ /* We need to retry refilling in the next NAPI poll so
+ * we must return budget to make sure the NAPI is
+ * repolled.
+ */
+ packets = budget;
}
- u64_stats_set(&stats.packets, packets);
u64_stats_update_begin(&rq->stats.syncp);
for (i = 0; i < ARRAY_SIZE(virtnet_rq_stats_desc); i++) {
size_t offset = virtnet_rq_stats_desc[i].offset;
@@ -3230,9 +3230,10 @@ static int virtnet_open(struct net_device *dev)
for (i = 0; i < vi->max_queue_pairs; i++) {
if (i < vi->curr_queue_pairs)
- /* Make sure we have some buffers: if oom use wq. */
- if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
- schedule_delayed_work(&vi->refill, 0);
+ /* Pre-fill rq agressively, to make sure we are ready to
+ * get packets immediately.
+ */
+ try_fill_recv(vi, &vi->rq[i], GFP_KERNEL);
err = virtnet_enable_queue_pair(vi, i);
if (err < 0)
@@ -3472,16 +3473,15 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
struct receive_queue *rq,
bool refill)
{
- bool running = netif_running(vi->dev);
- bool schedule_refill = false;
+ if (netif_running(vi->dev)) {
+ /* Pre-fill rq agressively, to make sure we are ready to get
+ * packets immediately.
+ */
+ if (refill)
+ try_fill_recv(vi, rq, GFP_KERNEL);
- if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
- schedule_refill = true;
- if (running)
virtnet_napi_enable(rq);
-
- if (schedule_refill)
- schedule_delayed_work(&vi->refill, 0);
+ }
}
static void virtnet_rx_resume_all(struct virtnet_info *vi)
@@ -3829,11 +3829,13 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
}
succ:
vi->curr_queue_pairs = queue_pairs;
- /* virtnet_open() will refill when device is going to up. */
- spin_lock_bh(&vi->refill_lock);
- if (dev->flags & IFF_UP && vi->refill_enabled)
- schedule_delayed_work(&vi->refill, 0);
- spin_unlock_bh(&vi->refill_lock);
+ if (dev->flags & IFF_UP) {
+ local_bh_disable();
+ for (int i = 0; i < vi->curr_queue_pairs; ++i)
+ virtqueue_napi_schedule(&vi->rq[i].napi, vi->rq[i].vq);
+
+ local_bh_enable();
+ }
return 0;
}
--
2.43.0
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x e0f44f74ed6313e50b38eb39a2c7f210ae208db2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010623-surrender-evolve-4b79@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e0f44f74ed6313e50b38eb39a2c7f210ae208db2 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Tue, 23 Sep 2025 17:23:40 +0200
Subject: [PATCH] drm/mediatek: ovl_adaptor: Fix probe device leaks
Make sure to drop the references taken to the component devices by
of_find_device_by_node() during probe on probe failure (e.g. probe
deferral) and on driver unbind.
Fixes: 453c3364632a ("drm/mediatek: Add ovl_adaptor support for MT8195")
Cc: stable(a)vger.kernel.org # 6.4
Cc: Nancy.Lin <nancy.lin(a)mediatek.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20250923152340.18234-6…
Signed-off-by: Chun-Kuang Hu <chunkuang.hu(a)kernel.org>
diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl_adaptor.c b/drivers/gpu/drm/mediatek/mtk_disp_ovl_adaptor.c
index fe97bb97e004..c0af3e3b51d5 100644
--- a/drivers/gpu/drm/mediatek/mtk_disp_ovl_adaptor.c
+++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl_adaptor.c
@@ -527,6 +527,13 @@ bool mtk_ovl_adaptor_is_comp_present(struct device_node *node)
type == OVL_ADAPTOR_TYPE_PADDING;
}
+static void ovl_adaptor_put_device(void *_dev)
+{
+ struct device *dev = _dev;
+
+ put_device(dev);
+}
+
static int ovl_adaptor_comp_init(struct device *dev, struct component_match **match)
{
struct mtk_disp_ovl_adaptor *priv = dev_get_drvdata(dev);
@@ -560,6 +567,11 @@ static int ovl_adaptor_comp_init(struct device *dev, struct component_match **ma
if (!comp_pdev)
return -EPROBE_DEFER;
+ ret = devm_add_action_or_reset(dev, ovl_adaptor_put_device,
+ &comp_pdev->dev);
+ if (ret)
+ return ret;
+
priv->ovl_adaptor_comp[id] = &comp_pdev->dev;
drm_of_component_match_add(dev, match, component_compare_of, node);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x e0f44f74ed6313e50b38eb39a2c7f210ae208db2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026010626-drainable-stricken-476d@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e0f44f74ed6313e50b38eb39a2c7f210ae208db2 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Tue, 23 Sep 2025 17:23:40 +0200
Subject: [PATCH] drm/mediatek: ovl_adaptor: Fix probe device leaks
Make sure to drop the references taken to the component devices by
of_find_device_by_node() during probe on probe failure (e.g. probe
deferral) and on driver unbind.
Fixes: 453c3364632a ("drm/mediatek: Add ovl_adaptor support for MT8195")
Cc: stable(a)vger.kernel.org # 6.4
Cc: Nancy.Lin <nancy.lin(a)mediatek.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20250923152340.18234-6…
Signed-off-by: Chun-Kuang Hu <chunkuang.hu(a)kernel.org>
diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl_adaptor.c b/drivers/gpu/drm/mediatek/mtk_disp_ovl_adaptor.c
index fe97bb97e004..c0af3e3b51d5 100644
--- a/drivers/gpu/drm/mediatek/mtk_disp_ovl_adaptor.c
+++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl_adaptor.c
@@ -527,6 +527,13 @@ bool mtk_ovl_adaptor_is_comp_present(struct device_node *node)
type == OVL_ADAPTOR_TYPE_PADDING;
}
+static void ovl_adaptor_put_device(void *_dev)
+{
+ struct device *dev = _dev;
+
+ put_device(dev);
+}
+
static int ovl_adaptor_comp_init(struct device *dev, struct component_match **match)
{
struct mtk_disp_ovl_adaptor *priv = dev_get_drvdata(dev);
@@ -560,6 +567,11 @@ static int ovl_adaptor_comp_init(struct device *dev, struct component_match **ma
if (!comp_pdev)
return -EPROBE_DEFER;
+ ret = devm_add_action_or_reset(dev, ovl_adaptor_put_device,
+ &comp_pdev->dev);
+ if (ret)
+ return ret;
+
priv->ovl_adaptor_comp[id] = &comp_pdev->dev;
drm_of_component_match_add(dev, match, component_compare_of, node);
This reverts commit 25decf0469d4c91d90aa2e28d996aed276bfc622.
This software node change doesn't actually fix any current issues
with the kernel, it is an improvement to the lookup process rather
than fixing a live bug. It also causes a couple of regressions with
shipping laptops, which relied on the label based lookup.
There is a fix for the regressions in mainline, the first 5 patches
of [1]. However, those patches are fairly substantial changes and
given the patch causing the regression doesn't actually fix a bug
it seems better to just revert it in stable.
CC: stable(a)vger.kernel.org # 6.18
Link: https://lore.kernel.org/linux-sound/20251120-reset-gpios-swnodes-v7-0-a1004… [1]
Closes: https://github.com/thesofproject/linux/issues/5599
Closes: https://github.com/thesofproject/linux/issues/5603
Acked-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
Signed-off-by: Charles Keepax <ckeepax(a)opensource.cirrus.com>
---
This fix for the software node lookups is also required on 6.18 stable,
see the discussion for 6.12/6.17 in [2] for why we are doing a revert
rather than backporting the other fixes. The "full" fixes are merged in
6.19 so this should be the last kernel we need to push this revert onto.
Thanks,
Charles
drivers/gpio/gpiolib-swnode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpiolib-swnode.c b/drivers/gpio/gpiolib-swnode.c
index e3806db1c0e07..f21dbc28cf2c8 100644
--- a/drivers/gpio/gpiolib-swnode.c
+++ b/drivers/gpio/gpiolib-swnode.c
@@ -41,7 +41,7 @@ static struct gpio_device *swnode_get_gpio_device(struct fwnode_handle *fwnode)
!strcmp(gdev_node->name, GPIOLIB_SWNODE_UNDEFINED_NAME))
return ERR_PTR(-ENOENT);
- gdev = gpio_device_find_by_fwnode(fwnode);
+ gdev = gpio_device_find_by_label(gdev_node->name);
return gdev ?: ERR_PTR(-EPROBE_DEFER);
}
--
2.47.3
Setting vm.dirtytime_expire_seconds to 0 causes wakeup_dirtytime_writeback()
to reschedule itself with a delay of 0, creating an infinite busy loop that
spins kworker at 100% CPU.
This series:
- Patch 1: Fixes the bug by handling interval=0 as "disable writeback"
(consistent with dirty_writeback_centisecs behavior)
- Patch 2: Documents that setting the value to 0 disables writeback
Tested by booting kernels in QEMU with virtme-ng:
- Buggy kernel: kworker CPU spikes to ~73% when interval set to 0
- Fixed kernel: CPU remains normal, writeback correctly disabled
- Re-enabling (0 -> non-zero): writeback resumes correctly
v2:
- Added Reviewed-by from Jan Kara (no code changes)
Laveesh Bansal (2):
writeback: fix 100% CPU usage when dirtytime_expire_interval is 0
docs: clarify that dirtytime_expire_seconds=0 disables writeback
Documentation/admin-guide/sysctl/vm.rst | 2 ++
fs/fs-writeback.c | 14 ++++++++++----
2 files changed, 12 insertions(+), 4 deletions(-)
--
2.43.0
This driver is known broken, as it computes the wrong SHA-1 and SHA-256
hashes. Correctness needs to be the first priority for cryptographic
code. Just disable it, allowing the standard (and actually correct)
SHA-1 and SHA-256 implementations to take priority.
Reported-by: larryw3i <larryw3i(a)yeah.net>
Closes: https://lore.kernel.org/r/3af01fec-b4d3-4d0c-9450-2b722d4bbe39@yeah.net/
Closes: https://lists.debian.org/debian-kernel/2025/09/msg00019.html
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1113996
Cc: stable(a)vger.kernel.org
Cc: AlanSong-oc(a)zhaoxin.com
Cc: CobeChen(a)zhaoxin.com
Cc: GeorgeXue(a)zhaoxin.com
Cc: HansHu(a)zhaoxin.com
Cc: LeoLiu-oc(a)zhaoxin.com
Cc: TonyWWang-oc(a)zhaoxin.com
Cc: YunShen(a)zhaoxin.com
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
---
This patch is targeting crypto/master
drivers/crypto/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index a6688d54984c..16ea3e741350 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -38,11 +38,11 @@ config CRYPTO_DEV_PADLOCK_AES
If unsure say M. The compiled module will be
called padlock-aes.
config CRYPTO_DEV_PADLOCK_SHA
tristate "PadLock driver for SHA1 and SHA256 algorithms"
- depends on CRYPTO_DEV_PADLOCK
+ depends on CRYPTO_DEV_PADLOCK && BROKEN
select CRYPTO_HASH
select CRYPTO_SHA1
select CRYPTO_SHA256
help
Use VIA PadLock for SHA1/SHA256 algorithms.
base-commit: 59b0afd01b2ce353ab422ea9c8375b03db313a21
--
2.51.2