- Linux-stable-mirror - lists.linaro.org

[PATCH] drm/i915: silence rpm wakeref asserts on GEN11_GU_MISC_IIR access

by Jani Nikula

Commit 8d9908e8fe9c ("drm/i915/display: remove small micro-optimizations in irq handling") not only removed the optimizations, it also enabled wakeref asserts for the GEN11_GU_MISC_IIR access. Silence the asserts by wrapping the access inside intel_display_rpm_assert_{block,unblock}(). Reported-by: Jason A. Donenfeld <Jason(a)zx2c4.com> Closes: https://lore.kernel.org/r/aG0tWkfmxWtxl_xc@zx2c4.com Fixes: 8d9908e8fe9c ("drm/i915/display: remove small micro-optimizations in irq handling") Cc: <stable(a)vger.kernel.org> # v6.13+ Suggested-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula(a)intel.com> --- drivers/gpu/drm/i915/display/intel_display_irq.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c b/drivers/gpu/drm/i915/display/intel_display_irq.c index fb25ec8adae3..68157f177b6a 100644 --- a/drivers/gpu/drm/i915/display/intel_display_irq.c +++ b/drivers/gpu/drm/i915/display/intel_display_irq.c @@ -1506,10 +1506,14 @@ u32 gen11_gu_misc_irq_ack(struct intel_display *display, const u32 master_ctl) if (!(master_ctl & GEN11_GU_MISC_IRQ)) return 0; + intel_display_rpm_assert_block(display); + iir = intel_de_read(display, GEN11_GU_MISC_IIR); if (likely(iir)) intel_de_write(display, GEN11_GU_MISC_IIR, iir); + intel_display_rpm_assert_unblock(display); + return iir; } -- 2.39.5

1 month

2
2
0 0

[PATCH] powerpc/mm: Fix SLB multihit issue during SLB preload

by Donet Tom

On systems using the hash MMU, there is a software SLB preload cache that mirrors the entries loaded into the hardware SLB buffer. This preload cache is subject to periodic eviction — typically after every 256 context switches — to remove old entry. To optimize performance, the kernel skips switch_mmu_context() in switch_mm_irqs_off() when the prev and next mm_struct are the same. However, on hash MMU systems, this can lead to inconsistencies between the hardware SLB and the software preload cache. If an SLB entry for a process is evicted from the software cache on one CPU, and the same process later runs on another CPU without executing switch_mmu_context(), the hardware SLB may retain stale entries. If the kernel then attempts to reload that entry, it can trigger an SLB multi-hit error. The following timeline shows how stale SLB entries are created and can cause a multi-hit error when a process moves between CPUs without a MMU context switch. CPU 0 CPU 1 ----- ----- Process P exec swapper/1 load_elf_binary begin_new_exc activate_mm switch_mm_irqs_off switch_mmu_context switch_slb /* * This invalidates all * the entries in the HW * and setup the new HW * SLB entries as per the * preload cache. */ context_switch sched_migrate_task migrates process P to cpu-1 Process swapper/0 context switch (to process P) (uses mm_struct of Process P) switch_mm_irqs_off() switch_slb load_slb++ /* * load_slb becomes 0 here * and we evict an entry from * the preload cache with * preload_age(). We still * keep HW SLB and preload * cache in sync, that is * because all HW SLB entries * anyways gets evicted in * switch_slb during SLBIA. * We then only add those * entries back in HW SLB, * which are currently * present in preload_cache * (after eviction). */ load_elf_binary continues... setup_new_exec() slb_setup_new_exec() sched_switch event sched_migrate_task migrates process P to cpu-0 context_switch from swapper/0 to Process P switch_mm_irqs_off() /* * Since both prev and next mm struct are same we don't call * switch_mmu_context(). This will cause the HW SLB and SW preload * cache to go out of sync in preload_new_slb_context. Because there * was an SLB entry which was evicted from both HW and preload cache * on cpu-1. Now later in preload_new_slb_context(), when we will try * to add the same preload entry again, we will add this to the SW * preload cache and then will add it to the HW SLB. Since on cpu-0 * this entry was never invalidated, hence adding this entry to the HW * SLB will cause a SLB multi-hit error. */ load_elf_binary continues... START_THREAD start_thread preload_new_slb_context /* * This tries to add a new EA to preload cache which was earlier * evicted from both cpu-1 HW SLB and preload cache. This caused the * HW SLB of cpu-0 to go out of sync with the SW preload cache. The * reason for this was, that when we context switched back on CPU-0, * we should have ideally called switch_mmu_context() which will * bring the HW SLB entries on CPU-0 in sync with SW preload cache * entries by setting up the mmu context properly. But we didn't do * that since the prev mm_struct running on cpu-0 was same as the * next mm_struct (which is true for swapper / kernel threads). So * now when we try to add this new entry into the HW SLB of cpu-0, * we hit a SLB multi-hit error. */ WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62 assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149] Modules linked in: CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12 VOLUNTARY Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected) 0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000 REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty) MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000 CFAR: c0000000001543b0 IRQMASK: 3 <...> NIP [c00000000015426c] assert_slb_presence+0x2c/0x50 LR [c0000000001543b4] slb_insert_entry+0x124/0x390 Call Trace: 0x7fffceb5ffff (unreliable) preload_new_slb_context+0x100/0x1a0 start_thread+0x26c/0x420 load_elf_binary+0x1b04/0x1c40 bprm_execve+0x358/0x680 do_execveat_common+0x1f8/0x240 sys_execve+0x58/0x70 system_call_exception+0x114/0x300 system_call_common+0x160/0x2c4 From the above analysis, during early exec the hardware SLB is cleared, and entries from the software preload cache are reloaded into hardware by switch_slb. However, preload_new_slb_context and slb_setup_new_exec also attempt to load some of the same entries, which can trigger a multi-hit. In most cases, these additional preloads simply hit existing entries and add nothing new. Removing these functions avoids redundant preloads and eliminates the multi-hit issue. This patch removes these two functions. We tested process switching performance using the context_switch benchmark on POWER9/hash, and observed no regression. Without this patch: 129041 ops/sec With this patch: 129341 ops/sec We also measured SLB faults during boot, and the counts are essentially the same with and without this patch. SLB faults without this patch: 19727 SLB faults with this patch: 19786 Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache") cc: stable(a)vger.kernel.org Suggested-by: Nicholas Piggin <npiggin(a)gmail.com> Signed-off-by: Donet Tom <donettom(a)linux.ibm.com> --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 - arch/powerpc/kernel/process.c | 5 -- arch/powerpc/mm/book3s64/internal.h | 2 - arch/powerpc/mm/book3s64/mmu_context.c | 2 - arch/powerpc/mm/book3s64/slb.c | 88 ------------------- 5 files changed, 98 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index 1c4eebbc69c9..e1f77e2eead4 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -524,7 +524,6 @@ void slb_save_contents(struct slb_entry *slb_ptr); void slb_dump_contents(struct slb_entry *slb_ptr); extern void slb_vmalloc_update(void); -void preload_new_slb_context(unsigned long start, unsigned long sp); #ifdef CONFIG_PPC_64S_HASH_MMU void slb_set_size(u16 size); diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 855e09886503..2b9799157eb4 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1897,8 +1897,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) return 0; } -void preload_new_slb_context(unsigned long start, unsigned long sp); - /* * Set up a thread for executing a new program */ @@ -1906,9 +1904,6 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp) { #ifdef CONFIG_PPC64 unsigned long load_addr = regs->gpr[2]; /* saved by ELF_PLAT_INIT */ - - if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled()) - preload_new_slb_context(start, sp); #endif #ifdef CONFIG_PPC_TRANSACTIONAL_MEM diff --git a/arch/powerpc/mm/book3s64/internal.h b/arch/powerpc/mm/book3s64/internal.h index a57a25f06a21..c26a6f0c90fc 100644 --- a/arch/powerpc/mm/book3s64/internal.h +++ b/arch/powerpc/mm/book3s64/internal.h @@ -24,8 +24,6 @@ static inline bool stress_hpt(void) void hpt_do_stress(unsigned long ea, unsigned long hpte_group); -void slb_setup_new_exec(void); - void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush); #endif /* ARCH_POWERPC_MM_BOOK3S64_INTERNAL_H */ diff --git a/arch/powerpc/mm/book3s64/mmu_context.c b/arch/powerpc/mm/book3s64/mmu_context.c index 4e1e45420bd4..fb9dcf9ca599 100644 --- a/arch/powerpc/mm/book3s64/mmu_context.c +++ b/arch/powerpc/mm/book3s64/mmu_context.c @@ -150,8 +150,6 @@ static int hash__init_new_context(struct mm_struct *mm) void hash__setup_new_exec(void) { slice_setup_new_exec(); - - slb_setup_new_exec(); } #else static inline int hash__init_new_context(struct mm_struct *mm) diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c index 6b783552403c..7e053c561a09 100644 --- a/arch/powerpc/mm/book3s64/slb.c +++ b/arch/powerpc/mm/book3s64/slb.c @@ -328,94 +328,6 @@ static void preload_age(struct thread_info *ti) ti->slb_preload_tail = (ti->slb_preload_tail + 1) % SLB_PRELOAD_NR; } -void slb_setup_new_exec(void) -{ - struct thread_info *ti = current_thread_info(); - struct mm_struct *mm = current->mm; - unsigned long exec = 0x10000000; - - WARN_ON(irqs_disabled()); - - /* - * preload cache can only be used to determine whether a SLB - * entry exists if it does not start to overflow. - */ - if (ti->slb_preload_nr + 2 > SLB_PRELOAD_NR) - return; - - hard_irq_disable(); - - /* - * We have no good place to clear the slb preload cache on exec, - * flush_thread is about the earliest arch hook but that happens - * after we switch to the mm and have already preloaded the SLBEs. - * - * For the most part that's probably okay to use entries from the - * previous exec, they will age out if unused. It may turn out to - * be an advantage to clear the cache before switching to it, - * however. - */ - - /* - * preload some userspace segments into the SLB. - * Almost all 32 and 64bit PowerPC executables are linked at - * 0x10000000 so it makes sense to preload this segment. - */ - if (!is_kernel_addr(exec)) { - if (preload_add(ti, exec)) - slb_allocate_user(mm, exec); - } - - /* Libraries and mmaps. */ - if (!is_kernel_addr(mm->mmap_base)) { - if (preload_add(ti, mm->mmap_base)) - slb_allocate_user(mm, mm->mmap_base); - } - - /* see switch_slb */ - asm volatile("isync" : : : "memory"); - - local_irq_enable(); -} - -void preload_new_slb_context(unsigned long start, unsigned long sp) -{ - struct thread_info *ti = current_thread_info(); - struct mm_struct *mm = current->mm; - unsigned long heap = mm->start_brk; - - WARN_ON(irqs_disabled()); - - /* see above */ - if (ti->slb_preload_nr + 3 > SLB_PRELOAD_NR) - return; - - hard_irq_disable(); - - /* Userspace entry address. */ - if (!is_kernel_addr(start)) { - if (preload_add(ti, start)) - slb_allocate_user(mm, start); - } - - /* Top of stack, grows down. */ - if (!is_kernel_addr(sp)) { - if (preload_add(ti, sp)) - slb_allocate_user(mm, sp); - } - - /* Bottom of heap, grows up. */ - if (heap && !is_kernel_addr(heap)) { - if (preload_add(ti, heap)) - slb_allocate_user(mm, heap); - } - - /* see switch_slb */ - asm volatile("isync" : : : "memory"); - - local_irq_enable(); -} - static void slb_cache_slbie_kernel(unsigned int index) { unsigned long slbie_data = get_paca()->slb_cache[index]; -- 2.47.3

1 month

1
0
0 0

[PATCH v2] block: restore default wbt enablement

by Julian Sun

The commit 245618f8e45f ("block: protect wbt_lat_usec using q->elevator_lock") protected wbt_enable_default() with q->elevator_lock; however, it also placed wbt_enable_default() before blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);, resulting in wbt failing to be enabled. Moreover, the protection of wbt_enable_default() by q->elevator_lock was removed in commit 78c271344b6f ("block: move wbt_enable_default() out of queue freezing from sched ->exit()"), so we can directly fix this issue by placing wbt_enable_default() after blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);. Additionally, this issue also causes the inability to read the wbt_lat_usec file, and the scenario is as follows: root@q:/sys/block/sda/queue# cat wbt_lat_usec cat: wbt_lat_usec: Invalid argument root@q:/data00/sjc/linux# ls /sys/kernel/debug/block/sda/rqos cannot access '/sys/kernel/debug/block/sda/rqos': No such file or directory root@q:/data00/sjc/linux# find /sys -name wbt /sys/kernel/debug/tracing/events/wbt After testing with this patch, wbt can be enabled normally. Signed-off-by: Julian Sun <sunjunchao(a)bytedance.com> Cc: stable(a)vger.kernel.org Fixes: 245618f8e45f ("block: protect wbt_lat_usec using q->elevator_lock") --- Changed in v2: - Improved commit message and comment - Added Fixes and Cc stable block/blk-sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 396cded255ea..979f01bbca01 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -903,9 +903,9 @@ int blk_register_queue(struct gendisk *disk) if (queue_is_mq(q)) elevator_set_default(q); - wbt_enable_default(disk); blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q); + wbt_enable_default(disk); /* Now everything is ready and send out KOBJ_ADD uevent */ kobject_uevent(&disk->queue_kobj, KOBJ_ADD); -- 2.39.5

1 month

6
6
0 0

Re: do_change_type(): refuse to operate on unmounted/not ours mounts

by Andrei Vagin

On Thu, Jul 24, 2025 at 4:00 PM Al Viro <viro(a)zeniv.linux.org.uk> wrote: > > On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote: > > Hi Al and Christian, > > > > The commit 12f147ddd6de ("do_change_type(): refuse to operate on > > unmounted/not ours mounts") introduced an ABI backward compatibility > > break. CRIU depends on the previous behavior, and users are now > > reporting criu restore failures following the kernel update. This change > > has been propagated to stable kernels. Is this check strictly required? > > Yes. > > > Would it be possible to check only if the current process has > > CAP_SYS_ADMIN within the mount user namespace? > > Not enough, both in terms of permissions *and* in terms of "thou > shalt not bugger the kernel data structures - nobody's priveleged > enough for that". Al, I am still thinking in terms of "Thou shalt not break userspace"... Seriously though, this original behavior has been in the kernel for 20 years, and it hasn't triggered any corruptions in all that time. I understand this change might be necessary in its current form, and that some collateral damage could be unavoidable. But if that's the case, I'd expect a detailed explanation of why it had to be so and why userspace breakage is unavoidable. The original change was merged two decades ago. We need to consider that some applications might rely on that behavior. I'm not questioning the security aspect - that must be addressed. But for anything else, we need to minimize the impact on user applications that don't violate security. We can consider a cleaner fix for the upstream kernel, but when we are talking about stable kernels, the user-space backward compatibility aspect should be even more critical. Thanks, Andrei

1 month

7
15
0 0

[PATCH 6.12.y] wifi: mac80211: check basic rates validity in sta_link_apply_parameters

by Hanne-Lotta Mäenpää

From: Mikhail Lobanov <m.lobanov(a)rosa.ru> [ Upstream commit 16ee3ea8faef8ff042acc15867a6c458c573de61 ] When userspace sets supported rates for a new station via NL80211_CMD_NEW_STATION, it might send a list that's empty or contains only invalid values. Currently, we process these values in sta_link_apply_parameters() without checking the result of ieee80211_parse_bitrates(), which can lead to an empty rates bitmap. A similar issue was addressed for NL80211_CMD_SET_BSS in commit ce04abc3fcc6 ("wifi: mac80211: check basic rates validity"). This patch applies the same approach in sta_link_apply_parameters() for NL80211_CMD_NEW_STATION, ensuring there is at least one valid rate by inspecting the result of ieee80211_parse_bitrates(). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: b95eb7f0eee4 ("wifi: cfg80211/mac80211: separate link params from station params") Signed-off-by: Mikhail Lobanov <m.lobanov(a)rosa.ru> Link: https://patch.msgid.link/20250317103139.17625-1-m.lobanov@rosa.ru Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> (cherry picked from commit 16ee3ea8faef8ff042acc15867a6c458c573de61) Signed-off-by: Hanne-Lotta Mäenpää <hannelotta(a)gmail.com> --- net/mac80211/cfg.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index cf2b8a05c338..9da17d653238 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -1879,12 +1879,12 @@ static int sta_link_apply_parameters(struct ieee80211_local *local, } if (params->supported_rates && - params->supported_rates_len) { - ieee80211_parse_bitrates(link->conf->chanreq.oper.width, - sband, params->supported_rates, - params->supported_rates_len, - &link_sta->pub->supp_rates[sband->band]); - } + params->supported_rates_len && + !ieee80211_parse_bitrates(link->conf->chanreq.oper.width, + sband, params->supported_rates, + params->supported_rates_len, + &link_sta->pub->supp_rates[sband->band])) + return -EINVAL; if (params->ht_capa) ieee80211_ht_cap_ie_to_sta_ht_cap(sdata, sband, -- 2.50.0

1 month

1
0
0 0

[PATCH] usb: hub: Don't try to recover devices lost during warm reset.

by Mathias Nyman

Hub driver warm-resets ports in SS.Inactive or Compliance mode to recover a possible connected device. The port reset code correctly detects if a connection is lost during reset, but hub driver port_event() fails to take this into account in some cases. port_event() ends up using stale values and assumes there is a connected device, and will try all means to recover it, including power-cycling the port. Details: This case was triggered when xHC host was suspended with DbC (Debug Capability) enabled and connected. DbC turns one xHC port into a simple usb debug device, allowing debugging a system with an A-to-A USB debug cable. xhci DbC code disables DbC when xHC is system suspended to D3, and enables it back during resume. We essentially end up with two hosts connected to each other during suspend, and, for a short while during resume, until DbC is enabled back. The suspended xHC host notices some activity on the roothub port, but can't train the link due to being suspended, so xHC hardware sets a CAS (Cold Attach Status) flag for this port to inform xhci host driver that the port needs to be warm reset once xHC resumes. CAS is xHCI specific, and not part of USB specification, so xhci driver tells usb core that the port has a connection and link is in compliance mode. Recovery from complinace mode is similar to CAS recovery. xhci CAS driver support that fakes a compliance mode connection was added in commit 8bea2bd37df0 ("usb: Add support for root hub port status CAS") Once xHCI resumes and DbC is enabled back, all activity on the xHC roothub host side port disappears. The hub driver will anyway think port has a connection and link is in compliance mode, and hub driver will try to recover it. The port power-cycle during recovery seems to cause issues to the active DbC connection. Fix this by clearing connect_change flag if hub_port_reset() returns -ENOTCONN, thus avoiding the whole unnecessary port recovery and initialization attempt. Cc: stable(a)vger.kernel.org Fixes: 8bea2bd37df0 ("usb: Add support for root hub port status CAS") Tested-by: Łukasz Bartosik <ukaszb(a)chromium.org> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/core/hub.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 6bb6e92cb0a4..f981e365be36 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -5754,6 +5754,7 @@ static void port_event(struct usb_hub *hub, int port1) struct usb_device *hdev = hub->hdev; u16 portstatus, portchange; int i = 0; + int err; connect_change = test_bit(port1, hub->change_bits); clear_bit(port1, hub->event_bits); @@ -5850,8 +5851,11 @@ static void port_event(struct usb_hub *hub, int port1) } else if (!udev || !(portstatus & USB_PORT_STAT_CONNECTION) || udev->state == USB_STATE_NOTATTACHED) { dev_dbg(&port_dev->dev, "do warm reset, port only\n"); - if (hub_port_reset(hub, port1, NULL, - HUB_BH_RESET_TIME, true) < 0) + err = hub_port_reset(hub, port1, NULL, + HUB_BH_RESET_TIME, true); + if (!udev && err == -ENOTCONN) + connect_change = 0; + else if (err < 0) hub_port_disable(hub, port1, 1); } else { dev_dbg(&port_dev->dev, "do warm reset, full device\n"); -- 2.43.0

1 month

6
17
0 0

[PATCH] cifs: Fix UAF in cifs_demultiplex_thread()

by Chanho Min

From: Zhang Xiaoxu <zhangxiaoxu5(a)huawei.com> commit d527f51331cace562393a8038d870b3e9916686f upstream. There is a UAF when xfstests on cifs: BUG: KASAN: use-after-free in smb2_is_network_name_deleted+0x27/0x160 Read of size 4 at addr ffff88810103fc08 by task cifsd/923 CPU: 1 PID: 923 Comm: cifsd Not tainted 6.1.0-rc4+ #45 ... Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report+0x171/0x472 kasan_report+0xad/0x130 kasan_check_range+0x145/0x1a0 smb2_is_network_name_deleted+0x27/0x160 cifs_demultiplex_thread.cold+0x172/0x5a4 kthread+0x165/0x1a0 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 923: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_slab_alloc+0x54/0x60 kmem_cache_alloc+0x147/0x320 mempool_alloc+0xe1/0x260 cifs_small_buf_get+0x24/0x60 allocate_buffers+0xa1/0x1c0 cifs_demultiplex_thread+0x199/0x10d0 kthread+0x165/0x1a0 ret_from_fork+0x1f/0x30 Freed by task 921: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x143/0x1b0 kmem_cache_free+0xe3/0x4d0 cifs_small_buf_release+0x29/0x90 SMB2_negotiate+0x8b7/0x1c60 smb2_negotiate+0x51/0x70 cifs_negotiate_protocol+0xf0/0x160 cifs_get_smb_ses+0x5fa/0x13c0 mount_get_conns+0x7a/0x750 cifs_mount+0x103/0xd00 cifs_smb3_do_mount+0x1dd/0xcb0 smb3_get_tree+0x1d5/0x300 vfs_get_tree+0x41/0xf0 path_mount+0x9b3/0xdd0 __x64_sys_mount+0x190/0x1d0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The UAF is because: mount(pid: 921) | cifsd(pid: 923) -------------------------------|------------------------------- | cifs_demultiplex_thread SMB2_negotiate | cifs_send_recv | compound_send_recv | smb_send_rqst | wait_for_response | wait_event_state [1] | | standard_receive3 | cifs_handle_standard | handle_mid | mid->resp_buf = buf; [2] | dequeue_mid [3] KILL the process [4] | resp_iov[i].iov_base = buf | free_rsp_buf [5] | | is_network_name_deleted [6] | callback 1. After send request to server, wait the response until mid->mid_state != SUBMITTED; 2. Receive response from server, and set it to mid; 3. Set the mid state to RECEIVED; 4. Kill the process, the mid state already RECEIVED, get 0; 5. Handle and release the negotiate response; 6. UAF. It can be easily reproduce with add some delay in [3] - [6]. Only sync call has the problem since async call's callback is executed in cifsd process. Add an extra state to mark the mid state to READY before wakeup the waitter, then it can get the resp safely. Cc: stable(a)vger.kernel.org # 5.4 Fixes: ec637e3ffb6b ("[CIFS] Avoid extra large buffer allocation (and memcpy) in cifs_readpages") Reviewed-by: Paulo Alcantara (SUSE) <pc(a)manguebit.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5(a)huawei.com> Signed-off-by: Steve French <stfrench(a)microsoft.com> [fs/cifs was moved to fs/smb/client since 38c8a9a52082 ("smb: move client and server files to common directory fs/smb"). We apply the patch to fs/cifs with some minor context changes.] Signed-off-by: He Zhe <zhe.he(a)windriver.com> Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com> [ chanho: Backported to v5.4.y ] Signed-off-by: Chanho Min <chanho.min(a)lge.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- fs/cifs/cifsglob.h | 1 + fs/cifs/transport.c | 34 +++++++++++++++++++++++----------- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 5f545a240afa6..19107b77f8c00 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1722,6 +1722,7 @@ static inline bool is_retryable_error(int error) #define MID_RETRY_NEEDED 8 /* session closed while this request out */ #define MID_RESPONSE_MALFORMED 0x10 #define MID_SHUTDOWN 0x20 +#define MID_RESPONSE_READY 0x40 /* ready for other process handle the rsp */ /* Flags */ #define MID_WAIT_CANCELLED 1 /* Cancelled while waiting for response */ diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index fc237b920f231..d0bc90746fa64 100644 --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -47,6 +47,8 @@ void cifs_wake_up_task(struct mid_q_entry *mid) { + if (mid->mid_state == MID_RESPONSE_RECEIVED) + mid->mid_state = MID_RESPONSE_READY; wake_up_process(mid->callback_data); } @@ -99,7 +101,8 @@ void _cifs_mid_q_entry_release(struct kref *refcount) struct TCP_Server_Info *server = midEntry->server; if (midEntry->resp_buf && (midEntry->mid_flags & MID_WAIT_CANCELLED) && - midEntry->mid_state == MID_RESPONSE_RECEIVED && + (midEntry->mid_state == MID_RESPONSE_RECEIVED || + midEntry->mid_state == MID_RESPONSE_READY) && server->ops->handle_cancelled_mid) server->ops->handle_cancelled_mid(midEntry->resp_buf, server); @@ -730,7 +733,8 @@ wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ) int error; error = wait_event_freezekillable_unsafe(server->response_q, - midQ->mid_state != MID_REQUEST_SUBMITTED); + midQ->mid_state != MID_REQUEST_SUBMITTED && + midQ->mid_state != MID_RESPONSE_RECEIVED); if (error < 0) return -ERESTARTSYS; @@ -882,7 +886,7 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server) spin_lock(&GlobalMid_Lock); switch (mid->mid_state) { - case MID_RESPONSE_RECEIVED: + case MID_RESPONSE_READY: spin_unlock(&GlobalMid_Lock); return rc; case MID_RETRY_NEEDED: @@ -980,6 +984,9 @@ cifs_compound_callback(struct mid_q_entry *mid) credits.instance = server->reconnect_instance; add_credits(server, &credits, mid->optype); + + if (mid->mid_state == MID_RESPONSE_RECEIVED) + mid->mid_state = MID_RESPONSE_READY; } static void @@ -1143,7 +1150,8 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses, send_cancel(server, &rqst[i], midQ[i]); spin_lock(&GlobalMid_Lock); midQ[i]->mid_flags |= MID_WAIT_CANCELLED; - if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) { + if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED || + midQ[i]->mid_state == MID_RESPONSE_RECEIVED) { midQ[i]->callback = cifs_cancelled_callback; cancelled_mid[i] = true; credits[i].value = 0; @@ -1164,7 +1172,7 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses, } if (!midQ[i]->resp_buf || - midQ[i]->mid_state != MID_RESPONSE_RECEIVED) { + midQ[i]->mid_state != MID_RESPONSE_READY) { rc = -EIO; cifs_dbg(FYI, "Bad MID state?\n"); goto out; @@ -1341,7 +1349,8 @@ SendReceive(const unsigned int xid, struct cifs_ses *ses, if (rc != 0) { send_cancel(server, &rqst, midQ); spin_lock(&GlobalMid_Lock); - if (midQ->mid_state == MID_REQUEST_SUBMITTED) { + if (midQ->mid_state == MID_REQUEST_SUBMITTED || + midQ->mid_state == MID_RESPONSE_RECEIVED) { /* no longer considered to be "in-flight" */ midQ->callback = DeleteMidQEntry; spin_unlock(&GlobalMid_Lock); @@ -1358,7 +1367,7 @@ SendReceive(const unsigned int xid, struct cifs_ses *ses, } if (!midQ->resp_buf || !out_buf || - midQ->mid_state != MID_RESPONSE_RECEIVED) { + midQ->mid_state != MID_RESPONSE_READY) { rc = -EIO; cifs_server_dbg(VFS, "Bad MID state?\n"); goto out; @@ -1478,13 +1487,15 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon, /* Wait for a reply - allow signals to interrupt. */ rc = wait_event_interruptible(server->response_q, - (!(midQ->mid_state == MID_REQUEST_SUBMITTED)) || + (!(midQ->mid_state == MID_REQUEST_SUBMITTED || + midQ->mid_state == MID_RESPONSE_RECEIVED)) || ((server->tcpStatus != CifsGood) && (server->tcpStatus != CifsNew))); /* Were we interrupted by a signal ? */ if ((rc == -ERESTARTSYS) && - (midQ->mid_state == MID_REQUEST_SUBMITTED) && + (midQ->mid_state == MID_REQUEST_SUBMITTED || + midQ->mid_state == MID_RESPONSE_RECEIVED) && ((server->tcpStatus == CifsGood) || (server->tcpStatus == CifsNew))) { @@ -1514,7 +1525,8 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon, if (rc) { send_cancel(server, &rqst, midQ); spin_lock(&GlobalMid_Lock); - if (midQ->mid_state == MID_REQUEST_SUBMITTED) { + if (midQ->mid_state == MID_REQUEST_SUBMITTED || + midQ->mid_state == MID_RESPONSE_RECEIVED) { /* no longer considered to be "in-flight" */ midQ->callback = DeleteMidQEntry; spin_unlock(&GlobalMid_Lock); @@ -1532,7 +1544,7 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon, return rc; /* rcvd frame is ok */ - if (out_buf == NULL || midQ->mid_state != MID_RESPONSE_RECEIVED) { + if (out_buf == NULL || midQ->mid_state != MID_RESPONSE_READY) { rc = -EIO; cifs_tcon_dbg(VFS, "Bad MID state?\n"); goto out;

1 month

1
0
0 0

[PATCH] Bluetooth: fix use-after-free in device_for_each_child()

by Chanho Min

From: Dmitry Antipov <dmantipov(a)yandex.ru> [ Upstream commit 27aabf27fd014ae037cc179c61b0bee7cff55b3d ] Syzbot has reported the following KASAN splat: BUG: KASAN: slab-use-after-free in device_for_each_child+0x18f/0x1a0 Read of size 8 at addr ffff88801f605308 by task kbnepd bnep0/4980 CPU: 0 UID: 0 PID: 4980 Comm: kbnepd bnep0 Not tainted 6.12.0-rc4-00161-gae90f6a6170d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x100/0x190 ? device_for_each_child+0x18f/0x1a0 print_report+0x13a/0x4cb ? __virt_addr_valid+0x5e/0x590 ? __phys_addr+0xc6/0x150 ? device_for_each_child+0x18f/0x1a0 kasan_report+0xda/0x110 ? device_for_each_child+0x18f/0x1a0 ? __pfx_dev_memalloc_noio+0x10/0x10 device_for_each_child+0x18f/0x1a0 ? __pfx_device_for_each_child+0x10/0x10 pm_runtime_set_memalloc_noio+0xf2/0x180 netdev_unregister_kobject+0x1ed/0x270 unregister_netdevice_many_notify+0x123c/0x1d80 ? __mutex_trylock_common+0xde/0x250 ? __pfx_unregister_netdevice_many_notify+0x10/0x10 ? trace_contention_end+0xe6/0x140 ? __mutex_lock+0x4e7/0x8f0 ? __pfx_lock_acquire.part.0+0x10/0x10 ? rcu_is_watching+0x12/0xc0 ? unregister_netdev+0x12/0x30 unregister_netdevice_queue+0x30d/0x3f0 ? __pfx_unregister_netdevice_queue+0x10/0x10 ? __pfx_down_write+0x10/0x10 unregister_netdev+0x1c/0x30 bnep_session+0x1fb3/0x2ab0 ? __pfx_bnep_session+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? __pfx_woken_wake_function+0x10/0x10 ? __kthread_parkme+0x132/0x200 ? __pfx_bnep_session+0x10/0x10 ? kthread+0x13a/0x370 ? __pfx_bnep_session+0x10/0x10 kthread+0x2b7/0x370 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x48/0x80 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 4974: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0xaa/0xb0 __kmalloc_noprof+0x1d1/0x440 hci_alloc_dev_priv+0x1d/0x2820 __vhci_create_device+0xef/0x7d0 vhci_write+0x2c7/0x480 vfs_write+0x6a0/0xfc0 ksys_write+0x12f/0x260 do_syscall_64+0xc7/0x250 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 4979: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x4f/0x70 kfree+0x141/0x490 hci_release_dev+0x4d9/0x600 bt_host_release+0x6a/0xb0 device_release+0xa4/0x240 kobject_put+0x1ec/0x5a0 put_device+0x1f/0x30 vhci_release+0x81/0xf0 __fput+0x3f6/0xb30 task_work_run+0x151/0x250 do_exit+0xa79/0x2c30 do_group_exit+0xd5/0x2a0 get_signal+0x1fcd/0x2210 arch_do_signal_or_restart+0x93/0x780 syscall_exit_to_user_mode+0x140/0x290 do_syscall_64+0xd4/0x250 entry_SYSCALL_64_after_hwframe+0x77/0x7f In 'hci_conn_del_sysfs()', 'device_unregister()' may be called when an underlying (kobject) reference counter is greater than 1. This means that reparenting (happened when the device is actually freed) is delayed and, during that delay, parent controller device (hciX) may be deleted. Since the latter may create a dangling pointer to freed parent, avoid that scenario by reparenting to NULL explicitly. Cc: stable(a)vger.kernel.org # 5.4 Reported-by: syzbot+6cf5652d3df49fae2e3f(a)syzkaller.appspotmail.com Tested-by: syzbot+6cf5652d3df49fae2e3f(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6cf5652d3df49fae2e3f Fixes: a85fb91e3d72 ("Bluetooth: Fix double free in hci_conn_cleanup") Signed-off-by: Dmitry Antipov <dmantipov(a)yandex.ru> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com> [ chanho: Backported from v5.10.y to v5.4.y. device_find_any_child() is not supported in v5.4.y, so changed to use device_find_child() with __match_any ] Signed-off-by: Chanho Min <chanho.min(a)lge.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- net/bluetooth/hci_sysfs.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c index 266112c960ee8..f8e7b0ba2d273 100644 --- a/net/bluetooth/hci_sysfs.c +++ b/net/bluetooth/hci_sysfs.c @@ -19,14 +19,9 @@ static const struct device_type bt_link = { .release = bt_link_release, }; -/* - * The rfcomm tty device will possibly retain even when conn - * is down, and sysfs doesn't support move zombie device, - * so we should move the device before conn device is destroyed. - */ -static int __match_tty(struct device *dev, void *data) +static int __match_any(struct device *dev, void *unused) { - return !strncmp(dev_name(dev), "rfcomm", 6); + return 1; } void hci_conn_init_sysfs(struct hci_conn *conn) @@ -71,10 +66,12 @@ void hci_conn_del_sysfs(struct hci_conn *conn) return; } + /* If there are devices using the connection as parent reset it to NULL + * before unregistering the device. + */ while (1) { struct device *dev; - - dev = device_find_child(&conn->dev, NULL, __match_tty); + dev = device_find_child(&conn->dev, NULL, __match_any); if (!dev) break; device_move(dev, NULL, DPM_ORDER_DEV_LAST);

1 month

1
0
0 0

[PATCH] rust: cpumask: Mark CpumaskVar as transparent

by Baptiste Lepers

Unsafe code in CpumaskVar's methods assumes that the type has the same layout as `bindings::cpumask_var_t`. This is not guaranteed by the default struct representation in Rust, but requires specifying the `transparent` representation. Fixes: 8961b8cb3099a ("rust: cpumask: Add initial abstractions") Cc: stable(a)vger.kernel.org Signed-off-by: Baptiste Lepers <baptiste.lepers(a)gmail.com> --- rust/kernel/cpumask.rs | 1 + 1 file changed, 1 insertion(+) diff --git a/rust/kernel/cpumask.rs b/rust/kernel/cpumask.rs index 3fcbff438670..05e1c882404e 100644 --- a/rust/kernel/cpumask.rs +++ b/rust/kernel/cpumask.rs @@ -212,6 +212,7 @@ pub fn copy(&self, dstp: &mut Self) { /// } /// assert_eq!(mask2.weight(), count); /// ``` +#[repr(transparent)] pub struct CpumaskVar { #[cfg(CONFIG_CPUMASK_OFFSTACK)] ptr: NonNull<Cpumask>, -- 2.43.0

1 month

3
2
0 0

FAILED: patch "[PATCH] KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with" failed to apply to 6.16-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.16-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.16.y git checkout FETCH_HEAD git cherry-pick -x 7d0cce6cbe71af6e9c1831bff101a2b9c249c4a2 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081208-gerbil-shelve-9cc8@gregkh' --subject-prefix 'PATCH 6.16.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 7d0cce6cbe71af6e9c1831bff101a2b9c249c4a2 Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:09 -0700 Subject: [PATCH] KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIs Introduce vmx_guest_debugctl_{read,write}() to handle all accesses to vmcs.GUEST_IA32_DEBUGCTL. This will allow stuffing FREEZE_IN_SMM into GUEST_IA32_DEBUGCTL based on the host setting without bleeding the state into the guest, and without needing to copy+paste the FREEZE_IN_SMM logic into every patch that accesses GUEST_IA32_DEBUGCTL. No functional change intended. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> [sean: massage changelog, make inline, use in all prepare_vmcs02() cases] Reviewed-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com> Link: https://lore.kernel.org/r/20250610232010.162191-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 1b8b0642fc2d..ef20184b8b11 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2663,11 +2663,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, if (vmx->nested.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) { kvm_set_dr(vcpu, 7, vmcs12->guest_dr7); - vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl & - vmx_get_supported_debugctl(vcpu, false)); + vmx_guest_debugctl_write(vcpu, vmcs12->guest_ia32_debugctl & + vmx_get_supported_debugctl(vcpu, false)); } else { kvm_set_dr(vcpu, 7, vcpu->arch.dr7); - vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl); + vmx_guest_debugctl_write(vcpu, vmx->nested.pre_vmenter_debugctl); } if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) @@ -3532,7 +3532,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, if (!vmx->nested.nested_run_pending || !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) - vmx->nested.pre_vmenter_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); + vmx->nested.pre_vmenter_debugctl = vmx_guest_debugctl_read(); if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) @@ -4806,7 +4806,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, __vmx_set_segment(vcpu, &seg, VCPU_SREG_LDTR); kvm_set_dr(vcpu, 7, 0x400); - vmcs_write64(GUEST_IA32_DEBUGCTL, 0); + vmx_guest_debugctl_write(vcpu, 0); if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr, vmcs12->vm_exit_msr_load_count)) diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index bbf4509f32d0..0b173602821b 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -653,11 +653,11 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu) */ static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu) { - u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL); + u64 data = vmx_guest_debugctl_read(); if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) { data &= ~DEBUGCTLMSR_LBR; - vmcs_write64(GUEST_IA32_DEBUGCTL, data); + vmx_guest_debugctl_write(vcpu, data); } } @@ -730,7 +730,7 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu) if (!lbr_desc->event) { vmx_disable_lbr_msrs_passthrough(vcpu); - if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR) + if (vmx_guest_debugctl_read() & DEBUGCTLMSR_LBR) goto warn; if (test_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use)) goto warn; @@ -752,7 +752,7 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu) static void intel_pmu_cleanup(struct kvm_vcpu *vcpu) { - if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)) + if (!(vmx_guest_debugctl_read() & DEBUGCTLMSR_LBR)) intel_pmu_release_guest_lbr_event(vcpu); } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6a8b78e954cd..a77d325fe78b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2149,7 +2149,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = vmx->pt_desc.guest.addr_a[index / 2]; break; case MSR_IA32_DEBUGCTLMSR: - msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL); + msr_info->data = vmx_guest_debugctl_read(); break; default: find_uret_msr: @@ -2283,7 +2283,8 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) VM_EXIT_SAVE_DEBUG_CONTROLS) get_vmcs12(vcpu)->guest_ia32_debugctl = data; - vmcs_write64(GUEST_IA32_DEBUGCTL, data); + vmx_guest_debugctl_write(vcpu, data); + if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event && (data & DEBUGCTLMSR_LBR)) intel_pmu_create_guest_lbr_event(vcpu); @@ -4798,7 +4799,8 @@ static void init_vmcs(struct vcpu_vmx *vmx) vmcs_write32(GUEST_SYSENTER_CS, 0); vmcs_writel(GUEST_SYSENTER_ESP, 0); vmcs_writel(GUEST_SYSENTER_EIP, 0); - vmcs_write64(GUEST_IA32_DEBUGCTL, 0); + + vmx_guest_debugctl_write(&vmx->vcpu, 0); if (cpu_has_vmx_tpr_shadow()) { vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 392e66c7e5fe..c20a4185d10a 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -417,6 +417,16 @@ void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu); u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated); bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated); +static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val) +{ + vmcs_write64(GUEST_IA32_DEBUGCTL, val); +} + +static inline u64 vmx_guest_debugctl_read(void) +{ + return vmcs_read64(GUEST_IA32_DEBUGCTL); +} + /* * Note, early Intel manuals have the write-low and read-high bitmap offsets * the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and

1 month

2
3
0 0

FAILED: patch "[PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested" failed to apply to 6.16-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.16-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.16.y git checkout FETCH_HEAD git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081221-finicky-ensure-b830@gregkh' --subject-prefix 'PATCH 6.16.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:08 -0700 Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports a subset of hardware functionality, i.e. KVM can't rely on hardware to detect illegal/unsupported values. Failure to check the vmcs12 value would allow the guest to load any harware-supported value while running L2. Take care to exempt BTF and LBR from the validity check in order to match KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR are being intercepted. Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set *and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but that would incur non-trivial complexity and wouldn't change the fact that KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity is not worth carrying. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Co-developed-by: Sean Christopherson <seanjc(a)google.com> Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 7211c71d4241..1b8b0642fc2d 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, if (vmx->nested.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) { kvm_set_dr(vcpu, 7, vmcs12->guest_dr7); - vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl); + vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl & + vmx_get_supported_debugctl(vcpu, false)); } else { kvm_set_dr(vcpu, 7, vcpu->arch.dr7); vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl); @@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu, return -EINVAL; if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) && - CC(!kvm_dr7_valid(vmcs12->guest_dr7))) + (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) || + CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false)))) return -EINVAL; if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) && @@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) | (vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE); + /* + * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02. + * Writes to DEBUGCTL that aren't intercepted by L1 are immediately + * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into + * vmcs02 doesn't strictly track vmcs12. + */ if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS) vmcs12->guest_dr7 = vcpu->arch.dr7; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 4f827a75d980..6a8b78e954cd 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu, return (unsigned long)data; } -static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated) +u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated) { u64 debugctl = 0; @@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated return debugctl; } -static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, - bool host_initiated) +bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) { u64 invalid; diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index b5758c33c60f..392e66c7e5fe 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu); +u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated); +bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated); + /* * Note, early Intel manuals have the write-low and read-high bitmap offsets * the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and

1 month

2
2
0 0

[PATCH net v2 0/2] ets: use old 'nbands' while purging unused classes

by Davide Caratti

- patch 1/2 fixes a NULL dereference in the control path of sch_ets qdisc - patch 2/2 extends kselftests to verify effectiveness of the above fix Changes since v1: - added a kselftest (thanks Victor) Davide Caratti (2): net/sched: ets: use old 'nbands' while purging unused classes selftests: net/forwarding: test purge of active DWRR classes net/sched/sch_ets.c | 11 ++++++----- tools/testing/selftests/net/forwarding/sch_ets.sh | 1 + .../testing/selftests/net/forwarding/sch_ets_tests.sh | 8 ++++++++ 3 files changed, 15 insertions(+), 5 deletions(-) -- 2.47.0

1 month

4
5
0 0

[PATCH 0/2] iterate_folioq bug when offset==size (Was: [REGRESSION] 9pfs issues on 6.12-rc1)

by Dominique Martinet via B4 Relay

So we've had this regression in 9p for.. almost a year, which is way too long, but there was no "easy" reproducer until yesterday (thank you again!!) It turned out to be a bug with iov_iter on folios, iov_iter_get_pages_alloc2() would advance the iov_iter correctly up to the end edge of a folio and the later copy_to_iter() fails on the iterate_folioq() bug. Happy to consider alternative ways of fixing this, now there's a reproducer it's all much clearer; for the bug to be visible we basically need to make and IO with non-contiguous folios in the iov_iter which is not obvious to test with synthetic VMs, with size that triggers a zero-copy read followed by a non-zero-copy read. Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org> --- Dominique Martinet (2): iov_iter: iterate_folioq: fix handling of offset >= folio size iov_iter: iov_folioq_get_pages: don't leave empty slot behind include/linux/iov_iter.h | 3 +++ lib/iov_iter.c | 6 +++--- 2 files changed, 6 insertions(+), 3 deletions(-) --- base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 change-id: 20250811-iot_iter_folio-1b7849f88fed Best regards, -- Dominique Martinet <asmadeus(a)codewreck.org>

1 month

8
14
0 0

[PATCH] drm/amd/display: Add HDR workaround for specific eDP

by Kevin Oh

[WHY & HOW] Some eDP panels suffer from flicking when HDR is enabled. I am adding a case for my panel to disable VSC to stop flickering. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4452 Cc: Rodrigo Siqueira <siqueira(a)igalia.com> Cc: stable(a)vger.kernel.org Signed-off-by: Kevin Oh <kevoh1516(a)gmail.com> --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c index fe100e4c9801..1a16bea10afb 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c @@ -86,6 +86,10 @@ static void apply_edid_quirks(struct drm_device *dev, struct edid *edid, struct drm_dbg_driver(dev, "Disabling VSC on monitor with panel id %X\n", panel_id); edid_caps->panel_patch.disable_colorimetry = true; break; + case drm_edid_encode_panel_id('S', 'D', 'C', 0x4171): + drm_dbg_driver(dev, "Disabling VSC on monitor with panel id %X\n", panel_id); + edid_caps->panel_patch.disable_colorimetry = true; + break; default: return; } -- 2.50.1

1 month

1
0
0 0

Re: [PATCH] USB: storage: Ignore driver CD mode for Realtek multi-mode Wi-Fi dongles

by Giorgi

Maybe we could only add US_FL_IGNORE_DEVICE for the exact Realtek-based models (Mercury MW310UH, D-Link AX9U, etc.) that fail with usb_modeswitch. This avoids disabling access to the emulated CD for unrelated devices. >On August 13, 2025 9:53:12 PM GMT+04:00, Zenm Chen <zenmchen(a)gmail.com> wrote: >>Alan Stern <stern(a)rowland.harvard.edu> 於 2025年8月14日週四上午12:58寫道： >>> >>> On Thu, Aug 14, 2025 at 12:24:15AM +0800, Zenm Chen wrote: >>> > Many Realtek USB Wi-Fi dongles released in recent years have two modes: >>> > one is driver CD mode which has Windows driver onboard, another one is >>> > Wi-Fi mode. Add the US_FL_IGNORE_DEVICE quirk for these multi-mode devices. >>> > Otherwise, usb_modeswitch may fail to switch them to Wi-Fi mode. >>> >>> There are several other entries like this already in the unusual_devs.h >>> file. But I wonder if we really still need them. Shouldn't the >>> usb_modeswitch program be smart enough by now to know how to handle >>> these things? >> >>Hi Alan, >> >>Thanks for your review and reply. >> >>Without this patch applied, usb_modeswitch cannot switch my Mercury MW310UH >>into Wi-Fi mode [1]. I also ran into a similar problem like [2] with D-Link >>AX9U, so I believe this patch is needed. >> >>> >>> In theory, someone might want to access the Windows driver on the >>> emulated CD. With this quirk, they wouldn't be able to. >>> >> >>Actually an emulated CD doesn't appear when I insert these 2 Wi-Fi dongles into >>my Linux PC, so users cannot access that Windows driver even if this patch is not >>applied. >> >>> Alan Stern >> >>[1] https://drive.google.com/file/d/1YfWUTxKnvSeu1egMSwcF-memu3Kis8Mg/view?usp=… >> >>[2] https://github.com/morrownr/rtw89/issues/10 >>

1 month

2
1
0 0

[PATCH 10/11] drm/amd/display: Fix Xorg desktop unresponsive on Replay panel

by Alex Hung

From: Tom Chung <chiahsuan.chung(a)amd.com> [WHY & HOW] IPS & self-fresh feature can cause vblank counter resets between vblank disable and enable. It may cause system stuck due to wait the vblank counter. Call the drm_crtc_vblank_restore() during vblank enable to estimate missed vblanks by using timestamps and update the vblank counter in DRM. It can make the vblank counter increase smoothly and resolve this issue. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org Reviewed-by: Sun peng (Leo) Li <sunpeng.li(a)amd.com> Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com> Signed-off-by: Alex Hung <alex.hung(a)amd.com> --- .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c index 010172f930ae..45feb404b097 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c @@ -299,6 +299,25 @@ static inline int amdgpu_dm_crtc_set_vblank(struct drm_crtc *crtc, bool enable) irq_type = amdgpu_display_crtc_idx_to_irq_type(adev, acrtc->crtc_id); if (enable) { + struct dc *dc = adev->dm.dc; + struct drm_vblank_crtc *vblank = drm_crtc_vblank_crtc(crtc); + struct psr_settings *psr = &acrtc_state->stream->link->psr_settings; + struct replay_settings *pr = &acrtc_state->stream->link->replay_settings; + bool sr_supported = (psr->psr_version != DC_PSR_VERSION_UNSUPPORTED) || + pr->config.replay_supported; + + /* + * IPS & self-refresh feature can cause vblank counter resets between + * vblank disable and enable. + * It may cause system stuck due to waiting for the vblank counter. + * Call this function to estimate missed vblanks by using timestamps and + * update the vblank counter in DRM. + */ + if (dc->caps.ips_support && + dc->config.disable_ips != DMUB_IPS_DISABLE_ALL && + sr_supported && vblank->config.disable_immediate) + drm_crtc_vblank_restore(crtc); + /* vblank irq on -> Only need vupdate irq in vrr mode */ if (amdgpu_dm_crtc_vrr_active(acrtc_state)) rc = amdgpu_dm_crtc_set_vupdate_irq(crtc, true); -- 2.43.0

1 month

1
0
0 0

[PATCH 08/11] drm/amd/display: Avoid a NULL pointer dereference

by Alex Hung

From: Mario Limonciello <mario.limonciello(a)amd.com> [WHY] Although unlikely drm_atomic_get_new_connector_state() or drm_atomic_get_old_connector_state() can return NULL. [HOW] Check returns before dereference. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org Reviewed-by: Harry Wentland <harry.wentland(a)amd.com> Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com> Signed-off-by: Alex Hung <alex.hung(a)amd.com> --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 176f420effd9..b944abea306d 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -7836,6 +7836,9 @@ amdgpu_dm_connector_atomic_check(struct drm_connector *conn, struct amdgpu_dm_connector *aconn = to_amdgpu_dm_connector(conn); int ret; + if (WARN_ON(unlikely(!old_con_state || !new_con_state))) + return -EINVAL; + trace_amdgpu_dm_connector_atomic_check(new_con_state); if (conn->connector_type == DRM_MODE_CONNECTOR_DisplayPort) { -- 2.43.0

1 month

1
0
0 0

+ iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: iov_iter: iterate_folioq: fix handling of offset >= folio size has been added to the -mm mm-hotfixes-unstable branch. Its filename is iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Dominique Martinet <asmadeus(a)codewreck.org> Subject: iov_iter: iterate_folioq: fix handling of offset >= folio size Date: Wed, 13 Aug 2025 15:04:55 +0900 It's apparently possible to get an iov advanced all the way up to the end of the current page we're looking at, e.g. (gdb) p *iter $24 = {iter_type = 4 '\004', nofault = false, data_source = false, iov_offset = 4096, {__ubuf_iovec = { iov_base = 0xffff88800f5bc000, iov_len = 655}, {{__iov = 0xffff88800f5bc000, kvec = 0xffff88800f5bc000, bvec = 0xffff88800f5bc000, folioq = 0xffff88800f5bc000, xarray = 0xffff88800f5bc000, ubuf = 0xffff88800f5bc000}, count = 655}}, {nr_segs = 2, folioq_slot = 2 '\002', xarray_start = 2}} Where iov_offset is 4k with 4k-sized folios This should have been fine because we're only in the 2nd slot and there's another one after this, but iterate_folioq should not try to map a folio that skips the whole size, and more importantly part here does not end up zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not zero..), so skip forward to the "advance to next folio" code Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-0-a0ffad2b665a@codewre… Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-1-a0ffad2b665a@codewre… Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org> Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios") Reported-by: Maximilian Bosch <maximilian(a)mbosch.me> Reported-by: Ryan Lahfa <ryan(a)lahfa.xyz> Reported-by: Christian Theune <ct(a)flyingcircus.io> Reported-by: Arnout Engelen <arnout(a)bzzt.net> Link: https://lkml.kernel.org/r/D4LHHUNLG79Y.12PI0X6BEHRHW@mbosch.me/ Acked-by: David Howells <dhowells(a)redhat.com> Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: Christian Brauner <brauner(a)kernel.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> [6.12+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/iov_iter.h | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) --- a/include/linux/iov_iter.h~iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size +++ a/include/linux/iov_iter.h @@ -160,7 +160,7 @@ size_t iterate_folioq(struct iov_iter *i do { struct folio *folio = folioq_folio(folioq, slot); - size_t part, remain, consumed; + size_t part, remain = 0, consumed; size_t fsize; void *base; @@ -168,14 +168,16 @@ size_t iterate_folioq(struct iov_iter *i break; fsize = folioq_folio_size(folioq, slot); - base = kmap_local_folio(folio, skip); - part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); - remain = step(base, progress, part, priv, priv2); - kunmap_local(base); - consumed = part - remain; - len -= consumed; - progress += consumed; - skip += consumed; + if (skip < fsize) { + base = kmap_local_folio(folio, skip); + part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); + remain = step(base, progress, part, priv, priv2); + kunmap_local(base); + consumed = part - remain; + len -= consumed; + progress += consumed; + skip += consumed; + } if (skip >= fsize) { skip = 0; slot++; _ Patches currently in -mm which might be from asmadeus(a)codewreck.org are iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size.patch iov_iter-iov_folioq_get_pages-dont-leave-empty-slot-behind.patch

1 month

1
0
0 0

[tip: x86/entry] x86/fred: Remove ENDBR64 from FRED entry points

by tip-bot2 for Xin Li (Intel)

The following commit has been merged into the x86/entry branch of tip: Commit-ID: 3da01ffe1aeaa0d427ab5235ba735226670a80d9 Gitweb: https://git.kernel.org/tip/3da01ffe1aeaa0d427ab5235ba735226670a80d9 Author: Xin Li (Intel) <xin(a)zytor.com> AuthorDate: Tue, 15 Jul 2025 23:33:20 -07:00 Committer: Dave Hansen <dave.hansen(a)linux.intel.com> CommitterDate: Wed, 13 Aug 2025 15:05:32 -07:00 x86/fred: Remove ENDBR64 from FRED entry points The FRED specification has been changed in v9.0 to state that there is no need for FRED event handlers to begin with ENDBR64, because in the presence of supervisor indirect branch tracking, FRED event delivery does not enter the WAIT_FOR_ENDBRANCH state. As a result, remove ENDBR64 from FRED entry points. Then add ANNOTATE_NOENDBR to indicate that FRED entry points will never be used for indirect calls to suppress an objtool warning. This change implies that any indirect CALL/JMP to FRED entry points causes #CP in the presence of supervisor indirect branch tracking. Credit goes to Jennifer Miller <jmill(a)asu.edu> and other contributors from Arizona State University whose research shows that placing ENDBR at entry points has negative value thus led to this change. Note: This is obviously an incompatible change to the FRED architecture. But, it's OK because there no FRED systems out in the wild today. All production hardware and late pre-production hardware will follow the FRED v9 spec and be compatible with this approach. [ dhansen: add note to changelog about incompatibility ] Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code") Signed-off-by: Xin Li (Intel) <xin(a)zytor.com> Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Reviewed-by: H. Peter Anvin (Intel) <hpa(a)zytor.com> Reviewed-by: Andrew Cooper <andrew.cooper3(a)citrix.com> Link: https://lore.kernel.org/linux-hardening/Z60NwR4w%2F28Z7XUa@ubun/ Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250716063320.1337818-1-xin%40zytor.com --- arch/x86/entry/entry_64_fred.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S index 29c5c32..907bd23 100644 --- a/arch/x86/entry/entry_64_fred.S +++ b/arch/x86/entry/entry_64_fred.S @@ -16,7 +16,7 @@ .macro FRED_ENTER UNWIND_HINT_END_OF_STACK - ENDBR + ANNOTATE_NOENDBR PUSH_AND_CLEAR_REGS movq %rsp, %rdi /* %rdi -> pt_regs */ .endm

1 month

1
0
0 0

Re: [PATCH 6.15 000/480] 6.15.10-rc1 review

by Ron Economos

On 8/13/25 10:25, Jon Hunter wrote: > On Wed, Aug 13, 2025 at 08:48:28AM -0700, Jon Hunter wrote: >> On Tue, 12 Aug 2025 19:43:28 +0200, Greg Kroah-Hartman wrote: >>> This is the start of the stable review cycle for the 6.15.10 release. >>> There are 480 patches in this series, all will be posted as a response >>> to this one. If anyone has any issues with these being applied, please >>> let me know. >>> >>> Responses should be made by Thu, 14 Aug 2025 17:42:20 +0000. >>> Anything received after that time might be too late. >>> >>> The whole patch series can be found in one patch at: >>> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.15.10-rc… >>> or in the git tree and branch at: >>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.15.y >>> and the diffstat can be found below. >>> >>> thanks, >>> >>> greg k-h >> Failures detected for Tegra ... >> >> Test results for stable-v6.15: >> 10 builds: 10 pass, 0 fail >> 28 boots: 28 pass, 0 fail >> 120 tests: 119 pass, 1 fail >> >> Linux version: 6.15.10-rc1-g2510f67e2e34 >> Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, >> tegra186-p3509-0000+p3636-0001, tegra194-p2972-0000, >> tegra194-p3509-0000+p3668-0000, tegra20-ventana, >> tegra210-p2371-2180, tegra210-p3450-0000, >> tegra30-cardhu-a04 >> >> Test failures: tegra194-p2972-0000: boot.py > I am seeing the following kernel warning for both linux-6.15.y and linux-6.16.y … > > WARNING KERN sched: DL replenish lagged too much > > I believe that this is introduced by … > > Peter Zijlstra <peterz(a)infradead.org> > sched/deadline: Less agressive dl_server handling > > This has been reported here: https://lore.kernel.org/all/CAMuHMdXn4z1pioTtBGMfQM0jsLviqS2jwysaWXpoLxWYoG… > > Jon Seeing this kernel warning on RISC-V also.

1 month

1
0
0 0

[PATCH v3 1/1] x86/fred: Remove ENDBR64 from FRED entry points

by Xin Li (Intel)

The FRED specification has been changed in v9.0 to state that there is no need for FRED event handlers to begin with ENDBR64, because in the presence of supervisor indirect branch tracking, FRED event delivery does not enter the WAIT_FOR_ENDBRANCH state. As a result, remove ENDBR64 from FRED entry points. Then add ANNOTATE_NOENDBR to indicate that FRED entry points will never be used for indirect calls to suppress an objtool warning. This change implies that any indirect CALL/JMP to FRED entry points causes #CP in the presence of supervisor indirect branch tracking. Credit goes to Jennifer Miller <jmill(a)asu.edu> and other contributors from Arizona State University whose research shows that placing ENDBR at entry points has negative value thus led to this change. Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code") Link: https://lore.kernel.org/linux-hardening/Z60NwR4w%2F28Z7XUa@ubun/ Reviewed-by: H. Peter Anvin (Intel) <hpa(a)zytor.com> Reviewed-by: Andrew Cooper <andrew.cooper3(a)citrix.com> Signed-off-by: Xin Li (Intel) <xin(a)zytor.com> Cc: Jennifer Miller <jmill(a)asu.edu> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Andrew Cooper <andrew.cooper3(a)citrix.com> Cc: H. Peter Anvin <hpa(a)zytor.com> Cc: stable(a)vger.kernel.org # v6.9+ --- Change in v3: *) Revise the FRED spec change description to clearly indicate that it deviates from previous versions and is based on new research showing that placing ENDBR at entry points has negative value (Andrew Cooper). Change in v2: *) CC stable and add a fixes tag (PeterZ). --- arch/x86/entry/entry_64_fred.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S index 29c5c32c16c3..907bd233c6c1 100644 --- a/arch/x86/entry/entry_64_fred.S +++ b/arch/x86/entry/entry_64_fred.S @@ -16,7 +16,7 @@ .macro FRED_ENTER UNWIND_HINT_END_OF_STACK - ENDBR + ANNOTATE_NOENDBR PUSH_AND_CLEAR_REGS movq %rsp, %rdi /* %rdi -> pt_regs */ .endm -- 2.50.1

1 month

4
3
0 0

[PATCH] Revert "ata: libata-scsi: Improve CDL control"

by Igor Pylypiv

This reverts commit 17e897a456752ec9c2d7afb3d9baf268b442451b. The extra checks for the ATA_DFLAG_CDL_ENABLED flag prevent SET FEATURES command from being issued to a drive when NCQ commands are active. ata_mselect_control_ata_feature() sets / clears the ATA_DFLAG_CDL_ENABLED flag during the translation of MODE SELECT to SET FEATURES. If SET FEATURES gets deferred due to outstanding NCQ commands, the original MODE SELECT command will be re-queued. When the re-queued MODE SELECT goes through the ata_mselect_control_ata_feature() translation again, SET FEATURES will not be issued because ATA_DFLAG_CDL_ENABLED has been already set or cleared by the initial translation of MODE SELECT. The ATA_DFLAG_CDL_ENABLED checks in ata_mselect_control_ata_feature() are safe to remove because scsi_cdl_enable() implements a similar logic that avoids enabling CDL if it has been already enabled. Cc: stable(a)vger.kernel.org Signed-off-by: Igor Pylypiv <ipylypiv(a)google.com> --- drivers/ata/libata-scsi.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 57f674f51b0c..856eabfd5a17 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -3904,27 +3904,17 @@ static int ata_mselect_control_ata_feature(struct ata_queued_cmd *qc, /* Check cdl_ctrl */ switch (buf[0] & 0x03) { case 0: - /* Disable CDL if it is enabled */ - if (!(dev->flags & ATA_DFLAG_CDL_ENABLED)) - return 0; - ata_dev_dbg(dev, "Disabling CDL\n"); + /* Disable CDL */ cdl_action = 0; dev->flags &= ~ATA_DFLAG_CDL_ENABLED; break; case 0x02: - /* - * Enable CDL if not already enabled. Since this is mutually - * exclusive with NCQ priority, allow this only if NCQ priority - * is disabled. - */ - if (dev->flags & ATA_DFLAG_CDL_ENABLED) - return 0; + /* Enable CDL T2A/T2B: NCQ priority must be disabled */ if (dev->flags & ATA_DFLAG_NCQ_PRIO_ENABLED) { ata_dev_err(dev, "NCQ priority must be disabled to enable CDL\n"); return -EINVAL; } - ata_dev_dbg(dev, "Enabling CDL\n"); cdl_action = 1; dev->flags |= ATA_DFLAG_CDL_ENABLED; break; -- 2.51.0.rc0.215.g125493bb4a-goog

1 month

1
0
0 0

FAILED: patch "[PATCH] s390/mm: Remove possible false-positive warning in" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 5647f61ad9171e8f025558ed6dc5702c56a33ba3 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081255-shabby-impound-4a47@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 5647f61ad9171e8f025558ed6dc5702c56a33ba3 Mon Sep 17 00:00:00 2001 From: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com> Date: Wed, 9 Jul 2025 20:34:30 +0200 Subject: [PATCH] s390/mm: Remove possible false-positive warning in pte_free_defer() Commit 8211dad627981 ("s390: add pte_free_defer() for pgtables sharing page") added a warning to pte_free_defer(), on our request. It was meant to warn if this would ever be reached for KVM guest mappings, because the page table would be freed w/o a gmap_unlink(). THP mappings are not allowed for KVM guests on s390, so this should never happen. However, it is possible that the warning is triggered in a valid case as false-positive. s390_enable_sie() takes the mmap_lock, marks all VMAs as VM_NOHUGEPAGE and splits possibly existing THP guest mappings. mm->context.has_pgste is set to 1 before that, to prevent races with the mm_has_pgste() check in MADV_HUGEPAGE. khugepaged drops the mmap_lock for file mappings and might run in parallel, before a vma is marked VM_NOHUGEPAGE, but after mm->context.has_pgste was set to 1. If it finds file mappings to collapse, it will eventually call pte_free_defer(). This will trigger the warning, but it is a valid case because gmap is not yet set up, and the THP mappings will be split again. Therefore, remove the warning and the comment. Fixes: 8211dad627981 ("s390: add pte_free_defer() for pgtables sharing page") Cc: <stable(a)vger.kernel.org> # 6.6+ Reviewed-by: Alexander Gordeev <agordeev(a)linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index b449fd2605b0..d2f6f1f6d2fc 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -173,11 +173,6 @@ void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) struct ptdesc *ptdesc = virt_to_ptdesc(pgtable); call_rcu(&ptdesc->pt_rcu_head, pte_free_now); - /* - * THPs are not allowed for KVM guests. Warn if pgste ever reaches here. - * Turn to the generic pte_free_defer() version once gmap is removed. - */ - WARN_ON_ONCE(mm_has_pgste(mm)); } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */

1 month

2
1
0 0

FAILED: patch "[PATCH] x86/fpu: Delay instruction pointer fixup until after warning" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 1cec9ac2d071cfd2da562241aab0ef701355762a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081252-serotonin-cranium-3e92@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 1cec9ac2d071cfd2da562241aab0ef701355762a Mon Sep 17 00:00:00 2001 From: Dave Hansen <dave.hansen(a)linux.intel.com> Date: Tue, 24 Jun 2025 14:01:48 -0700 Subject: [PATCH] x86/fpu: Delay instruction pointer fixup until after warning Right now, if XRSTOR fails a console message like this is be printed: Bad FPU state detected at restore_fpregs_from_fpstate+0x9a/0x170, reinitializing FPU registers. However, the text location (...+0x9a in this case) is the instruction *AFTER* the XRSTOR. The highlighted instruction in the "Code:" dump also points one instruction late. The reason is that the "fixup" moves RIP up to pass the bad XRSTOR and keep on running after returning from the #GP handler. But it does this fixup before warning. The resulting warning output is nonsensical because it looks like the non-FPU-related instruction is #GP'ing. Do not fix up RIP until after printing the warning. Do this by using the more generic and standard ex_handler_default(). Fixes: d5c8028b4788 ("x86/fpu: Reinitialize FPU registers if restoring FPU state fails") Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Reviewed-by: Chao Gao <chao.gao(a)intel.com> Acked-by: Alison Schofield <alison.schofield(a)intel.com> Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250624210148.97126F9E%40davehans-spike.ostc.i… diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c index bf8dab18be97..2fdc1f1f5adb 100644 --- a/arch/x86/mm/extable.c +++ b/arch/x86/mm/extable.c @@ -122,13 +122,12 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup, static bool ex_handler_fprestore(const struct exception_table_entry *fixup, struct pt_regs *regs) { - regs->ip = ex_fixup_addr(fixup); - WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.", (void *)instruction_pointer(regs)); fpu_reset_from_exception_fixup(); - return true; + + return ex_handler_default(fixup, regs); } /*

1 month

2
1
0 0

FAILED: patch "[PATCH] x86/fpu: Delay instruction pointer fixup until after warning" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 1cec9ac2d071cfd2da562241aab0ef701355762a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081251-sitter-agreed-26a4@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 1cec9ac2d071cfd2da562241aab0ef701355762a Mon Sep 17 00:00:00 2001 From: Dave Hansen <dave.hansen(a)linux.intel.com> Date: Tue, 24 Jun 2025 14:01:48 -0700 Subject: [PATCH] x86/fpu: Delay instruction pointer fixup until after warning Right now, if XRSTOR fails a console message like this is be printed: Bad FPU state detected at restore_fpregs_from_fpstate+0x9a/0x170, reinitializing FPU registers. However, the text location (...+0x9a in this case) is the instruction *AFTER* the XRSTOR. The highlighted instruction in the "Code:" dump also points one instruction late. The reason is that the "fixup" moves RIP up to pass the bad XRSTOR and keep on running after returning from the #GP handler. But it does this fixup before warning. The resulting warning output is nonsensical because it looks like the non-FPU-related instruction is #GP'ing. Do not fix up RIP until after printing the warning. Do this by using the more generic and standard ex_handler_default(). Fixes: d5c8028b4788 ("x86/fpu: Reinitialize FPU registers if restoring FPU state fails") Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Reviewed-by: Chao Gao <chao.gao(a)intel.com> Acked-by: Alison Schofield <alison.schofield(a)intel.com> Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250624210148.97126F9E%40davehans-spike.ostc.i… diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c index bf8dab18be97..2fdc1f1f5adb 100644 --- a/arch/x86/mm/extable.c +++ b/arch/x86/mm/extable.c @@ -122,13 +122,12 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup, static bool ex_handler_fprestore(const struct exception_table_entry *fixup, struct pt_regs *regs) { - regs->ip = ex_fixup_addr(fixup); - WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.", (void *)instruction_pointer(regs)); fpu_reset_from_exception_fixup(); - return true; + + return ex_handler_default(fixup, regs); } /*

1 month

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror