From: Sean Christopherson <seanjc(a)google.com>
When loading guest XSAVE state via KVM_SET_XSAVE, and when updating XFD in
response to a guest WRMSR, clear XFD-disabled features in the saved (or to
be restored) XSTATE_BV to ensure KVM doesn't attempt to load state for
features that are disabled via the guest's XFD. Because the kernel
executes XRSTOR with the guest's XFD, saving XSTATE_BV[i]=1 with XFD[i]=1
will cause XRSTOR to #NM and panic the kernel.
E.g. if fpu_update_guest_xfd() sets XFD without clearing XSTATE_BV:
------------[ cut here ]------------
WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#29: amx_test/848
Modules linked in: kvm_intel kvm irqbypass
CPU: 29 UID: 1000 PID: 848 Comm: amx_test Not tainted 6.19.0-rc2-ffa07f7fd437-x86_amx_nm_xfd_non_init-vm #171 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:exc_device_not_available+0x101/0x110
Call Trace:
<TASK>
asm_exc_device_not_available+0x1a/0x20
RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
switch_fpu_return+0x4a/0xb0
kvm_arch_vcpu_ioctl_run+0x1245/0x1e40 [kvm]
kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x62/0x940
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
This can happen if the guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1,
and a host IRQ triggers kernel_fpu_begin() prior to the vmexit handler's
call to fpu_update_guest_xfd().
and if userspace stuffs XSTATE_BV[i]=1 via KVM_SET_XSAVE:
------------[ cut here ]------------
WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#14: amx_test/867
Modules linked in: kvm_intel kvm irqbypass
CPU: 14 UID: 1000 PID: 867 Comm: amx_test Not tainted 6.19.0-rc2-2dace9faccd6-x86_amx_nm_xfd_non_init-vm #168 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:exc_device_not_available+0x101/0x110
Call Trace:
<TASK>
asm_exc_device_not_available+0x1a/0x20
RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
fpu_swap_kvm_fpstate+0x6b/0x120
kvm_load_guest_fpu+0x30/0x80 [kvm]
kvm_arch_vcpu_ioctl_run+0x85/0x1e40 [kvm]
kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x62/0x940
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
The new behavior is consistent with the AMX architecture. Per Intel's SDM,
XSAVE saves XSTATE_BV as '0' for components that are disabled via XFD
(and non-compacted XSAVE saves the initial configuration of the state
component):
If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i,
the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1;
instead, it operates as if XINUSE[i] = 0 (and the state component was
in its initial state): it saves bit i of XSTATE_BV field of the XSAVE
header as 0; in addition, XSAVE saves the initial configuration of the
state component (the other instructions do not save state component i).
Alternatively, KVM could always do XRSTOR with XFD=0, e.g. by using
a constant XFD based on the set of enabled features when XSAVEing for
a struct fpu_guest. However, having XSTATE_BV[i]=1 for XFD-disabled
features can only happen in the above interrupt case, or in similar
scenarios involving preemption on preemptible kernels, because
fpu_swap_kvm_fpstate()'s call to save_fpregs_to_fpstate() saves the
outgoing FPU state with the current XFD; and that is (on all but the
first WRMSR to XFD) the guest XFD.
Therefore, XFD can only go out of sync with XSTATE_BV in the above
interrupt case, or in similar scenarios involving preemption on
preemptible kernels, and it we can consider it (de facto) part of KVM
ABI that KVM_GET_XSAVE returns XSTATE_BV[i]=0 for XFD-disabled features.
Reported-by: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: 820a6ee944e7 ("kvm: x86: Add emulation for IA32_XFD", 2022-01-14)
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
[Move clearing of XSTATE_BV from fpu_copy_uabi_to_guest_fpstate
to kvm_vcpu_ioctl_x86_set_xsave. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kernel/fpu/core.c | 32 +++++++++++++++++++++++++++++---
arch/x86/kvm/x86.c | 9 +++++++++
2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index da233f20ae6f..166c380b0161 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -319,10 +319,29 @@ EXPORT_SYMBOL_FOR_KVM(fpu_enable_guest_xfd_features);
#ifdef CONFIG_X86_64
void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd)
{
+ struct fpstate *fpstate = guest_fpu->fpstate;
+
fpregs_lock();
- guest_fpu->fpstate->xfd = xfd;
- if (guest_fpu->fpstate->in_use)
- xfd_update_state(guest_fpu->fpstate);
+
+ /*
+ * KVM's guest ABI is that setting XFD[i]=1 *can* immediately revert
+ * the save state to initialized. Likewise, KVM_GET_XSAVE does the
+ * same as XSAVE and returns XSTATE_BV[i]=0 whenever XFD[i]=1.
+ *
+ * If the guest's FPU state is in hardware, just update XFD: the XSAVE
+ * in fpu_swap_kvm_fpstate will clear XSTATE_BV[i] whenever XFD[i]=1.
+ *
+ * If however the guest's FPU state is NOT resident in hardware, clear
+ * disabled components in XSTATE_BV now, or a subsequent XRSTOR will
+ * attempt to load disabled components and generate #NM _in the host_.
+ */
+ if (xfd && test_thread_flag(TIF_NEED_FPU_LOAD))
+ fpstate->regs.xsave.header.xfeatures &= ~xfd;
+
+ fpstate->xfd = xfd;
+ if (fpstate->in_use)
+ xfd_update_state(fpstate);
+
fpregs_unlock();
}
EXPORT_SYMBOL_FOR_KVM(fpu_update_guest_xfd);
@@ -430,6 +449,13 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;
+ /*
+ * Disabled features must be in their initial state, otherwise XRSTOR
+ * causes an exception.
+ */
+ if (WARN_ON_ONCE(ustate->xsave.header.xfeatures & kstate->xfd))
+ return -EINVAL;
+
/*
* Nullify @vpkru to preserve its current value if PKRU's bit isn't set
* in the header. KVM's odd ABI is to leave PKRU untouched in this
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff8812f3a129..c0416f53b5f5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5807,9 +5807,18 @@ static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave)
{
+ union fpregs_state *xstate = (union fpregs_state *)guest_xsave->region;
+
if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
+ /*
+ * Do not reject non-initialized disabled features for backwards
+ * compatibility, but clear XSTATE_BV[i] whenever XFD[i]=1.
+ * Otherwise, XRSTOR would cause a #NM.
+ */
+ xstate->xsave.header.xfeatures &= ~vcpu->arch.guest_fpu.fpstate->xfd;
+
return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu,
guest_xsave->region,
kvm_caps.supported_xcr0,
--
2.52.0
Recenly when test uvc gadget function I find some YUYV pixel format
720p and 1080p stream can't output normally. However, small resulution
and MJPEG format stream works fine. The first patch#1 is to fix the issue.
Patch#2 and #3 are small fix or improvement.
For patch#4: it's a workaround for a long-term issue in videobuf2. With
it, many device can work well and not solely based on the SG allocation
method.
Signed-off-by: Xu Yang <xu.yang_2(a)nxp.com>
---
Xu Yang (4):
usb: gadget: uvc: fix req_payload_size calculation
usb: gadget: uvc: fix interval_duration calculation
usb: gadget: uvc: improve error handling in uvcg_video_init()
usb: gadget: uvc: retry vb2_reqbufs() with vb_vmalloc_memops if use_sg fail
drivers/usb/gadget/function/f_uvc.c | 4 ++++
drivers/usb/gadget/function/uvc.h | 3 ++-
drivers/usb/gadget/function/uvc_queue.c | 23 +++++++++++++++++++----
drivers/usb/gadget/function/uvc_video.c | 14 +++++++-------
4 files changed, 32 insertions(+), 12 deletions(-)
---
base-commit: 56a512a9b4107079f68701e7d55da8507eb963d9
change-id: 20260108-uvc-gadget-fix-patch-aa5996332bb5
Best regards,
--
Xu Yang <xu.yang_2(a)nxp.com>
Commit 7346e7a058a2 ("pwm: stm32: Always do lazy disabling") triggered a
regression where PWM polarity changes could be ignored.
stm32_pwm_set_polarity() was skipped due to a mismatch between the
cached pwm->state.polarity and the actual hardware state, leaving the
hardware polarity unchanged.
Fixes: 7edf7369205b ("pwm: Add driver for STM32 plaftorm")
Cc: stable(a)vger.kernel.org # <= 6.12
Signed-off-by: Sean Nyekjaer <sean(a)geanix.com>
Co-developed-by: Uwe Kleine-König <ukleinek(a)kernel.org>
---
This patch is only applicable for stable tree's <= 6.12
---
Changes in v2:
- Taken patch improvements for Uwe
- Link to v1: https://lore.kernel.org/r/20260106-stm32-pwm-v1-1-33e9e8a9fc33@geanix.com
---
drivers/pwm/pwm-stm32.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c
index eb24054f9729734da21eb96f2e37af03339e3440..86e6eb7396f67990249509dd347cb5a60c9ccf16 100644
--- a/drivers/pwm/pwm-stm32.c
+++ b/drivers/pwm/pwm-stm32.c
@@ -458,8 +458,7 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
return 0;
}
- if (state->polarity != pwm->state.polarity)
- stm32_pwm_set_polarity(priv, pwm->hwpwm, state->polarity);
+ stm32_pwm_set_polarity(priv, pwm->hwpwm, state->polarity);
ret = stm32_pwm_config(priv, pwm->hwpwm,
state->duty_cycle, state->period);
---
base-commit: eb18504ca5cf1e6a76a752b73daf0ef51de3551b
change-id: 20260105-stm32-pwm-91cb843680f4
Best regards,
--
Sean Nyekjaer <sean(a)geanix.com>
Backport commit:5701875f9609 ("ext4: fix out-of-bound read in
ext4_xattr_inode_dec_ref_all()" to linux 5.10 branch.
The fix depends on commit:69f3a3039b0d ("ext4: introduce ITAIL helper")
In order to make a clean backport on stable kernel, backport 2 commits.
It has a single merge conflict where static inline int, which changed
to static int.
Signed-off-by: David Nyström <david.nystrom(a)est.tech>
---
Changes in v2:
- Resend identical patchset with correct "Upstream commit" denotation.
- Link to v1: https://patch.msgid.link/20251216-ext4_splat-v1-0-b76fd8748f44@est.tech
---
Ye Bin (2):
ext4: introduce ITAIL helper
ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all()
fs/ext4/inode.c | 5 +++++
fs/ext4/xattr.c | 32 ++++----------------------------
fs/ext4/xattr.h | 10 ++++++++++
3 files changed, 19 insertions(+), 28 deletions(-)
---
base-commit: f964b940099f9982d723d4c77988d4b0dda9c165
change-id: 20251215-ext4_splat-f59c1acd9e88
Best regards,
--
David Nyström <david.nystrom(a)est.tech>
From: Steven Rostedt <rostedt(a)goodmis.org>
A bug was reported about an infinite recursion caused by tracing the rcu
events with the kernel stack trace trigger enabled. The stack trace code
called back into RCU which then called the stack trace again.
Expand the ftrace recursion protection to add a set of bits to protect
events from recursion. Each bit represents the context that the event is
in (normal, softirq, interrupt and NMI).
Have the stack trace code use the interrupt context to protect against
recursion.
Note, the bug showed an issue in both the RCU code as well as the tracing
stacktrace code. This only handles the tracing stack trace side of the
bug. The RCU fix will be handled separately.
Link: https://lore.kernel.org/all/20260102122807.7025fc87@gandalf.local.home/
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Link: https://patch.msgid.link/20260105203141.515cd49f@gandalf.local.home
Reported-by: Yao Kai <yaokai34(a)huawei.com>
Tested-by: Yao Kai <yaokai34(a)huawei.com>
Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in __rcu_read_unlock()")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/trace_recursion.h | 9 +++++++++
kernel/trace/trace.c | 6 ++++++
2 files changed, 15 insertions(+)
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index ae04054a1be3..e6ca052b2a85 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -34,6 +34,13 @@ enum {
TRACE_INTERNAL_SIRQ_BIT,
TRACE_INTERNAL_TRANSITION_BIT,
+ /* Internal event use recursion bits */
+ TRACE_INTERNAL_EVENT_BIT,
+ TRACE_INTERNAL_EVENT_NMI_BIT,
+ TRACE_INTERNAL_EVENT_IRQ_BIT,
+ TRACE_INTERNAL_EVENT_SIRQ_BIT,
+ TRACE_INTERNAL_EVENT_TRANSITION_BIT,
+
TRACE_BRANCH_BIT,
/*
* Abuse of the trace_recursion.
@@ -58,6 +65,8 @@ enum {
#define TRACE_LIST_START TRACE_INTERNAL_BIT
+#define TRACE_EVENT_START TRACE_INTERNAL_EVENT_BIT
+
#define TRACE_CONTEXT_MASK ((1 << (TRACE_LIST_START + TRACE_CONTEXT_BITS)) - 1)
/*
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6f2148df14d9..aef9058537d5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3012,6 +3012,11 @@ static void __ftrace_trace_stack(struct trace_array *tr,
struct ftrace_stack *fstack;
struct stack_entry *entry;
int stackidx;
+ int bit;
+
+ bit = trace_test_and_set_recursion(_THIS_IP_, _RET_IP_, TRACE_EVENT_START);
+ if (bit < 0)
+ return;
/*
* Add one, for this function and the call to save_stack_trace()
@@ -3080,6 +3085,7 @@ static void __ftrace_trace_stack(struct trace_array *tr,
/* Again, don't let gcc optimize things here */
barrier();
__this_cpu_dec(ftrace_stack_reserve);
+ trace_clear_recursion(bit);
}
static inline void ftrace_trace_stack(struct trace_array *tr,
--
2.51.0
From: Steven Rostedt <rostedt(a)goodmis.org>
The code has integrity checks to make sure that depth never goes below
zero. But the depth field has recently been converted to unsigned long
from "int" (for alignment reasons). As unsigned long can never be less
than zero, the integrity checks no longer work.
Convert depth to long from unsigned long to allow the integrity checks to
work again.
Cc: stable(a)vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: pengdonglin <pengdonglin(a)xiaomi.com>
Link: https://patch.msgid.link/20260102143148.251c2e16@gandalf.local.home
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/all/aS6kGi0maWBl-MjZ@stanley.mountain/
Fixes: f83ac7544fbf7 ("function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
---
include/linux/ftrace.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 770f0dc993cc..a3a8989e3268 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1167,7 +1167,7 @@ static inline void ftrace_init(void) { }
*/
struct ftrace_graph_ent {
unsigned long func; /* Current function */
- unsigned long depth;
+ long depth; /* signed to check for less than zero */
} __packed;
/*
--
2.51.0
If a process receives a signal while it executes some kernel code that
calls mm_take_all_locks, we get -EINTR error. The -EINTR is propagated up
the call stack to userspace and userspace may fail if it gets this
error.
This commit changes -EINTR to -ERESTARTSYS, so that if the signal handler
was installed with the SA_RESTART flag, the operation is automatically
restarted.
For example, this problem happens when using OpenCL on AMDGPU. If some
signal races with clGetDeviceIDs, clGetDeviceIDs returns an error
CL_DEVICE_NOT_FOUND (and strace shows that open("/dev/kfd") failed with
EINTR).
This problem can be reproduced with the following program.
To run this program, you need AMD graphics card and the package
"rocm-opencl" installed. You must not have the package "mesa-opencl-icd"
installed, because it redirects the default OpenCL implementation to
itself.
include <stdio.h>
include <stdlib.h>
include <unistd.h>
include <string.h>
include <signal.h>
include <sys/time.h>
define CL_TARGET_OPENCL_VERSION 300
include <CL/opencl.h>
static void fn(void)
{
while (1) {
int32_t err;
cl_device_id device;
err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
if (err != CL_SUCCESS) {
fprintf(stderr, "clGetDeviceIDs failed: %d\n", err);
exit(1);
}
write(2, "-", 1);
}
}
static void alrm(int sig)
{
write(2, ".", 1);
}
int main(void)
{
struct itimerval it;
struct sigaction sa;
memset(&sa, 0, sizeof sa);
sa.sa_handler = alrm;
sa.sa_flags = SA_RESTART;
sigaction(SIGALRM, &sa, NULL);
it.it_interval.tv_sec = 0;
it.it_interval.tv_usec = 50;
it.it_value.tv_sec = 0;
it.it_value.tv_usec = 50;
setitimer(ITIMER_REAL, &it, NULL);
fn();
return 1;
}
I'm submitting this patch for the stable kernels, because the AMD ROCm
stack fails if it receives EINTR from open (it seems to restart EINTR
from ioctl correctly). The process may receive signals at unpredictable
times, so the OpenCL implementation may fail at unpredictable times.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Link: https://lists.freedesktop.org/archives/amd-gfx/2025-November/133141.html
Link: https://yhbt.net/lore/linux-mm/6f16b618-26fc-3031-abe8-65c2090262e7@redhat.…
Cc: stable(a)vger.kernel.org
Fixes: 7906d00cd1f6 ("mmu-notifiers: add mm_take_all_locks() operation")
---
mm/vma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: mm/mm/vma.c
===================================================================
--- mm.orig/mm/vma.c 2026-01-07 20:11:21.000000000 +0100
+++ mm/mm/vma.c 2026-01-07 20:11:21.000000000 +0100
@@ -2202,7 +2202,7 @@ int mm_take_all_locks(struct mm_struct *
out_unlock:
mm_drop_all_locks(mm);
- return -EINTR;
+ return -ERESTARTSYS;
}
static void vm_unlock_anon_vma(struct anon_vma *anon_vma)