From: Fushuai Wang wangfushuai@baidu.com
Problem ------- With CONFIG_X86_DEBUG_FPU enabled, reading /proc/[kthread]/arch_status causes a kernel NULL pointer dereference.
Kernel threads aren't expected to access the FPU state directly. Kernel usage of FPU registers is contained within kernel_fpu_begin()/_end() sections.
However, to report AVX-512 usage, the avx512_timestamp variable within struct fpu needs to be accessed, which triggers a warning in x86_task_fpu().
For Kthreads: proc_pid_arch_status() avx512_status() x86_task_fpu() => Warning and returns NULL x86_task_fpu()->avx512_timestamp => NULL dereference
The warning is a false alarm in this case, since the access isn't intended for modifying the FPU state. All kernel threads (except the init_task) have a "struct fpu" with an accessible avx512_timestamp variable. The init_task (PID 0) never follows this path since it is not exposed in /proc.
Solution -------- One option is to get rid of the warning in x86_task_fpu() for kernel threads. However, that warning was recently added and might be useful to catch any potential misuse of the FPU state in kernel threads.
Another option is to avoid the access altogether. The kernel does not track AVX-512 usage for kernel threads. save_fpregs_to_fpstate()->update_avx_timestamp() is never invoked for kernel threads, so avx512_timestamp is always guaranteed to be 0.
Also, the legacy behavior of reporting "AVX512_elapsed_ms: -1", which signifies "no AVX-512 usage", is misleading. The kernel usage just isn't tracked.
Update the ABI for kernel threads and do not report AVX-512 usage for them. Not having a value in the file avoids the NULL dereference as well as the misleading report.
Suggested-by: Dave Hansen dave.hansen@intel.com Fixes: 22aafe3bcb67 ("x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks") Cc: stable@vger.kernel.org Signed-off-by: Fushuai Wang wangfushuai@baidu.com Co-developed-by: Sohil Mehta sohil.mehta@intel.com Signed-off-by: Sohil Mehta sohil.mehta@intel.com --- v3: - Do not report anything for kernel threads. (DaveH) - Make the commit message more precise.
v2: https://lore.kernel.org/lkml/20250721215302.3562784-1-sohil.mehta@intel.com/ - Avoid making the fix dependent on CONFIG_X86_DEBUG_FPU. - Include PF_USER_WORKER in the kernel thread check. - Update commit message for clarity. --- arch/x86/kernel/fpu/xstate.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 9aa9ac8399ae..b90b2eec8fb8 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1855,19 +1855,20 @@ long fpu_xstate_prctl(int option, unsigned long arg2) #ifdef CONFIG_PROC_PID_ARCH_STATUS /* * Report the amount of time elapsed in millisecond since last AVX512 - * use in the task. + * use in the task. Report -1 if no AVX-512 usage. */ static void avx512_status(struct seq_file *m, struct task_struct *task) { - unsigned long timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp); - long delta; + unsigned long timestamp; + long delta = -1;
- if (!timestamp) { - /* - * Report -1 if no AVX512 usage - */ - delta = -1; - } else { + /* AVX-512 usage is not tracked for kernel threads. */ + if (task->flags & (PF_KTHREAD | PF_USER_WORKER)) + return; + + timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp); + + if (timestamp) { delta = (long)(jiffies - timestamp); /* * Cap to LONG_MAX if time difference > LONG_MAX