While running LTP test cases splice07 and perf_event_open01 found following kernel BUG running on arm64 device juno-r2 and qemu-arm64 on the Linux next-20240812 and next-20240813 tag.
GOOD: next-20240809 BAD: next-20240812
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Test log: -------- [ 2278.760258] check_preemption_disabled: 15 callbacks suppressed [ 2278.760282] BUG: using smp_processor_id() in preemptible [00000000] code: perf_event_open/111076 [ 2278.775032] caller is debug_smp_processor_id+0x20/0x30 [ 2278.780270] CPU: 5 UID: 0 PID: 111076 Comm: perf_event_open Not tainted 6.11.0-rc3-next-20240812 #1 [ 2278.789344] Hardware name: ARM Juno development board (r2) (DT) [ 2278.795276] Call trace: [ 2278.797724] dump_backtrace+0x9c/0x128 [ 2278.801487] show_stack+0x20/0x38 [ 2278.804812] dump_stack_lvl+0xbc/0xd0 [ 2278.808487] dump_stack+0x18/0x28 [ 2278.811811] check_preemption_disabled+0xd8/0xf8 [ 2278.816446] debug_smp_processor_id+0x20/0x30 [ 2278.820818] event_function_call+0x54/0x168 [ 2278.825015] _perf_event_enable+0x78/0xa8 [ 2278.829037] perf_event_for_each_child+0x44/0xa0 [ 2278.833672] _perf_ioctl+0x1bc/0xae0 [ 2278.837262] perf_ioctl+0x58/0x90 [ 2278.840590] __arm64_sys_ioctl+0xb4/0x100 [ 2278.844615] invoke_syscall+0x50/0x120 [ 2278.848381] el0_svc_common.constprop.0+0x48/0xf0 [ 2278.853103] do_el0_svc+0x24/0x38 [ 2278.856432] el0_svc+0x3c/0x108 [ 2278.859585] el0t_64_sync_handler+0x120/0x130 [ 2278.863956] el0t_64_sync+0x190/0x198 [ 2279.068732] BUG: using smp_processor_id() in preemptible [00000000] code: perf_event_open/111076 [ 2279.077570] caller is debug_smp_processor_id+0x20/0x30 [ 2279.082754] CPU: 1 UID: 0 PID: 111076 Comm: perf_event_open Not tainted 6.11.0-rc3-next-20240812 #1 [ 2279.091823] Hardware name: ARM Juno development board (r2) (DT)
Full test log: --------- - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240813/tes... - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240813/tes... - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240812/tes...
metadata: artifact-location: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKH... build-url: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKH... build_name: gcc-13-lkftconfig-debug git_describe: next-20240812 git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: 9e6869691724b12e1f43655eeedc35fade38120c kernel-config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKH... kernel_version: 6.11.0-rc3 toolchain: gcc-13
-- Linaro LKFT https://lkft.linaro.org
Hello,
On Tue, Aug 13, 2024 at 08:49:59PM +0530, Naresh Kamboju wrote:
While running LTP test cases splice07 and perf_event_open01 found following kernel BUG running on arm64 device juno-r2 and qemu-arm64 on the Linux next-20240812 and next-20240813 tag.
GOOD: next-20240809 BAD: next-20240812
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Test log:
[ 2278.760258] check_preemption_disabled: 15 callbacks suppressed [ 2278.760282] BUG: using smp_processor_id() in preemptible [00000000] code: perf_event_open/111076 [ 2278.775032] caller is debug_smp_processor_id+0x20/0x30 [ 2278.780270] CPU: 5 UID: 0 PID: 111076 Comm: perf_event_open Not tainted 6.11.0-rc3-next-20240812 #1 [ 2278.789344] Hardware name: ARM Juno development board (r2) (DT) [ 2278.795276] Call trace: [ 2278.797724] dump_backtrace+0x9c/0x128 [ 2278.801487] show_stack+0x20/0x38 [ 2278.804812] dump_stack_lvl+0xbc/0xd0 [ 2278.808487] dump_stack+0x18/0x28 [ 2278.811811] check_preemption_disabled+0xd8/0xf8 [ 2278.816446] debug_smp_processor_id+0x20/0x30 [ 2278.820818] event_function_call+0x54/0x168 [ 2278.825015] _perf_event_enable+0x78/0xa8 [ 2278.829037] perf_event_for_each_child+0x44/0xa0 [ 2278.833672] _perf_ioctl+0x1bc/0xae0 [ 2278.837262] perf_ioctl+0x58/0x90 [ 2278.840590] __arm64_sys_ioctl+0xb4/0x100 [ 2278.844615] invoke_syscall+0x50/0x120 [ 2278.848381] el0_svc_common.constprop.0+0x48/0xf0 [ 2278.853103] do_el0_svc+0x24/0x38 [ 2278.856432] el0_svc+0x3c/0x108 [ 2278.859585] el0t_64_sync_handler+0x120/0x130 [ 2278.863956] el0t_64_sync+0x190/0x198 [ 2279.068732] BUG: using smp_processor_id() in preemptible [00000000] code: perf_event_open/111076 [ 2279.077570] caller is debug_smp_processor_id+0x20/0x30 [ 2279.082754] CPU: 1 UID: 0 PID: 111076 Comm: perf_event_open Not tainted 6.11.0-rc3-next-20240812 #1 [ 2279.091823] Hardware name: ARM Juno development board (r2) (DT)
Full test log:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240813/tes...
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240813/tes...
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240812/tes...
metadata: artifact-location: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKH... build-url: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKH... build_name: gcc-13-lkftconfig-debug git_describe: next-20240812 git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: 9e6869691724b12e1f43655eeedc35fade38120c kernel-config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKH... kernel_version: 6.11.0-rc3 toolchain: gcc-13
Thanks for the report, can you please check if it solves the problem?
Thanks, Namhyung
--- diff --git a/kernel/events/core.c b/kernel/events/core.c index 9893ba5e98aa..85204c2376fa 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -298,13 +298,14 @@ static int event_function(void *info) static void event_function_call(struct perf_event *event, event_f func, void *data) { struct perf_event_context *ctx = event->ctx; - struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); + struct perf_cpu_context *cpuctx; struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */ struct event_function_struct efs = { .event = event, .func = func, .data = data, }; + unsigned long flags;
if (!event->parent) { /* @@ -327,22 +328,27 @@ static void event_function_call(struct perf_event *event, event_f func, void *da if (!task_function_call(task, event_function, &efs)) return;
+ local_irq_save(flags); + cpuctx = this_cpu_ptr(&perf_cpu_context); + perf_ctx_lock(cpuctx, ctx); /* * Reload the task pointer, it might have been changed by * a concurrent perf_event_context_sched_out(). */ task = ctx->task; - if (task == TASK_TOMBSTONE) { - perf_ctx_unlock(cpuctx, ctx); - return; - } + if (task == TASK_TOMBSTONE) + goto out; + if (ctx->is_active) { perf_ctx_unlock(cpuctx, ctx); + local_irq_restore(flags); goto again; } func(event, NULL, ctx, data); +out: perf_ctx_unlock(cpuctx, ctx); + local_irq_restore(flags); }
/*
On Tue, Aug 13, 2024 at 11:28:54AM -0700, Namhyung Kim wrote:
Duh, yeah.
diff --git a/kernel/events/core.c b/kernel/events/core.c index 9893ba5e98aa..85204c2376fa 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -298,13 +298,14 @@ static int event_function(void *info) static void event_function_call(struct perf_event *event, event_f func, void *data) { struct perf_event_context *ctx = event->ctx;
- struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
- struct perf_cpu_context *cpuctx; struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */ struct event_function_struct efs = { .event = event, .func = func, .data = data, };
- unsigned long flags;
if (!event->parent) { /* @@ -327,22 +328,27 @@ static void event_function_call(struct perf_event *event, event_f func, void *da if (!task_function_call(task, event_function, &efs)) return;
- local_irq_save(flags);
This can just be local_irq_disable() though, seeing how the fingered commit replaced raw_spin_lock_irq().
I'll queue the below...
--- Subject: perf: Really fix event_function_call() locking From: Namhyung Kim namhyung@kernel.org Date: Tue Aug 13 22:55:11 CEST 2024
Commit 558abc7e3f89 ("perf: Fix event_function_call() locking") lost IRQ disabling by mistake.
Fixes: 558abc7e3f89 ("perf: Fix event_function_call() locking") Reported-by: Pengfei Xu pengfei.xu@intel.com Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org --- kernel/events/core.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -298,8 +298,8 @@ static int event_function(void *info) static void event_function_call(struct perf_event *event, event_f func, void *data) { struct perf_event_context *ctx = event->ctx; - struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */ + struct perf_cpu_context *cpuctx; struct event_function_struct efs = { .event = event, .func = func, @@ -327,22 +327,25 @@ static void event_function_call(struct p if (!task_function_call(task, event_function, &efs)) return;
+ local_irq_disable(); + cpuctx = this_cpu_ptr(&perf_cpu_context); perf_ctx_lock(cpuctx, ctx); /* * Reload the task pointer, it might have been changed by * a concurrent perf_event_context_sched_out(). */ task = ctx->task; - if (task == TASK_TOMBSTONE) { - perf_ctx_unlock(cpuctx, ctx); - return; - } + if (task == TASK_TOMBSTONE) + goto unlock; if (ctx->is_active) { perf_ctx_unlock(cpuctx, ctx); + local_irq_enable(); goto again; } func(event, NULL, ctx, data); +unlock: perf_ctx_unlock(cpuctx, ctx); + local_irq_enable(); }
/*
Hi Peter and Kim,
I tested this patch on top of 6.11.0-rc3-next-20240812. This issue can not be reproduced in syzkaller reproducer.
Best Regards, Thanks!
On 2024-08-13 at 23:02:09 +0200, Peter Zijlstra wrote:
On Tue, Aug 13, 2024 at 11:28:54AM -0700, Namhyung Kim wrote:
Duh, yeah.
diff --git a/kernel/events/core.c b/kernel/events/core.c index 9893ba5e98aa..85204c2376fa 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -298,13 +298,14 @@ static int event_function(void *info) static void event_function_call(struct perf_event *event, event_f func, void *data) { struct perf_event_context *ctx = event->ctx;
- struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
- struct perf_cpu_context *cpuctx; struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */ struct event_function_struct efs = { .event = event, .func = func, .data = data, };
- unsigned long flags;
if (!event->parent) { /* @@ -327,22 +328,27 @@ static void event_function_call(struct perf_event *event, event_f func, void *da if (!task_function_call(task, event_function, &efs)) return;
- local_irq_save(flags);
This can just be local_irq_disable() though, seeing how the fingered commit replaced raw_spin_lock_irq().
I'll queue the below...
Subject: perf: Really fix event_function_call() locking From: Namhyung Kim namhyung@kernel.org Date: Tue Aug 13 22:55:11 CEST 2024
Commit 558abc7e3f89 ("perf: Fix event_function_call() locking") lost IRQ disabling by mistake.
Fixes: 558abc7e3f89 ("perf: Fix event_function_call() locking") Reported-by: Pengfei Xu pengfei.xu@intel.com Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org
kernel/events/core.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -298,8 +298,8 @@ static int event_function(void *info) static void event_function_call(struct perf_event *event, event_f func, void *data) { struct perf_event_context *ctx = event->ctx;
- struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */
- struct perf_cpu_context *cpuctx; struct event_function_struct efs = { .event = event, .func = func,
@@ -327,22 +327,25 @@ static void event_function_call(struct p if (!task_function_call(task, event_function, &efs)) return;
- local_irq_disable();
- cpuctx = this_cpu_ptr(&perf_cpu_context); perf_ctx_lock(cpuctx, ctx); /*
*/ task = ctx->task;
- Reload the task pointer, it might have been changed by
- a concurrent perf_event_context_sched_out().
- if (task == TASK_TOMBSTONE) {
perf_ctx_unlock(cpuctx, ctx);
return;
- }
- if (task == TASK_TOMBSTONE)
if (ctx->is_active) { perf_ctx_unlock(cpuctx, ctx);goto unlock;
goto again; } func(event, NULL, ctx, data);local_irq_enable();
+unlock: perf_ctx_unlock(cpuctx, ctx);
- local_irq_enable();
} /*