On 10/13/2025 4:05 PM, Peter Zijlstra wrote:
On Mon, Oct 13, 2025 at 10:34:27AM +0800, Mi, Dapeng wrote:
It looks the issue described in the link (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...) happens again but in a different way. :(
As the commit message above link described, cpu-clock (and task-clock) is a specific SW event which rely on hrtimer. The hrtimer handler calls __perf_event_overflow() and then event_stop (cpu_clock_event_stop()) and eventually call hrtimer_cancel() which traps into a dead loop which waits for the calling hrtimer handler finishes.
As the change (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...), it should be enough to just disable the event and don't need an extra event stop.
@Octavia, could you please check if the change below can fix this issue? Thanks.
diff --git a/kernel/events/core.c b/kernel/events/core.c index 7541f6f85fcb..883b0e1fa5d3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10343,7 +10343,20 @@ static int __perf_event_overflow(struct perf_event *event, ret = 1; event->pending_kill = POLL_HUP; perf_event_disable_inatomic(event); - event->pmu->stop(event, 0);
+ /* + * The cpu-clock and task-clock are two special SW events, + * which rely on the hrtimer. The __perf_event_overflow() + * is invoked from the hrtimer handler for these 2 events. + * Avoid to call event_stop()->hrtimer_cancel() for these + * 2 events since hrtimer_cancel() waits for the hrtimer + * handler to finish, which would trigger a deadlock. + * Only disabling the events is enough to stop the hrtimer. + * See perf_swevent_cancel_hrtimer(). + */ + if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK && + event->attr.config != PERF_COUNT_SW_TASK_CLOCK) + event->pmu->stop(event, 0);
This is broken though; you cannot test config without first knowing which PMU you're dealing with.
Ah, yes. Just ignore this.
Also, that timer really should get stopped, we can't know for certain this overflow is of the timer itself or not, it could be a related event.
Something like the below might do -- but please carefully consider the cases where hrtimer_try_to_cancel() might fail; in those cases we'll have set HES_STOPPED and the hrtimer callback *SHOULD* observe this and NORESTART.
But I didn't check all the details.
The only reason that hrtimer_try_to_cancel() could fail is that the hrtimer callback is currently executing, so current change should be fine.
diff --git a/kernel/events/core.c b/kernel/events/core.c index 820127536e62..a91481d57841 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11756,7 +11756,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer) event = container_of(hrtimer, struct perf_event, hw.hrtimer);
- if (event->state != PERF_EVENT_STATE_ACTIVE)
- if (event->state != PERF_EVENT_STATE_ACTIVE ||
return HRTIMER_NORESTART;event->hw.state & PERF_HES_STOPPED)event->pmu->read(event); @@ -11810,7 +11811,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event) ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer); local64_set(&hwc->period_left, ktime_to_ns(remaining));
hrtimer_cancel(&hwc->hrtimer);
}hrtimer_try_to_cancel(&hwc->hrtimer);} @@ -11854,12 +11855,14 @@ static void cpu_clock_event_update(struct perf_event *event) static void cpu_clock_event_start(struct perf_event *event, int flags) {
- event->hw.state = 0; local64_set(&event->hw.prev_count, local_clock()); perf_swevent_start_hrtimer(event);
} static void cpu_clock_event_stop(struct perf_event *event, int flags) {
- event->hw.state = PERF_HES_STOPPED; perf_swevent_cancel_hrtimer(event); if (flags & PERF_EF_UPDATE) cpu_clock_event_update(event);
Besides cpu-clock, task-clock should need similar change as well. I would post a complete change later.