[REGRESSION] bisected: perf: hang when using async-profiler caused by perf: Fix the POLL_HUP delivery breakage

List overview All Threads
Download

newer

older

+...

[PATCH net v3 0/6] Intel Wired LAN...

Octavia Togami

11 Oct 2025 11 Oct '25

8:31 a.m.

Using async-profiler (https://github.com/async-profiler/async-profiler/) on Linux 6.17.1-arch1-1 causes a complete hang of the CPU. This has been reported by many people at https://github.com/lucko/spark/issues/530. spark is a piece of software that uses async-profiler internally.

As seen in https://github.com/lucko/spark/issues/530#issuecomment-3339974827, this was bisected to 18dbcbfabfffc4a5d3ea10290c5ad27f22b0d240 perf: Fix the POLL_HUP delivery breakage. Reverting this commit on 6.17.1 fixed the issue for me.

Steps to reproduce: 1. Get a copy of async-profiler. I tested both v3 (affects older spark versions) and v4.1 (latest at time of writing). Unarchive it, this is <async-profiler-dir>. 2. Set kernel parameters kernel.perf_event_paranoid=1 and kernel.kptr_restrict=0 as instructed by https://github.com/async-profiler/async-profiler/blob/fb673227c7fb311f872ce9... 3. Install a version of Java that comes with jshell, i.e. Java 9 or newer. Note: jshell is used for ease of reproduction. Any Java application that is actively running will work. 4. Run `printf 'int acc; while (true) { acc++; }' | jshell -`. This will start an infinitely running Java process. 5. Run `jps` and take the PID next to the text RemoteExecutionControl -- this is the process that was just started. 6. Attach async-profiler to this process by running `<async-profiler-dir>/bin/asprof -d 1 <PID>`. This will run for one second, then the system should freeze entirely shortly thereafter.

I triggered a sysrq crash while the system was frozen, and the output I found in journalctl afterwards is at https://gist.github.com/octylFractal/76611ee76060051e5efc0c898dd0949e I'm not sure if that text is actually from the triggered crash, but it seems relevant. If needed, please tell me how to get the actual crash report, I'm not sure where it is.

I'm using an AMD Ryzen 9 5900X 12-Core Processor. Given that I've seen no Intel reports, it may be AMD specific. I don't have an Intel CPU on hand to test with.

/proc/version: Linux version 6.17.1-arch1-1 (linux@archlinux) (gcc (GCC) 15.2.1 20250813, GNU ld (GNU Binutils) 2.45.0) #1 SMP PREEMPT_DYNAMIC Mon, 06 Oct 2025 18:48:29 +0000 Operating System: Arch Linux uname -mi: x86_64 unknown

Show replies by date

Mi, Dapeng

13 Oct 13 Oct

2:34 a.m.

On 10/11/2025 4:31 PM, Octavia Togami wrote:

...

Using async-profiler (https://github.com/async-profiler/async-profiler/) on Linux 6.17.1-arch1-1 causes a complete hang of the CPU. This has been reported by many people at https://github.com/lucko/spark/issues/530. spark is a piece of software that uses async-profiler internally.

As seen in https://github.com/lucko/spark/issues/530#issuecomment-3339974827, this was bisected to 18dbcbfabfffc4a5d3ea10290c5ad27f22b0d240 perf: Fix the POLL_HUP delivery breakage. Reverting this commit on 6.17.1 fixed the issue for me.

Steps to reproduce:

Get a copy of async-profiler. I tested both v3 (affects older spark

versions) and v4.1 (latest at time of writing). Unarchive it, this is <async-profiler-dir>. 2. Set kernel parameters kernel.perf_event_paranoid=1 and kernel.kptr_restrict=0 as instructed by https://github.com/async-profiler/async-profiler/blob/fb673227c7fb311f872ce9... 3. Install a version of Java that comes with jshell, i.e. Java 9 or newer. Note: jshell is used for ease of reproduction. Any Java application that is actively running will work. 4. Run `printf 'int acc; while (true) { acc++; }' | jshell -`. This will start an infinitely running Java process. 5. Run `jps` and take the PID next to the text RemoteExecutionControl -- this is the process that was just started. 6. Attach async-profiler to this process by running `<async-profiler-dir>/bin/asprof -d 1 <PID>`. This will run for one second, then the system should freeze entirely shortly thereafter.

I triggered a sysrq crash while the system was frozen, and the output I found in journalctl afterwards is at https://gist.github.com/octylFractal/76611ee76060051e5efc0c898dd0949e I'm not sure if that text is actually from the triggered crash, but it seems relevant. If needed, please tell me how to get the actual crash report, I'm not sure where it is.

I'm using an AMD Ryzen 9 5900X 12-Core Processor. Given that I've seen no Intel reports, it may be AMD specific. I don't have an Intel CPU on hand to test with.

/proc/version: Linux version 6.17.1-arch1-1 (linux@archlinux) (gcc (GCC) 15.2.1 20250813, GNU ld (GNU Binutils) 2.45.0) #1 SMP PREEMPT_DYNAMIC Mon, 06 Oct 2025 18:48:29 +0000 Operating System: Arch Linux uname -mi: x86_64 unknown

It looks the issue described in the link (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...) happens again but in a different way. :(

As the commit message above link described, cpu-clock (and task-clock) is a specific SW event which rely on hrtimer. The hrtimer handler calls __perf_event_overflow() and then event_stop (cpu_clock_event_stop()) and eventually call hrtimer_cancel() which traps into a dead loop which waits for the calling hrtimer handler finishes.

As the change (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...), it should be enough to just disable the event and don't need an extra event stop.

@Octavia, could you please check if the change below can fix this issue? Thanks.

diff --git a/kernel/events/core.c b/kernel/events/core.c index 7541f6f85fcb..883b0e1fa5d3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10343,7 +10343,20 @@ static int __perf_event_overflow(struct perf_event *event, ret = 1; event->pending_kill = POLL_HUP; perf_event_disable_inatomic(event); - event->pmu->stop(event, 0); + + /* + * The cpu-clock and task-clock are two special SW events, + * which rely on the hrtimer. The __perf_event_overflow() + * is invoked from the hrtimer handler for these 2 events. + * Avoid to call event_stop()->hrtimer_cancel() for these + * 2 events since hrtimer_cancel() waits for the hrtimer + * handler to finish, which would trigger a deadlock. + * Only disabling the events is enough to stop the hrtimer. + * See perf_swevent_cancel_hrtimer(). + */ + if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK && + event->attr.config != PERF_COUNT_SW_TASK_CLOCK) + event->pmu->stop(event, 0); }

if (event->attr.sigtrap) {

Octavia Togami

6:55 a.m.

That change appears to fix the problem on my end. I ran my reproducer and some other tests multiple times without issue.

On Sun, Oct 12, 2025 at 7:34 PM Mi, Dapeng dapeng1.mi@linux.intel.com wrote:

...

On 10/11/2025 4:31 PM, Octavia Togami wrote:

...
Using async-profiler (https://github.com/async-profiler/async-profiler/) on Linux 6.17.1-arch1-1 causes a complete hang of the CPU. This has been reported by many people at https://github.com/lucko/spark/issues/530. spark is a piece of software that uses async-profiler internally.

As seen in https://github.com/lucko/spark/issues/530#issuecomment-3339974827, this was bisected to 18dbcbfabfffc4a5d3ea10290c5ad27f22b0d240 perf: Fix the POLL_HUP delivery breakage. Reverting this commit on 6.17.1 fixed the issue for me.

Steps to reproduce:

Get a copy of async-profiler. I tested both v3 (affects older spark

versions) and v4.1 (latest at time of writing). Unarchive it, this is <async-profiler-dir>. 2. Set kernel parameters kernel.perf_event_paranoid=1 and kernel.kptr_restrict=0 as instructed by https://github.com/async-profiler/async-profiler/blob/fb673227c7fb311f872ce9... 3. Install a version of Java that comes with jshell, i.e. Java 9 or newer. Note: jshell is used for ease of reproduction. Any Java application that is actively running will work. 4. Run `printf 'int acc; while (true) { acc++; }' | jshell -`. This will start an infinitely running Java process. 5. Run `jps` and take the PID next to the text RemoteExecutionControl -- this is the process that was just started. 6. Attach async-profiler to this process by running `<async-profiler-dir>/bin/asprof -d 1 <PID>`. This will run for one second, then the system should freeze entirely shortly thereafter.

I triggered a sysrq crash while the system was frozen, and the output I found in journalctl afterwards is at https://gist.github.com/octylFractal/76611ee76060051e5efc0c898dd0949e I'm not sure if that text is actually from the triggered crash, but it seems relevant. If needed, please tell me how to get the actual crash report, I'm not sure where it is.

I'm using an AMD Ryzen 9 5900X 12-Core Processor. Given that I've seen no Intel reports, it may be AMD specific. I don't have an Intel CPU on hand to test with.

/proc/version: Linux version 6.17.1-arch1-1 (linux@archlinux) (gcc (GCC) 15.2.1 20250813, GNU ld (GNU Binutils) 2.45.0) #1 SMP PREEMPT_DYNAMIC Mon, 06 Oct 2025 18:48:29 +0000 Operating System: Arch Linux uname -mi: x86_64 unknown

It looks the issue described in the link (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...) happens again but in a different way. :(

As the commit message above link described, cpu-clock (and task-clock) is a specific SW event which rely on hrtimer. The hrtimer handler calls __perf_event_overflow() and then event_stop (cpu_clock_event_stop()) and eventually call hrtimer_cancel() which traps into a dead loop which waits for the calling hrtimer handler finishes.

As the change (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...), it should be enough to just disable the event and don't need an extra event stop.

@Octavia, could you please check if the change below can fix this issue? Thanks.

diff --git a/kernel/events/core.c b/kernel/events/core.c index 7541f6f85fcb..883b0e1fa5d3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10343,7 +10343,20 @@ static int __perf_event_overflow(struct perf_event *event, ret = 1; event->pending_kill = POLL_HUP; perf_event_disable_inatomic(event);
          event->pmu->stop(event, 0);
          /*
           * The cpu-clock and task-clock are two special SW events,
           * which rely on the hrtimer. The __perf_event_overflow()
           * is invoked from the hrtimer handler for these 2 events.
           * Avoid to call event_stop()->hrtimer_cancel() for these
           * 2 events since hrtimer_cancel() waits for the hrtimer
           * handler to finish, which would trigger a deadlock.
           * Only disabling the events is enough to stop the hrtimer.
           * See perf_swevent_cancel_hrtimer().
           */
          if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK &&
              event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
                  event->pmu->stop(event, 0);
  }

  if (event->attr.sigtrap) {

Mi, Dapeng

14 Oct 14 Oct

6:40 a.m.

On 10/13/2025 2:55 PM, Octavia Togami wrote:

...

That change appears to fix the problem on my end. I ran my reproducer and some other tests multiple times without issue.

@Octavia Thanks for checking this patch. But following Peter's comments, we need to update the fix. So could you please re-test the below changes? Thanks.

diff --git a/kernel/events/core.c b/kernel/events/core.c index 7541f6f85fcb..ed236b8bbcaa 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11773,7 +11773,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)

event = container_of(hrtimer, struct perf_event, hw.hrtimer);

- if (event->state != PERF_EVENT_STATE_ACTIVE) + if (event->state != PERF_EVENT_STATE_ACTIVE || + event->hw.state & PERF_HES_STOPPED) return HRTIMER_NORESTART;

event->pmu->read(event); @@ -11827,7 +11828,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event) ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer); local64_set(&hwc->period_left, ktime_to_ns(remaining));

- hrtimer_cancel(&hwc->hrtimer); + hrtimer_try_to_cancel(&hwc->hrtimer); } }

@@ -11871,12 +11872,14 @@ static void cpu_clock_event_update(struct perf_event *event)

static void cpu_clock_event_start(struct perf_event *event, int flags) { + event->hw.state = 0; local64_set(&event->hw.prev_count, local_clock()); perf_swevent_start_hrtimer(event); }

static void cpu_clock_event_stop(struct perf_event *event, int flags) { + event->hw.state = PERF_HES_STOPPED; perf_swevent_cancel_hrtimer(event); if (flags & PERF_EF_UPDATE) cpu_clock_event_update(event); @@ -11950,12 +11953,14 @@ static void task_clock_event_update(struct perf_event *event, u64 now)

static void task_clock_event_start(struct perf_event *event, int flags) { + event->hw.state = 0; local64_set(&event->hw.prev_count, event->ctx->time); perf_swevent_start_hrtimer(event); }

static void task_clock_event_stop(struct perf_event *event, int flags) { + event->hw.state = PERF_HES_STOPPED; perf_swevent_cancel_hrtimer(event); if (flags & PERF_EF_UPDATE) task_clock_event_update(event, event->ctx->time);

...

On Sun, Oct 12, 2025 at 7:34 PM Mi, Dapeng dapeng1.mi@linux.intel.com wrote:

...
On 10/11/2025 4:31 PM, Octavia Togami wrote:

...
Using async-profiler (https://github.com/async-profiler/async-profiler/) on Linux 6.17.1-arch1-1 causes a complete hang of the CPU. This has been reported by many people at https://github.com/lucko/spark/issues/530. spark is a piece of software that uses async-profiler internally.

As seen in https://github.com/lucko/spark/issues/530#issuecomment-3339974827, this was bisected to 18dbcbfabfffc4a5d3ea10290c5ad27f22b0d240 perf: Fix the POLL_HUP delivery breakage. Reverting this commit on 6.17.1 fixed the issue for me.

Steps to reproduce:

Get a copy of async-profiler. I tested both v3 (affects older spark

versions) and v4.1 (latest at time of writing). Unarchive it, this is <async-profiler-dir>. 2. Set kernel parameters kernel.perf_event_paranoid=1 and kernel.kptr_restrict=0 as instructed by https://github.com/async-profiler/async-profiler/blob/fb673227c7fb311f872ce9... 3. Install a version of Java that comes with jshell, i.e. Java 9 or newer. Note: jshell is used for ease of reproduction. Any Java application that is actively running will work. 4. Run `printf 'int acc; while (true) { acc++; }' | jshell -`. This will start an infinitely running Java process. 5. Run `jps` and take the PID next to the text RemoteExecutionControl -- this is the process that was just started. 6. Attach async-profiler to this process by running `<async-profiler-dir>/bin/asprof -d 1 <PID>`. This will run for one second, then the system should freeze entirely shortly thereafter.

I triggered a sysrq crash while the system was frozen, and the output I found in journalctl afterwards is at https://gist.github.com/octylFractal/76611ee76060051e5efc0c898dd0949e I'm not sure if that text is actually from the triggered crash, but it seems relevant. If needed, please tell me how to get the actual crash report, I'm not sure where it is.

I'm using an AMD Ryzen 9 5900X 12-Core Processor. Given that I've seen no Intel reports, it may be AMD specific. I don't have an Intel CPU on hand to test with.

/proc/version: Linux version 6.17.1-arch1-1 (linux@archlinux) (gcc (GCC) 15.2.1 20250813, GNU ld (GNU Binutils) 2.45.0) #1 SMP PREEMPT_DYNAMIC Mon, 06 Oct 2025 18:48:29 +0000 Operating System: Arch Linux uname -mi: x86_64 unknown

It looks the issue described in the link (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...) happens again but in a different way. :(

As the commit message above link described, cpu-clock (and task-clock) is a specific SW event which rely on hrtimer. The hrtimer handler calls __perf_event_overflow() and then event_stop (cpu_clock_event_stop()) and eventually call hrtimer_cancel() which traps into a dead loop which waits for the calling hrtimer handler finishes.

As the change (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...), it should be enough to just disable the event and don't need an extra event stop.

@Octavia, could you please check if the change below can fix this issue? Thanks.

diff --git a/kernel/events/core.c b/kernel/events/core.c index 7541f6f85fcb..883b0e1fa5d3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10343,7 +10343,20 @@ static int __perf_event_overflow(struct perf_event *event, ret = 1; event->pending_kill = POLL_HUP; perf_event_disable_inatomic(event);
          event->pmu->stop(event, 0);
          /*
           * The cpu-clock and task-clock are two special SW events,
           * which rely on the hrtimer. The __perf_event_overflow()
           * is invoked from the hrtimer handler for these 2 events.
           * Avoid to call event_stop()->hrtimer_cancel() for these
           * 2 events since hrtimer_cancel() waits for the hrtimer
           * handler to finish, which would trigger a deadlock.
           * Only disabling the events is enough to stop the hrtimer.
           * See perf_swevent_cancel_hrtimer().
           */
          if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK &&
              event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
                  event->pmu->stop(event, 0);
  }

  if (event->attr.sigtrap) {

Octavia Togami

8:13 p.m.

That patch is also working fine.

On Mon, Oct 13, 2025 at 11:41 PM Mi, Dapeng dapeng1.mi@linux.intel.com wrote:

...

On 10/13/2025 2:55 PM, Octavia Togami wrote:

...
That change appears to fix the problem on my end. I ran my reproducer and some other tests multiple times without issue.

@Octavia Thanks for checking this patch. But following Peter's comments, we need to update the fix. So could you please re-test the below changes? Thanks.

diff --git a/kernel/events/core.c b/kernel/events/core.c index 7541f6f85fcb..ed236b8bbcaa 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11773,7 +11773,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
    event = container_of(hrtimer, struct perf_event, hw.hrtimer);
  if (event->state != PERF_EVENT_STATE_ACTIVE)
  if (event->state != PERF_EVENT_STATE_ACTIVE ||
      event->hw.state & PERF_HES_STOPPED)
          return HRTIMER_NORESTART;

  event->pmu->read(event);
@@ -11827,7 +11828,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event) ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer); local64_set(&hwc->period_left, ktime_to_ns(remaining));
          hrtimer_cancel(&hwc->hrtimer);
          hrtimer_try_to_cancel(&hwc->hrtimer);
  }
}

@@ -11871,12 +11872,14 @@ static void cpu_clock_event_update(struct perf_event *event)

static void cpu_clock_event_start(struct perf_event *event, int flags) {
  event->hw.state = 0;
  local64_set(&event->hw.prev_count, local_clock());
  perf_swevent_start_hrtimer(event);
}

static void cpu_clock_event_stop(struct perf_event *event, int flags) {
  event->hw.state = PERF_HES_STOPPED;
  perf_swevent_cancel_hrtimer(event);
  if (flags & PERF_EF_UPDATE)
          cpu_clock_event_update(event);
@@ -11950,12 +11953,14 @@ static void task_clock_event_update(struct perf_event *event, u64 now)

static void task_clock_event_start(struct perf_event *event, int flags) {
  event->hw.state = 0;
  local64_set(&event->hw.prev_count, event->ctx->time);
  perf_swevent_start_hrtimer(event);
}

static void task_clock_event_stop(struct perf_event *event, int flags) {
  event->hw.state = PERF_HES_STOPPED;
  perf_swevent_cancel_hrtimer(event);
  if (flags & PERF_EF_UPDATE)
          task_clock_event_update(event, event->ctx->time);
...
On Sun, Oct 12, 2025 at 7:34 PM Mi, Dapeng dapeng1.mi@linux.intel.com wrote:

...
On 10/11/2025 4:31 PM, Octavia Togami wrote:

...
Using async-profiler (https://github.com/async-profiler/async-profiler/) on Linux 6.17.1-arch1-1 causes a complete hang of the CPU. This has been reported by many people at https://github.com/lucko/spark/issues/530. spark is a piece of software that uses async-profiler internally.

As seen in https://github.com/lucko/spark/issues/530#issuecomment-3339974827, this was bisected to 18dbcbfabfffc4a5d3ea10290c5ad27f22b0d240 perf: Fix the POLL_HUP delivery breakage. Reverting this commit on 6.17.1 fixed the issue for me.

Steps to reproduce:

Get a copy of async-profiler. I tested both v3 (affects older spark

versions) and v4.1 (latest at time of writing). Unarchive it, this is <async-profiler-dir>. 2. Set kernel parameters kernel.perf_event_paranoid=1 and kernel.kptr_restrict=0 as instructed by https://github.com/async-profiler/async-profiler/blob/fb673227c7fb311f872ce9... 3. Install a version of Java that comes with jshell, i.e. Java 9 or newer. Note: jshell is used for ease of reproduction. Any Java application that is actively running will work. 4. Run `printf 'int acc; while (true) { acc++; }' | jshell -`. This will start an infinitely running Java process. 5. Run `jps` and take the PID next to the text RemoteExecutionControl -- this is the process that was just started. 6. Attach async-profiler to this process by running `<async-profiler-dir>/bin/asprof -d 1 <PID>`. This will run for one second, then the system should freeze entirely shortly thereafter.

I triggered a sysrq crash while the system was frozen, and the output I found in journalctl afterwards is at https://gist.github.com/octylFractal/76611ee76060051e5efc0c898dd0949e I'm not sure if that text is actually from the triggered crash, but it seems relevant. If needed, please tell me how to get the actual crash report, I'm not sure where it is.

I'm using an AMD Ryzen 9 5900X 12-Core Processor. Given that I've seen no Intel reports, it may be AMD specific. I don't have an Intel CPU on hand to test with.

/proc/version: Linux version 6.17.1-arch1-1 (linux@archlinux) (gcc (GCC) 15.2.1 20250813, GNU ld (GNU Binutils) 2.45.0) #1 SMP PREEMPT_DYNAMIC Mon, 06 Oct 2025 18:48:29 +0000 Operating System: Arch Linux uname -mi: x86_64 unknown

It looks the issue described in the link (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...) happens again but in a different way. :(

As the commit message above link described, cpu-clock (and task-clock) is a specific SW event which rely on hrtimer. The hrtimer handler calls __perf_event_overflow() and then event_stop (cpu_clock_event_stop()) and eventually call hrtimer_cancel() which traps into a dead loop which waits for the calling hrtimer handler finishes.

As the change (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...), it should be enough to just disable the event and don't need an extra event stop.

@Octavia, could you please check if the change below can fix this issue? Thanks.

diff --git a/kernel/events/core.c b/kernel/events/core.c index 7541f6f85fcb..883b0e1fa5d3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10343,7 +10343,20 @@ static int __perf_event_overflow(struct perf_event *event, ret = 1; event->pending_kill = POLL_HUP; perf_event_disable_inatomic(event);
          event->pmu->stop(event, 0);
          /*
           * The cpu-clock and task-clock are two special SW events,
           * which rely on the hrtimer. The __perf_event_overflow()
           * is invoked from the hrtimer handler for these 2 events.
           * Avoid to call event_stop()->hrtimer_cancel() for these
           * 2 events since hrtimer_cancel() waits for the hrtimer
           * handler to finish, which would trigger a deadlock.
           * Only disabling the events is enough to stop the hrtimer.
           * See perf_swevent_cancel_hrtimer().
           */
          if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK &&
              event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
                  event->pmu->stop(event, 0);
  }

  if (event->attr.sigtrap) {

Mi, Dapeng

15 Oct 15 Oct

12:27 a.m.

On 10/15/2025 4:13 AM, Octavia Togami wrote:

...

That patch is also working fine.

Thanks for testing this patch. I would post it.

...

On Mon, Oct 13, 2025 at 11:41 PM Mi, Dapeng dapeng1.mi@linux.intel.com wrote:

...
On 10/13/2025 2:55 PM, Octavia Togami wrote:

...
That change appears to fix the problem on my end. I ran my reproducer and some other tests multiple times without issue.

@Octavia Thanks for checking this patch. But following Peter's comments, we need to update the fix. So could you please re-test the below changes? Thanks.

diff --git a/kernel/events/core.c b/kernel/events/core.c index 7541f6f85fcb..ed236b8bbcaa 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11773,7 +11773,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
    event = container_of(hrtimer, struct perf_event, hw.hrtimer);
  if (event->state != PERF_EVENT_STATE_ACTIVE)
  if (event->state != PERF_EVENT_STATE_ACTIVE ||
      event->hw.state & PERF_HES_STOPPED)
          return HRTIMER_NORESTART;

  event->pmu->read(event);
@@ -11827,7 +11828,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event) ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer); local64_set(&hwc->period_left, ktime_to_ns(remaining));
          hrtimer_cancel(&hwc->hrtimer);
          hrtimer_try_to_cancel(&hwc->hrtimer);
  }
}

@@ -11871,12 +11872,14 @@ static void cpu_clock_event_update(struct perf_event *event)

static void cpu_clock_event_start(struct perf_event *event, int flags) {
  event->hw.state = 0;
  local64_set(&event->hw.prev_count, local_clock());
  perf_swevent_start_hrtimer(event);
}

static void cpu_clock_event_stop(struct perf_event *event, int flags) {
  event->hw.state = PERF_HES_STOPPED;
  perf_swevent_cancel_hrtimer(event);
  if (flags & PERF_EF_UPDATE)
          cpu_clock_event_update(event);
@@ -11950,12 +11953,14 @@ static void task_clock_event_update(struct perf_event *event, u64 now)

static void task_clock_event_start(struct perf_event *event, int flags) {
  event->hw.state = 0;
  local64_set(&event->hw.prev_count, event->ctx->time);
  perf_swevent_start_hrtimer(event);
}

static void task_clock_event_stop(struct perf_event *event, int flags) {
  event->hw.state = PERF_HES_STOPPED;
  perf_swevent_cancel_hrtimer(event);
  if (flags & PERF_EF_UPDATE)
          task_clock_event_update(event, event->ctx->time);
...
On Sun, Oct 12, 2025 at 7:34 PM Mi, Dapeng dapeng1.mi@linux.intel.com wrote:

...
On 10/11/2025 4:31 PM, Octavia Togami wrote:

...
Using async-profiler (https://github.com/async-profiler/async-profiler/) on Linux 6.17.1-arch1-1 causes a complete hang of the CPU. This has been reported by many people at https://github.com/lucko/spark/issues/530. spark is a piece of software that uses async-profiler internally.

As seen in https://github.com/lucko/spark/issues/530#issuecomment-3339974827, this was bisected to 18dbcbfabfffc4a5d3ea10290c5ad27f22b0d240 perf: Fix the POLL_HUP delivery breakage. Reverting this commit on 6.17.1 fixed the issue for me.

Steps to reproduce:

Get a copy of async-profiler. I tested both v3 (affects older spark

versions) and v4.1 (latest at time of writing). Unarchive it, this is <async-profiler-dir>. 2. Set kernel parameters kernel.perf_event_paranoid=1 and kernel.kptr_restrict=0 as instructed by https://github.com/async-profiler/async-profiler/blob/fb673227c7fb311f872ce9... 3. Install a version of Java that comes with jshell, i.e. Java 9 or newer. Note: jshell is used for ease of reproduction. Any Java application that is actively running will work. 4. Run `printf 'int acc; while (true) { acc++; }' | jshell -`. This will start an infinitely running Java process. 5. Run `jps` and take the PID next to the text RemoteExecutionControl -- this is the process that was just started. 6. Attach async-profiler to this process by running `<async-profiler-dir>/bin/asprof -d 1 <PID>`. This will run for one second, then the system should freeze entirely shortly thereafter.

I triggered a sysrq crash while the system was frozen, and the output I found in journalctl afterwards is at https://gist.github.com/octylFractal/76611ee76060051e5efc0c898dd0949e I'm not sure if that text is actually from the triggered crash, but it seems relevant. If needed, please tell me how to get the actual crash report, I'm not sure where it is.

I'm using an AMD Ryzen 9 5900X 12-Core Processor. Given that I've seen no Intel reports, it may be AMD specific. I don't have an Intel CPU on hand to test with.

/proc/version: Linux version 6.17.1-arch1-1 (linux@archlinux) (gcc (GCC) 15.2.1 20250813, GNU ld (GNU Binutils) 2.45.0) #1 SMP PREEMPT_DYNAMIC Mon, 06 Oct 2025 18:48:29 +0000 Operating System: Arch Linux uname -mi: x86_64 unknown

It looks the issue described in the link (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...) happens again but in a different way. :(

As the commit message above link described, cpu-clock (and task-clock) is a specific SW event which rely on hrtimer. The hrtimer handler calls __perf_event_overflow() and then event_stop (cpu_clock_event_stop()) and eventually call hrtimer_cancel() which traps into a dead loop which waits for the calling hrtimer handler finishes.

As the change (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...), it should be enough to just disable the event and don't need an extra event stop.

@Octavia, could you please check if the change below can fix this issue? Thanks.

diff --git a/kernel/events/core.c b/kernel/events/core.c index 7541f6f85fcb..883b0e1fa5d3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10343,7 +10343,20 @@ static int __perf_event_overflow(struct perf_event *event, ret = 1; event->pending_kill = POLL_HUP; perf_event_disable_inatomic(event);
          event->pmu->stop(event, 0);
          /*
           * The cpu-clock and task-clock are two special SW events,
           * which rely on the hrtimer. The __perf_event_overflow()
           * is invoked from the hrtimer handler for these 2 events.
           * Avoid to call event_stop()->hrtimer_cancel() for these
           * 2 events since hrtimer_cancel() waits for the hrtimer
           * handler to finish, which would trigger a deadlock.
           * Only disabling the events is enough to stop the hrtimer.
           * See perf_swevent_cancel_hrtimer().
           */
          if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK &&
              event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
                  event->pmu->stop(event, 0);
  }

  if (event->attr.sigtrap) {

Peter Zijlstra

13 Oct 13 Oct

8:05 a.m.

On Mon, Oct 13, 2025 at 10:34:27AM +0800, Mi, Dapeng wrote:

...

It looks the issue described in the link (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...) happens again but in a different way. :(

As the commit message above link described, cpu-clock (and task-clock) is a specific SW event which rely on hrtimer. The hrtimer handler calls __perf_event_overflow() and then event_stop (cpu_clock_event_stop()) and eventually call hrtimer_cancel() which traps into a dead loop which waits for the calling hrtimer handler finishes.

As the change (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...), it should be enough to just disable the event and don't need an extra event stop.

@Octavia, could you please check if the change below can fix this issue? Thanks.

diff --git a/kernel/events/core.c b/kernel/events/core.c index 7541f6f85fcb..883b0e1fa5d3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10343,7 +10343,20 @@ static int __perf_event_overflow(struct perf_event *event, ret = 1; event->pending_kill = POLL_HUP; perf_event_disable_inatomic(event); - event->pmu->stop(event, 0);

+ /* + * The cpu-clock and task-clock are two special SW events, + * which rely on the hrtimer. The __perf_event_overflow() + * is invoked from the hrtimer handler for these 2 events. + * Avoid to call event_stop()->hrtimer_cancel() for these + * 2 events since hrtimer_cancel() waits for the hrtimer + * handler to finish, which would trigger a deadlock. + * Only disabling the events is enough to stop the hrtimer. + * See perf_swevent_cancel_hrtimer(). + */ + if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK && + event->attr.config != PERF_COUNT_SW_TASK_CLOCK) + event->pmu->stop(event, 0);

This is broken though; you cannot test config without first knowing which PMU you're dealing with.

Also, that timer really should get stopped, we can't know for certain this overflow is of the timer itself or not, it could be a related event.

Something like the below might do -- but please carefully consider the cases where hrtimer_try_to_cancel() might fail; in those cases we'll have set HES_STOPPED and the hrtimer callback *SHOULD* observe this and NORESTART.

But I didn't check all the details.

diff --git a/kernel/events/core.c b/kernel/events/core.c index 820127536e62..a91481d57841 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11756,7 +11756,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)

event = container_of(hrtimer, struct perf_event, hw.hrtimer);

- if (event->state != PERF_EVENT_STATE_ACTIVE) + if (event->state != PERF_EVENT_STATE_ACTIVE || + event->hw.state & PERF_HES_STOPPED) return HRTIMER_NORESTART;

event->pmu->read(event); @@ -11810,7 +11811,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event) ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer); local64_set(&hwc->period_left, ktime_to_ns(remaining));

- hrtimer_cancel(&hwc->hrtimer); + hrtimer_try_to_cancel(&hwc->hrtimer); } }

@@ -11854,12 +11855,14 @@ static void cpu_clock_event_update(struct perf_event *event)

static void cpu_clock_event_start(struct perf_event *event, int flags) { + event->hw.state = 0; local64_set(&event->hw.prev_count, local_clock()); perf_swevent_start_hrtimer(event); }

Mi, Dapeng

8:38 a.m.

On 10/13/2025 4:05 PM, Peter Zijlstra wrote:

...

On Mon, Oct 13, 2025 at 10:34:27AM +0800, Mi, Dapeng wrote:

...
It looks the issue described in the link (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...) happens again but in a different way. :(

As the commit message above link described, cpu-clock (and task-clock) is a specific SW event which rely on hrtimer. The hrtimer handler calls __perf_event_overflow() and then event_stop (cpu_clock_event_stop()) and eventually call hrtimer_cancel() which traps into a dead loop which waits for the calling hrtimer handler finishes.

As the change (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.co...), it should be enough to just disable the event and don't need an extra event stop.

@Octavia, could you please check if the change below can fix this issue? Thanks.

diff --git a/kernel/events/core.c b/kernel/events/core.c index 7541f6f85fcb..883b0e1fa5d3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10343,7 +10343,20 @@ static int __perf_event_overflow(struct perf_event *event, ret = 1; event->pending_kill = POLL_HUP; perf_event_disable_inatomic(event); - event->pmu->stop(event, 0);

+ /* + * The cpu-clock and task-clock are two special SW events, + * which rely on the hrtimer. The __perf_event_overflow() + * is invoked from the hrtimer handler for these 2 events. + * Avoid to call event_stop()->hrtimer_cancel() for these + * 2 events since hrtimer_cancel() waits for the hrtimer + * handler to finish, which would trigger a deadlock. + * Only disabling the events is enough to stop the hrtimer. + * See perf_swevent_cancel_hrtimer(). + */ + if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK && + event->attr.config != PERF_COUNT_SW_TASK_CLOCK) + event->pmu->stop(event, 0);

This is broken though; you cannot test config without first knowing which PMU you're dealing with.

Ah, yes. Just ignore this.

...

Also, that timer really should get stopped, we can't know for certain this overflow is of the timer itself or not, it could be a related event.

Something like the below might do -- but please carefully consider the cases where hrtimer_try_to_cancel() might fail; in those cases we'll have set HES_STOPPED and the hrtimer callback *SHOULD* observe this and NORESTART.

But I didn't check all the details.

The only reason that hrtimer_try_to_cancel() could fail is that the hrtimer callback is currently executing, so current change should be fine.

...

diff --git a/kernel/events/core.c b/kernel/events/core.c index 820127536e62..a91481d57841 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11756,7 +11756,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer) event = container_of(hrtimer, struct perf_event, hw.hrtimer);

if (event->state != PERF_EVENT_STATE_ACTIVE)
if (event->state != PERF_EVENT_STATE_ACTIVE ||
   event->hw.state & PERF_HES_STOPPED)
return HRTIMER_NORESTART;
event->pmu->read(event); @@ -11810,7 +11811,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event) ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer); local64_set(&hwc->period_left, ktime_to_ns(remaining));
hrtimer_cancel(&hwc->hrtimer);
hrtimer_try_to_cancel(&hwc->hrtimer);
}
} @@ -11854,12 +11855,14 @@ static void cpu_clock_event_update(struct perf_event *event) static void cpu_clock_event_start(struct perf_event *event, int flags) {

event->hw.state = 0; local64_set(&event->hw.prev_count, local_clock()); perf_swevent_start_hrtimer(event);

} static void cpu_clock_event_stop(struct perf_event *event, int flags) {

event->hw.state = PERF_HES_STOPPED; perf_swevent_cancel_hrtimer(event); if (flags & PERF_EF_UPDATE) cpu_clock_event_update(event);

Besides cpu-clock, task-clock should need similar change as well. I would post a complete change later.

days inactive

days old

linux-stable-mirror@lists.linaro.org

7 comments

participants

tags (0)

participants (3)

Mi, Dapeng
Octavia Togami
Peter Zijlstra