On Fri, Jul 20, 2018 at 4:52 AM Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jul 19, 2018 at 12:12:53PM -0700, Cong Wang wrote:
hrtimer_cancel() busy-waits for the hrtimer callback to stop, pretty much like del_timer_sync(). This creates a possible deadlock scenario where we hold a spinlock before calling hrtimer_cancel() while in trying to acquire the same spinlock in the callback.
Has this actually been observed?
Without lockdep annotation, it is not easy to observe.
cpu_clock_event_init(): perf_swevent_init_hrtimer(): hwc->hrtimer.function = perf_swevent_hrtimer;
perf_swevent_hrtimer(): __perf_event_overflow(): __perf_event_account_interrupt(): perf_adjust_period(): pmu->stop(): cpu_clock_event_stop(): perf_swevent_cancel(): hrtimer_cancel()
Please explain how a hrtimer event ever gets to perf_adjust_period(). Last I checked perf_swevent_init_hrtimer() results in attr.freq=0.
Good point.
I thought attr.freq is specified by user-space, but seems perf_swevent_init_hrtimer() clears it purposely and will not change after initialization, interesting...
Getting stuck in an hrtimer is a disaster:
You'll get NMI watchdog splats. Getting stuck in NMI context is far more 'interesting :-)
Yes, I did see a stack trace in perf_swevent_hrtimer() which led me here. But I have to admit among those hundreds of soft lockup's, I only saw one showing swevent hrtimer backtrace.
Previously I thought this is because of NMI handler race, but Jiri pointed out the race doesn't exist.
+#define PERF_EF_NO_WAIT 0x08 /* do not wait when stopping, for
* example, waiting for a timer
*/
That's a broken comment style.
It is picked by checkpatch.pl, not me, I chose a different one and got a complain. :)
Thanks!