On 08-12-15, 14:30, Rafael J. Wysocki wrote:
OK, but instead of relying on the spinlock to wait for the already running
That's the purpose of the spinlock, not a side-effect.
dbs_timer_handler() in gov_cancel_work() (which is really easy to overlook and should at least be mentioned in a comment) we can wait for it explicitly.
I agree, and I will add explicit comment about it.
That is, if the relevant code in gov_cancel_work() is like this:
atomic_inc(&shared->skip_work); gov_cancel_timers(shared->policy); cancel_work_sync(&shared->work); gov_cancel_timers(shared->policy);
Apart from it being *really* ugly (we should know exactly what should be done, it rather looks like hit and try), it is still racy.
atomic_set(&shared->skip_work, 0);
then the work item should not be leaked behind the cancel_work_sync() any more AFAICS.
Suppose queue_work() isn't done within the spin lock.
CPU0 CPU1
cpufreq_governor_stop() dbs_timer_handler() -> gov_cancel_work() -> lock -> shared->skip_work++, as skip_work was 0. //skip_work=1 -> unlock -> lock -> shared->skip_work++; //skip_work=2 -> unlock -> queue_work(); -> gov_cancel_timers(shared->policy); dbs_work_handler(); -> queue-timers again (as we aren't checking skip_work here) -> cancel_work_sync(&shared->work); dbs_timer_handler() -> lock -> shared->skip_work++, as skip_work was 0. //skip_work=1 -> unlock ->queue_work() -> gov_cancel_timers(shared->policy); -> shared->skip_work = 0;
And we have the same situation again. I have thought of all this before I wrote the initial patch, and really tried the ugly double timer-cancel thing. But the current approach is really the right thing to do.
I will send a patch adding the comment.