Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers

8 Dec 2015


      On Tuesday, December 08, 2015 12:26:22 PM Viresh Kumar wrote:
...
On 07-12-15, 23:43, Rafael J. Wysocki wrote:
...
On Monday, December 07, 2015 01:20:27 PM Viresh Kumar wrote:
...
...
At this point we might end up decrementing skip_work from
gov_cancel_work() and then cancel the work which we haven't queued
yet. And the end result will be that the work is still queued while
gov_cancel_work() has finished.
I'm not quite sure how that can happen.
I will describe that towards the end of this email.
...
There is a bug in this code snippet, but it may cause us to fail to queue
the work at all, so the incrementation and the check need to be done
under the spinlock.
What bug ?
Well, if the timer function runs on all CPUs at the same time, they all
can see skip_work > 1 and none of them will queue the work.
...
...
...
And we have to keep the atomic operation, as well as queue_work()
within the lock.
Putting queue_work() under the lock doesn't prevent any races from happening,
Then I am not able to think about it properly, but I will at least
present my case here :)
...
because only one of the CPUs can execute that part of the function anyway.
...
...
queue_work(system_wq, &shared->work);


and the remaining incrementation and decrementation of skip_work are replaced
with the corresponding atomic operations, it still should work, no?
Well, no, the above wouldn't work.
But what about something like this instead:
if (atomic_inc_return(&shared->skip_work) > 1)
   	atomic_dec(&shared->skip_work);
   else
   	queue_work(system_wq, &shared->work);
(plus the changes requisite replacements in the other places)?
Only one CPU can see the result of the atomic_inc_return() as 1 and this is the
only one that will queue up the work item, unless I'm missing anything super
subtle.
Looks like you are talking about the race between different timer
handlers, which race against queuing the work. Sorry if you are not.
But I am not talking about that thing..
Suppose queue_work() isn't done within the spin lock.
CPU0                                            CPU1
cpufreq_governor_stop()                         dbs_timer_handler()
-> gov_cancel_work()                            -> lock
                                                -> shared->skip_work++, as skip_work was 0. //skip_work=1
                                                -> unlock
   -> lock
   -> shared->skip_work++; //skip_work=2
   -> unlock
   -> cancel_work_sync(&shared->work);
                                                -> queue_work();
   -> gov_cancel_timers(shared->policy);
   -> shared->skip_work = 0;
                                                dbs_work_handler();
And according to how I understand it, we are screwed up at this point.
And its the same old bug which I fixed recently (which we hacked up by
using gov-lock earlier).
You are right, I've overlooked that race (but then it is rather easy to
overlook).
Thanks,
Rafael

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers