On Thu, Apr 24, 2014 at 9:33 AM, Ashwin Chaugule ashwin.chaugule@linaro.org wrote:
On 18 April 2014 11:21, Ashwin Chaugule ashwin.chaugule@linaro.org wrote:
Hi Mark,
On 17 April 2014 15:50, Mark Rutland mark.rutland@arm.com wrote:
On Thu, Apr 17, 2014 at 08:15:46PM +0100, Ashwin Chaugule wrote:
PSCIv0.2 adds a new function called AFFINITY_INFO, which can be used to query if a specified CPU has actually gone offline. Calling this function via cpu_kill ensures that a CPU has quiesced after a call to cpu_die.
[...]
We can race with the dying CPU here -- if we call AFFINITY_INFO before the dying cpu is sufficiently far through its CPU_OFF call it won't register as OFF.
Could we poll here instead (with a reasonable limit on the number of iterations)? That would enable us to not spuriously declare a CPU to be dead when it happened to take slightly longer than we expect to turn off.
True. How about something like this?
int __ref psci_cpu_kill(unsigned int cpu) {
int err;
int err, retries; if (!psci_ops.affinity_info) return 1;
/*
* cpu_kill could race with cpu_die and we can
* potentially end up declaring this cpu undead
* while it is dying. So retry a couple of times.
*/
+retry: err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
if (err != PSCI_AFFINITY_INFO_RET_OFF) {
if (++retries < 3) {
pr_info("Retrying check for CPU kill: %d\n", retries);
goto retry;
} pr_err("psci: Cannot kill CPU:%d, psci ret val: %d\n", cpu, err); /* Make platform_cpu_kill() fail. */
Hi Rob, I've already got your Reviewed-by on this patch without this "retry" thing. Are you okay with this as well? I can then roll it up in one patch.
Yes. My only comment is I would perhaps add a sleep (or delay if this context cannot sleep) on the retry. I'm not sure what I reasonable time would be, but at least then you are waiting a defined amount of time versus how long it takes this code to execute.
Rob