Hi Andy,
Andy Green andy.green@linaro.org writes:
Hi -
Yesterday I studied suspend status on 3.0 kernel for Panda, for mem suspend it's working pretty well in Ubuntu case, desktop is coming back up woken by USB keyboard action, WLAN is workable after reassociation.
However in Android case, the same tree merged with common-android-3.0 to get Androidization is blowing chunks in suspend / resume, entering a loop where it aborts suspend and then tries to suspend again all the time.
I've been hit by the same problem when I was trying to implement hibernation on Panda.
Increasing the debug level in wakelock code shows at least two guys that can make trouble, locks "mmc_delayed_work" and "alarm_rtc".
"mmc_delayed_work" casues wakelock stuff to return -EAGAIN, and "alarm_rtc" seems to timeout as a wakelock, but leave the alarm device in a state where it will abort suspend on -EBUSY.
I took a look in drivers/mmc/core/core.c to see what the wakelock support patches had done there and was a bit surprised.
They have a single wakelock to cover delayed work in there, however there are multiple delayed works possible to be queued, eg delayed disable and delayed detect actions, and although they wrap scheduling the delayed work to also lock the wakelock, they don't wrap cancelling it, eg -->
Yes, they unlock the wakelock when the delayed work is done, in the handler.
So here when preparing for suspend we can cancel the delayed work we presumably arranged wakelock coverage for, without unlocking the wakelock.
When the delayed work was canceled and handled immediately the wakelock should have been unlocked.
The problem is here:
int mmc_pm_notify(struct notifier_block *notify_block, unsigned long mode, void *unused) { ... switch (mode) { case PM_HIBERNATION_PREPARE: case PM_SUSPEND_PREPARE: ... mmc_release_host(host);
Which calls
void mmc_release_host(struct mmc_host *host) { ... mmc_host_lazy_disable(host); ... }
Since omap_hsmmc has non-zero host->disable_delay by default, this will schedule a new delayed_work thus acquire new wakelock.
I tried to fix this by ignore the host->disable_delay if host->rescan_disable is non-zero, which means we are in suspend phase, and it worked.