Hello,
On 11/15/2016 08:55 AM, Morten Rasmussen wrote:
On Tue, Nov 15, 2016 at 03:22:59PM +0000, Patrick Bellasi wrote:
On 15-Nov 22:05, Leo Yan wrote:
But this code is delibrately written for WALT to update souce rq and destination rq statistics for workload. So currently I can simply revert double_lock_balance()/double_unlock_balance() for only using PELT signals, but for WALT I want to get some suggestion for the fixing, if we confirm this is a potential issue, this issue should exist both on Android common kernel 3.18 and 4.4.
AFAIK, in the (simplified) version of WALT posted on LKML before LPC [2] Vikram got ride of all the double loking. If the problem you reported is verified, than we should probably try to update AOSP WALT using the same locking schema used by the LKML posting.
[1] http://www2.comp.ufscar.br/lxr/source/Documentation/RCU/RTFP.txt#L176 [2] https://lkml.org/lkml/2016/10/28/84
Cheers Patrick
/*
- detach_task() -- detach the task for the migration specified in env
*/ static void detach_task(struct task_struct *p, struct lb_env *env) { lockdep_assert_held(&env->src_rq->lock);
deactivate_task(env->src_rq, p, 0); p->on_rq = TASK_ON_RQ_MIGRATING; double_lock_balance(env->src_rq, env->dst_rq); set_task_cpu(p, env->dst_cpu); double_unlock_balance(env->src_rq, env->dst_rq);
}
I'm a bit worried that this unlocking business as part of the double-locking in detach_task() is not safe.
If the code looks anything like mainline in detach_tasks() (notice the s), I don't think it is safe.
I do agree that the double_lock_balance is fragile in that new code - such as Leo's - may (rightfully) never consider that the rq lock is being dropped. The LKML implementation that Patrick pointed out does indeed eliminate double locking and should be quite easy to port to WALT on android-common/LSK. I don't think there's a good intermediate solution otherwise (sorry Leo!). We should move over to that implementation wherever WALT is being used with EAS for upcoming development to at the very least not have to think about the lock being dropped.
For 3.18/4.4, the code paths involved were audited when the double-locking was added back when we first upgraded to 3.18 in 2014, and given the sheer number of hours of commercial testing, I daresay that it has been safe so far to drop the rq lock from a stability perspective with the HMP scheduler and with the EAS scheduler (I think because the lock is dropped after the task is dequeued in all cases), neither of which mess with these LB code paths to an extent that would mask a potential problem. So android/common kernel-3.18/kernel-4.4 should be safe for now.
Morten IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev