HI all, After I upgraded kernel to AOSP with following commit: commit bfc525947c5686df850efb39c81aae3eb6a62ac3 Author: Ke Wang ke.wang@spreadtrum.com Date: Mon Jan 21 13:41:45 2019 +0800
ANDROID: sched/walt: Fix lockdep assert issue
commit c8d50e061e38 ("ANDROID: DEBUG: Temporarily disable lockdep asserting on update_task_ravg") is a temporary commit to disable the lockdep assert in walt_update_task_ravg(). The root cause is that there are two paths enetering here without holding the rq lock in the pure scheduler: one is move_queued_task(), another is detach_task().
Now fix this by making sure the rq lock is held at the two paths listed above as it did in android4.4.
I met hard lockup issue because cpu is stalled in spin lock (seem rq_lock) with following stack when doing the hotplug stress test,:
[ 4488.191067] c5 40 (migration/5) [<ffffff80080ef3fc>] move_queued_task+0x124/0x240 [ 4488.191079] c5 40 (migration/5) [<ffffff80080ef7f8>] __migrate_task+0xa0/0xe0 [ 4488.191090] c5 40 (migration/5) [<ffffff80080f0898>] migration_cpu_stop+0x104/0x114 [ 4488.191103] c5 40 (migration/5) [<ffffff800817c02c>] cpu_stopper_thread+0xbc/0x160 [ 4488.191117] c5 40 (migration/5) [<ffffff80080e55b4>] smpboot_thread_fn+0x1f0/0x280 [ 4488.191127] c5 40 (migration/5) [<ffffff80080e05f0>] kthread+0x10c/0x138 [ 4488.191139] c5 40 (migration/5) [<ffffff8008085780>] ret_from_fork+0x10/0x18
static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf, struct task_struct *p, int new_cpu) { struct rq *new_rq = cpu_rq(new_cpu);
lockdep_assert_held(&rq->lock);
p->on_rq = TASK_ON_RQ_MIGRATING; dequeue_task(rq, p, DEQUEUE_NOCLOCK); rq_unpin_lock(rq, rf); double_lock_balance(rq, new_rq); set_task_cpu(p, new_cpu); double_unlock_balance(rq, new_rq); raw_spin_unlock(&rq->lock);
rq = cpu_rq(new_cpu);
rq_lock(rq, rf); BUG_ON(task_cpu(p) != new_cpu); enqueue_task(rq, p, 0); p->on_rq = TASK_ON_RQ_QUEUED; check_preempt_curr(rq, p, 0);
return rq; }
Did Anyone meet same issue as me (or this is a known issue) with this latest kernel version? Is it possible the new merged patch “bfc525947c5686df850efb39c81aae3eb6a62ac3” result in this issue? (now that I didn’t meet this issue with same test case before).
Thanks GangWu
Hi GangWu,
On 25/02/2019 06:24, Wu Gang(吴刚) wrote:
HI all, After I upgraded kernel to AOSP with following commit: commit bfc525947c5686df850efb39c81aae3eb6a62ac3 Author: Ke Wang ke.wang@spreadtrum.com Date: Mon Jan 21 13:41:45 2019 +0800
ANDROID: sched/walt: Fix lockdep assert issue commit c8d50e061e38 ("ANDROID: DEBUG: Temporarily disable lockdep asserting on update_task_ravg") is a temporary commit to disable the lockdep assert in walt_update_task_ravg(). The root cause is that there are two paths enetering here without holding the rq lock in the pure scheduler: one is move_queued_task(), another is detach_task(). Now fix this by making sure the rq lock is held at the two paths listed above as it did in android4.4.
I met hard lockup issue because cpu is stalled in spin lock (seem rq_lock) with following stack when doing the hotplug stress test,:
[ 4488.191067] c5 40 (migration/5) [<ffffff80080ef3fc>] move_queued_task+0x124/0x240 [ 4488.191079] c5 40 (migration/5) [<ffffff80080ef7f8>] __migrate_task+0xa0/0xe0 [ 4488.191090] c5 40 (migration/5) [<ffffff80080f0898>] migration_cpu_stop+0x104/0x114 [ 4488.191103] c5 40 (migration/5) [<ffffff800817c02c>] cpu_stopper_thread+0xbc/0x160 [ 4488.191117] c5 40 (migration/5) [<ffffff80080e55b4>] smpboot_thread_fn+0x1f0/0x280 [ 4488.191127] c5 40 (migration/5) [<ffffff80080e05f0>] kthread+0x10c/0x138 [ 4488.191139] c5 40 (migration/5) [<ffffff8008085780>] ret_from_fork+0x10/0x18
static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf, struct task_struct *p, int new_cpu) { struct rq *new_rq = cpu_rq(new_cpu);
lockdep_assert_held(&rq->lock); p->on_rq = TASK_ON_RQ_MIGRATING; dequeue_task(rq, p, DEQUEUE_NOCLOCK); rq_unpin_lock(rq, rf); double_lock_balance(rq, new_rq); set_task_cpu(p, new_cpu); double_unlock_balance(rq, new_rq); raw_spin_unlock(&rq->lock); rq = cpu_rq(new_cpu); rq_lock(rq, rf); BUG_ON(task_cpu(p) != new_cpu); enqueue_task(rq, p, 0); p->on_rq = TASK_ON_RQ_QUEUED; check_preempt_curr(rq, p, 0); return rq;
}
Did Anyone meet same issue as me (or this is a known issue) with this latest kernel version? Is it possible the new merged patch “bfc525947c5686df850efb39c81aae3eb6a62ac3” result in this issue? (now that I didn’t meet this issue with same test case before).
Thanks for this. We do see it as well but didn't yet fix it.
There is a fix in progress which I will review today ( https://android-review.googlesource.com/c/kernel/common/+/911233 ) but in the mean time please revert bfc525947c5686d in your local branch so as not to hit this lockup.
Best Regards,
Chris
Hi Chris, Thanks for your information. It is very helpful. I will try the "in review" fixing patch mentioned. BTW, I think we should meet a lot of WALT warning/bug issues if revert commit bfc525947c5686d on the other hand.
Thanks GangWu
-----Original Message----- From: eas-dev [mailto:eas-dev-bounces@lists.linaro.org] On Behalf Of Chris Redpath Sent: 2019年2月25日 20:07 To: Wu Gang(吴刚); eas-dev@lists.linaro.org Subject: Re: [Eas-dev] one stall issue in migration_cpu_stop? Thanks!
Hi GangWu,
On 25/02/2019 06:24, Wu Gang(吴刚) wrote:
HI all, After I upgraded kernel to AOSP with following commit: commit bfc525947c5686df850efb39c81aae3eb6a62ac3 Author: Ke Wang ke.wang@spreadtrum.com Date: Mon Jan 21 13:41:45 2019 +0800
ANDROID: sched/walt: Fix lockdep assert issue commit c8d50e061e38 ("ANDROID: DEBUG: Temporarily disable lockdep asserting on update_task_ravg") is a temporary commit to disable the lockdep assert in walt_update_task_ravg(). The root cause is that there are two paths enetering here without holding the rq lock in the pure scheduler: one is move_queued_task(), another is detach_task(). Now fix this by making sure the rq lock is held at the two paths listed above as it did in android4.4.
I met hard lockup issue because cpu is stalled in spin lock (seem rq_lock) with following stack when doing the hotplug stress test,:
[ 4488.191067] c5 40 (migration/5) [<ffffff80080ef3fc>] move_queued_task+0x124/0x240 [ 4488.191079] c5 40 (migration/5) [<ffffff80080ef7f8>] __migrate_task+0xa0/0xe0 [ 4488.191090] c5 40 (migration/5) [<ffffff80080f0898>] migration_cpu_stop+0x104/0x114 [ 4488.191103] c5 40 (migration/5) [<ffffff800817c02c>] cpu_stopper_thread+0xbc/0x160 [ 4488.191117] c5 40 (migration/5) [<ffffff80080e55b4>] smpboot_thread_fn+0x1f0/0x280 [ 4488.191127] c5 40 (migration/5) [<ffffff80080e05f0>] kthread+0x10c/0x138 [ 4488.191139] c5 40 (migration/5) [<ffffff8008085780>] ret_from_fork+0x10/0x18
static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf, struct task_struct *p, int new_cpu) { struct rq *new_rq = cpu_rq(new_cpu);
lockdep_assert_held(&rq->lock); p->on_rq = TASK_ON_RQ_MIGRATING; dequeue_task(rq, p, DEQUEUE_NOCLOCK); rq_unpin_lock(rq, rf); double_lock_balance(rq, new_rq); set_task_cpu(p, new_cpu); double_unlock_balance(rq, new_rq); raw_spin_unlock(&rq->lock); rq = cpu_rq(new_cpu); rq_lock(rq, rf); BUG_ON(task_cpu(p) != new_cpu); enqueue_task(rq, p, 0); p->on_rq = TASK_ON_RQ_QUEUED; check_preempt_curr(rq, p, 0); return rq;
}
Did Anyone meet same issue as me (or this is a known issue) with this latest kernel version? Is it possible the new merged patch “bfc525947c5686df850efb39c81aae3eb6a62ac3” result in this issue? (now that I didn’t meet this issue with same test case before).
Thanks for this. We do see it as well but didn't yet fix it.
There is a fix in progress which I will review today ( https://android-review.googlesource.com/c/kernel/common/+/911233 ) but in the mean time please revert bfc525947c5686d in your local branch so as not to hit this lockup.
Best Regards,
Chris
_______________________________________________ eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev
Hi GangWu,
On 26/02/2019 02:43, Wu Gang(吴刚) wrote:
Hi Chris, Thanks for your information. It is very helpful. I will try the "in review" fixing patch mentioned. BTW, I think we should meet a lot of WALT warning/bug issues if revert commit bfc525947c5686d on the other hand.
The fix from Ke Wang is now merged - what it does is replace an unconditional unlock of the original rq with a double_rq_unlock call which is able to cope in the case where both rq pointers are the same. This means that the rqs should now be locked every time we call walt_fixup_busy_time.
Please let me know if this does not solve your issue!
Best Regards,
Chris
Thanks GangWu
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Chris, Thanks! I have applied this fixing patch (yesterday before you merged) and tried in our platform. Didn't found this issue by now. This fix works. Best Regards GangWu
-----Original Message----- From: Chris Redpath [mailto:Chris.Redpath@arm.com] Sent: 2019年2月26日 18:58 To: Wu Gang(吴刚); eas-dev@lists.linaro.org; ke.wang@spreadtrum.com Subject: Re: [Eas-dev] one stall issue in migration_cpu_stop? Thanks!
Hi GangWu,
On 26/02/2019 02:43, Wu Gang(吴刚) wrote:
Hi Chris, Thanks for your information. It is very helpful. I will try the "in review" fixing patch mentioned. BTW, I think we should meet a lot of WALT warning/bug issues if revert commit bfc525947c5686d on the other hand.
The fix from Ke Wang is now merged - what it does is replace an unconditional unlock of the original rq with a double_rq_unlock call which is able to cope in the case where both rq pointers are the same. This means that the rqs should now be locked every time we call walt_fixup_busy_time.
Please let me know if this does not solve your issue!
Best Regards,
Chris
Thanks GangWu
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.