Hi Chris, Thanks for your information. It is very helpful. I will try the "in review" fixing patch mentioned. BTW, I think we should meet a lot of WALT warning/bug issues if revert commit bfc525947c5686d on the other hand.
Thanks GangWu
-----Original Message----- From: eas-dev [mailto:eas-dev-bounces@lists.linaro.org] On Behalf Of Chris Redpath Sent: 2019年2月25日 20:07 To: Wu Gang(吴刚); eas-dev@lists.linaro.org Subject: Re: [Eas-dev] one stall issue in migration_cpu_stop? Thanks!
Hi GangWu,
On 25/02/2019 06:24, Wu Gang(吴刚) wrote:
HI all, After I upgraded kernel to AOSP with following commit: commit bfc525947c5686df850efb39c81aae3eb6a62ac3 Author: Ke Wang ke.wang@spreadtrum.com Date: Mon Jan 21 13:41:45 2019 +0800
ANDROID: sched/walt: Fix lockdep assert issue commit c8d50e061e38 ("ANDROID: DEBUG: Temporarily disable lockdep asserting on update_task_ravg") is a temporary commit to disable the lockdep assert in walt_update_task_ravg(). The root cause is that there are two paths enetering here without holding the rq lock in the pure scheduler: one is move_queued_task(), another is detach_task(). Now fix this by making sure the rq lock is held at the two paths listed above as it did in android4.4.I met hard lockup issue because cpu is stalled in spin lock (seem rq_lock) with following stack when doing the hotplug stress test,:
[ 4488.191067] c5 40 (migration/5) [<ffffff80080ef3fc>] move_queued_task+0x124/0x240 [ 4488.191079] c5 40 (migration/5) [<ffffff80080ef7f8>] __migrate_task+0xa0/0xe0 [ 4488.191090] c5 40 (migration/5) [<ffffff80080f0898>] migration_cpu_stop+0x104/0x114 [ 4488.191103] c5 40 (migration/5) [<ffffff800817c02c>] cpu_stopper_thread+0xbc/0x160 [ 4488.191117] c5 40 (migration/5) [<ffffff80080e55b4>] smpboot_thread_fn+0x1f0/0x280 [ 4488.191127] c5 40 (migration/5) [<ffffff80080e05f0>] kthread+0x10c/0x138 [ 4488.191139] c5 40 (migration/5) [<ffffff8008085780>] ret_from_fork+0x10/0x18
static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf, struct task_struct *p, int new_cpu) { struct rq *new_rq = cpu_rq(new_cpu);
lockdep_assert_held(&rq->lock); p->on_rq = TASK_ON_RQ_MIGRATING; dequeue_task(rq, p, DEQUEUE_NOCLOCK); rq_unpin_lock(rq, rf); double_lock_balance(rq, new_rq); set_task_cpu(p, new_cpu); double_unlock_balance(rq, new_rq); raw_spin_unlock(&rq->lock); rq = cpu_rq(new_cpu); rq_lock(rq, rf); BUG_ON(task_cpu(p) != new_cpu); enqueue_task(rq, p, 0); p->on_rq = TASK_ON_RQ_QUEUED; check_preempt_curr(rq, p, 0); return rq;}
Did Anyone meet same issue as me (or this is a known issue) with this latest kernel version? Is it possible the new merged patch “bfc525947c5686df850efb39c81aae3eb6a62ac3” result in this issue? (now that I didn’t meet this issue with same test case before).
Thanks for this. We do see it as well but didn't yet fix it.
There is a fix in progress which I will review today ( https://android-review.googlesource.com/c/kernel/common/+/911233 ) but in the mean time please revert bfc525947c5686d in your local branch so as not to hit this lockup.
Best Regards,
Chris
_______________________________________________ eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev