Hi Chris, Thanks for your information. It is very helpful. I will try the "in review" fixing patch mentioned. BTW, I think we should meet a lot of WALT warning/bug issues if revert commit bfc525947c5686d on the other hand.
Thanks GangWu
-----Original Message----- From: eas-dev [mailto:eas-dev-bounces@lists.linaro.org] On Behalf Of Chris Redpath Sent: 2019年2月25日 20:07 To: Wu Gang(吴刚); eas-dev@lists.linaro.org Subject: Re: [Eas-dev] one stall issue in migration_cpu_stop? Thanks!
Hi GangWu,
On 25/02/2019 06:24, Wu Gang(吴刚) wrote:
HI all, After I upgraded kernel to AOSP with following commit: commit bfc525947c5686df850efb39c81aae3eb6a62ac3 Author: Ke Wang ke.wang@spreadtrum.com Date: Mon Jan 21 13:41:45 2019 +0800
ANDROID: sched/walt: Fix lockdep assert issue commit c8d50e061e38 ("ANDROID: DEBUG: Temporarily disable lockdep asserting on update_task_ravg") is a temporary commit to disable the lockdep assert in walt_update_task_ravg(). The root cause is that there are two paths enetering here without holding the rq lock in the pure scheduler: one is move_queued_task(), another is detach_task(). Now fix this by making sure the rq lock is held at the two paths listed above as it did in android4.4.
I met hard lockup issue because cpu is stalled in spin lock (seem rq_lock) with following stack when doing the hotplug stress test,:
[ 4488.191067] c5 40 (migration/5) [<ffffff80080ef3fc>] move_queued_task+0x124/0x240 [ 4488.191079] c5 40 (migration/5) [<ffffff80080ef7f8>] __migrate_task+0xa0/0xe0 [ 4488.191090] c5 40 (migration/5) [<ffffff80080f0898>] migration_cpu_stop+0x104/0x114 [ 4488.191103] c5 40 (migration/5) [<ffffff800817c02c>] cpu_stopper_thread+0xbc/0x160 [ 4488.191117] c5 40 (migration/5) [<ffffff80080e55b4>] smpboot_thread_fn+0x1f0/0x280 [ 4488.191127] c5 40 (migration/5) [<ffffff80080e05f0>] kthread+0x10c/0x138 [ 4488.191139] c5 40 (migration/5) [<ffffff8008085780>] ret_from_fork+0x10/0x18
static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf, struct task_struct *p, int new_cpu) { struct rq *new_rq = cpu_rq(new_cpu);
lockdep_assert_held(&rq->lock); p->on_rq = TASK_ON_RQ_MIGRATING; dequeue_task(rq, p, DEQUEUE_NOCLOCK); rq_unpin_lock(rq, rf); double_lock_balance(rq, new_rq); set_task_cpu(p, new_cpu); double_unlock_balance(rq, new_rq); raw_spin_unlock(&rq->lock); rq = cpu_rq(new_cpu); rq_lock(rq, rf); BUG_ON(task_cpu(p) != new_cpu); enqueue_task(rq, p, 0); p->on_rq = TASK_ON_RQ_QUEUED; check_preempt_curr(rq, p, 0); return rq;
}
Did Anyone meet same issue as me (or this is a known issue) with this latest kernel version? Is it possible the new merged patch “bfc525947c5686df850efb39c81aae3eb6a62ac3” result in this issue? (now that I didn’t meet this issue with same test case before).
Thanks for this. We do see it as well but didn't yet fix it.
There is a fix in progress which I will review today ( https://android-review.googlesource.com/c/kernel/common/+/911233 ) but in the mean time please revert bfc525947c5686d in your local branch so as not to hit this lockup.
Best Regards,
Chris
_______________________________________________ eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev