Hi GangWu,
On 25/02/2019 06:24, Wu Gang(吴刚) wrote:
HI all, After I upgraded kernel to AOSP with following commit: commit bfc525947c5686df850efb39c81aae3eb6a62ac3 Author: Ke Wang ke.wang@spreadtrum.com Date: Mon Jan 21 13:41:45 2019 +0800
ANDROID: sched/walt: Fix lockdep assert issue commit c8d50e061e38 ("ANDROID: DEBUG: Temporarily disable lockdep asserting on update_task_ravg") is a temporary commit to disable the lockdep assert in walt_update_task_ravg(). The root cause is that there are two paths enetering here without holding the rq lock in the pure scheduler: one is move_queued_task(), another is detach_task(). Now fix this by making sure the rq lock is held at the two paths listed above as it did in android4.4.I met hard lockup issue because cpu is stalled in spin lock (seem rq_lock) with following stack when doing the hotplug stress test,:
[ 4488.191067] c5 40 (migration/5) [<ffffff80080ef3fc>] move_queued_task+0x124/0x240 [ 4488.191079] c5 40 (migration/5) [<ffffff80080ef7f8>] __migrate_task+0xa0/0xe0 [ 4488.191090] c5 40 (migration/5) [<ffffff80080f0898>] migration_cpu_stop+0x104/0x114 [ 4488.191103] c5 40 (migration/5) [<ffffff800817c02c>] cpu_stopper_thread+0xbc/0x160 [ 4488.191117] c5 40 (migration/5) [<ffffff80080e55b4>] smpboot_thread_fn+0x1f0/0x280 [ 4488.191127] c5 40 (migration/5) [<ffffff80080e05f0>] kthread+0x10c/0x138 [ 4488.191139] c5 40 (migration/5) [<ffffff8008085780>] ret_from_fork+0x10/0x18
static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf, struct task_struct *p, int new_cpu) { struct rq *new_rq = cpu_rq(new_cpu);
lockdep_assert_held(&rq->lock); p->on_rq = TASK_ON_RQ_MIGRATING; dequeue_task(rq, p, DEQUEUE_NOCLOCK); rq_unpin_lock(rq, rf); double_lock_balance(rq, new_rq); set_task_cpu(p, new_cpu); double_unlock_balance(rq, new_rq); raw_spin_unlock(&rq->lock); rq = cpu_rq(new_cpu); rq_lock(rq, rf); BUG_ON(task_cpu(p) != new_cpu); enqueue_task(rq, p, 0); p->on_rq = TASK_ON_RQ_QUEUED; check_preempt_curr(rq, p, 0); return rq;}
Did Anyone meet same issue as me (or this is a known issue) with this latest kernel version? Is it possible the new merged patch “bfc525947c5686df850efb39c81aae3eb6a62ac3” result in this issue? (now that I didn’t meet this issue with same test case before).
Thanks for this. We do see it as well but didn't yet fix it.
There is a fix in progress which I will review today ( https://android-review.googlesource.com/c/kernel/common/+/911233 ) but in the mean time please revert bfc525947c5686d in your local branch so as not to hit this lockup.
Best Regards,
Chris