HI all, After I upgraded kernel to AOSP with following commit: commit bfc525947c5686df850efb39c81aae3eb6a62ac3 Author: Ke Wang ke.wang@spreadtrum.com Date: Mon Jan 21 13:41:45 2019 +0800
ANDROID: sched/walt: Fix lockdep assert issue
commit c8d50e061e38 ("ANDROID: DEBUG: Temporarily disable lockdep asserting on update_task_ravg") is a temporary commit to disable the lockdep assert in walt_update_task_ravg(). The root cause is that there are two paths enetering here without holding the rq lock in the pure scheduler: one is move_queued_task(), another is detach_task().
Now fix this by making sure the rq lock is held at the two paths listed above as it did in android4.4.
I met hard lockup issue because cpu is stalled in spin lock (seem rq_lock) with following stack when doing the hotplug stress test,:
[ 4488.191067] c5 40 (migration/5) [<ffffff80080ef3fc>] move_queued_task+0x124/0x240 [ 4488.191079] c5 40 (migration/5) [<ffffff80080ef7f8>] __migrate_task+0xa0/0xe0 [ 4488.191090] c5 40 (migration/5) [<ffffff80080f0898>] migration_cpu_stop+0x104/0x114 [ 4488.191103] c5 40 (migration/5) [<ffffff800817c02c>] cpu_stopper_thread+0xbc/0x160 [ 4488.191117] c5 40 (migration/5) [<ffffff80080e55b4>] smpboot_thread_fn+0x1f0/0x280 [ 4488.191127] c5 40 (migration/5) [<ffffff80080e05f0>] kthread+0x10c/0x138 [ 4488.191139] c5 40 (migration/5) [<ffffff8008085780>] ret_from_fork+0x10/0x18
static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf, struct task_struct *p, int new_cpu) { struct rq *new_rq = cpu_rq(new_cpu);
lockdep_assert_held(&rq->lock);
p->on_rq = TASK_ON_RQ_MIGRATING; dequeue_task(rq, p, DEQUEUE_NOCLOCK); rq_unpin_lock(rq, rf); double_lock_balance(rq, new_rq); set_task_cpu(p, new_cpu); double_unlock_balance(rq, new_rq); raw_spin_unlock(&rq->lock);
rq = cpu_rq(new_cpu);
rq_lock(rq, rf); BUG_ON(task_cpu(p) != new_cpu); enqueue_task(rq, p, 0); p->on_rq = TASK_ON_RQ_QUEUED; check_preempt_curr(rq, p, 0);
return rq; }
Did Anyone meet same issue as me (or this is a known issue) with this latest kernel version? Is it possible the new merged patch “bfc525947c5686df850efb39c81aae3eb6a62ac3” result in this issue? (now that I didn’t meet this issue with same test case before).
Thanks GangWu