HI all,
After I upgraded kernel to AOSP with following commit:
commit bfc525947c5686df850efb39c81aae3eb6a62ac3
Author: Ke Wang <ke.wang(a)spreadtrum.com>
Date: Mon Jan 21 13:41:45 2019 +0800
ANDROID: sched/walt: Fix lockdep assert issue
commit c8d50e061e38 ("ANDROID: DEBUG: Temporarily disable lockdep
asserting on update_task_ravg") is a temporary commit to disable the
lockdep assert in walt_update_task_ravg(). The root cause is that there
are two paths enetering here without holding the rq lock in the pure
scheduler: one is move_queued_task(), another is detach_task().
Now fix this by making sure the rq lock is held at the two paths listed
above as it did in android4.4.
I met hard lockup issue because cpu is stalled in spin lock (seem rq_lock) with following stack when doing the hotplug stress test,:
[ 4488.191067] c5 40 (migration/5) [<ffffff80080ef3fc>] move_queued_task+0x124/0x240
[ 4488.191079] c5 40 (migration/5) [<ffffff80080ef7f8>] __migrate_task+0xa0/0xe0
[ 4488.191090] c5 40 (migration/5) [<ffffff80080f0898>] migration_cpu_stop+0x104/0x114
[ 4488.191103] c5 40 (migration/5) [<ffffff800817c02c>] cpu_stopper_thread+0xbc/0x160
[ 4488.191117] c5 40 (migration/5) [<ffffff80080e55b4>] smpboot_thread_fn+0x1f0/0x280
[ 4488.191127] c5 40 (migration/5) [<ffffff80080e05f0>] kthread+0x10c/0x138
[ 4488.191139] c5 40 (migration/5) [<ffffff8008085780>] ret_from_fork+0x10/0x18
static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
struct task_struct *p, int new_cpu)
{
struct rq *new_rq = cpu_rq(new_cpu);
lockdep_assert_held(&rq->lock);
p->on_rq = TASK_ON_RQ_MIGRATING;
dequeue_task(rq, p, DEQUEUE_NOCLOCK);
rq_unpin_lock(rq, rf);
double_lock_balance(rq, new_rq);
set_task_cpu(p, new_cpu);
double_unlock_balance(rq, new_rq);
raw_spin_unlock(&rq->lock);
rq = cpu_rq(new_cpu);
rq_lock(rq, rf);
BUG_ON(task_cpu(p) != new_cpu);
enqueue_task(rq, p, 0);
p->on_rq = TASK_ON_RQ_QUEUED;
check_preempt_curr(rq, p, 0);
return rq;
}
Did Anyone meet same issue as me (or this is a known issue) with this latest kernel version? Is it possible the new merged patch “bfc525947c5686df850efb39c81aae3eb6a62ac3” result in this issue? (now that I didn’t meet this issue with same test case before).
Thanks
GangWu
is it possible that lock& unlock (for rq->lock) isn’t called with pair (for example, the unlock is called more) for some corner case as:
rq’s cpu and dest_cpu is the same one in API:
__migrate_task(struct rq *rq, struct rq_flags *rf,
struct task_struct *p, int dest_cpu)
or my any other mistake?
Appreciated for any suggestion.
Thanks
GangWu
From: Wu Gang(吴刚)
Sent: 2019年2月25日 14:25
To: eas-dev(a)lists.linaro.org
Subject: one stall issue in migration_cpu_stop? Thanks!
HI all,
After I upgraded kernel to AOSP with following commit:
commit bfc525947c5686df850efb39c81aae3eb6a62ac3
Author: Ke Wang <ke.wang(a)spreadtrum.com>
Date: Mon Jan 21 13:41:45 2019 +0800
ANDROID: sched/walt: Fix lockdep assert issue
commit c8d50e061e38 ("ANDROID: DEBUG: Temporarily disable lockdep
asserting on update_task_ravg") is a temporary commit to disable the
lockdep assert in walt_update_task_ravg(). The root cause is that there
are two paths enetering here without holding the rq lock in the pure
scheduler: one is move_queued_task(), another is detach_task().
Now fix this by making sure the rq lock is held at the two paths listed
above as it did in android4.4.
I met hard lockup issue because cpu is stalled in spin lock (seem rq_lock) with following stack when doing the hotplug stress test,:
[ 4488.191067] c5 40 (migration/5) [<ffffff80080ef3fc>] move_queued_task+0x124/0x240
[ 4488.191079] c5 40 (migration/5) [<ffffff80080ef7f8>] __migrate_task+0xa0/0xe0
[ 4488.191090] c5 40 (migration/5) [<ffffff80080f0898>] migration_cpu_stop+0x104/0x114
[ 4488.191103] c5 40 (migration/5) [<ffffff800817c02c>] cpu_stopper_thread+0xbc/0x160
[ 4488.191117] c5 40 (migration/5) [<ffffff80080e55b4>] smpboot_thread_fn+0x1f0/0x280
[ 4488.191127] c5 40 (migration/5) [<ffffff80080e05f0>] kthread+0x10c/0x138
[ 4488.191139] c5 40 (migration/5) [<ffffff8008085780>] ret_from_fork+0x10/0x18
static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
struct task_struct *p, int new_cpu)
{
struct rq *new_rq = cpu_rq(new_cpu);
lockdep_assert_held(&rq->lock);
p->on_rq = TASK_ON_RQ_MIGRATING;
dequeue_task(rq, p, DEQUEUE_NOCLOCK);
rq_unpin_lock(rq, rf);
double_lock_balance(rq, new_rq);
set_task_cpu(p, new_cpu);
double_unlock_balance(rq, new_rq);
raw_spin_unlock(&rq->lock);
rq = cpu_rq(new_cpu);
rq_lock(rq, rf);
BUG_ON(task_cpu(p) != new_cpu);
enqueue_task(rq, p, 0);
p->on_rq = TASK_ON_RQ_QUEUED;
check_preempt_curr(rq, p, 0);
return rq;
}
Did Anyone meet same issue as me (or this is a known issue) with this latest kernel version? Is it possible the new merged patch “bfc525947c5686df850efb39c81aae3eb6a62ac3” result in this issue? (now that I didn’t meet this issue with same test case before).
Thanks
GangWu
Hi,
Can I send you a Price of one of our Database Sellers based on your
requirement?
Kindly just share your requirements by filling in the below table:
Industries : -----------------------
Job Titles : -----------------------
Geography : ----------------------
I'll come up with the data counts, costs & few sample contacts for your
review.
Regards,
Katherine Allison
Business Development