[Eas-dev] one stall issue in migration_cpu_stop? Thanks!

25 Feb 2019


      HI all,
After I upgraded kernel to AOSP with following commit:
commit bfc525947c5686df850efb39c81aae3eb6a62ac3
Author: Ke Wang ke.wang@spreadtrum.com
Date:   Mon Jan 21 13:41:45 2019 +0800
ANDROID: sched/walt: Fix lockdep assert issue
commit c8d50e061e38 ("ANDROID: DEBUG: Temporarily disable lockdep
    asserting on update_task_ravg") is a temporary commit to disable the
    lockdep assert in walt_update_task_ravg(). The root cause is that there
    are two paths enetering here without holding the rq lock in the pure
    scheduler: one is move_queued_task(), another is detach_task().
Now fix this by making sure the rq lock is held at the two paths listed
    above as it did in android4.4.
I met hard lockup issue because cpu is stalled in spin lock (seem rq_lock) with following stack when doing the hotplug stress test,:
[ 4488.191067] c5 40 (migration/5) [<ffffff80080ef3fc>] move_queued_task+0x124/0x240
[ 4488.191079] c5 40 (migration/5) [<ffffff80080ef7f8>] __migrate_task+0xa0/0xe0
[ 4488.191090] c5 40 (migration/5) [<ffffff80080f0898>] migration_cpu_stop+0x104/0x114
[ 4488.191103] c5 40 (migration/5) [<ffffff800817c02c>] cpu_stopper_thread+0xbc/0x160
[ 4488.191117] c5 40 (migration/5) [<ffffff80080e55b4>] smpboot_thread_fn+0x1f0/0x280
[ 4488.191127] c5 40 (migration/5) [<ffffff80080e05f0>] kthread+0x10c/0x138
[ 4488.191139] c5 40 (migration/5) [<ffffff8008085780>] ret_from_fork+0x10/0x18
static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
                   struct task_struct *p, int new_cpu)
{
    struct rq *new_rq = cpu_rq(new_cpu);
lockdep_assert_held(&rq->lock);
p->on_rq = TASK_ON_RQ_MIGRATING;
    dequeue_task(rq, p, DEQUEUE_NOCLOCK);
    rq_unpin_lock(rq, rf);
    double_lock_balance(rq, new_rq);
    set_task_cpu(p, new_cpu);
    double_unlock_balance(rq, new_rq);
    raw_spin_unlock(&rq->lock);
rq = cpu_rq(new_cpu);
rq_lock(rq, rf);
    BUG_ON(task_cpu(p) != new_cpu);
    enqueue_task(rq, p, 0);
    p->on_rq = TASK_ON_RQ_QUEUED;
    check_preempt_curr(rq, p, 0);
return rq;
}
Did Anyone meet same issue as me (or this is a known issue) with this latest kernel version? Is it possible the new merged patch “bfc525947c5686df850efb39c81aae3eb6a62ac3” result in this issue? (now that I didn’t meet this issue with same test case before).
Thanks
GangWu

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

[Eas-dev] one stall issue in migration_cpu_stop? Thanks!