Re: [Eas-dev] (no subject)

22 Sep 2016


      Hi Patrick,
On 2016-09-22 03:10, Patrick Bellasi wrote:
...
Subject: Re: [Eas-dev] [RFC PATCH v1 0/3] sched: Introduce Window
Assisted
 Load Tracking
Reply-To:
In-Reply-To: 7a94b493-178a-e2ed-a39d-66a7105f566a@arm.com
On 16-Sep 19:09, Dietmar Eggemann wrote:
...
On 03/09/16 00:27, markivx@codeaurora.org wrote:
...
This patch series implements an alternative window assisted load tracking
[...]
...
To me this issue seems something related to the one fixed
by this Todd's patch:
https://android.googlesource.com/kernel/common/+/ab1b90f03a063f4ef9899835e9d...
We noticed an issue while working on AOSP v3.18 but it is potentially
still present in mainline kernels since the implementation of the
locking functions has not been updated.
Here is how Todd described a possible race condition:
Thanks for the review. I've convinced myself that getting to
   move_queued_task() with the two cpus being the same is possible (but
   probably rare) since there are races between normal scheduler
   migration and the forced migration via the cpu_migration_thread. If
   the thread migrates naturally from the src to the dest and does it
   after the last check in __migrate_task, we get into this case. This
   can happen since we drop the rq lock during double_lock_balance
   allowing a migration behind our back while we are re-acquiring the
rq
   lock.
Thanks for bringing this up.
It is a rare event but we have noticed rqs being the same happening (and
it
should happen in mainline as well). The most usual case is due to the
code
in sched_exec which we have a non-ideal fix for internally [1]. The
other way
to fix this would be to use double_rq_unlock instead of the
double_unlock_balance + raw_spin_unlock in move_queued_task, suggested
by Rameez. However this will still need the double_lock_balance and we
will still be passing in the same value for the RQs.
Btw, the check in __migrate_task is only checking that task_cpu(p) !=
src_cpu,
not for src_cpu == dest_cpu so it won't help anyway. I don't think we
have seen
any other path other than sched_exc hitting this problem. We might want
to
bring up the potential of migration code uselessly running in these
cases
upstream.
...
Perhaps a better solution for WALT should be to use the
double_rq_(un)lock()
primitives instead of the double_(un)lock_balance() ones. Which also
makes the
code more aligned with the locking APIs already used in core
scheduler.
The _balance_ ones are currently necessary simply because we need both
and don't
know which one is acquired by _this_ cpu. But, since the consensus from
the other
threads is that we should avoid double locking entirely, we'll work on
getting
rid of it for RFC V2.
[1] -
https://source.codeaurora.org/quic/la/kernel/msm-3.18/commit/kernel/sched/co...
Thanks,
Vikram

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] (no subject)