Good day Jon,
Please include the included patch in your tree. It is a fix for [1].
Those scheduler patches get into linux-linaro from Viresh's big.LITTLE
MP branch, so I think fixes should really be applied there.
--
Tixy
> Thanks,
> Mathieu.
>
> [1].
https://bugs.launchpad.net/linaro-big-little-system/+bug/1097213
>
>
> -------- Original Message --------
> Subject: Re: Update on LP1097213
> Date: Mon, 17 Jun 2013 16:31:47 +0100
> From: Morten Rasmussen
morten.rasmussen@arm.com
> To: Mathieu Poirier
mathieu.poirier@linaro.org
> CC: Vincent Guittot
vincent.guittot@linaro.org, Serge Broslavsky
>
serge.broslavsky@linaro.org, Amit Kucheria
amit.kucheria@linaro.org,
> Nicolas Pitre
nicolas.pitre@linaro.org, Naresh Kamboju
>
naresh.kamboju@linaro.org
>
> Hi Mathieu,
>
> I had a quick look at the hmp_next_{up,down}_delay() stuff. It is all
> introduced in the patch: "sched: SCHED_HMP multi-domain task migration
> control". Reverting it requires some manual conflict fixing and you will
> also need to remove the extra hmp_next_down_delay() added by a later patch.
>
> I've attached a revert patch for debugging purposes that should do it all.
>
> I'm not sure if this will just remove the symptom or if the sched_clock
> accesses are the true cause of the problem.
>
> I hope it helps,
> Morten
>
> On 17/06/13 14:26, Vincent Guittot wrote:
> > Mathieu,
> >
> > Please find below the mail we have discussed during the call
> >
> > Vincent
> >
> > On 14 June 2013 15:21, Vincent Guittot
vincent.guittot@linaro.org wrote:
> >> On 14 June 2013 15:14, Vincent Guittot
vincent.guittot@linaro.org wrote:
> >>> On 14 June 2013 14:39, Mathieu Poirier
mathieu.poirier@linaro.org wrote:
> >>>> Anything on this ?!? Morten, Vincent ?
> >>>
> >>> Hi Mathieu,
> >>>
> >>> I haven't noticed that the problem can be reproduced on a snowball,
> >>> the 1st time i read your email.
> >>> It's means that the hmp specific function are also called on smp system ?
> >>>
> >>> I'm going to look more ddeplyin the code
> >>>
> >>
> >> for_each_online_cpu is used in hmp_force_up_migration but it's not
> >> protected against hotplug so it can used a cpu that is going to be
> >> unplugged
> >>
> >> We should probably protect the sequence with get/put_online_cpus
> >>
> >> Vincent
> >>
> >>> Vincent
> >>>
> >>>>
> >>>> On 13-06-12 03:13 PM, Mathieu Poirier wrote:
> >>>>> Good day gents,
> >>>>>
> >>>>> I have been working on [1] for a while now, on and off as time
> >>>>> permitted. The problem has always been very elusive but definitely
> >>>>> present. As some of the notes in the bug report indicate TC2 wasn't the
> >>>>> only ARM system I could reproduce this on - snowball suffered from the
> >>>>> exact same problem.
> >>>>>
> >>>>> I started looking at this again for 3.10 and I have good and bad news.
> >>>>>
> >>>>> The good news is that I can't reproduce the problem anymore if
> >>>>> CONFIG_SCHED_HMP is not enabled. I ran the attached script for more
> >>>>> than 16 hours without even the hint of a problem. Normally one would
> >>>>> get a crash [2] in less than a minute. I won't go so far as claiming
> >>>>> that upstream solved the problem. Maybe we are lucky and timing in 3.10
> >>>>> simply doesn't allow for the fault to occur. In any case, all we can do
> >>>>> is continue monitoring the situation in upcoming versions.
> >>>>>
> >>>>> On the flip side we have a definite problem with hotplug when
> >>>>> CONFIG_SCHED_HMP is defined. The crash in [2] is consistent and can be
> >>>>> reproduced at will. Looking at the trace the problem happens in
> >>>>> 'select_task_rq_fair' where calls to 'hmp_next_up_delay' and
> >>>>> 'hmp_next_down_delay' end up referencing 'cfs_rq_clock_task' where
> >>>>> cfs-rq->rq point to a bogus address.
> >>>>>
> >>>>> Have a look at line 9 in [2] - this is a little bit of instrumentation I
> >>>>> started working on. It basically outputs the new and previous CPUs in
> >>>>> 'hmp_[up,down]_migration' conditional statements along with the
> >>>>> direction of the migration [3]. In every instances the system was going
> >>>>> from the A15 to the A7 cluster. I haven't found a single instance where
> >>>>> the opposite was be true.
> >>>>>
> >>>>> Since this is directly related to our efforts to make the scheduler
> >>>>> power aware and based on Ingo's latest rebuttal, I am not sure that it
> >>>>> wise for me to continue working on this - specifically if we end up
> >>>>> scrapping that portion of the code. I'm eager to hear your opinion.
> >>>>>
> >>>>> On the flip side it highlights (once again) that we need to invest
> >>>>> massively in the hotplug subsystem, more specifically in its relation to
> >>>>> the scheduler and the RCU subsystem.
> >>>>>
> >>>>> Mathieu.
> >>>>>
> >>>>> PS. I have purposely kept the audience to a minimum - forward as you
> >>>>> see fit.
> >>>>>
> >>>>> [1].
https://bugs.launchpad.net/linaro-big-little-system/+bug/1188778
> >>>>> [2].
https://pastebin.linaro.org/view/0751c84b
> >>>>> [3].
https://pastebin.linaro.org/view/4491ee27
> >>>>>
> >>>>
> >
>
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy
> the information in any medium. Thank you.
>
>