Hi Todd and others,
If we have a multi-package system, where we have multiple instances of struct
policy (per package), currently we can't have multiple instances of same
governor. i.e. We can't have multiple instances of Interactive governor for
multiple packages.
This is a bottleneck for multicluster system, where we want different packages
to use Interactive governor, but with different tunables.
---------x------------x---------
Recently, I have upstreamed this support in 3.10-rc1 for cpufreq core, Ondemand
and Conservative governor. Now is an attempt for Interactive Governor.
I didn't had any clue on what kernel to rebase my patches over as I couldn't
find a 3.10-rc based branch in your tree and so based it on
experimental/android-3.9.
So, this is what this patchset does:
- Backports some important patches from v3.10-rc1/2 to v3.9: First 8 patches
- Added few more supportive patches which might go in rc3: Next 4 patches
- Finally updated Interactive governor: Last 4 patches
So, Review is probably required only for last 4 patches. The last patch is a bit
long, it is mostly rearrangement of the code rather then major update. It is
based on the patchset which I wrote for Ondemand/Conservative governor.
This has been tested on ARM big LITTLE platform which has multiple packages
requiring separate tunables.
Nathan Zimmer (1):
cpufreq: Convert the cpufreq_driver_lock to a rwlock
Stratos Karafotis (1):
cpufreq: governors: Calculate iowait time only when necessary
Viresh Kumar (14):
cpufreq: Add per policy governor-init/exit infrastructure
cpufreq: governor: Implement per policy instances of governors
cpufreq: Call __cpufreq_governor() with correct policy->cpus mask
cpufreq: Don't call __cpufreq_governor() for drivers without target()
cpufreq: governors: Fix CPUFREQ_GOV_POLICY_{INIT|EXIT} notifiers
cpufreq: Issue CPUFREQ_GOV_POLICY_EXIT notifier before dropping
policy refcount
cpufreq: Add EXPORT_SYMBOL_GPL for have_governor_per_policy
cpufreq: governors: Move get_governor_parent_kobj() to cpufreq.c
cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT
cpufreq: Move get_cpu_idle_time() to cpufreq.c
cpufreq: interactive: Use generic get_cpu_idle_time() from cpufreq.c
cpufreq: interactive: Remove unnecessary cpu_online() check
cpufreq: interactive: Move definition of cpufreq_gov_interactive
downwards
cpufreq: Interactive: Implement per policy instances of governor
drivers/cpufreq/cpufreq.c | 157 ++++++--
drivers/cpufreq/cpufreq_conservative.c | 195 ++++++----
drivers/cpufreq/cpufreq_governor.c | 273 +++++++-------
drivers/cpufreq/cpufreq_governor.h | 120 +++++-
drivers/cpufreq/cpufreq_interactive.c | 663 +++++++++++++++++++--------------
drivers/cpufreq/cpufreq_ondemand.c | 274 ++++++++------
include/linux/cpufreq.h | 19 +-
7 files changed, 1043 insertions(+), 658 deletions(-)
--
1.7.12.rc2.18.g61b472e
Hi Peter/Ingo,
This set contains few more minor fixes that I could find for code responsible
for creating sched domains. They are rebased of my earlier fixes:
Part 1:
https://lkml.org/lkml/2013/6/4/253
Part 2:
https://lkml.org/lkml/2013/6/10/141
They should be applied in this order to avoid conflicts.
My study of "How scheduling domains are created" is almost over now and so
probably this is my last patchset for fixes related to scheduling domains.
Sorry for three separate sets, I sent them as soon as I had few of them sitting
in my tree.
Viresh Kumar (3):
sched: Use cached value of span instead of calling
sched_domain_span()
sched: don't call get_group() for covered cpus
sched: remove WARN_ON(!sd) from init_sched_groups_power()
kernel/sched/core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--
1.7.12.rc2.18.g61b472e
Good day Jon,
Please include the included patch in your tree. It is a fix for [1].
Thanks,
Mathieu.
[1]. https://bugs.launchpad.net/linaro-big-little-system/+bug/1097213
-------- Original Message --------
Subject: Re: Update on LP1097213
Date: Mon, 17 Jun 2013 16:31:47 +0100
From: Morten Rasmussen <morten.rasmussen(a)arm.com>
To: Mathieu Poirier <mathieu.poirier(a)linaro.org>
CC: Vincent Guittot <vincent.guittot(a)linaro.org>, Serge Broslavsky
<serge.broslavsky(a)linaro.org>, Amit Kucheria <amit.kucheria(a)linaro.org>,
Nicolas Pitre <nicolas.pitre(a)linaro.org>, Naresh Kamboju
<naresh.kamboju(a)linaro.org>
Hi Mathieu,
I had a quick look at the hmp_next_{up,down}_delay() stuff. It is all
introduced in the patch: "sched: SCHED_HMP multi-domain task migration
control". Reverting it requires some manual conflict fixing and you will
also need to remove the extra hmp_next_down_delay() added by a later patch.
I've attached a revert patch for debugging purposes that should do it all.
I'm not sure if this will just remove the symptom or if the sched_clock
accesses are the true cause of the problem.
I hope it helps,
Morten
On 17/06/13 14:26, Vincent Guittot wrote:
> Mathieu,
>
> Please find below the mail we have discussed during the call
>
> Vincent
>
> On 14 June 2013 15:21, Vincent Guittot <vincent.guittot(a)linaro.org> wrote:
>> On 14 June 2013 15:14, Vincent Guittot <vincent.guittot(a)linaro.org> wrote:
>>> On 14 June 2013 14:39, Mathieu Poirier <mathieu.poirier(a)linaro.org> wrote:
>>>> Anything on this ?!? Morten, Vincent ?
>>>
>>> Hi Mathieu,
>>>
>>> I haven't noticed that the problem can be reproduced on a snowball,
>>> the 1st time i read your email.
>>> It's means that the hmp specific function are also called on smp system ?
>>>
>>> I'm going to look more ddeplyin the code
>>>
>>
>> for_each_online_cpu is used in hmp_force_up_migration but it's not
>> protected against hotplug so it can used a cpu that is going to be
>> unplugged
>>
>> We should probably protect the sequence with get/put_online_cpus
>>
>> Vincent
>>
>>> Vincent
>>>
>>>>
>>>> On 13-06-12 03:13 PM, Mathieu Poirier wrote:
>>>>> Good day gents,
>>>>>
>>>>> I have been working on [1] for a while now, on and off as time
>>>>> permitted. The problem has always been very elusive but definitely
>>>>> present. As some of the notes in the bug report indicate TC2 wasn't the
>>>>> only ARM system I could reproduce this on - snowball suffered from the
>>>>> exact same problem.
>>>>>
>>>>> I started looking at this again for 3.10 and I have good and bad news.
>>>>>
>>>>> The good news is that I can't reproduce the problem anymore if
>>>>> CONFIG_SCHED_HMP is not enabled. I ran the attached script for more
>>>>> than 16 hours without even the hint of a problem. Normally one would
>>>>> get a crash [2] in less than a minute. I won't go so far as claiming
>>>>> that upstream solved the problem. Maybe we are lucky and timing in 3.10
>>>>> simply doesn't allow for the fault to occur. In any case, all we can do
>>>>> is continue monitoring the situation in upcoming versions.
>>>>>
>>>>> On the flip side we have a definite problem with hotplug when
>>>>> CONFIG_SCHED_HMP is defined. The crash in [2] is consistent and can be
>>>>> reproduced at will. Looking at the trace the problem happens in
>>>>> 'select_task_rq_fair' where calls to 'hmp_next_up_delay' and
>>>>> 'hmp_next_down_delay' end up referencing 'cfs_rq_clock_task' where
>>>>> cfs-rq->rq point to a bogus address.
>>>>>
>>>>> Have a look at line 9 in [2] - this is a little bit of instrumentation I
>>>>> started working on. It basically outputs the new and previous CPUs in
>>>>> 'hmp_[up,down]_migration' conditional statements along with the
>>>>> direction of the migration [3]. In every instances the system was going
>>>>> from the A15 to the A7 cluster. I haven't found a single instance where
>>>>> the opposite was be true.
>>>>>
>>>>> Since this is directly related to our efforts to make the scheduler
>>>>> power aware and based on Ingo's latest rebuttal, I am not sure that it
>>>>> wise for me to continue working on this - specifically if we end up
>>>>> scrapping that portion of the code. I'm eager to hear your opinion.
>>>>>
>>>>> On the flip side it highlights (once again) that we need to invest
>>>>> massively in the hotplug subsystem, more specifically in its relation to
>>>>> the scheduler and the RCU subsystem.
>>>>>
>>>>> Mathieu.
>>>>>
>>>>> PS. I have purposely kept the audience to a minimum - forward as you
>>>>> see fit.
>>>>>
>>>>> [1]. https://bugs.launchpad.net/linaro-big-little-system/+bug/1188778
>>>>> [2]. https://pastebin.linaro.org/view/0751c84b
>>>>> [3]. https://pastebin.linaro.org/view/4491ee27
>>>>>
>>>>
>
-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy
the information in any medium. Thank you.
Peter/Ingo,
These are minor fixes that I could find for code responsible for creating sched
domains. They are rebased of my earlier fixes:
https://lkml.org/lkml/2013/6/4/253
I couldn't find them in linux-next or tip/master and so giving this link.
Viresh Kumar (3):
sched: don't initialize alloc_state in build_sched_domains
sched: don't sd->child to NULL when it is already NULL
sched: Create for_each_sd_topology()
kernel/sched/core.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
--
1.7.12.rc2.18.g61b472e
Hi Rafael,
Recently Arnd sent few fixes where drivers were using APIs from freq_table.c but
haven't selected CPU_FREQ_TABLE. Based on that, I just crossed checked all the
places where it should be selected and where it shouldn't be. These are fixes
around that.
I have applied these in my cpufreq-kconfig-fixes branch. Will send you a pull
request separately once I get some Acks (will wait for few days).
Viresh Kumar (11):
cpufreq: blackfin: enable driver for CONFIG_BFIN_CPU_FREQ
cpufreq: cris: select CPU_FREQ_TABLE
cpufreq: davinci: select CPU_FREQ_TABLE
cpufreq: exynos: select CPU_FREQ_TABLE
cpufreq: highbank: remove select CPU_FREQ_TABLE
cpufreq: imx: select CPU_FREQ_TABLE
cpufreq: powerpc: CBE_RAS: select CPU_FREQ_TABLE
cpufreq: pxa: select CPU_FREQ_TABLE
cpufreq: S3C2416/S3C64XX: select CPU_FREQ_TABLE
cpufreq: tegra: select CPU_FREQ_TABLE for ARCH_TEGRA
cpufreq: X86_AMD_FREQ_SENSITIVITY: select CPU_FREQ_TABLE
arch/arm/mach-davinci/Kconfig | 1 +
arch/arm/mach-pxa/Kconfig | 3 +++
arch/arm/mach-tegra/Kconfig | 4 +---
arch/cris/Kconfig | 2 ++
drivers/cpufreq/Kconfig.arm | 6 +++++-
drivers/cpufreq/Kconfig.powerpc | 1 +
drivers/cpufreq/Kconfig.x86 | 1 +
drivers/cpufreq/Makefile | 2 +-
8 files changed, 15 insertions(+), 5 deletions(-)
--
1.7.12.rc2.18.g61b472e
== Linus Walleij linusw ==
=== Highlights ===
* Merged the runtime PM pinctrl states device core
container patch into the pinctrl tree. Now discussing
the OMAP "active" state with affected maintainers.
* Olof J pulled all 5 ux500 branches for v3.11
* Iterated the Integrator/AP pull request after it was
discovered that it broke on ATAG build. Mea culpa.
Hopefully the fixed version get pulled.
* Sent a pull request for the Integrator PCI DT patch
series to ARM SoC.
* Sent fixes on top of the U300 Device Tree and
multiplatform branch to address the last review
comments by utilizing regmap/syscon and attempt
to move board power into the regulator driver. If we
can sort this out I can line up a
pull request.
* Merged pinctrl patches for sparser GPIO ranges
i.e. where pinctrl GPIO ranges are not entirely
linear. Christian Ruppert needed this and it enables
us to proceed with the Intel Bay Trail as a pinctrl
driver.
* Reviewed lots of pinctrl code. Qeueued some
pinctrl patches.
* Adviced on how pinctrl works to LKML newbies.
* Involved in some Allwinner reviews.
=== Plans ===
* I have a ux500-defconfig branch, that will be
submitted later, turning on this and some more new
stuff that will hit the v3.11 merge window. Maybe this
need to come after v3.11-rc1.
* Finalize U300 DT+multiplatform patch set and send
a pull request for it.
* Start to delete Integrator board files and convert to
multiplatform once the PCI DT patches land in ARM
SoC.
* Convert Nomadik pinctrl driver to register GPIO ranges
from the gpiochip side.
* Test the PL08x patches on the Ericsson Research
PB11MPCore and submit platform data for using
pl08x DMA on that platform.
=== Issues ===
* Subsystem maintainers in the kernel community are
acting like Judge Dredd on DT review and commit issues,
as noted last week.
* Some impediments from internal turmoil @ST-Ericsson.
Thanks,
Linus Walleij
=== Highlights ===
* Cleaned/fixed up and sent out volatile ranges (v8) patchset to lkml
* Sent re-factored ION patchset to Rebecca, Arnd, Jesse and Serban
* Updated linaro.android kernel to the AOSP 3.10-rc5 base branch
* Discussed Android's adoption of memcg pressure notifications w/ AntonV
and Android devs.
* Thomas merged my current 3.11 queue into -tip
* Worked with Zoran on his mmc wakeup_source patch
* Tried to sort out vfat ioctl issues w/ Android devs, so we can get
something upstream.
* Discussed & reviewed a number of community time/rtc patches on lkml
* Implemented a new alarmtimer test for my timekeeping testsuite
* Worked out some details on LCE Android Graphics Upstreaming session
* More work on Plumbers Android MiniConf (& got another yes from an
Android dev!)
* Reviewed blueprints and sent out weekly status mail
* Attended LSK android patch discussion
* Attended Linaro internal patch review discussion
=== Plans ===
* Try to get Anton's ulmkd updated to use upstreamed memcg mempressure
notifier
* Re-integrate noswap purging into vrange patchset
* Update refactored ion patches to include changes from the AOSP
3.10-rc5 branch
* Sort out the rest of my 3.11 queue and send to Thomas
* Still have to do some blueprint breaking up for Jakub
=== Issues ===
* N/A