Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_power (this patchset)
-tasks packing algorithm
SMT system is no more the only system that can have a CPUs with an original
capacity that is different from the default value. We need to extend the use of
cpu_power_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_power_orig/power_orig) and the current capacity
(cpu_power/power) of CPUs and sched_groups. A new function arch_scale_cpu_power
has been created and replace arch_scale_smt_power, which is SMT specifc in the
computation of the capapcity of a CPU.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
This assumption generates wrong decision by creating ghost cores and by
removing real ones when the original capacity of CPUs is different from the
default SCHED_POWER_SCALE. We don't try anymore to evaluate the number of
available cores based on the group_capacity but instead we detect when the group
is fully utilized
Now that we have the original capacity of CPUS and their activity/utilization,
we can evaluate more accuratly the capacity and the level of utilization of a
group of CPUs.
This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we could certainly take advantage of this
new statistic in several other places of the load balance.
Tests results:
I have put below results of 3 kind of tests:
- hackbench -l 500 -s 4096
- scp of 100MB file on the platform
- ebizzy with various number of threads
on 3 kernel
tip = tip/sched/core
patch = tip + this patchset
patch+irq = tip + this patchset + irq accounting
each test has been run 6 times and the figure below show the stdev and the
diff compared to the tip kernel
Dual cortex A7 tip | patch | patch+irq
stdev | diff stdev | diff stdev
hackbench (+/-)1.02% | +0.42%(+/-)1.29% | -0.65%(+/-)0.44%
scp (+/-)0.41% | +0.18%(+/-)0.10% | +78.05%(+/-)0.70%
ebizzy -t 1 (+/-)1.72% | +1.43%(+/-)1.62% | +2.58%(+/-)2.11%
ebizzy -t 2 (+/-)0.42% | +0.06%(+/-)0.45% | +1.45%(+/-)4.05%
ebizzy -t 4 (+/-)0.73% | +8.39%(+/-)13.25% | +4.25%(+/-)10.08%
ebizzy -t 6 (+/-)10.30% | +2.19%(+/-)3.59% | +0.58%(+/-)1.80%
ebizzy -t 8 (+/-)1.45% | -0.05%(+/-)2.18% | +2.53%(+/-)3.40%
ebizzy -t 10 (+/-)3.78% | -2.71%(+/-)2.79% | -3.16%(+/-)3.06%
ebizzy -t 12 (+/-)3.21% | +1.13%(+/-)2.02% | -1.13%(+/-)4.43%
ebizzy -t 14 (+/-)2.05% | +0.15%(+/-)3.47% | -2.08%(+/-)1.40%
uad cortex A15 tip | patch | patch+irq
stdev | diff stdev | diff stdev
hackbench (+/-)0.55% | -0.58%(+/-)0.90% | +0.62%(+/-)0.43%
scp (+/-)0.21% | -0.10%(+/-)0.10% | +5.70%(+/-)0.53%
ebizzy -t 1 (+/-)0.42% | -0.58%(+/-)0.48% | -0.29%(+/-)0.18%
ebizzy -t 2 (+/-)0.52% | -0.83%(+/-)0.20% | -2.07%(+/-)0.35%
ebizzy -t 4 (+/-)0.22% | -1.39%(+/-)0.49% | -1.78%(+/-)0.67%
ebizzy -t 6 (+/-)0.44% | -0.78%(+/-)0.15% | -1.79%(+/-)1.10%
ebizzy -t 8 (+/-)0.43% | +0.13%(+/-)0.92% | -0.17%(+/-)0.67%
ebizzy -t 10 (+/-)0.71% | +0.10%(+/-)0.93% | -0.36%(+/-)0.77%
ebizzy -t 12 (+/-)0.65% | -1.07%(+/-)1.13% | -1.13%(+/-)0.70%
ebizzy -t 14 (+/-)0.92% | -0.28%(+/-)1.25% | +2.84%(+/-)9.33%
I haven't been able to fully test the patchset for a SMT system to check that
the regression that has been reported by Preethi has been solved but the
various tests that i have done, don't show any regression so far.
The correction of SD_PREFER_SIBLING mode and the use of the latter at SMT level
should have fix the regression.
Change since V2:
- rebase on top of capacity renaming
- fix wake_affine statistic update
- rework nohz_kick_needed
- optimize the active migration of a task from CPU with reduced capacity
- rename group_activity by group_utilization and remove unused total_utilization
- repair SD_PREFER_SIBLING and use it for SMT level
- reorder patchset to gather patches with same topics
Change since V1:
- add 3 fixes
- correct some commit messages
- replace capacity computation by activity
- take into account current cpu capacity
[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377
Vincent Guittot (12):
sched: fix imbalance flag reset
sched: remove a wake_affine condition
sched: fix avg_load computation
sched: Allow all archs to set the power_orig
ARM: topology: use new cpu_power interface
sched: add per rq cpu_power_orig
sched: test the cpu's capacity in wake affine
sched: move cfs task on a CPU with higher capacity
Revert "sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED"
sched: get CPU's utilization statistic
sched: replace capacity_factor by utilization
sched: add SD_PREFER_SIBLING for SMT level
arch/arm/kernel/topology.c | 4 +-
kernel/sched/core.c | 3 +-
kernel/sched/fair.c | 290 +++++++++++++++++++++++----------------------
kernel/sched/sched.h | 5 +-
4 files changed, 158 insertions(+), 144 deletions(-)
--
1.9.1
OPPs can be populated statically, via DT, or added at run time with
dev_pm_opp_add().
While this driver handles the first case correctly, it would fail to populate
OPPs added at runtime. Because call to of_init_opp_table() would fail as there
are no OPPs in DT and probe will return early.
To fix this, remove error checking and call dev_pm_opp_init_cpufreq_table()
unconditionally.
Update bindings as well.
Acked-by: Santosh Shilimkar <santosh.shilimkar(a)ti.com>
Suggested-by: Stephen Boyd <sboyd(a)codeaurora.org>
Tested-by: Stephen Boyd <sboyd(a)codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
Hi Rafael,
This was earlier sent as part of: https://lkml.org/lkml/2014/7/1/358 series. But
actually is an fix and can (should?) be pushed for 3.16.
Rest of the patches from that series are cleanups/updates and so can go in
3.17.
Please see if you can take it for 3.16.
Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt | 6 ++++--
drivers/cpufreq/cpufreq-cpu0.c | 7 ++-----
2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt b/Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt
index f055515..366690c 100644
--- a/Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt
+++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt
@@ -8,10 +8,12 @@ Both required and optional properties listed below must be defined
under node /cpus/cpu@0.
Required properties:
-- operating-points: Refer to Documentation/devicetree/bindings/power/opp.txt
- for details
+- None
Optional properties:
+- operating-points: Refer to Documentation/devicetree/bindings/power/opp.txt for
+ details. OPPs *must* be supplied either via DT, i.e. this property, or
+ populated at runtime.
- clock-latency: Specify the possible maximum transition latency for clock,
in unit of nanoseconds.
- voltage-tolerance: Specify the CPU voltage tolerance in percentage.
diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c
index ee1ae30..86beda9 100644
--- a/drivers/cpufreq/cpufreq-cpu0.c
+++ b/drivers/cpufreq/cpufreq-cpu0.c
@@ -152,11 +152,8 @@ static int cpu0_cpufreq_probe(struct platform_device *pdev)
goto out_put_reg;
}
- ret = of_init_opp_table(cpu_dev);
- if (ret) {
- pr_err("failed to init OPP table: %d\n", ret);
- goto out_put_clk;
- }
+ /* OPPs might be populated at runtime, don't check for error here */
+ of_init_opp_table(cpu_dev);
ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table);
if (ret) {
--
2.0.0.rc2
Hi Rafael,
As discussed here:
https://www.mail-archive.com/stable@vger.kernel.org/msg87645.html
This is another attempt to fix an important bug and cleanups around it.
First patch is the real fix (Accumulated all Reviewed/Tested-by's) and *should*
go in 3.16-rc*.
Others are useful cleanups around it to beautify the code, and can go later in
3.17.
Based over: 3.16-rc4
Viresh Kumar (4):
cpufreq: move policy kobj to policy->cpu at resume
cpufreq: don't restore policy->cpus on failure to move kobj
cpufreq: propagate error returned by kobject_move()
cpufreq: move policy kobj from update_policy_cpu()
drivers/cpufreq/cpufreq.c | 68 +++++++++++++++++++++--------------------------
1 file changed, 31 insertions(+), 37 deletions(-)
--
2.0.0.rc2
Tree/Branch: next-20140717
Git describe: next-20140717
Commit: b395397b3a Add linux-next specific files for 20140717
Build Time: 1 min 21 sec
Passed: 0 / 1 ( 0.00 %)
Failed: 0 / 1 ( 0.00 %)
Unknown: 1 / 1 (100.00 %)
Errors: 1
Warnings: 5
Section Mismatches: 0
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
6 warnings 0 mismatches : arm64-defconfig
-------------------------------------------------------------------------------
Errors summary: 1
1 /home/broonie/build/linux-next/arch/arm64/kernel/ptrace.c:1118:2: error: too many arguments to function ‘audit_syscall_entry’
Warnings Summary: 5
2 /home/broonie/build/linux-next/scripts/sortextable.h:176:3: warning: ‘relocs_size’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1 /home/broonie/build/linux-next/scripts/kconfig/menu.c:590:18: warning: ‘jump’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1 /home/broonie/build/linux-next/fs/direct-io.c:920:9: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1 /home/broonie/build/linux-next/fs/direct-io.c:1034:9: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1 /home/broonie/build/linux-next/arch/arm64/include/asm/pgtable.h:363:50: warning: ‘start’ may be used uninitialized in this function [-Wmaybe-uninitialized]
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
arm64-defconfig : UNKNOWN, 1 errors, 6 warnings, 0 section mismatches
Errors:
/home/broonie/build/linux-next/arch/arm64/kernel/ptrace.c:1118:2: error: too many arguments to function ‘audit_syscall_entry’
Warnings:
/home/broonie/build/linux-next/scripts/kconfig/menu.c:590:18: warning: ‘jump’ may be used uninitialized in this function [-Wmaybe-uninitialized]
/home/broonie/build/linux-next/scripts/sortextable.h:176:3: warning: ‘relocs_size’ may be used uninitialized in this function [-Wmaybe-uninitialized]
/home/broonie/build/linux-next/scripts/sortextable.h:176:3: warning: ‘relocs_size’ may be used uninitialized in this function [-Wmaybe-uninitialized]
/home/broonie/build/linux-next/arch/arm64/include/asm/pgtable.h:363:50: warning: ‘start’ may be used uninitialized in this function [-Wmaybe-uninitialized]
/home/broonie/build/linux-next/fs/direct-io.c:920:9: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized]
/home/broonie/build/linux-next/fs/direct-io.c:1034:9: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized]
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
Tree/Branch: master
Git describe: v3.16-rc5-143-gb6603fe574af
Commit: b6603fe574 Merge tag 'for-linus-20140716' of git://git.infradead.org/linux-mtd
Build Time: 0 min 19 sec
Passed: 0 / 1 ( 0.00 %)
Failed: 0 / 1 ( 0.00 %)
Unknown: 1 / 1 (100.00 %)
Errors: 0
Warnings: 5
Section Mismatches: 0
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
6 warnings 0 mismatches : arm64-defconfig
-------------------------------------------------------------------------------
Warnings Summary: 5
2 /home/broonie/build/linux/scripts/sortextable.h:176:3: warning: ‘relocs_size’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1 /home/broonie/build/linux/scripts/kconfig/menu.c:590:18: warning: ‘jump’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1 /home/broonie/build/linux/fs/direct-io.c:920:9: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1 /home/broonie/build/linux/fs/direct-io.c:1034:9: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1 /home/broonie/build/linux/arch/arm64/include/asm/pgtable.h:363:50: warning: ‘start’ may be used uninitialized in this function [-Wmaybe-uninitialized]
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
arm64-defconfig : UNKNOWN, 0 errors, 6 warnings, 0 section mismatches
Warnings:
/home/broonie/build/linux/scripts/kconfig/menu.c:590:18: warning: ‘jump’ may be used uninitialized in this function [-Wmaybe-uninitialized]
/home/broonie/build/linux/scripts/sortextable.h:176:3: warning: ‘relocs_size’ may be used uninitialized in this function [-Wmaybe-uninitialized]
/home/broonie/build/linux/scripts/sortextable.h:176:3: warning: ‘relocs_size’ may be used uninitialized in this function [-Wmaybe-uninitialized]
/home/broonie/build/linux/arch/arm64/include/asm/pgtable.h:363:50: warning: ‘start’ may be used uninitialized in this function [-Wmaybe-uninitialized]
/home/broonie/build/linux/fs/direct-io.c:920:9: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized]
/home/broonie/build/linux/fs/direct-io.c:1034:9: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized]
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr