This patchset consolidates several changes in the capacity and the usage
tracking of the CPU. It provides a frequency invariant metric of the usage of
CPUs and generally improves the accuracy of load/usage tracking in the
scheduler. The frequency invariant metric is the foundation required for the
consolidation of cpufreq and implementation of a fully invariant load tracking.
These are currently WIP and require several changes to the load balancer
(including how it will use and interprets load and capacity metrics) and
extensive validation. The frequency invariance is done with
arch_scale_freq_capacity and this patchset doesn't provide the backends of
the function which are architecture dependent.
As discussed at LPC14, Morten and I have consolidated our changes into a single
patchset to make it easier to review and merge.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
This assumption generates wrong decision by creating ghost cores or by
removing real ones when the original capacity of CPUs is different from the
default SCHED_CAPACITY_SCALE. With this patch set, we don't try anymore to
evaluate the number of available cores based on the group_capacity but instead
we evaluate the usage of a group and compare it with its capacity.
This patchset mainly replaces the old capacity_factor method by a new one and
keeps the general policy almost unchanged. These new metrics will be also used
in later patches.
The CPU usage is based on a running time tracking version of the current
implementation of the load average tracking. I also have a version that is
based on the new implementation proposal [1] but I haven't provide the patches
and results as [1] is still under review. I can provide change above [1] to
change how CPU usage is computed and to adapt to new mecanism.
Change since V8
- reorder patches
Change since V7
- add freq invariance for usage tracking
- add freq invariance for scale_rt
- update comments and commits' message
- fix init of utilization_avg_contrib
- fix prefer_sibling
Change since V6
- add group usage tracking
- fix some commits' messages
- minor fix like comments and argument order
Change since V5
- remove patches that have been merged since v5 : patches 01, 02, 03, 04, 05, 07
- update commit log and add more details on the purpose of the patches
- fix/remove useless code with the rebase on patchset [2]
- remove capacity_orig in sched_group_capacity as it is not used
- move code in the right patch
- add some helper function to factorize code
Change since V4
- rebase to manage conflicts with changes in selection of busiest group
Change since V3:
- add usage_avg_contrib statistic which sums the running time of tasks on a rq
- use usage_avg_contrib instead of runnable_avg_sum for cpu_utilization
- fix replacement power by capacity
- update some comments
Change since V2:
- rebase on top of capacity renaming
- fix wake_affine statistic update
- rework nohz_kick_needed
- optimize the active migration of a task from CPU with reduced capacity
- rename group_activity by group_utilization and remove unused total_utilization
- repair SD_PREFER_SIBLING and use it for SMT level
- reorder patchset to gather patches with same topics
Change since V1:
- add 3 fixes
- correct some commit messages
- replace capacity computation by activity
- take into account current cpu capacity
[1] https://lkml.org/lkml/2014/10/10/131
[2] https://lkml.org/lkml/2014/7/25/589
Morten Rasmussen (2):
sched: Track group sched_entity usage contributions
sched: Make sched entity usage tracking scale-invariant
Vincent Guittot (8):
sched: add utilization_avg_contrib
sched: remove frequency scaling from cpu_capacity
sched: make scale_rt invariant with frequency
sched: add per rq cpu_capacity_orig
sched: get CPU's usage statistic
sched: replace capacity_factor by usage
sched: add SD_PREFER_SIBLING for SMT level
sched: move cfs task on a CPU with higher capacity
include/linux/sched.h | 21 ++-
kernel/sched/core.c | 15 +-
kernel/sched/debug.c | 12 +-
kernel/sched/fair.c | 369 ++++++++++++++++++++++++++++++++------------------
kernel/sched/sched.h | 15 +-
5 files changed, 276 insertions(+), 156 deletions(-)
--
1.9.1
Tree/Branch: master
Git describe: v3.18-rc6-22-gb914c5b21302
Commit: b914c5b213 Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux
Build Time: 23 min 15 sec
Passed: 8 / 8 (100.00 %)
Failed: 0 / 8 ( 0.00 %)
Errors: 0
Warnings: 29
Section Mismatches: 0
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
1 warnings 0 mismatches : x86_64-allnoconfig
11 warnings 0 mismatches : arm64-allmodconfig
1 warnings 0 mismatches : arm-allnoconfig
20 warnings 0 mismatches : arm-allmodconfig
1 warnings 0 mismatches : arm64-defconfig
-------------------------------------------------------------------------------
Warnings Summary: 29
3 ../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
2 ../include/linux/dynamic_debug.h:78:3: warning: unsupported argument to '__builtin_return_address'
2 ../drivers/scsi/ips.c:210:2: warning: #warning "This driver has only been tested on the x86/ia64/x86_64 platforms" [-Wcpp]
2 ../drivers/net/ethernet/dec/tulip/winbond-840.c:910:2: warning: #warning Processor architecture undefined [-Wcpp]
1 ../net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
1 ../lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1480 bytes is larger than 1024 bytes [-Wframe-larger-than=]
1 ../include/uapi/linux/swab.h:13:15: warning: integer overflow in expression [-Woverflow]
1 ../include/linux/kernel.h:707:17: warning: comparison of distinct pointer types lacks a cast
1 ../include/linux/dynamic_debug.h:64:16: warning: format '%d' expects argument of type 'int', but argument 4 has type 'long unsigned int' [-Wformat=]
1 ../fs/btrfs/extent_io.c:2166:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/usb/renesas_usbhs/common.c:469:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/usb/gadget/udc/udc-xilinx.c:2136:34: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/staging/vt6655/device_main.c:2993:1: warning: the frame size of 1304 bytes is larger than 1024 bytes [-Wframe-larger-than=]
1 ../drivers/staging/bcm/CmHost.c:1564:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/staging/bcm/CmHost.c:1546:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/staging/bcm/CmHost.c:1503:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:467:46: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:307:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:303:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/net/ethernet/dec/tulip/tulip_core.c:101:2: warning: #warning Processor architecture undefined! [-Wcpp]
1 ../drivers/mtd/chips/cfi_cmdset_0020.c:651:1: warning: the frame size of 1208 bytes is larger than 1024 bytes [-Wframe-larger-than=]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1203:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1198:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1172:31: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1171:33: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/hw/qib/qib_qp.c:44:0: warning: "BITS_PER_PAGE" redefined
1 ../drivers/block/drbd/drbd_bitmap.c:483:0: warning: "BITS_PER_PAGE_MASK" redefined
1 ../arch/arm/mach-cns3xxx/pcie.c:313:1: warning: the frame size of 1080 bytes is larger than 1024 bytes [-Wframe-larger-than=]
1 ....../drivers/block/drbd/drbd_bitmap.c:482:0: warning: "BITS_PER_PAGE" redefined
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
x86_64-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
-------------------------------------------------------------------------------
arm64-allmodconfig : PASS, 0 errors, 11 warnings, 0 section mismatches
Warnings:
....../drivers/block/drbd/drbd_bitmap.c:482:0: warning: "BITS_PER_PAGE" redefined
../drivers/block/drbd/drbd_bitmap.c:483:0: warning: "BITS_PER_PAGE_MASK" redefined
../drivers/infiniband/hw/qib/qib_qp.c:44:0: warning: "BITS_PER_PAGE" redefined
../drivers/net/ethernet/dec/tulip/winbond-840.c:910:2: warning: #warning Processor architecture undefined [-Wcpp]
../drivers/net/ethernet/dec/tulip/tulip_core.c:101:2: warning: #warning Processor architecture undefined! [-Wcpp]
../drivers/staging/bcm/CmHost.c:1503:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/staging/bcm/CmHost.c:1546:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/staging/bcm/CmHost.c:1564:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/scsi/ips.c:210:2: warning: #warning "This driver has only been tested on the x86/ia64/x86_64 platforms" [-Wcpp]
../drivers/usb/gadget/udc/udc-xilinx.c:2136:34: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
../drivers/usb/renesas_usbhs/common.c:469:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
-------------------------------------------------------------------------------
arm-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
-------------------------------------------------------------------------------
arm-allmodconfig : PASS, 0 errors, 20 warnings, 0 section mismatches
Warnings:
../arch/arm/mach-cns3xxx/pcie.c:313:1: warning: the frame size of 1080 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../fs/btrfs/extent_io.c:2166:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1480 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../drivers/infiniband/ulp/iser/iser_verbs.c:1171:33: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/infiniband/ulp/iser/iser_verbs.c:1172:31: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/infiniband/ulp/iser/iser_verbs.c:1198:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/infiniband/ulp/iser/iser_verbs.c:1203:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../include/linux/kernel.h:707:17: warning: comparison of distinct pointer types lacks a cast
../include/linux/dynamic_debug.h:78:3: warning: unsupported argument to '__builtin_return_address'
../include/linux/dynamic_debug.h:78:3: warning: unsupported argument to '__builtin_return_address'
../drivers/mtd/chips/cfi_cmdset_0020.c:651:1: warning: the frame size of 1208 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../include/linux/dynamic_debug.h:64:16: warning: format '%d' expects argument of type 'int', but argument 4 has type 'long unsigned int' [-Wformat=]
../include/uapi/linux/swab.h:13:15: warning: integer overflow in expression [-Woverflow]
../drivers/net/ethernet/dec/tulip/winbond-840.c:910:2: warning: #warning Processor architecture undefined [-Wcpp]
../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:303:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:307:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:467:46: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/scsi/ips.c:210:2: warning: #warning "This driver has only been tested on the x86/ia64/x86_64 platforms" [-Wcpp]
../drivers/staging/vt6655/device_main.c:2993:1: warning: the frame size of 1304 bytes is larger than 1024 bytes [-Wframe-larger-than=]
-------------------------------------------------------------------------------
arm64-defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
arm64-allnoconfig
arm-multi_v7_defconfig
x86_64-defconfig
Giving a warning in case we add duplicate OPPs doesn't workout that great. For
example just playing with cpufreq-dt driver as a module results in this:
$ modprobe cpufreq-dt
$ modprobe -r cpufreq-dt
$ modprobe cpufreq-dt
cpu cpu0: dev_pm_opp_add: duplicate OPPs detected. Existing:
freq: 261819000, volt: 1350000, enabled: 1. New: freq: 261819000, volt: 1350000,
enabled: 1
cpu cpu0: dev_pm_opp_add: duplicate OPPs detected. Existing:
freq: 360000000, volt: 1350000, enabled: 1. New: freq: 360000000, volt: 1350000,
enabled: 1
cpu cpu0: dev_pm_opp_add: duplicate OPPs detected. Existing:
freq: 392728000, volt: 1450000, enabled: 1. New: freq: 392728000, volt: 1450000,
enabled: 1
cpu cpu0: dev_pm_opp_add: duplicate OPPs detected. Existing:
freq: 454737000, volt: 1550000, enabled: 1. New: freq: 454737000, volt: 1550000,
enabled: 1
This happens because we don't destroy OPPs (created during ->init()) while
unloading modules.
Now the question is: Should we destroy these OPPs?
Logically kernel drivers *must* free resources they acquired. But in this
particular case, the OPPs are created using a static list present in device
tree. Destroying and then allocating them again isn't of much benefit. The only
benefit of removing OPPs is to save some space if the driver isn't loaded again.
This has its own complications. OPPs can be created either from DT (static) or
platform code (dynamic). Driver should only remove static OPPs and not the
dynamic ones as they are controlled from platform code. But there is no field in
'struct dev_pm_opp' which has this information to distinguish between different
kind of OPPs.
Because of all this, I wasn't sure if drivers should remove static OPPs during
their removal. And so just fixing the reported issue by issuing a dev_dbg()
instead of dev_warn().
Reported-by: Stefan Wahren <stefan.wahren(a)i2se.com>
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
drivers/base/power/opp.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index 89ced95..490e9db 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -466,9 +466,9 @@ int dev_pm_opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
int ret = opp->available && new_opp->u_volt == opp->u_volt ?
0 : -EEXIST;
- dev_warn(dev, "%s: duplicate OPPs detected. Existing: freq: %lu, volt: %lu, enabled: %d. New: freq: %lu, volt: %lu, enabled: %d\n",
- __func__, opp->rate, opp->u_volt, opp->available,
- new_opp->rate, new_opp->u_volt, new_opp->available);
+ dev_dbg(dev, "%s: duplicate OPPs detected. Existing: freq: %lu, volt: %lu, enabled: %d. New: freq: %lu, volt: %lu, enabled: %d\n",
+ __func__, opp->rate, opp->u_volt, opp->available,
+ new_opp->rate, new_opp->u_volt, new_opp->available);
mutex_unlock(&dev_opp_list_lock);
kfree(new_opp);
return ret;
--
2.0.3.693.g996b0fd
From: "zhichang.yuan" <zhichang.yuan(a)linaro.org>
This patch make the processing of map_mem more common and support more
discrete memory layout cases.
In current map_mem, the processing is based on two hypotheses:
1) no any early page allocations occur before the first PMD or PUD regime
where the kernel image locate is successfully mapped;
2) there are sufficient available pages in the PMD or PUD regime to satisfy
the need of page tables from other memory ranges mapping.
The current SOC or hardware platform designs had not broken this constraint.
But we can make the software more versatile.
In addition, for the 4K page system, to comply with the constraint No.1, the
start address of some memory ranges is forced to align at PMD boundary, it
will make some marginal pages of that ranges are skipped to build the PTE. It
is not reasonable.
This patch will relieve the system from those constraints. You can load the
kernel image in any memory range, the memory range can be small, can start at
non-alignment boundary, and so on.
In this patch, the kernel space mapping will probably scan all memory ranges
twice. In the first scanning, those memory ranges whose size is larger than a
threshold are mapped, then the second scanning will map the smaller memory
ranges. Since the threshold is so small, in most cases, the second scanning is
NULL operation.
The patch is also accessible @
https://git.linaro.org/people/zhichang.yuan/pgalloc.git/shortlog/refs/heads…
Signed-off-by: Zhichang Yuan <zhichang.yuan(a)linaro.org>
---
arch/arm64/include/asm/page.h | 10 ++
arch/arm64/include/asm/pgtable.h | 3 +
arch/arm64/kernel/vmlinux.lds.S | 4 +
arch/arm64/mm/mmu.c | 230 ++++++++++++++++++++++++++++++++------
include/linux/memblock.h | 5 +
5 files changed, 217 insertions(+), 35 deletions(-)
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 22b1623..7c55e11 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -44,6 +44,16 @@
#define SWAPPER_DIR_SIZE (SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
#define IDMAP_DIR_SIZE (SWAPPER_DIR_SIZE)
+/*This macro has strong dependency with BLOCK_SIZE in head.S...*/
+#ifdef CONFIG_ARM64_64K_PAGES
+#define INIT_MAP_PGSZ (PAGE_SIZE)
+/*we prepare one more page for probable memblock space extension*/
+#define PGT_BRK_SIZE ((SWAPPER_PGTABLE_LEVELS) << PAGE_SHIFT)
+#else
+#define INIT_MAP_PGSZ (SECTION_SIZE)
+#define PGT_BRK_SIZE ((SWAPPER_PGTABLE_LEVELS + 1) << PAGE_SHIFT)
+#endif
+
#ifndef __ASSEMBLY__
#include <asm/pgtable-types.h>
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 41a43bf..9f96c6c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -464,6 +464,9 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
+/*define for kernel direct space mapping*/
+extern char pgtbrk_base[], pgtbrk_end[];
+
/*
* Encode and decode a swap entry:
* bits 0-1: present (must be zero)
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index edf8715..ca5b69c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -113,6 +113,10 @@ SECTIONS
swapper_pg_dir = .;
. += SWAPPER_DIR_SIZE;
+ pgtbrk_base = .;
+ . += PGT_BRK_SIZE;
+ pgtbrk_end = .;
+
_end = .;
STABS_DEBUG
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f4f8b50..e56fbc8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -67,6 +67,12 @@ static struct cachepolicy cache_policies[] __initdata = {
};
/*
+ * points to the specific brk regions. Those pages can be allocated for
+ * page tables usage.
+ */
+static unsigned long pgtbrk_sp = (unsigned long)pgtbrk_base;
+
+/*
* These are useful for identifying cache coherency problems by allowing the
* cache or the cache and writebuffer to be turned off. It changes the Normal
* memory caching attributes in the MAIR_EL1 register.
@@ -131,7 +137,18 @@ EXPORT_SYMBOL(phys_mem_access_prot);
static void __init *early_alloc(unsigned long sz)
{
- void *ptr = __va(memblock_alloc(sz, sz));
+ void *ptr;
+
+ if (!(sz & (~PAGE_MASK)) &&
+ pgtbrk_sp + sz <= (unsigned long)pgtbrk_end) {
+ ptr = (void *)pgtbrk_sp;
+ pgtbrk_sp += sz;
+ pr_info("BRK [0x%p, 0x%lx] PGTABLE\n", ptr, pgtbrk_sp);
+
+ } else {
+ ptr = __va(memblock_alloc(sz, sz));
+ }
+
memset(ptr, 0, sz);
return ptr;
}
@@ -287,52 +304,195 @@ void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
addr, addr, size, map_io);
}
+/*
+* In the worst case, mapping one memory range or sub-range will comsume
+* MIN_MAP_INCRSZ pages. To garentee there are sufficient mapped pages
+* for the ranges to be mapped, it is priority to map those range that can
+* supply available pages over this macro value.
+* For 64K, one page less than SWAPPER_PGTABLE_LEVELS;
+* For 4K, SWAPPER_PGTABLE_LEVELS pages
+*/
+#ifdef CONFIG_ARM64_64K_PAGES
+#define MIN_MAP_INCRSZ ((SWAPPER_PGTABLE_LEVELS - 1) << PAGE_SHIFT)
+#else
+#define MIN_MAP_INCRSZ (SWAPPER_PGTABLE_LEVELS << PAGE_SHIFT)
+#endif
+
+static inline void __init map_cont_memseg(phys_addr_t start,
+ phys_addr_t end, phys_addr_t *plimit)
+{
+ create_mapping(start, __phys_to_virt(start), end - start);
+ if (*plimit < end) {
+ *plimit = end;
+ memblock_set_current_limit(end);
+ }
+}
+
+/*
+* This function will map the designated memory range. If successfully do
+* the mapping, will update the current_limit as the maximal mapped address.
+*
+* each mapped memory range should at least supply (SWAPPER_PGTABLE_LEVELS - 1)
+* new mapped pages for the next range. Otherwise, that range should be reserved
+* for delay mapping.
+* The memory range will probably be divided into several sub-ranges.
+* The division will occur at the PMD, PUD boundaries.
+* In the worst case, one sub-range will spend (SWAPPER_PGTABLE_LEVELS - 1)
+* pages as page tables, we firstly map the sub-range that can provide enough
+* pages for the remaining sub-ranges.
+*/
+static size_t __init map_onerng_reverse(phys_addr_t start,
+ phys_addr_t end, phys_addr_t *plimit)
+{
+ phys_addr_t blk_start, blk_end;
+ phys_addr_t delimit = 0;
+
+ blk_start = round_up(start, PMD_SIZE);
+ blk_end = round_down(end, PMD_SIZE);
+
+ /*
+ * first case: start and end are spread in adjacent PMD
+ * second case: start and end are separated by at least one PMD
+ * third case: start and end are in same PMD
+ */
+ if (blk_start == blk_end &&
+ blk_start != start && blk_end != end) {
+ delimit = blk_start;
+ /*blk_start is the minimum, blk_end is the maximum*/
+ if (end - delimit >= delimit - start) {
+ blk_end = end - delimit;
+ blk_start = delimit - start;
+ } else {
+ blk_end = delimit - start;
+ blk_start = end - delimit;
+ }
+ /*both sub-ranges can supply enough pages*/
+ if (blk_start >= MIN_MAP_INCRSZ) {
+ map_cont_memseg(delimit, end, plimit);
+ map_cont_memseg(start, delimit, plimit);
+ } else if (blk_end >= (MIN_MAP_INCRSZ << 1)) {
+ if (blk_end == end - delimit) {
+ map_cont_memseg(delimit, end, plimit);
+ map_cont_memseg(start, delimit, plimit);
+ } else {
+ map_cont_memseg(start, delimit, plimit);
+ map_cont_memseg(delimit, end, plimit);
+ }
+ } else
+ return 0;
+ } else if (blk_start < blk_end) {
+ /*
+ * In one PUD regime, only can mapping the sub-range that has
+ * one non-PMD alignment edge at most. Otherwise, the mapping
+ * will probably consume over MIN_MAP_INCRSZ space.
+ */
+ phys_addr_t pud_start, pud_end;
+
+ pud_end = round_down(blk_end, PUD_SIZE);
+ pud_start = round_up(blk_start, PUD_SIZE);
+ /*first case: [blk_start, blk_end) spread in adjacent PUD */
+ if ((pud_start == pud_end) &&
+ pud_start != blk_start && pud_end != blk_end)
+ delimit = (blk_end > pud_end) ?
+ (blk_end = end, pud_end) : blk_start;
+ else if (pud_start < pud_end)
+ /*spread among multiple PUD*/
+ delimit = (blk_end > pud_end) ?
+ (blk_end = end, pud_end) : pud_start;
+ else {
+ /*
+ * spread in same PUD:
+ * if blk_end aligns to PUD boundary, mapping of
+ * [start,blk_end) should has higher priority.
+ */
+ blk_end = (blk_end & ~PUD_MASK) ? end : blk_end;
+ delimit = ((blk_start & ~PUD_MASK) && !(blk_end & ~PMD_MASK)) ?
+ start : blk_start;
+ }
+ /*adjust the blk_end, try to map a bigger memory range*/
+ if (end - blk_end >= MIN_MAP_INCRSZ)
+ blk_end = end;
+
+ map_cont_memseg(delimit, blk_end, plimit);
+ /*
+ * now, at least one PMD was mapped. sufficient pages is ready
+ * for mapping the remaining sub-ranges.
+ */
+ if (blk_end < end)
+ map_cont_memseg(blk_end, end, plimit);
+ if (start < delimit)
+ map_cont_memseg(start, delimit, plimit);
+ } else {
+ if (end - start < MIN_MAP_INCRSZ)
+ return 0;
+ map_cont_memseg(start, end, plimit);
+ }
+
+ return end - start;
+}
+
+
static void __init map_mem(void)
{
struct memblock_region *reg;
- phys_addr_t limit;
- /*
- * Temporarily limit the memblock range. We need to do this as
- * create_mapping requires puds, pmds and ptes to be allocated from
- * memory addressable from the initial direct kernel mapping.
- *
- * The initial direct kernel mapping, located at swapper_pg_dir, gives
- * us PUD_SIZE (4K pages) or PMD_SIZE (64K pages) memory starting from
- * PHYS_OFFSET (which must be aligned to 2MB as per
- * Documentation/arm64/booting.txt).
- */
- if (IS_ENABLED(CONFIG_ARM64_64K_PAGES))
- limit = PHYS_OFFSET + PMD_SIZE;
- else
- limit = PHYS_OFFSET + PUD_SIZE;
- memblock_set_current_limit(limit);
+ size_t incr;
+ size_t mapped_sz = 0;
+ phys_addr_t limit = 0;
- /* map all the memory banks */
- for_each_memblock(memory, reg) {
- phys_addr_t start = reg->base;
- phys_addr_t end = start + reg->size;
+ phys_addr_t start, end;
- if (start >= end)
+ /*set current_limit as the maximum addr mapped in head.S*/
+ limit = round_up(__pa_symbol(_end), INIT_MAP_PGSZ);
+ memblock_set_current_limit(limit);
+
+ for_each_memblock_reverse(memory, reg) {
+ start = reg->base;
+ end = start + reg->size;
+ /*
+ * the range does not cover even one page is invalid.
+ * wrap-wroud is invalid too.
+ */
+ if (PFN_UP(start) >= PFN_DOWN(end))
break;
-#ifndef CONFIG_ARM64_64K_PAGES
+ incr = map_onerng_reverse(start, end, &limit);
/*
- * For the first memory bank align the start address and
- * current memblock limit to prevent create_mapping() from
- * allocating pte page tables from unmapped memory.
- * When 64K pages are enabled, the pte page table for the
- * first PGDIR_SIZE is already present in swapper_pg_dir.
- */
- if (start < limit)
- start = ALIGN(start, PMD_SIZE);
- if (end < limit) {
- limit = end & PMD_MASK;
- memblock_set_current_limit(limit);
+ * if CONFIG_HAVE_MEMBLOCK_NODE_MAP is support in future,need
+ * to change the input parameter of nid.
+ * incr is Zero means the range is too small that can not map
+ * in this scanning. In avoid to be allocated by memblock APIs,
+ * temporarily reserve this range and set the flag in
+ * memblock.memory for the second scanning.
+ */
+ if (!incr) {
+ memblock_add_range(&memblock.reserved, reg->base,
+ reg->size, NUMA_NO_NODE, reg->flags);
+ memblock_set_region_flags(reg, MEMBLOCK_TMP_UNMAP);
+ } else {
+ mapped_sz += incr;
}
-#endif
+ }
+ /*
+ * The second scanning. Supposed there are large memory ranges,
+ * after the first scanning, those large memory ranges were mapped,
+ * and supply sufficient pages to map the remaining small ranges.
+ */
+ for_each_memblock(memory, reg) {
+ if (!(reg->flags & MEMBLOCK_TMP_UNMAP))
+ continue;
+
+ start = reg->base;
+ end = start + reg->size;
+
+ if (PFN_UP(start) >= PFN_DOWN(end))
+ break;
create_mapping(start, __phys_to_virt(start), end - start);
+ memblock_clear_region_flags(reg, MEMBLOCK_TMP_UNMAP);
+
+ memblock_remove_range(&memblock.reserved, reg->base,
+ reg->size);
}
/* Limit no longer required. */
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e8cc453..4c09f7c 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -22,6 +22,7 @@
/* Definition of memblock flags. */
#define MEMBLOCK_HOTPLUG 0x1 /* hotpluggable region */
+#define MEMBLOCK_TMP_UNMAP 0x2 /* can not be mapped in first scan*/
struct memblock_region {
phys_addr_t base;
@@ -356,6 +357,10 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
region < (memblock.memblock_type.regions + memblock.memblock_type.cnt); \
region++)
+#define for_each_memblock_reverse(memblock_type, region) \
+ for (region = memblock.memblock_type.regions + memblock.memblock_type.cnt - 1; \
+ region >= memblock.memblock_type.regions; \
+ region--)
#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
#define __init_memblock __meminit
--
1.7.9.5
This patch series enables secure computing (system call filtering) on arm64,
and contains related enhancements and bug fixes.
NOTE: This versions contain a workaround against possible BUG_ON() failure
at audit_syscall_exit(), but doesn't contain an extra optimization, as I
submitted for arm, of excluding syscall enter/exit tracing against invalid
system calls due to an issue that I reported in:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/292170.h…
The code was tested on ARMv8 fast model with 64-bit/32-bit userspace using:
* libseccomp v2.1.1 with modifications for arm64, especially its "live"
tests: No.20, 21 and 24.
* modified version of Kees' seccomp test for 'changing/skipping a syscall'
and seccomp() system call
* in-house tests for 'changing/skipping a system call' by tracing with
ptrace(SETREGSET, NT_SYSTEM_CALL) (that is, not via seccomp filter)'
with and without audit tracing.
Changes v7 -> v8:
* changed an interface of changing a syscall number from ptrace(SET_SYSCALL)
to ptrace(SETREGSET, NT_ARM_SYSTEM_CALL) [1/6]
* removed IS_SKILL_SYSCALL macro [2/6]
* clarified comments in syscall_trace_enter() [2/6]
* changed unsigned int to compat_uint_t in compat_siginfo._sigsys [5/6]
* moved to a new calling interface of secure_computing(void) [6/6]
Changes v6 -> v7:
* simplified the condition of checking for user-issued syscall(-1) at
syscall_trace_enter() [2/6]
* defines __NR_seccomp_sigreturn only if arch-specific def doesn't exist.
As Kees suggests, this is necessary for x86 and others. [3/6]
* removed "#ifdef __ARCH_SIGSYS" which is always true on arm64. [5/6]
* changed to call syscall_trace_exit() even if secure_computing fails. [6/6]
In v6, syscall_trace_enter() returns RET_SYSCALL_SKIP_TRACE (== -2) and
skips syscall_trace_exit() to minimize the overhead, but this case can be
easily confused with user-issued (and invalid) syscall(-2).
Anyway, this is now a consistent behavior with arm and other archs.
Changes v5 -> v6:
* rebased to v3.17-rc
* changed the interface of changing/skipping a system call from re-writing
x8 register [v5 1/3] to using dedicated PTRACE_SET_SYSCALL command
[1/6, 2/6]
Patch [1/6] contains a checkpatch error around a switch statement, but it
won't be fixed as in compat_arch_ptrace().
* added a new system call, seccomp(), for compat task [4/6]
* added SIGSYS siginfo for compat task [5/6]
* changed to always execute audit exit tracing to avoid OOPs [2/6, 6/6]
Changes v4 -> v5:
* rebased to v3.16-rc
* add patch [1/3] to allow ptrace to change a system call
(please note that this patch should be applied even without seccomp.)
Changes v3 -> v4:
* removed the following patch and moved it to "arm64: prerequisites for
audit and ftrace" patchset since it is required for audit and ftrace in
case of !COMPAT, too.
"arm64: is_compat_task is defined both in asm/compat.h and linux/compat.h"
Changes v2 -> v3:
* removed unnecessary 'type cast' operations [2/3]
* check for a return value (-1) of secure_computing() explicitly [2/3]
* aligned with the patch, "arm64: split syscall_trace() into separate
functions for enter/exit" [2/3]
* changed default of CONFIG_SECCOMP to n [2/3]
Changes v1 -> v2:
* added generic seccomp.h for arm64 to utilize it [1,2/3]
* changed syscall_trace() to return more meaningful value (-EPERM)
on seccomp failure case [2/3]
* aligned with the change in "arm64: make a single hook to syscall_trace()
for all syscall features" v2 [2/3]
* removed is_compat_task() definition from compat.h [3/3]
AKASHI Takahiro (6):
arm64: ptrace: add NT_ARM_SYSTEM_CALL regset
arm64: ptrace: allow tracer to skip a system call
asm-generic: add generic seccomp.h for secure computing mode 1
arm64: add seccomp syscall for compat task
arm64: add SIGSYS siginfo for compat task
arm64: add seccomp support
arch/arm64/Kconfig | 14 +++++++++
arch/arm64/include/asm/compat.h | 7 +++++
arch/arm64/include/asm/seccomp.h | 25 ++++++++++++++++
arch/arm64/include/asm/unistd.h | 3 ++
arch/arm64/include/asm/unistd32.h | 3 +-
arch/arm64/kernel/entry.S | 3 ++
arch/arm64/kernel/ptrace.c | 58 +++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/signal32.c | 6 ++++
include/asm-generic/seccomp.h | 30 +++++++++++++++++++
include/uapi/linux/elf.h | 1 +
10 files changed, 149 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/include/asm/seccomp.h
create mode 100644 include/asm-generic/seccomp.h
--
1.7.9.5
Tree/Branch: next-20141125
Git describe: next-20141125
Commit: 7cc2ecdb63 Add linux-next specific files for 20141125
Build Time: 21 min 59 sec
Passed: 8 / 8 (100.00 %)
Failed: 0 / 8 ( 0.00 %)
Errors: 0
Warnings: 35
Section Mismatches: 0
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
10 warnings 0 mismatches : arm64-allmodconfig
3 warnings 0 mismatches : arm-multi_v7_defconfig
2 warnings 0 mismatches : x86_64-defconfig
25 warnings 0 mismatches : arm-allmodconfig
2 warnings 0 mismatches : arm-allnoconfig
1 warnings 0 mismatches : x86_64-allnoconfig
3 warnings 0 mismatches : arm64-defconfig
-------------------------------------------------------------------------------
Warnings Summary: 35
5 <stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
3 ../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
3 ../mm/memcontrol.c:1629:13: warning: 'test_mem_cgroup_node_reclaimable' defined but not used [-Wunused-function]
2 ../include/linux/dynamic_debug.h:78:3: warning: unsupported argument to '__builtin_return_address'
2 ../drivers/scsi/ips.c:210:2: warning: #warning "This driver has only been tested on the x86/ia64/x86_64 platforms" [-Wcpp]
2 ../drivers/net/ethernet/dec/tulip/winbond-840.c:910:2: warning: #warning Processor architecture undefined [-Wcpp]
1 ../net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
1 ../lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1480 bytes is larger than 1024 bytes [-Wframe-larger-than=]
1 ../include/uapi/linux/swab.h:13:15: warning: integer overflow in expression [-Woverflow]
1 ../include/linux/spinlock.h:364:95: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
1 ../include/linux/kernel.h:710:17: warning: comparison of distinct pointer types lacks a cast
1 ../include/linux/dynamic_debug.h:64:16: warning: format '%d' expects argument of type 'int', but argument 4 has type 'long unsigned int' [-Wformat=]
1 ../fs/btrfs/extent_io.c:2166:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/usb/renesas_usbhs/common.c:471:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/usb/gadget/udc/udc-xilinx.c:2135:34: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/tty/serial/imx.c:315:13: warning: 'imx_port_ucrs_restore' defined but not used [-Wunused-function]
1 ../drivers/tty/serial/imx.c:306:13: warning: 'imx_port_ucrs_save' defined but not used [-Wunused-function]
1 ../drivers/net/wireless/ath/wil6210/fw_inc.c:447:4: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'int' [-Wformat=]
1 ../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:467:46: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:307:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:303:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/net/ethernet/dec/tulip/tulip_core.c:101:2: warning: #warning Processor architecture undefined! [-Wcpp]
1 ../drivers/mtd/chips/cfi_cmdset_0020.c:651:1: warning: the frame size of 1208 bytes is larger than 1024 bytes [-Wframe-larger-than=]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1203:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1198:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1172:31: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1171:33: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/hw/qib/qib_qp.c:44:0: warning: "BITS_PER_PAGE" redefined
1 ../drivers/gpu/drm/i915/i915_debugfs.c:1803:5: warning: ignoring return value of 'i915_gem_obj_ggtt_pin', declared with attribute warn_unused_result [-Wunused-result]
1 ../drivers/gpio/gpio-74xx-mmio.c:132:16: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/block/drbd/drbd_bitmap.c:483:0: warning: "BITS_PER_PAGE_MASK" redefined
1 ../drivers/block/drbd/drbd_bitmap.c:482:0: warning: "BITS_PER_PAGE" redefined
1 ../arch/arm64/kernel/efi.c:276:20: warning: 'free_end' may be used uninitialized in this function [-Wmaybe-uninitialized]
1 ../arch/arm/mach-omap2/board-rx51-peripherals.c:1000:36: warning: 'rx51_si4713_platform_data' defined but not used [-Wunused-variable]
1 ../arch/arm/mach-cns3xxx/pcie.c:313:1: warning: the frame size of 1080 bytes is larger than 1024 bytes [-Wframe-larger-than=]
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
arm64-allmodconfig : PASS, 0 errors, 10 warnings, 0 section mismatches
Warnings:
../mm/memcontrol.c:1629:13: warning: 'test_mem_cgroup_node_reclaimable' defined but not used [-Wunused-function]
../drivers/block/drbd/drbd_bitmap.c:482:0: warning: "BITS_PER_PAGE" redefined
../drivers/block/drbd/drbd_bitmap.c:483:0: warning: "BITS_PER_PAGE_MASK" redefined
../drivers/gpio/gpio-74xx-mmio.c:132:16: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
../drivers/infiniband/hw/qib/qib_qp.c:44:0: warning: "BITS_PER_PAGE" redefined
../drivers/net/ethernet/dec/tulip/winbond-840.c:910:2: warning: #warning Processor architecture undefined [-Wcpp]
../drivers/net/ethernet/dec/tulip/tulip_core.c:101:2: warning: #warning Processor architecture undefined! [-Wcpp]
../drivers/usb/gadget/udc/udc-xilinx.c:2135:34: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
../drivers/scsi/ips.c:210:2: warning: #warning "This driver has only been tested on the x86/ia64/x86_64 platforms" [-Wcpp]
../drivers/usb/renesas_usbhs/common.c:471:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
-------------------------------------------------------------------------------
arm-multi_v7_defconfig : PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
<stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
../arch/arm/mach-omap2/board-rx51-peripherals.c:1000:36: warning: 'rx51_si4713_platform_data' defined but not used [-Wunused-variable]
<stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
-------------------------------------------------------------------------------
x86_64-defconfig : PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
../include/linux/spinlock.h:364:95: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
../drivers/gpu/drm/i915/i915_debugfs.c:1803:5: warning: ignoring return value of 'i915_gem_obj_ggtt_pin', declared with attribute warn_unused_result [-Wunused-result]
-------------------------------------------------------------------------------
arm-allmodconfig : PASS, 0 errors, 25 warnings, 0 section mismatches
Warnings:
<stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
../arch/arm/mach-cns3xxx/pcie.c:313:1: warning: the frame size of 1080 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../mm/memcontrol.c:1629:13: warning: 'test_mem_cgroup_node_reclaimable' defined but not used [-Wunused-function]
../lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1480 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../fs/btrfs/extent_io.c:2166:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../drivers/infiniband/ulp/iser/iser_verbs.c:1171:33: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/infiniband/ulp/iser/iser_verbs.c:1172:31: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/infiniband/ulp/iser/iser_verbs.c:1198:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/infiniband/ulp/iser/iser_verbs.c:1203:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../include/linux/kernel.h:710:17: warning: comparison of distinct pointer types lacks a cast
../include/linux/dynamic_debug.h:78:3: warning: unsupported argument to '__builtin_return_address'
../include/linux/dynamic_debug.h:78:3: warning: unsupported argument to '__builtin_return_address'
../drivers/mtd/chips/cfi_cmdset_0020.c:651:1: warning: the frame size of 1208 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../include/linux/dynamic_debug.h:64:16: warning: format '%d' expects argument of type 'int', but argument 4 has type 'long unsigned int' [-Wformat=]
../drivers/tty/serial/imx.c:306:13: warning: 'imx_port_ucrs_save' defined but not used [-Wunused-function]
../drivers/tty/serial/imx.c:315:13: warning: 'imx_port_ucrs_restore' defined but not used [-Wunused-function]
../drivers/net/ethernet/dec/tulip/winbond-840.c:910:2: warning: #warning Processor architecture undefined [-Wcpp]
../include/uapi/linux/swab.h:13:15: warning: integer overflow in expression [-Woverflow]
../drivers/scsi/ips.c:210:2: warning: #warning "This driver has only been tested on the x86/ia64/x86_64 platforms" [-Wcpp]
../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:303:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:307:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:467:46: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/net/wireless/ath/wil6210/fw_inc.c:447:4: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'int' [-Wformat=]
<stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
-------------------------------------------------------------------------------
arm-allnoconfig : PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
<stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
-------------------------------------------------------------------------------
x86_64-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
-------------------------------------------------------------------------------
arm64-defconfig : PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
../arch/arm64/kernel/efi.c:276:20: warning: 'free_end' may be used uninitialized in this function [-Wmaybe-uninitialized]
../mm/memcontrol.c:1629:13: warning: 'test_mem_cgroup_node_reclaimable' defined but not used [-Wunused-function]
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
arm64-allnoconfig
Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_capacity (this patchset)
-tasks packing algorithm
SMT system is no more the only system that can have a CPUs with an original
capacity that is different from the default value. We need to extend the use of
cpu_capacity_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_capacity_orig/capacity_orig) and the current capacity
(cpu_capacity/capacity) of CPUs and sched_groups. A new function
arch_scale_cpu_capacity has been created and replace arch_scale_smt_capacity,
which is SMT specifc in the computation of the capapcity of a CPU.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
This assumption generates wrong decision by creating ghost cores and by
removing real ones when the original capacity of CPUs is different from the
default SCHED_CAPACITY_SCALE. We don't try anymore to evaluate the number of
available cores based on the group_capacity but instead we detect when the
group is fully utilized
Now that we have the original capacity of CPUS and their activity/utilization,
we can evaluate more accuratly the capacity and the level of utilization of a
group of CPUs.
This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we could certainly take advantage of this
new statistic in several other places of the load balance.
Tests results:
I have put below results of 4 kind of tests:
- hackbench -l 500 -s 4096
- perf bench sched pipe -l 400000
- scp of 100MB file on the platform
- ebizzy with various number of threads
on 4 kernels :
- tip = tip/sched/core
- step1 = tip + patches(1-8)
- patchset = tip + whole patchset
- patchset+irq = tip + this patchset + irq accounting
each test has been run 6 times and the figure below show the stdev and the
diff compared to the tip kernel
Dual A7 tip | +step1 | +patchset | patchset+irq
stdev | results stdev | results stdev | results stdev
hackbench (lower is better) (+/-)0.64% | -0.19% (+/-)0.73% | 0.58% (+/-)1.29% | 0.20% (+/-)1.00%
perf (lower is better) (+/-)0.28% | 1.22% (+/-)0.17% | 1.29% (+/-)0.06% | 2.85% (+/-)0.33%
scp (+/-)4.81% | 2.61% (+/-)0.28% | 2.39% (+/-)0.22% | 82.18% (+/-)3.30%
ebizzy -t 1 (+/-)2.31% | -1.32% (+/-)1.90% | -0.79% (+/-)2.88% | 3.10% (+/-)2.32%
ebizzy -t 2 (+/-)0.70% | 8.29% (+/-)6.66% | 1.93% (+/-)5.47% | 2.72% (+/-)5.72%
ebizzy -t 4 (+/-)3.54% | 5.57% (+/-)8.00% | 0.36% (+/-)9.00% | 2.53% (+/-)3.17%
ebizzy -t 6 (+/-)2.36% | -0.43% (+/-)3.29% | -1.93% (+/-)3.47% | 0.57% (+/-)0.75%
ebizzy -t 8 (+/-)1.65% | -0.45% (+/-)0.93% | -1.95% (+/-)1.52% | -1.18% (+/-)1.61%
ebizzy -t 10 (+/-)2.55% | -0.98% (+/-)3.06% | -1.18% (+/-)6.17% | -2.33% (+/-)3.28%
ebizzy -t 12 (+/-)6.22% | 0.17% (+/-)5.63% | 2.98% (+/-)7.11% | 1.19% (+/-)4.68%
ebizzy -t 14 (+/-)5.38% | -0.14% (+/-)5.33% | 2.49% (+/-)4.93% | 1.43% (+/-)6.55%
Quad A15 tip | +patchset1 | +patchset2 | patchset+irq
stdev | results stdev | results stdev | results stdev
hackbench (lower is better) (+/-)0.78% | 0.87% (+/-)1.72% | 0.91% (+/-)2.02% | 3.30% (+/-)2.02%
perf (lower is better) (+/-)2.03% | -0.31% (+/-)0.76% | -2.38% (+/-)1.37% | 1.42% (+/-)3.14%
scp (+/-)0.04% | 0.51% (+/-)1.37% | 1.79% (+/-)0.84% | 1.77% (+/-)0.38%
ebizzy -t 1 (+/-)0.41% | 2.05% (+/-)0.38% | 2.08% (+/-)0.24% | 0.17% (+/-)0.62%
ebizzy -t 2 (+/-)0.78% | 0.60% (+/-)0.63% | 0.43% (+/-)0.48% | 1.61% (+/-)0.38%
ebizzy -t 4 (+/-)0.58% | -0.10% (+/-)0.97% | -0.65% (+/-)0.76% | -0.75% (+/-)0.86%
ebizzy -t 6 (+/-)0.31% | 1.07% (+/-)1.12% | -0.16% (+/-)0.87% | -0.76% (+/-)0.22%
ebizzy -t 8 (+/-)0.95% | -0.30% (+/-)0.85% | -0.79% (+/-)0.28% | -1.66% (+/-)0.21%
ebizzy -t 10 (+/-)0.31% | 0.04% (+/-)0.97% | -1.44% (+/-)1.54% | -0.55% (+/-)0.62%
ebizzy -t 12 (+/-)8.35% | -1.89% (+/-)7.64% | 0.75% (+/-)5.30% | -1.18% (+/-)8.16%
ebizzy -t 14 (+/-)13.17% | 6.22% (+/-)4.71% | 5.25% (+/-)9.14% | 5.87% (+/-)5.77%
I haven't been able to fully test the patchset for a SMT system to check that
the regression that has been reported by Preethi has been solved but the
various tests that i have done, don't show any regression so far.
The correction of SD_PREFER_SIBLING mode and the use of the latter at SMT level
should have fix the regression.
The usage_avg_contrib is based on the current implementation of the
load avg tracking. I also have a version of the usage_avg_contrib that is based
on the new implementation [3] but haven't provide the patches and results as
[3] is still under review. I can provide change above [3] to change how
usage_avg_contrib is computed and adapt to new mecanism.
TODO: manage conflict with the next version of [4]
Change since V3:
- add usage_avg_contrib statistic which sums the running time of tasks on a rq
- use usage_avg_contrib instead of runnable_avg_sum for cpu_utilization
- fix replacement power by capacity
- update some comments
Change since V2:
- rebase on top of capacity renaming
- fix wake_affine statistic update
- rework nohz_kick_needed
- optimize the active migration of a task from CPU with reduced capacity
- rename group_activity by group_utilization and remove unused total_utilization
- repair SD_PREFER_SIBLING and use it for SMT level
- reorder patchset to gather patches with same topics
Change since V1:
- add 3 fixes
- correct some commit messages
- replace capacity computation by activity
- take into account current cpu capacity
[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377
[3] https://lkml.org/lkml/2014/7/18/110
[4] https://lkml.org/lkml/2014/7/25/589
Vincent Guittot (12):
sched: fix imbalance flag reset
sched: remove a wake_affine condition
sched: fix avg_load computation
sched: Allow all archs to set the capacity_orig
ARM: topology: use new cpu_capacity interface
sched: add per rq cpu_capacity_orig
sched: test the cpu's capacity in wake affine
sched: move cfs task on a CPU with higher capacity
sched: add usage_load_avg
sched: get CPU's utilization statistic
sched: replace capacity_factor by utilization
sched: add SD_PREFER_SIBLING for SMT level
arch/arm/kernel/topology.c | 4 +-
include/linux/sched.h | 4 +-
kernel/sched/core.c | 3 +-
kernel/sched/fair.c | 350 ++++++++++++++++++++++++++-------------------
kernel/sched/sched.h | 3 +-
5 files changed, 207 insertions(+), 157 deletions(-)
--
1.9.1
Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_power (this patchset)
-tasks packing algorithm
SMT system is no more the only system that can have a CPUs with an original
capacity that is different from the default value. We need to extend the use of
cpu_power_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_power_orig/power_orig) and the current capacity
(cpu_power/power) of CPUs and sched_groups. A new function arch_scale_cpu_power
has been created and replace arch_scale_smt_power, which is SMT specifc in the
computation of the capapcity of a CPU.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
This assumption generates wrong decision by creating ghost cores and by
removing real ones when the original capacity of CPUs is different from the
default SCHED_POWER_SCALE.
Now that we have the original capacity of a CPUS and its activity/utilization,
we can evaluate more accuratly the capacity of a group of CPUs.
This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we can certainly take advantage of this new
statistic in several other places of the load balance.
TODO:
- align variable's and field's name with the renaming [3]
Tests results:
I have put below results of 2 tests:
- hackbench -l 500 -s 4096
- scp of 100MB file on the platform
on a dual cortex-A7
hackbench scp
tip/master 25.75s(+/-0.25) 5.16MB/s(+/-1.49)
+ patches 1,2 25.89s(+/-0.31) 5.18MB/s(+/-1.45)
+ patches 3-10 25.68s(+/-0.22) 7.00MB/s(+/-1.88)
+ irq accounting 25.80s(+/-0.25) 8.06MB/s(+/-0.05)
on a quad cortex-A15
hackbench scp
tip/master 15.69s(+/-0.16) 9.70MB/s(+/-0.04)
+ patches 1,2 15.53s(+/-0.13) 9.72MB/s(+/-0.05)
+ patches 3-10 15.56s(+/-0.22) 9.88MB/s(+/-0.05)
+ irq accounting 15.99s(+/-0.08) 10.37MB/s(+/-0.03)
The improvement of scp bandwidth happens when tasks and irq are using
different CPU which is a bit random without irq accounting config
Change since V1:
- add 3 fixes
- correct some commit messages
- replace capacity computation by activity
- take into account current cpu capacity
[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377
[3] https://lkml.org/lkml/2014/5/14/622
Vincent Guittot (11):
sched: fix imbalance flag reset
sched: remove a wake_affine condition
sched: fix avg_load computation
sched: Allow all archs to set the power_orig
ARM: topology: use new cpu_power interface
sched: add per rq cpu_power_orig
Revert "sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED"
sched: get CPU's activity statistic
sched: test the cpu's capacity in wake affine
sched: move cfs task on a CPU with higher capacity
sched: replace capacity by activity
arch/arm/kernel/topology.c | 4 +-
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 229 ++++++++++++++++++++++-----------------------
kernel/sched/sched.h | 5 +-
4 files changed, 118 insertions(+), 122 deletions(-)
--
1.9.1
Tree/Branch: next-20141124
Git describe: next-20141124
Commit: a4cfa44aa2 Add linux-next specific files for 20141124
Build Time: 21 min 7 sec
Passed: 8 / 8 (100.00 %)
Failed: 0 / 8 ( 0.00 %)
Errors: 0
Warnings: 35
Section Mismatches: 0
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
10 warnings 0 mismatches : arm64-allmodconfig
3 warnings 0 mismatches : arm-multi_v7_defconfig
2 warnings 0 mismatches : x86_64-defconfig
25 warnings 0 mismatches : arm-allmodconfig
2 warnings 0 mismatches : arm-allnoconfig
1 warnings 0 mismatches : x86_64-allnoconfig
3 warnings 0 mismatches : arm64-defconfig
-------------------------------------------------------------------------------
Warnings Summary: 35
5 <stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
3 ../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
3 ../mm/memcontrol.c:1629:13: warning: 'test_mem_cgroup_node_reclaimable' defined but not used [-Wunused-function]
2 ../include/linux/dynamic_debug.h:78:3: warning: unsupported argument to '__builtin_return_address'
2 ../drivers/scsi/ips.c:210:2: warning: #warning "This driver has only been tested on the x86/ia64/x86_64 platforms" [-Wcpp]
2 ../drivers/net/ethernet/dec/tulip/winbond-840.c:910:2: warning: #warning Processor architecture undefined [-Wcpp]
1 ../net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
1 ../lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1480 bytes is larger than 1024 bytes [-Wframe-larger-than=]
1 ../include/uapi/linux/swab.h:13:15: warning: integer overflow in expression [-Woverflow]
1 ../include/linux/spinlock.h:364:95: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
1 ../include/linux/kernel.h:710:17: warning: comparison of distinct pointer types lacks a cast
1 ../include/linux/dynamic_debug.h:64:16: warning: format '%d' expects argument of type 'int', but argument 4 has type 'long unsigned int' [-Wformat=]
1 ../fs/btrfs/extent_io.c:2166:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/usb/renesas_usbhs/common.c:471:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/usb/gadget/udc/udc-xilinx.c:2135:34: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/tty/serial/imx.c:315:13: warning: 'imx_port_ucrs_restore' defined but not used [-Wunused-function]
1 ../drivers/tty/serial/imx.c:306:13: warning: 'imx_port_ucrs_save' defined but not used [-Wunused-function]
1 ../drivers/net/wireless/ath/wil6210/fw_inc.c:447:4: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'int' [-Wformat=]
1 ../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:467:46: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:307:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:303:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/net/ethernet/dec/tulip/tulip_core.c:101:2: warning: #warning Processor architecture undefined! [-Wcpp]
1 ../drivers/mtd/chips/cfi_cmdset_0020.c:651:1: warning: the frame size of 1208 bytes is larger than 1024 bytes [-Wframe-larger-than=]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1203:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1198:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1172:31: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/ulp/iser/iser_verbs.c:1171:33: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
1 ../drivers/infiniband/hw/qib/qib_qp.c:44:0: warning: "BITS_PER_PAGE" redefined
1 ../drivers/gpu/drm/i915/i915_debugfs.c:1805:5: warning: ignoring return value of 'i915_gem_obj_ggtt_pin', declared with attribute warn_unused_result [-Wunused-result]
1 ../drivers/gpio/gpio-74xx-mmio.c:132:16: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1 ../drivers/block/drbd/drbd_bitmap.c:483:0: warning: "BITS_PER_PAGE_MASK" redefined
1 ../drivers/block/drbd/drbd_bitmap.c:482:0: warning: "BITS_PER_PAGE" redefined
1 ../arch/arm64/kernel/efi.c:276:20: warning: 'free_end' may be used uninitialized in this function [-Wmaybe-uninitialized]
1 ../arch/arm/mach-omap2/board-rx51-peripherals.c:1000:36: warning: 'rx51_si4713_platform_data' defined but not used [-Wunused-variable]
1 ../arch/arm/mach-cns3xxx/pcie.c:313:1: warning: the frame size of 1080 bytes is larger than 1024 bytes [-Wframe-larger-than=]
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
arm64-allmodconfig : PASS, 0 errors, 10 warnings, 0 section mismatches
Warnings:
../mm/memcontrol.c:1629:13: warning: 'test_mem_cgroup_node_reclaimable' defined but not used [-Wunused-function]
../drivers/block/drbd/drbd_bitmap.c:482:0: warning: "BITS_PER_PAGE" redefined
../drivers/block/drbd/drbd_bitmap.c:483:0: warning: "BITS_PER_PAGE_MASK" redefined
../drivers/gpio/gpio-74xx-mmio.c:132:16: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
../drivers/infiniband/hw/qib/qib_qp.c:44:0: warning: "BITS_PER_PAGE" redefined
../drivers/net/ethernet/dec/tulip/winbond-840.c:910:2: warning: #warning Processor architecture undefined [-Wcpp]
../drivers/net/ethernet/dec/tulip/tulip_core.c:101:2: warning: #warning Processor architecture undefined! [-Wcpp]
../drivers/scsi/ips.c:210:2: warning: #warning "This driver has only been tested on the x86/ia64/x86_64 platforms" [-Wcpp]
../drivers/usb/gadget/udc/udc-xilinx.c:2135:34: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
../drivers/usb/renesas_usbhs/common.c:471:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
-------------------------------------------------------------------------------
arm-multi_v7_defconfig : PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
<stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
../arch/arm/mach-omap2/board-rx51-peripherals.c:1000:36: warning: 'rx51_si4713_platform_data' defined but not used [-Wunused-variable]
<stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
-------------------------------------------------------------------------------
x86_64-defconfig : PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
../include/linux/spinlock.h:364:95: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
../drivers/gpu/drm/i915/i915_debugfs.c:1805:5: warning: ignoring return value of 'i915_gem_obj_ggtt_pin', declared with attribute warn_unused_result [-Wunused-result]
-------------------------------------------------------------------------------
arm-allmodconfig : PASS, 0 errors, 25 warnings, 0 section mismatches
Warnings:
<stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
../arch/arm/mach-cns3xxx/pcie.c:313:1: warning: the frame size of 1080 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../fs/btrfs/extent_io.c:2166:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../mm/memcontrol.c:1629:13: warning: 'test_mem_cgroup_node_reclaimable' defined but not used [-Wunused-function]
../lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1480 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../drivers/infiniband/ulp/iser/iser_verbs.c:1171:33: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/infiniband/ulp/iser/iser_verbs.c:1172:31: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/infiniband/ulp/iser/iser_verbs.c:1198:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/infiniband/ulp/iser/iser_verbs.c:1203:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../include/linux/kernel.h:710:17: warning: comparison of distinct pointer types lacks a cast
../include/linux/dynamic_debug.h:78:3: warning: unsupported argument to '__builtin_return_address'
../include/linux/dynamic_debug.h:78:3: warning: unsupported argument to '__builtin_return_address'
../drivers/mtd/chips/cfi_cmdset_0020.c:651:1: warning: the frame size of 1208 bytes is larger than 1024 bytes [-Wframe-larger-than=]
../include/linux/dynamic_debug.h:64:16: warning: format '%d' expects argument of type 'int', but argument 4 has type 'long unsigned int' [-Wformat=]
../drivers/net/ethernet/dec/tulip/winbond-840.c:910:2: warning: #warning Processor architecture undefined [-Wcpp]
../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:303:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:307:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
../drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:467:46: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
../drivers/tty/serial/imx.c:306:13: warning: 'imx_port_ucrs_save' defined but not used [-Wunused-function]
../drivers/tty/serial/imx.c:315:13: warning: 'imx_port_ucrs_restore' defined but not used [-Wunused-function]
../include/uapi/linux/swab.h:13:15: warning: integer overflow in expression [-Woverflow]
../drivers/scsi/ips.c:210:2: warning: #warning "This driver has only been tested on the x86/ia64/x86_64 platforms" [-Wcpp]
../drivers/net/wireless/ath/wil6210/fw_inc.c:447:4: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'int' [-Wformat=]
<stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
-------------------------------------------------------------------------------
arm-allnoconfig : PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
<stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
-------------------------------------------------------------------------------
x86_64-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
-------------------------------------------------------------------------------
arm64-defconfig : PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
../scripts/kconfig/menu.c:590:18: warning: 'jump' may be used uninitialized in this function [-Wmaybe-uninitialized]
../arch/arm64/kernel/efi.c:276:20: warning: 'free_end' may be used uninitialized in this function [-Wmaybe-uninitialized]
../mm/memcontrol.c:1629:13: warning: 'test_mem_cgroup_node_reclaimable' defined but not used [-Wunused-function]
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
arm64-allnoconfig
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
This assumption generates wrong decision by creating ghost cores or by
removing real ones when the original capacity of CPUs is different from the
default SCHED_CAPACITY_SCALE. We don't try anymore to evaluate the number of
available cores based on the group_capacity but instead we evaluate the usage
of a group and compare it with its capacity.
This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we could certainly take advantage of this
new statistic in several other places of the load balance.
The utilization_avg_contrib is based on the current implementation of the
load avg tracking. I also have a version of the utilization_avg_contrib that
is based on the new implementation proposal [1] but haven't provide the patches
and results as [1] is still under review. I can provide change above [1] to
change how utilization_avg_contrib is computed and adapt to new mecanism.
Change since V6
- add group usage tracking
- fix some commits' messages
- minor fix like comments and argument order
Change since V5
- remove patches that have been merged since v5 : patches 01, 02, 03, 04, 05, 07
- update commit log and add more details on the purpose of the patches
- fix/remove useless code with the rebase on patchset [2]
- remove capacity_orig in sched_group_capacity as it is not used
- move code in the right patch
- add some helper function to factorize code
Change since V4
- rebase to manage conflicts with changes in selection of busiest group [4]
Change since V3:
- add usage_avg_contrib statistic which sums the running time of tasks on a rq
- use usage_avg_contrib instead of runnable_avg_sum for cpu_utilization
- fix replacement power by capacity
- update some comments
Change since V2:
- rebase on top of capacity renaming
- fix wake_affine statistic update
- rework nohz_kick_needed
- optimize the active migration of a task from CPU with reduced capacity
- rename group_activity by group_utilization and remove unused total_utilization
- repair SD_PREFER_SIBLING and use it for SMT level
- reorder patchset to gather patches with same topics
Change since V1:
- add 3 fixes
- correct some commit messages
- replace capacity computation by activity
- take into account current cpu capacity
[1] https://lkml.org/lkml/2014/7/18/110
[2] https://lkml.org/lkml/2014/7/25/589
Morten Rasmussen (1):
sched: Track group sched_entity usage contributions
Vincent Guittot (6):
sched: add per rq cpu_capacity_orig
sched: move cfs task on a CPU with higher capacity
sched: add utilization_avg_contrib
sched: get CPU's usage statistic
sched: replace capacity_factor by usage
sched: add SD_PREFER_SIBLING for SMT level
include/linux/sched.h | 21 +++-
kernel/sched/core.c | 15 +--
kernel/sched/debug.c | 12 ++-
kernel/sched/fair.c | 283 ++++++++++++++++++++++++++++++--------------------
kernel/sched/sched.h | 11 +-
5 files changed, 209 insertions(+), 133 deletions(-)
--
1.9.1