(This patchset was already acked by the maintainers, and
re-targeting v3.17. See change history.)
(I don't think that discussions below about ptrace() have impact on
this patchset.
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/268923.html
)
This patchset adds system call audit support on arm64.
Both 32-bit (AUDIT_ARCH_ARM) and 64-bit tasks (AUDIT_ARCH_AARCH64)
are supported. Since arm64 has the exact same set of system calls
on LE and BE, we don't care about endianness (or more specifically
__AUDIT_ARCH_64BIT bit in AUDIT_ARCH_*).
This patch should work correctly with:
* userspace audit tool (v2.3.6 or later)
This code was tested on both 32-bit and 64-bit LE userland
in the following two ways:
1) basic operations with auditctl/autrace
# auditctl -a exit,always -S openat -F path=/etc/inittab
# auditctl -a exit,always -F dir=/tmp -F perm=rw
# auditctl -a task,always
# autrace /bin/ls
by comparing output from autrace with one from strace
2) audit-test-code (+ my workarounds for arm/arm64)
by running "audit-tool", "filter" and "syscalls" test categories.
Changes v9 -> v10:
* rebased on 3.16-rc3
* included Catalin's patch[1/3] and added more syscall definitions for 3.16
Changes v8 -> v9:
* rebased on 3.15-rc, especially due to the change of syscall_get_arch()
interface [1,2/2]
Changes v7 -> v8:
* aligned with the change in "audit: generic compat system call audit
support" v5 [1/2]
* aligned with the change in "arm64: split syscall_trace() into separate
functions for enter/exit" v5 [2/2]
Changes v6 -> v7:
* changed an include file in syscall.h from <linux/audit.h> to
<uapi/linux/audit.h> [1/2]
* aligned with the patch, "arm64: split syscall_trace() into separate
functions for enter/exit" [2/2]
Changes v5 -> v6:
* removed and put "arm64: Add regs_return_value() in syscall.h" patch into
a separate set
* aligned with the change in "arm64: make a single hook to syscall_trace()
for all syscall features" v3 [1/2]
Changes v4 -> v5:
* rebased to 3.14-rcX
* added a guard against TIF_SYSCALL_AUDIT [3/3]
* aligned with the change in "arm64: make a single hook to syscall_trace()
for all syscall features" v2 [3/3]
Changes v3 -> v4:
* Modified to sync with the patch, "make a single hook to syscall_trace()
for all syscall features"
* aligned with "audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL" patch
Changes v2 -> v3:
* Remove asm/audit.h.
See "generic compat syscall audit support" patch v4
* Remove endianness dependency, ie. AUDIT_ARCH_ARMEB/AARCH64EB.
* Remove kernel/syscalls/Makefile which was used to create unistd32.h.
See Catalin's "Add __NR_* definitions for compat syscalls" patch
Changes v1 -> v2:
* Modified to utilize "generic compat system call audit" [3/6, 4/6, 5/6]
Please note that a required header, unistd_32.h, is automatically
generated from unistd32.h.
* Refer to regs->orig_x0 instead of regs->x0 as the first argument of
system call in audit_syscall_entry() [6/6]
* Include "Add regs_return_value() in syscall.h" patch [2/6],
which was not intentionally included in v1 because it could be added
by "kprobes support".
AKASHI Takahiro (2):
arm64: Add audit support
arm64: audit: Add audit hook in syscall_trace_enter/exit()
Catalin Marinas (1):
arm64: Add __NR_* definitions for compat syscalls
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/syscall.h | 14 +
arch/arm64/include/asm/unistd.h | 17 +
arch/arm64/include/asm/unistd32.h | 1166 ++++++++++++++++++++++++-------------
arch/arm64/kernel/entry.S | 1 -
arch/arm64/kernel/kuser32.S | 2 +-
arch/arm64/kernel/ptrace.c | 7 +
arch/arm64/kernel/signal32.c | 2 +-
arch/arm64/kernel/sys_compat.c | 2 +-
include/uapi/linux/audit.h | 1 +
10 files changed, 810 insertions(+), 404 deletions(-)
--
1.7.9.5
This patchset implements "kiosk" mode for KDB debugger and is a
continuation of previous work by Anton Vorontsov (dating back to late
2012).
When kiosk mode is engaged several kdb commands become disabled leaving
only status reporting functions working normally. In particular arbitrary
memory read/write is prevented and it is no longer possible to alter
program flow.
Note that the commands that remain enabled are sufficient to run the
post-mortem macro commands, dumpcommon, dumpall and dumpcpu. One of the
motivating use-cases for this work is to realize post-mortem on embedded
devices (such as phones) without allowing the debug facility to be easily
exploited to compromise user privacy. In principle this means the feature
can be enabled on production devices.
There are a few patches, some are just cleanups, some are churn-ish
cleanups, but inevitable. And the rest implements the mode -- after all
the preparations, everything is pretty straightforward. The first patch
is actually a pure bug fix (arguably unrelated to kiosk mode) but
collides with the kiosk code to honour the sysrq mask so I have included
it here.
Changes since v1 (circa 2012):
* ef (Display exception frame) is essentially an overly complex peek
and has therefore been marked unsafe
* bt (Stack traceback) has been marked safe only with no arguments
* sr (Magic SysRq key) honours the sysrq mask when called in kiosk
mode
* Fixed over-zealous blocking of macro commands
* Symbol lookup is forbidden by kdbgetaddrarg (more robust, better
error reporting to user)
* Fix deadlock in sr (Magic SysRq key)
* Better help text in kiosk mode
* Default (kiosk on/off) can be changed From the config file.
Anton Vorontsov (7):
kdb: Remove currently unused kdbtab_t->cmd_flags
kdb: Rename kdb_repeat_t to kdb_cmdflags_t, cmd_repeat to cmd_flags
kdb: Rename kdb_register_repeat() to kdb_register_flags()
kdb: Use KDB_REPEAT_* values as flags
kdb: Remove KDB_REPEAT_NONE flag
kdb: Mark safe commands as KDB_SAFE and KDB_SAFE_NO_ARGS
kdb: Add kiosk mode
Daniel Thompson (3):
sysrq: Implement __handle_sysrq_nolock to avoid recursive locking in
kdb
kdb: Improve usability of help text when running in kiosk mode
kdb: Allow access to sensitive commands to be restricted by default
drivers/tty/sysrq.c | 11 ++-
include/linux/kdb.h | 20 ++--
include/linux/sysrq.h | 1 +
kernel/debug/kdb/kdb_bp.c | 22 ++---
kernel/debug/kdb/kdb_main.c | 207 +++++++++++++++++++++++------------------
kernel/debug/kdb/kdb_private.h | 3 +-
kernel/trace/trace_kdb.c | 4 +-
lib/Kconfig.kgdb | 21 +++++
8 files changed, 172 insertions(+), 117 deletions(-)
--
1.9.0
Currently if an active CPU fails to respond to a roundup request the
CPU that requested the roundup will become stuck. This needlessly
reduces the robustness of the debugger.
This patch introduces a timeout allowing the system state to be examined
even when the system contains unresponsive processors. It also modifies
kdb's cpu command to make it censor attempts to switch to unresponsive
processors and to report their state as (D)ead.
Signed-off-by: Daniel Thompson <daniel.thompson(a)linaro.org>
Cc: Jason Wessel <jason.wessel(a)windriver.com>
Cc: Mike Travis <travis(a)sgi.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Dimitri Sivanich <sivanich(a)sgi.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: kgdb-bugreport(a)lists.sourceforge.net
---
kernel/debug/debug_core.c | 9 +++++++--
kernel/debug/kdb/kdb_main.c | 4 +++-
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 1adf62b..acd7497 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -471,6 +471,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
int cpu;
int trace_on = 0;
int online_cpus = num_online_cpus();
+ u64 time_left;
kgdb_info[ks->cpu].enter_kgdb++;
kgdb_info[ks->cpu].exception_state |= exception_state;
@@ -595,9 +596,13 @@ return_normal:
/*
* Wait for the other CPUs to be notified and be waiting for us:
*/
- while (kgdb_do_roundup && (atomic_read(&masters_in_kgdb) +
- atomic_read(&slaves_in_kgdb)) != online_cpus)
+ time_left = loops_per_jiffy * HZ;
+ while (kgdb_do_roundup && --time_left &&
+ (atomic_read(&masters_in_kgdb) + atomic_read(&slaves_in_kgdb)) !=
+ online_cpus)
cpu_relax();
+ if (!time_left)
+ pr_crit("KGDB: Timed out waiting for secondary CPUs.\n");
/*
* At this point the primary processor is completely
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 2f7c760..49f2425 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2157,6 +2157,8 @@ static void kdb_cpu_status(void)
for (start_cpu = -1, i = 0; i < NR_CPUS; i++) {
if (!cpu_online(i)) {
state = 'F'; /* cpu is offline */
+ } else if (!kgdb_info[i].enter_kgdb) {
+ state = 'D'; /* cpu is online but unresponsive */
} else {
state = ' '; /* cpu is responding to kdb */
if (kdb_task_state_char(KDB_TSK(i)) == 'I')
@@ -2210,7 +2212,7 @@ static int kdb_cpu(int argc, const char **argv)
/*
* Validate cpunum
*/
- if ((cpunum > NR_CPUS) || !cpu_online(cpunum))
+ if ((cpunum > NR_CPUS) || !kgdb_info[cpunum].enter_kgdb)
return KDB_BADCPUNUM;
dbg_switch_cpu = cpunum;
--
1.9.3
Issuing a stack dump feels ergonomically wrong when entering due to NMI.
Entering due to NMI is a normally reaction to a user request, either the
NMI button on a server or a "magic knock" on a UART. Therefore the
backtrace behaviour on entry due to NMI should be like SysRq-g (no stack
dump) rather than like oops.
Note also that the stack dump does not offer any information that
cannot be trivial retrieved using the 'bt' command.
Signed-off-by: Daniel Thompson <daniel.thompson(a)linaro.org>
Cc: Jason Wessel <jason.wessel(a)windriver.com>
Cc: Mike Travis <travis(a)sgi.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: kgdb-bugreport(a)lists.sourceforge.net
---
kernel/debug/kdb/kdb_main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 49f2425..6d19905 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -1207,7 +1207,6 @@ static int kdb_local(kdb_reason_t reason, int error, struct pt_regs *regs,
kdb_printf("due to NonMaskable Interrupt @ "
kdb_machreg_fmt "\n",
instruction_pointer(regs));
- kdb_dumpregs(regs);
break;
case KDB_REASON_SSTEP:
case KDB_REASON_BREAK:
--
1.9.3
Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_capacity (this patchset)
-tasks packing algorithm
SMT system is no more the only system that can have a CPUs with an original
capacity that is different from the default value. We need to extend the use of
cpu_capacity_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_capacity_orig/capacity_orig) and the current capacity
(cpu_capacity/capacity) of CPUs and sched_groups. A new function
arch_scale_cpu_capacity has been created and replace arch_scale_smt_capacity,
which is SMT specifc in the computation of the capapcity of a CPU.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
This assumption generates wrong decision by creating ghost cores and by
removing real ones when the original capacity of CPUs is different from the
default SCHED_CAPACITY_SCALE. We don't try anymore to evaluate the number of
available cores based on the group_capacity but instead we detect when the
group is fully utilized
Now that we have the original capacity of CPUS and their activity/utilization,
we can evaluate more accuratly the capacity and the level of utilization of a
group of CPUs.
This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we could certainly take advantage of this
new statistic in several other places of the load balance.
Tests results:
I have put below results of 4 kind of tests:
- hackbench -l 500 -s 4096
- perf bench sched pipe -l 400000
- scp of 100MB file on the platform
- ebizzy with various number of threads
on 4 kernels :
- tip = tip/sched/core
- step1 = tip + patches(1-8)
- patchset = tip + whole patchset
- patchset+irq = tip + this patchset + irq accounting
each test has been run 6 times and the figure below show the stdev and the
diff compared to the tip kernel
Dual A7 tip | +step1 | +patchset | patchset+irq
stdev | results stdev | results stdev | results stdev
hackbench (lower is better) (+/-)0.64% | -0.19% (+/-)0.73% | 0.58% (+/-)1.29% | 0.20% (+/-)1.00%
perf (lower is better) (+/-)0.28% | 1.22% (+/-)0.17% | 1.29% (+/-)0.06% | 2.85% (+/-)0.33%
scp (+/-)4.81% | 2.61% (+/-)0.28% | 2.39% (+/-)0.22% | 82.18% (+/-)3.30%
ebizzy -t 1 (+/-)2.31% | -1.32% (+/-)1.90% | -0.79% (+/-)2.88% | 3.10% (+/-)2.32%
ebizzy -t 2 (+/-)0.70% | 8.29% (+/-)6.66% | 1.93% (+/-)5.47% | 2.72% (+/-)5.72%
ebizzy -t 4 (+/-)3.54% | 5.57% (+/-)8.00% | 0.36% (+/-)9.00% | 2.53% (+/-)3.17%
ebizzy -t 6 (+/-)2.36% | -0.43% (+/-)3.29% | -1.93% (+/-)3.47% | 0.57% (+/-)0.75%
ebizzy -t 8 (+/-)1.65% | -0.45% (+/-)0.93% | -1.95% (+/-)1.52% | -1.18% (+/-)1.61%
ebizzy -t 10 (+/-)2.55% | -0.98% (+/-)3.06% | -1.18% (+/-)6.17% | -2.33% (+/-)3.28%
ebizzy -t 12 (+/-)6.22% | 0.17% (+/-)5.63% | 2.98% (+/-)7.11% | 1.19% (+/-)4.68%
ebizzy -t 14 (+/-)5.38% | -0.14% (+/-)5.33% | 2.49% (+/-)4.93% | 1.43% (+/-)6.55%
Quad A15 tip | +patchset1 | +patchset2 | patchset+irq
stdev | results stdev | results stdev | results stdev
hackbench (lower is better) (+/-)0.78% | 0.87% (+/-)1.72% | 0.91% (+/-)2.02% | 3.30% (+/-)2.02%
perf (lower is better) (+/-)2.03% | -0.31% (+/-)0.76% | -2.38% (+/-)1.37% | 1.42% (+/-)3.14%
scp (+/-)0.04% | 0.51% (+/-)1.37% | 1.79% (+/-)0.84% | 1.77% (+/-)0.38%
ebizzy -t 1 (+/-)0.41% | 2.05% (+/-)0.38% | 2.08% (+/-)0.24% | 0.17% (+/-)0.62%
ebizzy -t 2 (+/-)0.78% | 0.60% (+/-)0.63% | 0.43% (+/-)0.48% | 1.61% (+/-)0.38%
ebizzy -t 4 (+/-)0.58% | -0.10% (+/-)0.97% | -0.65% (+/-)0.76% | -0.75% (+/-)0.86%
ebizzy -t 6 (+/-)0.31% | 1.07% (+/-)1.12% | -0.16% (+/-)0.87% | -0.76% (+/-)0.22%
ebizzy -t 8 (+/-)0.95% | -0.30% (+/-)0.85% | -0.79% (+/-)0.28% | -1.66% (+/-)0.21%
ebizzy -t 10 (+/-)0.31% | 0.04% (+/-)0.97% | -1.44% (+/-)1.54% | -0.55% (+/-)0.62%
ebizzy -t 12 (+/-)8.35% | -1.89% (+/-)7.64% | 0.75% (+/-)5.30% | -1.18% (+/-)8.16%
ebizzy -t 14 (+/-)13.17% | 6.22% (+/-)4.71% | 5.25% (+/-)9.14% | 5.87% (+/-)5.77%
I haven't been able to fully test the patchset for a SMT system to check that
the regression that has been reported by Preethi has been solved but the
various tests that i have done, don't show any regression so far.
The correction of SD_PREFER_SIBLING mode and the use of the latter at SMT level
should have fix the regression.
The usage_avg_contrib is based on the current implementation of the
load avg tracking. I also have a version of the usage_avg_contrib that is based
on the new implementation [3] but haven't provide the patches and results as
[3] is still under review. I can provide change above [3] to change how
usage_avg_contrib is computed and adapt to new mecanism.
TODO: manage conflict with the next version of [4]
Change since V3:
- add usage_avg_contrib statistic which sums the running time of tasks on a rq
- use usage_avg_contrib instead of runnable_avg_sum for cpu_utilization
- fix replacement power by capacity
- update some comments
Change since V2:
- rebase on top of capacity renaming
- fix wake_affine statistic update
- rework nohz_kick_needed
- optimize the active migration of a task from CPU with reduced capacity
- rename group_activity by group_utilization and remove unused total_utilization
- repair SD_PREFER_SIBLING and use it for SMT level
- reorder patchset to gather patches with same topics
Change since V1:
- add 3 fixes
- correct some commit messages
- replace capacity computation by activity
- take into account current cpu capacity
[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377
[3] https://lkml.org/lkml/2014/7/18/110
[4] https://lkml.org/lkml/2014/7/25/589
Vincent Guittot (12):
sched: fix imbalance flag reset
sched: remove a wake_affine condition
sched: fix avg_load computation
sched: Allow all archs to set the capacity_orig
ARM: topology: use new cpu_capacity interface
sched: add per rq cpu_capacity_orig
sched: test the cpu's capacity in wake affine
sched: move cfs task on a CPU with higher capacity
sched: add usage_load_avg
sched: get CPU's utilization statistic
sched: replace capacity_factor by utilization
sched: add SD_PREFER_SIBLING for SMT level
arch/arm/kernel/topology.c | 4 +-
include/linux/sched.h | 4 +-
kernel/sched/core.c | 3 +-
kernel/sched/fair.c | 350 ++++++++++++++++++++++++++-------------------
kernel/sched/sched.h | 3 +-
5 files changed, 207 insertions(+), 157 deletions(-)
--
1.9.1
Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_power (this patchset)
-tasks packing algorithm
SMT system is no more the only system that can have a CPUs with an original
capacity that is different from the default value. We need to extend the use of
cpu_power_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_power_orig/power_orig) and the current capacity
(cpu_power/power) of CPUs and sched_groups. A new function arch_scale_cpu_power
has been created and replace arch_scale_smt_power, which is SMT specifc in the
computation of the capapcity of a CPU.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
This assumption generates wrong decision by creating ghost cores and by
removing real ones when the original capacity of CPUs is different from the
default SCHED_POWER_SCALE.
Now that we have the original capacity of a CPUS and its activity/utilization,
we can evaluate more accuratly the capacity of a group of CPUs.
This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we can certainly take advantage of this new
statistic in several other places of the load balance.
TODO:
- align variable's and field's name with the renaming [3]
Tests results:
I have put below results of 2 tests:
- hackbench -l 500 -s 4096
- scp of 100MB file on the platform
on a dual cortex-A7
hackbench scp
tip/master 25.75s(+/-0.25) 5.16MB/s(+/-1.49)
+ patches 1,2 25.89s(+/-0.31) 5.18MB/s(+/-1.45)
+ patches 3-10 25.68s(+/-0.22) 7.00MB/s(+/-1.88)
+ irq accounting 25.80s(+/-0.25) 8.06MB/s(+/-0.05)
on a quad cortex-A15
hackbench scp
tip/master 15.69s(+/-0.16) 9.70MB/s(+/-0.04)
+ patches 1,2 15.53s(+/-0.13) 9.72MB/s(+/-0.05)
+ patches 3-10 15.56s(+/-0.22) 9.88MB/s(+/-0.05)
+ irq accounting 15.99s(+/-0.08) 10.37MB/s(+/-0.03)
The improvement of scp bandwidth happens when tasks and irq are using
different CPU which is a bit random without irq accounting config
Change since V1:
- add 3 fixes
- correct some commit messages
- replace capacity computation by activity
- take into account current cpu capacity
[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377
[3] https://lkml.org/lkml/2014/5/14/622
Vincent Guittot (11):
sched: fix imbalance flag reset
sched: remove a wake_affine condition
sched: fix avg_load computation
sched: Allow all archs to set the power_orig
ARM: topology: use new cpu_power interface
sched: add per rq cpu_power_orig
Revert "sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED"
sched: get CPU's activity statistic
sched: test the cpu's capacity in wake affine
sched: move cfs task on a CPU with higher capacity
sched: replace capacity by activity
arch/arm/kernel/topology.c | 4 +-
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 229 ++++++++++++++++++++++-----------------------
kernel/sched/sched.h | 5 +-
4 files changed, 118 insertions(+), 122 deletions(-)
--
1.9.1
From: Mark Brown <broonie(a)linaro.org>
The only requirement the scheduler has on cluster IDs is that they must
be unique. When enumerating the topology based on MPIDR information the
kernel currently generates cluster IDs by using the first level of
affinity above the core ID (either level one or two depending on if the
core has multiple threads) however the ARMv8 architecture allows for up
to three levels of affinity. This means that an ARMv8 system may
contain cores which have MPIDRs identical other than affinity level
three which with current code will cause us to report multiple cores
with the same identification to the scheduler in violation of its
uniqueness requirement.
Ensure that we do not violate the scheduler requirements on systems that
uses all the affinity levels by incorporating both affinity levels two
and three into the cluser ID when the cores are not threaded.
While no currently known hardware uses multi-level clusters it is better
to program defensively, this will help ease bringup of systems that have
them and will ensure that things like distribution install media do not
need to be respun to replace kernels in order to deploy such systems.
In the worst case the system will work but perform suboptimally until a
kernel modified to handle the new topology better is installed, in the
best case this will be an adequate description of such topologies for
the scheduler to perform well.
Signed-off-by: Mark Brown <broonie(a)linaro.org>
---
arch/arm64/kernel/topology.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index b6ee26b..5752c1b 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -255,7 +255,8 @@ void store_cpu_topology(unsigned int cpuid)
/* Multiprocessor system : Multi-threads per core */
cpuid_topo->thread_id = MPIDR_AFFINITY_LEVEL(mpidr, 0);
cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 1);
- cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2);
+ cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) |
+ MPIDR_AFFINITY_LEVEL(mpidr, 3) << 8;
} else {
/* Multiprocessor system : Single-thread per core */
cpuid_topo->thread_id = -1;
--
2.1.0.rc1
This patch series enable secure computing (system call filtering) on arm64,
and contain related enhancements and bug fixes.
This code was tested on ARMv8 fast model with 64-bit/32-bit userspace
using
* libseccomp v2.1.1 with modifications for arm64, especially its "live"
tests: No.20, 21 and 24.
* modified version of Kees' seccomp test for 'changing/skipping a syscall'
and seccomp() system call
* in-house tests for 'changing/skipping a system call' in tracing with
ptrace(PTRACE_SYSCALL) (that is, without seccomp)'
with and without audit tracing.
Changes v5 -> v6:
* rebased to v3.17-rc
* changed the interface of changing/skipping a system call from re-writing
x8 register [v5 1/3] to using dedicated PTRACE_SET_SYSCALL command
[1/6, 2/6]
Patch [1/6] contains a checkpatch error around a switch statement, but it
won't be fixed as in compat_arch_ptrace().
* added a new system call, seccomp(), for compat task [4/6]
* added SIGSYS siginfo for compat task [5/6]
* changed to always execute audit exit tracing to avoid OOPs [2/6, 6/6]
Changes v4 -> v5:
* rebased to v3.16-rc
* add patch [1/3] to allow ptrace to change a system call
(please note that this patch should be applied even without seccomp.)
Changes v3 -> v4:
* removed the following patch and moved it to "arm64: prerequisites for
audit and ftrace" patchset since it is required for audit and ftrace in
case of !COMPAT, too.
"arm64: is_compat_task is defined both in asm/compat.h and linux/compat.h"
Changes v2 -> v3:
* removed unnecessary 'type cast' operations [2/3]
* check for a return value (-1) of secure_computing() explicitly [2/3]
* aligned with the patch, "arm64: split syscall_trace() into separate
functions for enter/exit" [2/3]
* changed default of CONFIG_SECCOMP to n [2/3]
Changes v1 -> v2:
* added generic seccomp.h for arm64 to utilize it [1,2/3]
* changed syscall_trace() to return more meaningful value (-EPERM)
on seccomp failure case [2/3]
* aligned with the change in "arm64: make a single hook to syscall_trace()
for all syscall features" v2 [2/3]
* removed is_compat_task() definition from compat.h [3/3]
AKASHI Takahiro (6):
arm64: ptrace: add PTRACE_SET_SYSCALL
arm64: ptrace: allow tracer to skip a system call
asm-generic: add generic seccomp.h for secure computing mode 1
arm64: add seccomp syscall for compat task
arm64: add SIGSYS siginfo for compat task
arm64: add seccomp support
arch/arm64/Kconfig | 14 ++++++++++++
arch/arm64/include/asm/compat.h | 7 ++++++
arch/arm64/include/asm/ptrace.h | 9 ++++++++
arch/arm64/include/asm/seccomp.h | 25 ++++++++++++++++++++++
arch/arm64/include/asm/unistd.h | 5 ++++-
arch/arm64/include/asm/unistd32.h | 3 +++
arch/arm64/include/uapi/asm/ptrace.h | 1 +
arch/arm64/kernel/entry.S | 6 ++++++
arch/arm64/kernel/ptrace.c | 39 +++++++++++++++++++++++++++++++++-
arch/arm64/kernel/signal32.c | 8 +++++++
include/asm-generic/seccomp.h | 28 ++++++++++++++++++++++++
11 files changed, 143 insertions(+), 2 deletions(-)
create mode 100644 arch/arm64/include/asm/seccomp.h
create mode 100644 include/asm-generic/seccomp.h
--
1.7.9.5
Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_capacity (this patchset)
-tasks packing algorithm
SMT system is no more the only system that can have a CPUs with an original
capacity that is different from the default value. We need to extend the use of
(cpu_)capacity_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_capacity_orig/capacity_orig) and the current capacity
(cpu_capacity/capacity) of CPUs and sched_groups. A new function
arch_scale_cpu_capacity has been created and replace arch_scale_smt_capacity,
which is SMT specifc in the computation of the capapcity of a CPU.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
This assumption generates wrong decision by creating ghost cores or by
removing real ones when the original capacity of CPUs is different from the
default SCHED_CAPACITY_SCALE. We don't try anymore to evaluate the number of
available cores based on the group_capacity but instead we detect when the
group is fully utilized
Now that we have the original capacity of CPUS and their activity/utilization,
we can evaluate more accuratly the capacity and the level of utilization of a
group of CPUs.
This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we could certainly take advantage of this
new statistic in several other places of the load balance.
Tests results (done on v4, no test has been done on v5 that is only a rebase):
I have put below results of 4 kind of tests:
- hackbench -l 500 -s 4096
- perf bench sched pipe -l 400000
- scp of 100MB file on the platform
- ebizzy with various number of threads
on 4 kernels :
- tip = tip/sched/core
- step1 = tip + patches(1-8)
- patchset = tip + whole patchset
- patchset+irq = tip + this patchset + irq accounting
each test has been run 6 times and the figure below show the stdev and the
diff compared to the tip kernel
Dual A7 tip | +step1 | +patchset | patchset+irq
stdev | results stdev | results stdev | results stdev
hackbench (lower is better) (+/-)0.64% | -0.19% (+/-)0.73% | 0.58% (+/-)1.29% | 0.20% (+/-)1.00%
perf (lower is better) (+/-)0.28% | 1.22% (+/-)0.17% | 1.29% (+/-)0.06% | 2.85% (+/-)0.33%
scp (+/-)4.81% | 2.61% (+/-)0.28% | 2.39% (+/-)0.22% | 82.18% (+/-)3.30%
ebizzy -t 1 (+/-)2.31% | -1.32% (+/-)1.90% | -0.79% (+/-)2.88% | 3.10% (+/-)2.32%
ebizzy -t 2 (+/-)0.70% | 8.29% (+/-)6.66% | 1.93% (+/-)5.47% | 2.72% (+/-)5.72%
ebizzy -t 4 (+/-)3.54% | 5.57% (+/-)8.00% | 0.36% (+/-)9.00% | 2.53% (+/-)3.17%
ebizzy -t 6 (+/-)2.36% | -0.43% (+/-)3.29% | -1.93% (+/-)3.47% | 0.57% (+/-)0.75%
ebizzy -t 8 (+/-)1.65% | -0.45% (+/-)0.93% | -1.95% (+/-)1.52% | -1.18% (+/-)1.61%
ebizzy -t 10 (+/-)2.55% | -0.98% (+/-)3.06% | -1.18% (+/-)6.17% | -2.33% (+/-)3.28%
ebizzy -t 12 (+/-)6.22% | 0.17% (+/-)5.63% | 2.98% (+/-)7.11% | 1.19% (+/-)4.68%
ebizzy -t 14 (+/-)5.38% | -0.14% (+/-)5.33% | 2.49% (+/-)4.93% | 1.43% (+/-)6.55%
Quad A15 tip | +patchset1 | +patchset2 | patchset+irq
stdev | results stdev | results stdev | results stdev
hackbench (lower is better) (+/-)0.78% | 0.87% (+/-)1.72% | 0.91% (+/-)2.02% | 3.30% (+/-)2.02%
perf (lower is better) (+/-)2.03% | -0.31% (+/-)0.76% | -2.38% (+/-)1.37% | 1.42% (+/-)3.14%
scp (+/-)0.04% | 0.51% (+/-)1.37% | 1.79% (+/-)0.84% | 1.77% (+/-)0.38%
ebizzy -t 1 (+/-)0.41% | 2.05% (+/-)0.38% | 2.08% (+/-)0.24% | 0.17% (+/-)0.62%
ebizzy -t 2 (+/-)0.78% | 0.60% (+/-)0.63% | 0.43% (+/-)0.48% | 1.61% (+/-)0.38%
ebizzy -t 4 (+/-)0.58% | -0.10% (+/-)0.97% | -0.65% (+/-)0.76% | -0.75% (+/-)0.86%
ebizzy -t 6 (+/-)0.31% | 1.07% (+/-)1.12% | -0.16% (+/-)0.87% | -0.76% (+/-)0.22%
ebizzy -t 8 (+/-)0.95% | -0.30% (+/-)0.85% | -0.79% (+/-)0.28% | -1.66% (+/-)0.21%
ebizzy -t 10 (+/-)0.31% | 0.04% (+/-)0.97% | -1.44% (+/-)1.54% | -0.55% (+/-)0.62%
ebizzy -t 12 (+/-)8.35% | -1.89% (+/-)7.64% | 0.75% (+/-)5.30% | -1.18% (+/-)8.16%
ebizzy -t 14 (+/-)13.17% | 6.22% (+/-)4.71% | 5.25% (+/-)9.14% | 5.87% (+/-)5.77%
I haven't been able to fully test the patchset for a SMT system to check that
the regression that has been reported by Preethi has been solved but the
various tests that i have done, don't show any regression so far.
The correction of SD_PREFER_SIBLING mode and the use of the latter at SMT level
should have fix the regression.
The usage_avg_contrib is based on the current implementation of the
load avg tracking. I also have a version of the usage_avg_contrib that is based
on the new implementation [3] but haven't provide the patches and results as
[3] is still under review. I can provide change above [3] to change how
usage_avg_contrib is computed and adapt to new mecanism.
Change since V4
- rebase to manage conflicts with changes in selection of busiest group [4]
Change since V3:
- add usage_avg_contrib statistic which sums the running time of tasks on a rq
- use usage_avg_contrib instead of runnable_avg_sum for cpu_utilization
- fix replacement power by capacity
- update some comments
Change since V2:
- rebase on top of capacity renaming
- fix wake_affine statistic update
- rework nohz_kick_needed
- optimize the active migration of a task from CPU with reduced capacity
- rename group_activity by group_utilization and remove unused total_utilization
- repair SD_PREFER_SIBLING and use it for SMT level
- reorder patchset to gather patches with same topics
Change since V1:
- add 3 fixes
- correct some commit messages
- replace capacity computation by activity
- take into account current cpu capacity
[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377
[3] https://lkml.org/lkml/2014/7/18/110
[4] https://lkml.org/lkml/2014/7/25/589
Vincent Guittot (12):
sched: fix imbalance flag reset
sched: remove a wake_affine condition
sched: fix avg_load computation
sched: Allow all archs to set the capacity_orig
ARM: topology: use new cpu_capacity interface
sched: add per rq cpu_capacity_orig
sched: test the cpu's capacity in wake affine
sched: move cfs task on a CPU with higher capacity
sched: add usage_load_avg
sched: get CPU's utilization statistic
sched: replace capacity_factor by utilization
sched: add SD_PREFER_SIBLING for SMT level
arch/arm/kernel/topology.c | 4 +-
include/linux/sched.h | 4 +-
kernel/sched/core.c | 3 +-
kernel/sched/fair.c | 356 ++++++++++++++++++++++++++-------------------
kernel/sched/sched.h | 3 +-
5 files changed, 211 insertions(+), 159 deletions(-)
--
1.9.1