On Fri, Oct 18, 2013 at 03:36:38PM +0300, Taras Kondratiuk wrote:
> On 10/18/2013 02:03 PM, Jon Medhurst (Tixy) wrote:
> > On Thu, 2013-10-17 at 13:55 +0100, Dave Martin wrote:
> >> On Thu, Oct 17, 2013 at 01:17:23PM +0100, Jon Medhurst (Tixy) wrote:
> >>> I'll send a patch proposing that fix after I've worked out how to test
> >>> it on a big-endian kernel. Or if someone else sends a patch for that
> >>> with a good commit message that explains what's going on I'll happily
> >>> ack that.
> >>
> >> Disassemble-testing may be enough -- but I advise to dump the relevant
> >> code section with objdump -s as well as reading the code disassembly.
> >> The hex dump does not lie, whereas the disassembler does sometimes get
> >> confused in cases like this.
> >>
> >> Building and disassembling in BE8 and LE configurations will exercise
> >> the two main cases (linker swabbing versus no swabbing) -- comparing
> >> the dump of vmlinux with the dump of the .o file shows what the linker
> >> is doing.
> >
> > Thanks, that's the clue I was missing. I was looking at objdumps of .o
> > files and wasn't seeing any byte changes caused by this bug, just the
> > disassembler treating data as code. Doing a dump of the final linked
> > vmlinux I now see that exactly because the data is being treated as
> > code, the linking is swapping the byte order of the data, (because ARM
> > instructions are always little-endian and need to be fixed from their
> > big-endian representation in the .o file?).
> >
> > In another branch of this thread, Taras said he had patches that were
> > tested, so I'll wait for those rather than doing one myself. Taras
> > should get the main credit for this investigation anyway :-)
> >
>
> I've just sent patches to Ben.
>
> I was looking into gas code and found an opposite bug:
> If NOP-padding is used to align data it can be treated as data
> and saved in BE format.
> So we definitely have to explicitly specify fill value for data
> alignment in code section.
I included that in my original report to the gas folks, so hopefully
that should get fixed too.
But specifying the padding value definitely seems the way to go here.
Cheers
---Dave
Hi,
I have revised the previous power scheduler proposal[1] trying to address as
many of the comments as possible. The overall idea was discussed at LPC[2,3].
The revised design has removed the power scheduler and replaced it with a high
level power driver interface. An interface that allows the scheduler to query
the power driver for information and provide hints to guide power management
decisions in the power driver.
The power driver is going to be a unified platform power driver that can
replace cpufreq and cpuidle drivers. Generic power policies will be optional
helper functions called from the power driver. Platforms may choose to
implement their own policies as part of their power driver.
This RFC series prototypes a part of the power driver interface (cpu capacity
hints) and shows how they can be used from the scheduler. More extensive use of
the power driver hints and queries is left for later. The focus for now is the
power driver interface. The patch series includes a power driver/cpufreq
governor that can use existing cpufreq drivers as backend. It has been tested
(not thoroughly) on ARM TC2. The cpufreq governor power driver implementation
is rather horrible, but it illustrates how the power driver interface can be
used. Native power drivers is on the todo list.
The power driver interface is still missing quite a few calls to handle: Idle,
adding extra information to the sched_domain hierarchy to guide scheduling
decisions (packing), and possibly scaling of tracked load to compensate for
frequency changes and asymmetric systems (big.LITTLE).
This set is based on 3.11. I have done ARM TC2 testing based on linux-linaro
2013.08[4] to get cpufreq support for TC2.
Morten
[1] https://lkml.org/lkml/2013/7/9/314
[2] http://etherpad.osuosl.org/lpc2013-power-efficient-scheduling
[3] http://www.linuxplumbersconf.org/2013/ocw//system/presentations/1263/origin…
[4] http://git.linaro.org/gitweb?p=kernel/linux-linaro-tracking.git
Morten Rasmussen (7):
Initial power driver interface infrastructure
sched: power: Power driver late callback interface
sched: power: go_faster/slower power driver hints
sched: power: Remove power capacity hints for kworker threads
sched: power: Increase cpu capacity based on rq tracked load
sched: power: cpufreq: Initial schedpower cpufreq governor/power
driver
sched: power: Let the power driver choose the best wake-up cpu
arch/arm/Kconfig | 4 +
drivers/cpufreq/Kconfig | 11 ++
drivers/cpufreq/Makefile | 1 +
drivers/cpufreq/cpufreq_schedpower.c | 218 ++++++++++++++++++++++++++++++++++
include/linux/sched/power.h | 37 ++++++
kernel/sched/Makefile | 1 +
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 53 ++++++++-
kernel/sched/power.c | 73 ++++++++++++
kernel/sched/sched.h | 32 +++++
10 files changed, 430 insertions(+), 1 deletion(-)
create mode 100644 drivers/cpufreq/cpufreq_schedpower.c
create mode 100644 include/linux/sched/power.h
create mode 100644 kernel/sched/power.c
--
1.7.9.5
On Thu, Oct 17, 2013 at 01:17:23PM +0100, Jon Medhurst (Tixy) wrote:
> On Wed, 2013-10-16 at 01:38 +0300, Taras Kondratiuk wrote:
> > Hi
> >
> > I was debugging kprobes-test for BE8 and noticed that some data fields
> > are stored in LE instead of BE. It happens because these data fields
> > get interpreted as instructions.
> >
> > Is it a known issue?
> >
> > For example:
> > test_align_fail_data:
> > bx lr
> > .byte 0xaa
> > .align
> > .word 0x12345678
> >
> > I would expect to see something like this:
> > 00000000 <test_align_fail_data>:
> > 0: e12fff1e bx lr
> > 4: aa .byte 0xaa
> > 5: 00 .byte 0x00
> > 6: 0000 .short 0x0000
> > 8: 12345678 .word 0x12345678
> >
> > But instead I have:
> > 00000000 <test_align_fail_data>:
> > 0: e12fff1e bx lr
> > 4: aa .byte 0xaa
> > 5: 00 .byte 0x00
> > 6: 0000 .short 0x0000
> > 8: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
> >
> > As a result the word 0x12345678 will be stored in LE.
> >
> > I've run several tests and here are my observations:
> > - Double ".align" fixes the issue :)
> > - Behavior is the same for LE/BE, ARM/Thumb, GCC 4.4.1/4.6.x/4.8.2
> > - Size of alignment doesn't matter.
> > - Issue happens only if previous data is not instruction-aligned and
> > 0's are added before NOPs.
> > - Explicit filling with 0's (.align , 0) fixes the issue, but as a side
> > effect data @0x4 is interpreted as a single ".word 0xaa000000"
> > instead of ".byte .byte .short". I'm not sure if there can be any
> > functional difference because of this.
>
> After thinking about things overnight, I believe that this is the fix we
> should go with. We want to stick alignment padding between data laid
> down with .byte and .word so it makes sense to explicitly ask the
> toolchain to pad with zeros rather than leaving it the opportunity to
> get confused. (.align in the text section probably means it wants to
> align with nops, but then sees the initial alignment and/or surrounding
> statements look like binary data, not code, and then...)
I believe this workaround will work.
Looking at the gas code, if .align is given an explicit fill value, the
special NOP-padding code in gas is bypassed, and padding works just the
same as it would in a data section.
Obviously, this only works in situations where the bytes emitted by
.align are never executed by the CPU.
> I'll send a patch proposing that fix after I've worked out how to test
> it on a big-endian kernel. Or if someone else sends a patch for that
> with a good commit message that explains what's going on I'll happily
> ack that.
Disassemble-testing may be enough -- but I advise to dump the relevant
code section with objdump -s as well as reading the code disassembly.
The hex dump does not lie, whereas the disassembler does sometimes get
confused in cases like this.
Building and disassembling in BE8 and LE configurations will exercise
the two main cases (linker swabbing versus no swabbing) -- comparing
the dump of vmlinux with the dump of the .o file shows what the linker
is doing.
Cheers
---Dave
The ARM platforms take advantage of packing tasks on few cores if the latters
can be powergated independantly. We use DT and the cpu topology descirption
to define at which level a core can be independantly powergated to the others
and the SD_SHARE_POWERDOMAIN will be set accordingly at MC and CPU sched_domain
level.
The power-gate properties should be added with the value 1 in cpu and cluster
node when then can power gate independantly from the other.
As an example of a quad cores system which can power gate each core
independantly, we should have a DT similar to the example below
cpus {
#address-cells = <1>;
#size-cells = <0>;
cpu-map {
cluster0 {
power-gate = <1>;
core0 {
cpu = <&cpu0>;
power-gate = <1>;
};
core1 {
cpu = <&cpu1>;
power-gate = <1>;
};
core2 {
cpu = <&cpu2>;
power-gate = <1>;
};
core3 {
cpu = <&cpu3>;
power-gate = <1>;
};
};
};
...
};
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
---
arch/arm/include/asm/topology.h | 4 ++++
arch/arm/kernel/topology.c | 50 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 53 insertions(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index 58b8b84..5102847 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -5,12 +5,16 @@
#include <linux/cpumask.h>
+#define CPU_CORE_GATE 0x1
+#define CPU_CLUSTER_GATE 0x2
+
struct cputopo_arm {
int thread_id;
int core_id;
int socket_id;
cpumask_t thread_sibling;
cpumask_t core_sibling;
+ int flags;
};
extern struct cputopo_arm cpu_topology[NR_CPUS];
diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 85a8737..f38f1f9 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -24,6 +24,7 @@
#include <asm/cputype.h>
#include <asm/topology.h>
+#include <asm/smp_plat.h>
/*
* cpu power scale management
@@ -79,6 +80,51 @@ unsigned long *__cpu_capacity;
unsigned long middle_capacity = 1;
+static int __init get_dt_power_topology(struct device_node *topo)
+{
+ const u32 *reg;
+ int len, power = 0;
+ int flag = CPU_CORE_GATE;
+
+ for (; topo; topo = of_get_next_parent(topo)) {
+ reg = of_get_property(topo, "power-gate", &len);
+ if (reg && len == 4 && be32_to_cpup(reg))
+ power |= flag;
+ flag <<= 1;
+ }
+
+ return power;
+}
+
+#define for_each_subnode_with_property(dn, pn, prop_name) \
+ for (dn = of_find_node_with_property(pn, prop_name); dn; \
+ dn = of_find_node_with_property(dn, prop_name))
+
+static void __init init_dt_power_topology(void)
+{
+ struct device_node *cn, *topo;
+
+ /* Get power domain topology information */
+ cn = of_find_node_by_path("/cpus/cpu-map");
+ if (!cn) {
+ pr_warn("Missing cpu-map node, bailing out\n");
+ return;
+ }
+
+ for_each_subnode_with_property(topo, cn, "cpu") {
+ struct device_node *cpu;
+
+ cpu = of_parse_phandle(topo, "cpu", 0);
+ if (cpu) {
+ u32 hwid;
+
+ of_property_read_u32(cpu, "reg", &hwid);
+ cpu_topology[get_logical_index(hwid)].flags = get_dt_power_topology(topo);
+
+ }
+ }
+}
+
/*
* Iterate all CPUs' descriptor in DT and compute the efficiency
* (as per table_efficiency). Also calculate a middle efficiency
@@ -151,6 +197,8 @@ static void __init parse_dt_topology(void)
middle_capacity = ((max_capacity / 3)
>> (SCHED_POWER_SHIFT-1)) + 1;
+ /* Retrieve power topology information from DT */
+ init_dt_power_topology();
}
/*
@@ -283,7 +331,7 @@ void __init init_cpu_topology(void)
cpu_topo->socket_id = -1;
cpumask_clear(&cpu_topo->core_sibling);
cpumask_clear(&cpu_topo->thread_sibling);
-
+ cpu_topo->flags = 0;
set_power_scale(cpu, SCHED_POWER_SCALE);
}
smp_wmb();
--
1.7.9.5
The Power State and Coordination Interface (PSCI) specification defines
SYSTEM_OFF and SYSTEM_RESET functions for system poweroff and reboot.
This patchset adds:
1. Emulation of PSCI SYSTEM_OFF and SYSTEM_RESET functions in
KVM ARM/ARM64 by forwarding them to user space (QEMU or KVMTOOL)
2. System poweroff and reboot using PSCI for ARM/ARM64 kernel
Anup Patel (5):
ARM/ARM64: KVM: Update user space API header for PSCI emulation
ARM/ARM64: KVM: Forward PSCI SYSTEM_OFF and SYSTEM_RESET to user
space
KVM: Add documentation for KVM_EXIT_PSCI exit reason
ARM: psci: Add support for system reboot and poweroff
ARM64: psci: Add support for system reboot and poweroff
Documentation/virtual/kvm/api.txt | 13 +++++++++
arch/arm/include/asm/kvm_psci.h | 25 +++++++++++++++-
arch/arm/include/uapi/asm/kvm.h | 2 ++
arch/arm/kernel/psci.c | 36 +++++++++++++++++++++++
arch/arm/kvm/arm.c | 12 ++++++--
arch/arm/kvm/handle_exit.c | 12 ++++++--
arch/arm/kvm/psci.c | 57 ++++++++++++++++++++++++++++++++++---
arch/arm64/include/asm/kvm_psci.h | 25 +++++++++++++++-
arch/arm64/include/uapi/asm/kvm.h | 2 ++
arch/arm64/kernel/psci.c | 36 +++++++++++++++++++++++
arch/arm64/kvm/handle_exit.c | 24 ++++++++++++----
include/uapi/linux/kvm.h | 7 +++++
12 files changed, 233 insertions(+), 18 deletions(-)
--
1.7.9.5