== GDB ==
* Russell King now wants to revert my kernel patch that
fixed #615974; discussed alternative options.
== GCC ==
* Patch review week.
* Analyzed root cause of ICE when building Linux kernel
with mainline GCC (reported by Arnd).
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== QEmu ==
* Sent 64bit atomic helper fix upstream
* Basic boot time and simple benchmarks v Panda board
* Tested prebuilt images and Peter's latest post-merge QEmu tree
- The full Ubuntu desktop on an emulated Overo is a bit slow -
it's rather short on RAM
- The full Ubuntu desktop on an emulated VExpress isn't bad; it's
got the full 1G; (with particularly grim
line of awk to mount vexpress images based on Peter's
suggestion of the use of 'file')
== String routines ==
* Pushed memcpy and memset up to cortex-strings bzr
* Working through memset issue with Michael
- Made my code a little less sensitive to initial alignment
== Hard float ==
* Testing libffi 3.0.11rc1 - still hasn't got variadic patch in, but
hopeing it will land later in the cycle.
== Other ==
* Excavating inbox after week off.
* Build LMbench and kicked run off on Panda. (Got stuck in some
heuristics under emulation)
Dave
== This week ==
* Looked at the get_arm_condition_code ICE. Seems to be a popular bug:
was reported as #589887 #823708 and #809761 in Lauchpad and as PR49030
in bugzilla. Sent a patch upstream.
* Submitted SMS register-dependency patch upstream.
* Reviewed Bernd's new shrink-wrap patch.
* Tried to clean up my microbenchmarks. Found that preloading the caches
at the start of the benchmark fixed the variations I was seeing on a
Beagleboard. (As Dave says, it seems that there's no allocation
on write.) Added code to check the results of each loop. Packaged
it up and pushed into bzr.
* An IBM colleague kindly tried my -fsched-pressure patch/hack
on s390. Although it performed the best of the three runs
(trunk, -fsched-pressure, patch+-fsched-pressure), there were
some disappointing outliers. -fsched-pressure still introduces a 7%
regression in one test (down from a 14% regression without the patch).
Another test benefited from -fsched-pressure without my patch but
regressed with it.
== Next week ==
* Look more at SMS.
* Look more at the sched-pressure thing (if I get time).
Richard
* SPEC2K week. Experimented with building and running and finally did a full
run on the Panda board.
* Running SPEC2K on the Snowball board as well. It is troublesome to work
with the board because of the ethernet problem that makes the board freeze
after some time. I have to use the SD-card for file transfer. It seems to be
a known issue on the Snowball V3. Tested a V5 board which was much more
stable. Have asked for a V5 board from ST-E Linaro internal project manager.
(Do not know when I will get it.)
* Preliminary travel booking done for Linaro Connect in Orlando.
Best Regards
Åsa
I've tried to clean up the libav microbenchmarks that I did for the strided
load/store stuff. They're on Launchpad at:
lp:~rsandifo/+junk/loop-microbenchmarks
The main changes are that the benchmarks now preload the caches (for CPUs
that don't allocate on write) and that they now check the optimised loop
against an unoptimised one.
The usual big caveat applies: these loops were chosen because they were
affected by strided load/stores. They aren't necessarily interesting
for any other reason, and some were even explicitly marked as cold.
I'm going to add some of the video decode routines from Michael's
benchmark soon. These microbenchmarks aren't supposed to be
libav-specific though, so if you have other interesting ones,
please do add them.
Richard
Trip report: KVM Forum/LinuxCon NA 2011
KVM Forum is an annual conference; this year it was colocated with
LinuxCon NA in Vancouver. There were about 150 attendees; many of them
are simply users of KVM and so many of the talks are aimed at KVM
users. However it's also an opportunity for the KVM and QEMU developer
community to get together, with a number of informal BoF sessions and
an all-day hackathon later in the week.
The talk schedule is here, together with the slides for all talks:
http://www.linux-kvm.org/page/KVM_Forum_2011
Some brief highlights:
* Keynotes
ARM/Linaro got positive mentions in both keynotes; Avi Kivity said of
the ARM/A15 KVM work that he had "every reason to expect it to be very
successful". Anthony Liguori's keynote summarising the year in QEMU
development included some statistics about commits: Linaro came third
in the list of "companies with most commits", behind only Red Hat and
IBM; I came top of the "individual authors with most commits" list,
being apparently responsible for 7% of all QEMU patches this year :-)
* KVM on POWER/PPC
There were several talks about KVM on PPC architectures; interestingly
this is seeing use not just on the server end but also in the
embedded/realtime space (including a talk from Freescale where they
said they are working on KVM on embedded PPC because of customer
demand for KVM). It was also reassuring to see that another
architecture has preceded us in shaking out x86-isms from KVM.
* KVM Tool
This has got headlines recently as a potential replacement for the
userspace launcher/device model role which QEMU currently plays when
starting guest OSes under KVM. It's intended to be minimal and
lightweight and only to run Linux guests (with paravirtualised devices
for most purposes). The general reaction seemed to be that although
the implementation is currently minimal it will become larger and
bloatier as they add features off their wishlist. There's also some
ill-feeling about the effective namespace grab of calling the
userspace binary "kvm". From an ARM-centric point of view we can just
wait and see whether it gets much traction. Possibly it may turn into
a testbed for technology which is easier to develop on than the
'mature' QEMU which has to deal with backwards compatibility and
supporting users.
* QEMU Object Model and Device Model issues and redesign
For me one of the most important strands of conversation at the
conference was replacing QEMU's device model abstraction with
something better. QEMU's current device model abstraction is "qdev";
this is the (vaguely object-oriented) framework which lets you create
devices, configure them and connect them together. It models the world
as a tree: a root device exposes a bus, to which child devices can
connect; those child devices may expose further buses, and so on.
This works quite well in the PC world where mostly you're interested
in plugging in USB devices, PCI cards, etc; it is rather less well
matched to the embedded board models where things are much less
hierarchical. qdev's major flaws include:
+ insists on bus hierarchy, but not everything is a bus, and in any
case there are often several trees (memory transactions, clock,
interrupts) which don't necessarily coincide
+ no support for composition ("device foo is actually devices bar
and baz glued into one box")
+ just barely supports having devices expose signal (gpio/irq) lines
and memory regions (typically registers), but doesn't let you give
them useful names, so you have to access them by index number
We spent just about all of Thursday's hacking session going through
this. I felt we got good agreement on the problems, and perhaps
80-90% agreement on Anthony Liguori's proposed new QEMU Object Model
as a solution to them; some loose ends still need to be worked
through.
* LinuxCon NA
KVM Forum was colocated with LinuxCon NA this year. My opinion (which
seemed to be shared with the other Linaro attendees I talked to about it)
was that LinuxCon NA suffered from being not very technical and not
very focused -- it wasn't clear to me who they thought their target
audience was. A few points of interest:
+ Linus Torvalds' keynote was reported in some places as more
complaints about ARM hardware but I actually thought it was pretty
positive about the progress we're making in sorting out the issues
+ Matthew Garrett's talk about x86 platform drivers (those things that
deal with LEDs, funny keys, batteries and other odd laptop hardware)
revealed that actually PC hardware manufacturers do just as much
random non-standard undocumented silliness, it's just that accident
of history has limited them to only doing so in the minor bits at
the edges...
-- PMM
Hello Ulrich (or anyone else acquainted with gdb),
Could the gdb test suite be run on a kernel with the below patch applied
please? A confirmation that this patch doesn't regress gdb is required
before this can move ahead. Quick feedback would be greatly
appreciated.
Thanks.
---------- Forwarded message ----------
Date: Thu, 25 Aug 2011 15:55:58 +0100
From: Russell King - ARM Linux <linux(a)arm.linux.org.uk>
To: Tejun Heo <tj(a)kernel.org>, Arnd Bergmann <arnd(a)arndb.de>,
Mark Brown <broonie(a)opensource.wolfsonmicro.com>
Cc: Rafael J. Wysocki <rjw(a)sisk.pl>, linux-kernel(a)vger.kernel.org,
linux-arm-kernel(a)lists.infradead.org
Subject: Re: try_to_freeze() called with IRQs disabled on ARM
On Thu, Aug 25, 2011 at 03:09:07PM +0200, Tejun Heo wrote:
> Hey, Russell.
>
> If you can fix it properly without going through temporary step,
> that's awesome. Let's put the arguments behind, okay?
Here's the patch. As the kernel I've run this against doesn't have the
change to try_to_freeze(), I added a might_sleep() in do_signal() during
my testing to verify that it fixes Mark's problem (which it does.)
I've tested functions returning -ERESTARTSYS, -ERESTARTNOHAND and
-ERESTART_RESTARTBLOCK, all of which seem to behave as expected with
signals such as SIGCONT (without handler) and SIGALRM (with handler).
I haven't tested -ERESTARTNOINTR.
I don't have a test case for the race condition I mentioned (which is
admittedly pretty difficult to construct, requiring an explicit
signal, schedule, signal sequence) but this should plug that too.
How do we achieve this? Effectively the steps in this patch are:
1. Undo Arnd's fixups to the syscall restart processing (but don't worry,
we restore it in step 3).
2. Introduce TIF_SYS_RESTART, which is set when we enter signal handling
and the syscall has returned one of the restart codes. This is used
as a flag to indicate that we have some syscall restart processing to
do at some point.
3. Clear TIF_SYS_RESTART whenever ptrace is used to set the GP registers
(thereby restoring Arnd's fixup for his gdb testsuite problem - it
would be good if Arnd could reconfirm that.)
4. When we setup a user handler to run, check TIF_SYS_RESTART and clear it.
If it was set, we need to set things up to return -EINTR or restart the
syscall as appropriate. As we've cleared it, no further restart
processing will occur.
5. Once we've run all work (signal delivery, and rescheduling events), and
we're about to return to userspace, make a final check for TIF_SYS_RESTART.
If it's still set, then we're returning to userspace having not setup
any user handlers, and we need to restart the syscall. This is mostly
trivial, except for OABI restartblock which requires the user stack to
be written. We have to re-enable IRQs for this write, which means we
have to manually re-check for rescheduling events, abort the restart,
and try again later.
One of the side effects of reverting Arnd's patch is that we restore the
strace behaviour which we've had for years on ARM, and can still be seen
on x86: strace can see the -ERESTART return codes from the kernel syscalls,
rather than what seems to be the signal number:
Before:
rt_sigsuspend([] <unfinished ...>
--- SIGIO (I/O possible) ---
<... rt_sigsuspend resumed> ) = 29
sigreturn() = ? (mask now [])
vs:
rt_sigsuspend([]) = ? ERESTARTNOHAND (To be restarted)
--- SIGIO (I/O possible) @ 0 (0) ---
sigreturn() = ? (mask now [])
x86:
rt_sigsuspend([]) = ? ERESTARTNOHAND (To be restarted)
--- {si_signo=SIGIO, si_code=SI_USER} (I/O possible) ---
sigreturn() = ? (mask now [])
So, this patch should fix:
1. The race which I identified in the signal handling code (I think x86
and other architectures can suffer from it too.)
2. The warning from try_to_freeze.
3. The unanticipated change to strace output.
Arnd, can you test this to make sure your gdb test case still works, and
Mark, can you test this to make sure it fixes your problem please?
Thanks.
arch/arm/include/asm/thread_info.h | 3 +
arch/arm/kernel/entry-common.S | 11 ++
arch/arm/kernel/ptrace.c | 2 +
arch/arm/kernel/signal.c | 209 ++++++++++++++++++++++++------------
4 files changed, 155 insertions(+), 70 deletions(-)
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 7b5cc8d..40df533 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -129,6 +129,7 @@ extern void vfp_flush_hwstate(struct thread_info *);
/*
* thread information flags:
* TIF_SYSCALL_TRACE - syscall trace active
+ * TIF_SYS_RESTART - syscall restart processing
* TIF_SIGPENDING - signal pending
* TIF_NEED_RESCHED - rescheduling necessary
* TIF_NOTIFY_RESUME - callback before returning to user
@@ -139,6 +140,7 @@ extern void vfp_flush_hwstate(struct thread_info *);
#define TIF_NEED_RESCHED 1
#define TIF_NOTIFY_RESUME 2 /* callback before returning to user */
#define TIF_SYSCALL_TRACE 8
+#define TIF_SYS_RESTART 9
#define TIF_POLLING_NRFLAG 16
#define TIF_USING_IWMMXT 17
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
@@ -147,6 +149,7 @@ extern void vfp_flush_hwstate(struct thread_info *);
#define TIF_SECCOMP 21
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
+#define _TIF_SYS_RESTART (1 << TIF_SYS_RESTART)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index b2a27b6..e922b85 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -45,6 +45,7 @@ ret_fast_syscall:
fast_work_pending:
str r0, [sp, #S_R0+S_OFF]! @ returned r0
work_pending:
+ enable_irq
tst r1, #_TIF_NEED_RESCHED
bne work_resched
tst r1, #_TIF_SIGPENDING|_TIF_NOTIFY_RESUME
@@ -56,6 +57,13 @@ work_pending:
bl do_notify_resume
b ret_slow_syscall @ Check work again
+work_syscall_restart:
+ mov r0, sp @ 'regs'
+ bl syscall_restart @ process system call restart
+ teq r0, #0 @ if ret=0 -> success, so
+ beq ret_restart @ return to userspace directly
+ b ret_slow_syscall @ otherwise, we have a segfault
+
work_resched:
bl schedule
/*
@@ -69,6 +77,9 @@ ENTRY(ret_to_user_from_irq)
tst r1, #_TIF_WORK_MASK
bne work_pending
no_work_pending:
+ tst r1, #_TIF_SYS_RESTART
+ bne work_syscall_restart
+ret_restart:
#if defined(CONFIG_IRQSOFF_TRACER)
asm_trace_hardirqs_on
#endif
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 2491f3b..ac8c34e 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -177,6 +177,7 @@ put_user_reg(struct task_struct *task, int offset, long data)
if (valid_user_regs(&newregs)) {
regs->uregs[offset] = data;
+ clear_ti_thread_flag(task_thread_info(task), TIF_SYS_RESTART);
ret = 0;
}
@@ -604,6 +605,7 @@ static int gpr_set(struct task_struct *target,
return -EINVAL;
*task_pt_regs(target) = newregs;
+ clear_ti_thread_flag(task_thread_info(target), TIF_SYS_RESTART);
return 0;
}
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index 0340224..42a1521 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -649,6 +649,135 @@ handle_signal(unsigned long sig, struct k_sigaction *ka,
}
/*
+ * Syscall restarting codes
+ *
+ * -ERESTARTSYS: restart system call if no handler, or if there is a
+ * handler but it's marked SA_RESTART. Otherwise return -EINTR.
+ * -ERESTARTNOINTR: always restart system call
+ * -ERESTARTNOHAND: restart system call only if no handler, otherwise
+ * return -EINTR if invoking a user signal handler.
+ * -ERESTART_RESTARTBLOCK: call restart syscall if no handler, otherwise
+ * return -EINTR if invoking a user signal handler.
+ */
+static void setup_syscall_restart(struct pt_regs *regs)
+{
+ regs->ARM_r0 = regs->ARM_ORIG_r0;
+ regs->ARM_pc -= thumb_mode(regs) ? 2 : 4;
+}
+
+/*
+ * Depending on the signal settings we may need to revert the decision
+ * to restart the system call. But skip this if a debugger has chosen
+ * to restart at a different PC.
+ */
+static void syscall_restart_handler(struct pt_regs *regs, struct k_sigaction *ka)
+{
+ if (test_and_clear_thread_flag(TIF_SYS_RESTART)) {
+ long r0 = regs->ARM_r0;
+
+ /*
+ * By default, return -EINTR to the user process for any
+ * syscall which would otherwise be restarted.
+ */
+ regs->ARM_r0 = -EINTR;
+
+ if (r0 == -ERESTARTNOINTR ||
+ (r0 == -ERESTARTSYS && !(ka->sa.sa_flags & SA_RESTART)))
+ setup_syscall_restart(regs);
+ }
+}
+
+/*
+ * Handle syscall restarting when there is no user handler in place for
+ * a delivered signal. Rather than doing this as part of the normal
+ * signal processing, we do this on the final return to userspace, after
+ * we've finished handling signals and checking for schedule events.
+ *
+ * This avoids bad behaviour such as:
+ * - syscall returns -ERESTARTNOHAND
+ * - signal with no handler (so we set things up to restart the syscall)
+ * - schedule
+ * - signal with handler (eg, SIGALRM)
+ * - we call the handler and then restart the syscall
+ *
+ * In order to avoid races with TIF_NEED_RESCHED, IRQs must be disabled
+ * when this function is called and remain disabled until we exit to
+ * userspace.
+ */
+asmlinkage int syscall_restart(struct pt_regs *regs)
+{
+ struct thread_info *thread = current_thread_info();
+
+ clear_ti_thread_flag(thread, TIF_SYS_RESTART);
+
+ /*
+ * Restart the system call. We haven't setup a signal handler
+ * to invoke, and the regset hasn't been usurped by ptrace.
+ */
+ if (regs->ARM_r0 == -ERESTART_RESTARTBLOCK) {
+ if (thumb_mode(regs)) {
+ regs->ARM_r7 = __NR_restart_syscall - __NR_SYSCALL_BASE;
+ regs->ARM_pc -= 2;
+ } else {
+#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT)
+ regs->ARM_r7 = __NR_restart_syscall;
+ regs->ARM_pc -= 4;
+#else
+ u32 sp = regs->ARM_sp - 4;
+ u32 __user *usp = (u32 __user *)sp;
+ int ret;
+
+ /*
+ * For OABI, we need to play some extra games, because
+ * we need to write to the users stack, which we can't
+ * do reliably from IRQs-disabled context. Temporarily
+ * re-enable IRQs, perform the store, and then plug
+ * the resulting race afterwards.
+ */
+ local_irq_enable();
+ ret = put_user(regs->ARM_pc, usp);
+ local_irq_disable();
+
+ /*
+ * Plug the reschedule race - if we need to reschedule,
+ * abort the syscall restarting. We haven't modified
+ * anything other than the attempted write to the stack
+ * so we can merely retry later.
+ */
+ if (need_resched()) {
+ set_ti_thread_flag(thread, TIF_SYS_RESTART);
+ return -EINTR;
+ }
+
+ /*
+ * We failed (for some reason) to write to the stack.
+ * Terminate the task.
+ */
+ if (ret) {
+ force_sigsegv(0, current);
+ return -EFAULT;
+ }
+
+ /*
+ * Success, update the stack pointer and point the
+ * PC at the restarting code.
+ */
+ regs->ARM_sp = sp;
+ regs->ARM_pc = KERN_RESTART_CODE;
+#endif
+ }
+ } else {
+ /*
+ * Simple restart - just back up and re-execute the last
+ * instruction.
+ */
+ setup_syscall_restart(regs);
+ }
+
+ return 0;
+}
+
+/*
* Note that 'init' is a special process: it doesn't get signals it doesn't
* want to handle. Thus you cannot kill init even with a SIGKILL even by
* mistake.
@@ -659,7 +788,6 @@ handle_signal(unsigned long sig, struct k_sigaction *ka,
*/
static void do_signal(struct pt_regs *regs, int syscall)
{
- unsigned int retval = 0, continue_addr = 0, restart_addr = 0;
struct k_sigaction ka;
siginfo_t info;
int signr;
@@ -674,32 +802,16 @@ static void do_signal(struct pt_regs *regs, int syscall)
return;
/*
- * If we were from a system call, check for system call restarting...
+ * Set the SYS_RESTART flag to indicate that we have some
+ * cleanup of the restart state to perform when returning to
+ * userspace.
*/
- if (syscall) {
- continue_addr = regs->ARM_pc;
- restart_addr = continue_addr - (thumb_mode(regs) ? 2 : 4);
- retval = regs->ARM_r0;
-
- /*
- * Prepare for system call restart. We do this here so that a
- * debugger will see the already changed PSW.
- */
- switch (retval) {
- case -ERESTARTNOHAND:
- case -ERESTARTSYS:
- case -ERESTARTNOINTR:
- regs->ARM_r0 = regs->ARM_ORIG_r0;
- regs->ARM_pc = restart_addr;
- break;
- case -ERESTART_RESTARTBLOCK:
- regs->ARM_r0 = -EINTR;
- break;
- }
- }
-
- if (try_to_freeze())
- goto no_signal;
+ if (syscall &&
+ (regs->ARM_r0 == -ERESTARTSYS ||
+ regs->ARM_r0 == -ERESTARTNOINTR ||
+ regs->ARM_r0 == -ERESTARTNOHAND ||
+ regs->ARM_r0 == -ERESTART_RESTARTBLOCK))
+ set_thread_flag(TIF_SYS_RESTART);
/*
* Get the signal to deliver. When running under ptrace, at this
@@ -709,19 +821,7 @@ static void do_signal(struct pt_regs *regs, int syscall)
if (signr > 0) {
sigset_t *oldset;
- /*
- * Depending on the signal settings we may need to revert the
- * decision to restart the system call. But skip this if a
- * debugger has chosen to restart at a different PC.
- */
- if (regs->ARM_pc == restart_addr) {
- if (retval == -ERESTARTNOHAND
- || (retval == -ERESTARTSYS
- && !(ka.sa.sa_flags & SA_RESTART))) {
- regs->ARM_r0 = -EINTR;
- regs->ARM_pc = continue_addr;
- }
- }
+ syscall_restart_handler(regs, &ka);
if (test_thread_flag(TIF_RESTORE_SIGMASK))
oldset = ¤t->saved_sigmask;
@@ -740,38 +840,7 @@ static void do_signal(struct pt_regs *regs, int syscall)
return;
}
- no_signal:
if (syscall) {
- /*
- * Handle restarting a different system call. As above,
- * if a debugger has chosen to restart at a different PC,
- * ignore the restart.
- */
- if (retval == -ERESTART_RESTARTBLOCK
- && regs->ARM_pc == continue_addr) {
- if (thumb_mode(regs)) {
- regs->ARM_r7 = __NR_restart_syscall - __NR_SYSCALL_BASE;
- regs->ARM_pc -= 2;
- } else {
-#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT)
- regs->ARM_r7 = __NR_restart_syscall;
- regs->ARM_pc -= 4;
-#else
- u32 __user *usp;
-
- regs->ARM_sp -= 4;
- usp = (u32 __user *)regs->ARM_sp;
-
- if (put_user(regs->ARM_pc, usp) == 0) {
- regs->ARM_pc = KERN_RESTART_CODE;
- } else {
- regs->ARM_sp += 4;
- force_sigsegv(0, current);
- }
-#endif
- }
- }
-
/* If there's no signal to deliver, we just put the saved sigmask
* back.
*/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Hi,
the current gcc-4.6 packages build for both softfp and hard, so that the armel
and (not yet existing) armhf packages can be installed together in the system.
To enable multilib, I currently use the rather complicated arm-multilib.diff,
which works, but doesn't seem to be correct. With the much simpler arm-ml2.diff,
the directory for the default multilib is not resolved to . (as done for e.g.
amd64).
[amd64] $ gcc -print-multi-directory
.
[armel] $ gcc -print-multi-directory
sf
If I understand the code correctly, this comes from the hard setting of
MULTILIB_DEFAULTS in the arm target. If you look at mips, you see
#ifndef MULTILIB_DEFAULTS
#define MULTILIB_DEFAULTS \
{ MULTILIB_ENDIAN_DEFAULT, MULTILIB_ISA_DEFAULT, MULTILIB_ABI_DEFAULT }
#endif
which records the proper selected defaults.
Should something similiar be done for arm?
Matthias
Hi; I've just completed a tricky rebase of qemu-linaro on upstream;
there were several invasive upstream changes which have landed
recently and which meant that I had to tweak a lot of the qemu-linaro
patches as I did the rebase. I've tested the results but it's possible
that some breakage may have slipped through...
So if you're a regular user of qemu-linaro's system mode and feel
like checking the sources out of git:
git://git.linaro.org/qemu/qemu-linaro.git
building them and testing that the things you regularly do with it
haven't regressed, then I'd appreciate it, and you can help us avoid
any nasty surprises in the next (2011.09) release.
Thanks!
-- Peter Maydell
See:
http://builds.linaro.org/toolchain/gcc-4.7~svn178154
The problem is -Werror triggering on:
../../../gcc-4.7~/gcc/config/arm/arm.c: In function 'int
optimal_immediate_sequence_1(rtx_code, long long unsigned int,
four_ints*, int)':
../../../gcc-4.7~/gcc/config/arm/arm.c:2690:46: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2690:60: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2691:20: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2691:34: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2701:16: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2702:18: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2703:18: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
-- Michael
Booked hotel and travel for Linaro Connect in Orlando.
Fixed a couple of bugs in my thumb2 constants patch and retested. The
test results came back clean, so I've committed it upstream.
Bernd claimed he has found some test failures that might be caused by my
patch, but I couldn't reproduce them at first. I've now got the failure,
but I've not yet investigated the cause. Next week ...
Committed my widening multiplies patches to Linaro GCC, after first
convincing Richard Sandiford that it wasn't totally bonkers.
Started work on ARM GCC tuning options:
* Submitted a patch for -m{arch,cpu,tune}=native
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg14225.html
* Submitted a patch for -m{arch,cpu,tune}=generic-armv7-a
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg14231.html
Joseph found an issue with those patches, but that was easily resolved
and I've reposted both.
RAG:
Red:
Amber: OMAP3 patch upstreaming is (still) slower progress than hoped
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
== upstream-omap3-patches ==
* finished the fairly nasty rebase of qemu-linaro onto upstream master
(several invasive changes went into master that meant a number of
our local patches needed updating)
* omap_gpmc changes cleaned up, updated to use MemoryRegions, and
submitted to upstream (17 patch patchset)
== gsoc-support ==
* final meeting/evaluation writeup now the GSoC project has ended
* we now have a first pass at what some upstream-acceptable versions
of the Android goldfish platform devices might look like, and a
much better idea of the degree of difference between the android
and upstream qemu trees, and where the pitfalls/issues lie
== other ==
* fixed some breakage upstream in n810 and integratorcp models caused
by landing of MemoryRegion changes
* trying to write up what my preferred model of device connections
would look like in concrete C implementation terms
* interesting discussion on boot-architecture list about how boot
loaders should start hypervisor-aware software (xen, kvm kernel):
http://www.mail-archive.com/boot-architecture@lists.linaro.org/msg00053.html
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
Oct 30-Nov 04: Linaro Connect Q4.11
== This week ==
* Wrote some patches to make SMS schedule register moves. They made a
significant difference to some libav loops. I'm running a regression
test on pwoerpc-ibm-aix5.3.0 and will submit upstream next week if
all goes OK.
* Looked at why mjpegenc was so much worse with SMS. Turned out to be
a register spilling problem. Found that -fira-algorithm=priority
avoids the regression and makes several other tests better too.
(I just tested that to see whether there was a feasible register
allocation for these cases; -fira-algorithm=priority isn't the
way to go.)
* Saw that the register allocator seemed to be tripping over the
XImode "structure" values, and that we still had one vector move
per structure element by the time we get to the scheduling passes.
Eliminated those with a combination of one fix and one hack.
It seemed to avoid the allocation problems.
* Patch review (Linaro and upstream).
* Backported libgcc visibility fix to 4.6 and 4.5.
== Next week ==
* Submit register-scheduling patch.
* Submit memory cost patch (from auto-inc-dec changes)
* Possibly submit the auto-inc-dec changes themselves, depending on
how the rtx cost discussion goes.
Richard
==GCC==
===Progress===
* Looked at the vectorize_with_neon_quad failure again and decided
that I had to handle another case but not convinced that the extra
stall we'd get in this case was worth it. In any case it would have
been a workaround but Richard Sandiford fixed this by getting df to do
the right thing which would have been the right fix.
* Backported tbh patch.
* Backported conditional execution improvements patch from Jiangning
to Linaro 4.6 branch.
* Committed the LTO + Neon / Android intrinsics patch.
* Panda seems more reliable this week but I suspect that's the room
cooling more .
* Broke up a few blueprints and marked some as done.
* BRANCH_COST results show not a huge variation in SPEC and there are
some results that are inconsistent.. Need to run a few benchmarks
again Sigh :( .
* Finished the A9 scheduler patch for smull and friends and committed
upstream and into Linaro 4.6.
* Reviewed the shrink-wrapping patch and the widening multiplies patch
for a short duration.
* Looked at the failures in the "popular embedded benchmark" for
sometime with Asa.
* Tried one of the ICE patches and that seemed to work just fine with
bootstrap on FSF trunk. Need to figure out why this was breaking in
the Linaro 4.6 tree. https://bugs.launchpad.net/gcc-linaro/+bug/689887
=== Plans ===
Next Week - Holiday :) Feet not up but walking in what looks like
typical bank holiday weather ... Might check email later in the week.
Meetings:
* 1-1s
* TCWG calls
* Thumb2 performance call.
Absences.
* 29th Aug - Sept. 2 - Holiday booked and approved.
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked - hotel
to be booked.
* Investigated the errors in the automotive test and concluded that they are
CRC-errors, but not depending on the test case result (non intrusive crc
check). We decided these errors need to be cleared out once and for all.
Michael and Ramana helping out with continued investigation.
* EEMBC run on both Panda and Snowball with gcc4.5.2. Results look
reasonable, but Michael will also have a look. I will spend a little more
time comparing the results from the two boards.
* Started to run SPEC2K on the Panda board.
Best Regards
Åsa
Following on from yesterday's call about what it would take to enable
SMS by default: one of the problems I was seeing with the SMS+IV patch
was that we ended up with excessive moves. E.g. a loop such as:
void
foo (int *__restrict a, int n)
{
int i;
for (i = 0; i < n; i += 2)
a[i] = a[i] * a[i + 1];
}
would end up being scheduled with an ii of 3, which means that in the
ideal case, each loop iteration would take 3 cycles. However, we then
added ~8 register moves to the loop in order to satisfy dependencies.
Obviously those 8 moves add considerably to the iteration time.
I played around with a heuristic to see whether there were enough
free slots in the original schedule to accomodate the moves.
That avoided the problem, but it was a hack: the moves weren't
actually scheduled in those slots. (In current trunk, the moves
generated for an instruction are inserted immediately before that
instruction.)
I mentioned this to Revital, who told me that Mustafa Hagog had
tried a more complete approach that really did schedule the moves.
That patch was quite old, so I ended up reimplementing the same kind
of idea in a slightly different way. (The main functional changes
from Mustafa's version were to schedule from the end of the window
rather than the start, and to use a cyclic window. E.g. moves for
an instruction in row 0 column 0 should be scheduled starting at
row ii-1 downwards.)
The effect on my flawed libav microbenchmarks was much greater
than I imagined. I used the options:
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad
-fmodulo-sched -fmodulo-sched-allow-regmoves -fno-auto-inc-dec
The "before" code was from trunk, the "after" code was trunk + the
register scheduling patch alone (not the IV patch). Only the tests
that have different "before" and "after" code are run. The results were:
a3dec
before: 500000 runs take 4.68384s
after: 500000 runs take 4.61395s
speedup: x1.02
aes
before: 500000 runs take 20.0523s
after: 500000 runs take 16.9722s
speedup: x1.18
avs
before: 1000000 runs take 15.4698s
after: 1000000 runs take 2.23676s
speedup: x6.92
dxa
before: 2000000 runs take 18.5848s
after: 2000000 runs take 4.40607s
speedup: x4.22
mjpegenc
before: 500000 runs take 28.6987s
after: 500000 runs take 7.31342s
speedup: x3.92
resample
before: 1000000 runs take 10.418s
after: 1000000 runs take 1.91016s
speedup: x5.45
rgb2rgb-rgb24tobgr16
before: 1000000 runs take 1.60513s
after: 1000000 runs take 1.15643s
speedup: x1.39
rgb2rgb-yv12touyvy
before: 1500000 runs take 3.50122s
after: 1500000 runs take 3.49887s
speedup: x1
twinvq
before: 500000 runs take 0.452423s
after: 500000 runs take 0.452454s
speedup: x1
Taking resample as an example: before the patch we had an ii of 27,
stage count of 6, and 12 vector moves. Vector moves can't be dual
issued, and there was only one free slot, so even in theory, this loop
takes 27 + 12 - 1 = 38 cycles. Unfortunately, there were so many new
registers that we spilled quite a few.
After the patch we have an ii of 28, a stage count of 3, and no moves,
so in theory, one iteration should take 28 cycles. We also don't spill.
So I think the difference really is genuine. (The large difference
in moves between ii=27 and ii=28 is because in the ii=27 schedule,
a lot of A--(T,N,0)-->B (intra-cycle true) dependencies were scheduled
with time(B) == time(A) + ii + 1.)
I also saw benefits in one test in a "real" benchmark, which I can't
post here.
Richard
Hello,
Following today performance call
(https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2011-08-23)
here are some points raised regarding the steps towards enabling SMS by default:
* Benchmarks testing:
-- Running benchmarks as EEMBC and SPEC2006 with SMS enabled is
crucial to expose loops where SMS degrades the performance. those
loops need to be analysed to construct a cost model.
-- SMS increases code size by introducing prologue and epilogue to the
loop kernel. This should also be measured.
-- Measure increase in compile time: on native or cross build?
Currently SMS fails to bootstrap trunk on ARM machine. this should
also be taken into account when considering enabling it by default.
Should it be turned on with -O2 or -O3?
SMS flags to use for testing:
-O3 -fmodulo-sched-allow-regmoves -fmodulo-sched
-funsafe-loop-optimizations -fno-auto-inc-dec
Thanks,
Revital
Hi
Some time ago we agreed that not everyone here uses Ubuntu distribution
and decided to provide so called 'generic linux' cross toolchain.
Recently I managed to get it done and now need brave testers to tell is
it working or not.
Get it here: http://people.linaro.org/~hrw/generic-linux/ (64bit only)
Needed files are toolchain-11.07.tar.xz and init.sh script. Unpack
tarball from / so /opt/linaro/11.07/ will be populated and put init.sh
anywhere you want (it will be integrated into tarball later).
How to use:
$ source init.sh
this will add cross toolchain into PATH and also set LD_LIBRARY_PATH to
two directories:
- one with binutils libraries
- second with all extra libraries which may be needed
Feel free to experiment with second dir by removing files from there and
checking are system provided libs are fine too.
So far I checked this toolchain under few distributions:
- Ubuntu 10.04 'lucid' LTS
- Ubuntu 11.04 'natty'
- Fedora 14
- OpenSUSE 11.4
- CentOS 5.6
It failed only under CentOS (which was expected due to it's age).
How did I checked? So far compilation of 'gpm' and 'zlib' were tested.
==GCC==
===Progress===
* Continue to look at the test failure with mvectorize-with-neon-quad.
Should be able to commit the backend workaround in on Monday .
* Having some problems getting my panda board working reliably. I'm
not sure if its the temperature or what but when it gets hot in the
office as it was on Tuesday keeping it working reliably is hard. The
board locks up and then crashes quite often.
* Looked at VFP moves again for some more time.
* Committed tbh range change.
* Committed fixes for PR50022
=== Plans ===
* Finish off VFP moves patch.
* Look at BRANCH_COST results.
* Breakdown the T2 performance blueprints into smaller blueprints.
* Backport tbh range changes to Linaro 4.6
* Test the intrinsics patch once with some more intrinsics tests and
then merge it in to Linaro gcc 4.6
Meetings:
* 1-1s
* TCWG calls
Absences.
* 29th Aug - Sept. 2 - Holiday booked and approved.
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked - hotel
to be booked.
Hi all,
I'm having real trouble here :(
I just can't seem to get bzr to work! I've tried to branch
gcc-linaro/4.6 again and again, and it just won't. My other machine
refuses to do the merge from lp:gcc/4.6, presumable because the bzr on
there is too old.
I'm stuck. Can anybody else do the merge from upstream?
I'm going to keep trying.
Andrew