Continued work on my constant reuse optimizations. Not too much this
week though. I've now fixed some issues with the ARM size-costs code
that was causing it to wildly over-estimate the cost of a MOVT
instruction. I'll have to post this upstream sometime soon.
Took another look at the shift-amount bug. Discussed the issue with Paul
Brook. I've now fixed the original bug, and fixed the new bug introduced
by Paul's original fix, and committed that upstream. I still need to
backport it to Linaro GCC though, and the latent bug that Richard S
spotted is still being analysed.
Did a merge from FSF 4.5 & 4.6 to Linaro, and pushed them the Launchpad
branches for testing.
Begun work benchmarking different setups for the generic tuning patches.
I had a lot of trouble trying to set up SPEC2000 though. Hopefully these
issues are now resolved, with some help from Michael, and I have
established some baseline figures on both A8 and A9 to work from.
No progress on native tuning. I'm still waiting for upstream review.
In other news: Mentor's contract with Linaro has now been extended for
another 6 months. :)
== String Routines ==
* Built and tested a newlib with my memchr in - ready to go with a
bit of tidy up.
* Followed up on my eglibc patch submission by a comment suggesting
the use of --with-cpu pointing back at the previous discussion.
== 64 Bit atomics ==
* Updated gcc patch based on Ramana's comments, retested and posted
new version
- Lost half a day to a failing SD card in our panda.
== QEMU ==
* Posted a patch that made one variable thread local using __thread
that fixes multi threaded user mode ARM programs (e.g. firefox); this
seems to have mutated on the list into a patch for more general thread
local support.
Dave
== GDB ==
* Reimplemented patch to disable address space randomization
in gdbserver to respect the "set disable-randomization"
command, and checked it in to mainline and Linaro GDB 7.3
* Worked on support for cross-platform core file generation.
== GCC ==
* Checked in mainline fix for PR 50305.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
- worked on the RTL part of the widen-shift patch
- backported to linaro 2/3 of the SLP patches, and proposed the third one
- worked on additional SLP improvements:
- swap operands to make statements isomorphic
- support load with offset 1 (after load from 0)
- started working on presentation for NEON forum
Upcoming holidays:
Oct 12, Wed - half day
Oct 13, Thu
Oct 16-19, Sun-Wed - half day
Oct 20, Thu
Ira
Hi,
So one of the things Michael pointed out in today's call was that the
ARM backend doesn't generate vcvt.f32.s<type> where you have an idiom
conversion from fixed to floating point as in the example below. I've
chosen to implement this in the following manner in the backend using
these interfaces from real.c . The reason I've chosen to not allow
this transformation in case flag_rounding_math is true is because this
instruction always ends up rounding using round-to-nearest rather than
obeying whats in the FPSCR and thus is not safe for programs that want
to dynamically set their rounding modes.
The benefits are quite obvious in that we eliminate a load from the
constant pool and a floating point multiply and thus essentially
shaving off a floating point multiply + Load latency off these
sequences. This instruction can only write the output into the same
register as the input register which is why I've modelled it as below
by tying op1 into op0.
If there's a simpler way of using the interfaces into real.c then I'm all ears ?
Thoughts ? I believe such idioms are used in libav from where the
original report appears to have come and thus it's a worthwhile gain
where we can have it. Any other places where folks might have noticed
this.
I will post upstream as well once I finish testing this patch. I'm
posting this here to get some feedback as well to let anyone who is
really really keen about trying this out have a go given I'm out
tomorrow.
( I took a quick look at the short -> f32 case as well but the fact
remains that loads either zero or sign extend anyway so there's
probably not much gain in modelling that right away and the win really
is in getting rid of that fp mul and the constant pool load. There's
probably some gain in going from i64-> f64 as well so those patterns
need to be written up at some point for completeness )
cheers
Ramana
2011-10-04 Ramana Radhakrishnan <ramana.radhakrishnan(a)linaro.org>
* config/arm/arm.c (vfp3_const_double_for_fract_bits): Define.
* config/arm/arm-protos.h (vfp3_const_double_for_fract_bits): Declare.
* config/arm/constraints.md ("Dt"): New constraint.
* config/arm/predicates.md (const_double_vcvt_power_of_two_reciprocal):
New.
* config/arm/vfp.md (*arm_combine_vcvt_f32_s32): New.
(*arm_combine_vcvt_f32_u32): New.
For the following testcases I see the code as follows with
-mfloat-abi=hard -mfpu=vfpv3 and -mcpu=cortex-a9
float foo (int i)
{
float v = (float)i / (1 << 11);
return v;
}
float foa_unsigned (unsigned int i)
{
float v = (float)i / (1 << 5);
return v;
}
After patch .
foo:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsr s0, r0 @ int
vcvt.f32.s32 s0, s0, #11
bx lr
.size foo, .-foo
.align 2
.global foa_unsigned
.type foa_unsigned, %function
foa_unsigned:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsr s0, r0 @ int
vcvt.f32.u32 s0, s0, #5
bx lr
.size foa_unsigned, .-foa_unsigned
.align 2
.global foo1
.type foo1, %function
rather than
.type foo, %function
foo:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsr s15, r0 @ int
fsitos s0, s15
flds s15, .L2
fmuls s0, s0, s15
bx lr
.L3:
.align 2
.L2:
.word 973078528
.size foo, .-foo
.align 2
.global foa_unsigned
.type foa_unsigned, %function
foa_unsigned:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsr s15, r0 @ int
fuitos s0, s15
flds s15, .L5
fmuls s0, s0, s15
bx lr
.L6:
.align 2
.L5:
.word 1023410176
* Vacation
Monday, Tuesday, and Wednesday.
* GCC
Continued work on my constant reuse optimizations. Disappointingly, I've
found that there are very few optimization opportunities in EEMBC
(ARM/Thumb V7-A), although it's not difficult to write testcases that
the optimization could improve. I also discovered that the data-flow
chains don't work exactly how I thought (with respect to if-then-else
cases) so I need to do a little more work.
Pinged the native tuning patches; they're still waiting for upstream review.
Still can't get the generic tuning work done as the CodeSourcery panda
boards appear to be still offline.
Committed to mainline the patch to support instructions with auto-inc
operations in SMS after addressing Ayal's comments. The patch contains
two parts; one of them fixes a bug revealed during bootstrapping with
the patch and SMS flags.
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01988.htmlhttp://gcc.gnu.org/ml/gcc-patches/2011-09/msg01987.html
Looking at estimating register pressure with SMS: based on previous
discussion with Richard the current approach is to try and use the
register pressure estimation in loop invariant pass.
I'm compiling an application built with TI's DVSDK 3 *[0].
/home/user/ti/dvsdk/dvsdk_3_01_00_10/linuxutils_2_25_02_08/packages/ti/sdo/linuxutils/cmem/lib/cmem.a470MV(cmem.o470MV):(.ARM.exidx+0x0):
undefined reference to `__aeabi_unwind_cpp_pr0'
arm-linux-gnueabi-gcc --version
arm-linux-gnueabi-gcc (Ubuntu/Linaro 4.5.2-5ubuntu2~ppa1) 4.5.2
arm-linux-gnueabi-ld --version
GNU ld (GNU Binutils for Ubuntu) 2.21.0.20110302
More full output is here (but it isn't particularly helpful due to TI's RTSC
make system's black-magic)
https://gist.github.com/925674
FYI: the MV in cmem.a470MV stands for MontaVista.
This name is hard-coded somewhere even though it's not being linked against
a MontaVista system.
I believe the 470 means that it should work with ARMv4 through ARMv7, but
I'm not positive.
My googling suggest that this is a toolchain bug and that the best way
around the issue is to create a file which defines the function as a void
dummy and include it.
http://www.codesourcery.com/archives/arm-gnu/msg03604.htmlhttp://comments.gmane.org/gmane.comp.boot-loaders.u-boot/78649http://www.cs.fsu.edu/~baker/devices/lxr/http/ident?i=__aeabi_unwind_cpp_pr0
I have a script that I'll post shortly with instructions as to how to setup
TI's DVSDK with Linaro
AJ ONeal
[0] I'm not using the latest DVSDK version 4 because the paths and such are
so hard-coded for the 2009q3 version of codesourcery on ubuntu 10.04 LTS
that I don't know where to start fixing it.
===Progress===
* Patch review week.
* Looked at bootstrap issue for a while but Richard Sandiford picked
it up and sorted it out (Thanks Richard).
* Fun and games with some paperwork.
* Some backporting and testing patches. (50099 and 50186) underway.
=== Plans ===
* Clear out some of the old patches (POST_MODIFY_DISP for vfp,
BRANCH_COST ) and finish on auto-inc-dec patch from last week.
* Away for 1 day next week.
Meetings:
* 1-1s
* TCWG calls
Absences.
* 5th October -Out of Office.
* 13th - 14th October - Internal training.
* 31st Oct - 4th Nov - Linaro Summit Orlando.
* 08 Nov - 11 Nov - Tentatively booked
* Dec 19 - 31st Dec - Tentatively booked