The Linaro Toolchain Working Group is pleased to announce the availability
of the Linaro Stable Binary Toolchain GCC 5.2-2015.11 Release Archives.
http://releases.linaro.org/components/toolchain/binaries/5.2-2015.11/http://releases.linaro.org/components/toolchain/gcc-linaro/5.2-2015.11/
These archives provide cross-toolchain executables (compiler, debugger,
linker, etc.) and shared libraries (libstdc++, libc, etc.) that target ARM
or Aarch64 GNU/Linux and bare-metal environments. The cross-toolchain
binaries execute on a Linux or MS Windows (under mingw32) host
operating-system.
Please evaluate this release-candidate for correctness. Linaro will
shortly spin the Linaro GCC 5.2-2015.11 release if this release-candidate
passes stakeholder validation.
For bugs related to this release-candidate please email
linaro-toolchain(a)lists.linaro.org or file a bug at
https://bugs.linaro.org/enter_bug.cgi?product=Linux%20Binary%20toolchain
NEWS
* GCC 5.2 2015.11
The Linaro GCC 5.2 2015.11 binary toolchain release is built from the
Linaro GCC-5.2-2015.11 release source archive. The Linaro GCC-5.2-2015.11
release source archive is derived from the same sources as the Linaro
GCC-5.2-2015.10 snapshot source archive.
* GCC 5.2 2015.11-rc1
The Linaro GCC 5.2 2015.11-rc1 binary toolchain release-candidate is built
from the Linaro GCC-5.2-2015.11 release-candidate source archive. The
Linaro GCC-5.2-2015.11-rc1 release-candidate source archive is derived from
the same sources as the Linaro GCC-5.2-2015.10 snapshot source archive.
--
Ryan S. Arnold
Linaro Toolchain Working Group - Engineering Manager
www.linaro.org
Hello,
I am trying to create gcc 4.9.x toolchains for ARM v7 and v8 based on Linaro's sources. At first Linaro's 4.9-2015.05 binary release looked suitable, but then one of my colleagues noticed that that it had an incompatibility with Red Hat Enterprise Linux 6. Linaro has decided not to fix this incompatibility (see https://bugs.linaro.org/show_bug.cgi?id=1869 ).
So, I tried to work around that bug by rebuilding the toolchains myself on RHEL6 using Linaro's new ABE script. I initially tried to recreate the builds by using ABE's --manifest <manifest_file> command line option. I experienced problems with that, though, including it building gcc version 6.x instead of 4.9.x. I eventually gave up on that approach. Instead, I extracted the required branches and revisions from the manifest files and put them into ABE command line options, like this:
abe.sh --target aarch64-elf --build all --parallel --dump --tarball --release fsl-2015.11.16 --set libc=newlib binutils=binutils-gdb.git~linaro_binutils-2_24-branch@a93e252ee5250dba831e54f98336b40c7210dac7 gcc=gcc-linaro-4.9-2015.05 gmp=5.1.3 gdb=binutils-gdb.git~gdb-7.10-branch@ef5fa52ac9ab68b505b52acb2d2068b366ba8bf2 mpfr=3.1.2 mpc=1.0.1 newlib=newlib.git~linaro_newlib-branch@136b66e404df41435bdec4630c0787b0bc7e7580
abe.sh --target aarch64-linux-gnu --build all --parallel --dump --tarball --release fsl-2015.11.16 --set libc=glibc binutils=binutils-gdb.git~linaro_binutils-2_24-branch@a93e252ee5250dba831e54f98336b40c7210dac7 gcc=gcc-linaro-4.9-2015.05 gmp=5.1.3 gdb=binutils-gdb.git~gdb-7.10-branch@ef5fa52ac9ab68b505b52acb2d2068b366ba8bf2 mpfr=3.1.2 mpc=1.0.1 glibc=glibc-linaro-2.20-2014.11.tar.xz
abe.sh --target arm-eabi --build all --parallel --dump --tarball --release fsl-2015.11.16 --set libc=newlib binutils=binutils-gdb.git~linaro_binutils-2_24-branch@a93e252ee5250dba831e54f98336b40c7210dac7 gcc=gcc-linaro-4.9-2015.05 gmp=5.1.3 gdb=binutils-gdb.git~gdb-7.10-branch@ef5fa52ac9ab68b505b52acb2d2068b366ba8bf2 mpfr=3.1.2 mpc=1.0.1 newlib=newlib.git~linaro_newlib-branch@136b66e404df41435bdec4630c0787b0bc7e7580
abe.sh --target arm-linux-gnueabihf --build all --parallel --dump --tarball --release fsl-2015.11.16 --set libc=glibc binutils=binutils-gdb.git~linaro_binutils-2_24-branch@a93e252ee5250dba831e54f98336b40c7210dac7 gcc=gcc-linaro-4.9-2015.05 gmp=5.1.3 gdb=binutils-gdb.git~gdb-7.10-branch@ef5fa52ac9ab68b505b52acb2d2068b366ba8bf2 mpfr=3.1.2 mpc=1.0.1 glibc=glibc-linaro-2.20-2014.11.tar.xz
That worked, and the resulting toolchains ran without error under RHEL6. Note that I deliberately chose to switch to glibc in the *-linux-* toolchains, whereas the manifest files had them using eglibc.
At least one serious problem remained. The toolchains supported different multilibs than previous releases. For example, arm-eabi-gcc reported that it supported only three sets of libraries:
$ arm-eabi-gcc -print-multi-lib
.;
thumb;@mthumb
fpu;@mfloat-abi=hard
Linaro's 2015.05 build of the toolchain gives the same output. However, previous releases of this toolchain supported a much larger set of multilibs. A build from 2014.08 reports:
$ arm-none-eabi-gcc --print-multi-lib
.;
thumb;@mthumb
v7-a;@march=armv7-a
v7ve;@march=armv7ve
v8-a;@march=armv8-a
v7-a/fpv3/softfp;@march=armv7-a@mfpu=vfpv3-d16@mfloat-abi=softfp
v7-a/fpv3/hard;@march=armv7-a@mfpu=vfpv3-d16@mfloat-abi=hard
v7-a/simdv1/softfp;@march=armv7-a@mfpu=neon@mfloat-abi=softfp
v7-a/simdv1/hard;@march=armv7-a@mfpu=neon@mfloat-abi=hard
v7ve/fpv4/softfp;@march=armv7ve@mfpu=vfpv4-d16@mfloat-abi=softfp
v7ve/fpv4/hard;@march=armv7ve@mfpu=vfpv4-d16@mfloat-abi=hard
v7ve/simdvfpv4/softfp;@march=armv7ve@mfpu=neon-vfpv4@mfloat-abi=softfp
v7ve/simdvfpv4/hard;@march=armv7ve@mfpu=neon-vfpv4@mfloat-abi=hard
v8-a/simdv8/softfp;@march=armv8-a@mfpu=neon-fp-armv8@mfloat-abi=softfp
v8-a/simdv8/hard;@march=armv8-a@mfpu=neon-fp-armv8@mfloat-abi=hard
thumb/v7-a;@mthumb@march=armv7-a
thumb/v7ve;@mthumb@march=armv7ve
thumb/v8-a;@mthumb@march=armv8-a
thumb/v7-a/fpv3/softfp;@mthumb@march=armv7-a@mfpu=vfpv3-d16@mfloat-abi=softfp
thumb/v7-a/fpv3/hard;@mthumb@march=armv7-a@mfpu=vfpv3-d16@mfloat-abi=hard
thumb/v7-a/simdv1/softfp;@mthumb@march=armv7-a@mfpu=neon@mfloat-abi=softfp
thumb/v7-a/simdv1/hard;@mthumb@march=armv7-a@mfpu=neon@mfloat-abi=hard
thumb/v7ve/fpv4/softfp;@mthumb@march=armv7ve@mfpu=vfpv4-d16@mfloat-abi=softfp
thumb/v7ve/fpv4/hard;@mthumb@march=armv7ve@mfpu=vfpv4-d16@mfloat-abi=hard
thumb/v7ve/simdvfpv4/softfp;@mthumb@march=armv7ve@mfpu=neon-vfpv4@mfloat-abi=softfp
thumb/v7ve/simdvfpv4/hard;@mthumb@march=armv7ve@mfpu=neon-vfpv4@mfloat-abi=hard
thumb/v8-a/simdv8/softfp;@mthumb@march=armv8-a@mfpu=neon-fp-armv8@mfloat-abi=softfp
thumb/v8-a/simdv8/hard;@mthumb@march=armv8-a@mfpu=neon-fp-armv8@mfloat-abi=hard
I found that the file that encodes this older set of multilib mappings is gcc-linaro-4.9-2015.05/gcc/config/arm/t-aprofile. Based on some comments in gcc-linaro-4.9-2015.05/gcc/config.gcc, I guessed that ABE should have configured gcc with "--with-multilib-list=aprofile", and without "--with-arch=armv7-a" or "--with-fpu=vfpv3-d16". I quickly hacked these changes into abe/config/gcc.conf like this:
diff --git a/config/gcc.conf b/config/gcc.conf
index 19c44ca..4cc5eaf
--- a/config/gcc.conf
+++ b/config/gcc.conf
@@ -111,9 +111,9 @@ if test x"${build}" != x"${target}"; then
default_configure_flags="${default_configure_flags} --with-tune=cortex-a9"
fi
if test x"${override_arch}" = x -a x"${override_cpu}" = x; then
- default_configure_flags="${default_configure_flags} --with-arch=armv7-a"
+ default_configure_flags="${default_configure_flags}"
fi
- default_configure_flags="${default_configure_flags} --enable-threads=no --with-fpu=vfpv3-d16 --enable-multilib --disable-multiarch"
+ default_configure_flags="${default_configure_flags} --enable-threads=no --with-multilib-list=aprofile --enable-multilib --disable-multiarch"
languages="c,c++,lto"
;;
aarch64*-*elf)
After rebuilding the toolchain, I found it had the desired older set of multilibs.
I hope that this mail will help anyone who experiences similar problems. I have filed a bug report for the multilib issue. See https://bugs.linaro.org/show_bug.cgi?id=1920 .
While validating the toolchains, dejagnu reports a few unexpected failures. Does the TCWG publish their validation results anywhere for comparison? That would be very helpful.
Thanks,
Fred Peterson
Freescale Developer Tools
== Progress ==
o Valfidation and Infra (2/10)
* Some fixes in our release script
* look at refactoring our publishing snapshot job
o Linaro GCC (4/10)
* Start backports for 2015.12
* Tracking dependencies
o Upstream work (1/10)
* Continue on sanitizing gfortran testsuite
o Misc (3/10)
* Various meetings
* Internal support
== Plan ==
o Continue on-going tasks
Controlled image builds - TCWG-360 [2/10]
* A few more test/debug cycles with ci-loop-built image
Jenkins benchmarking job - TCWG-348 [3/10]
* YAML-ised Jenkins job, more test/debug cycles
Juno crashdump [1/10]
* Got a usable dump (via alt-sysrq-c) with latest patches plus some fiddling
SPEC-on-Android [1/10]
* Looked at Qian's work to date, didn't come up with any bright ideas
Misc [3/10]
=Plan=
Review security with shared uinstance/main instance code
Expose more data, benchmarks to bundles
Continue debug/test of Jenkins job
Create bootable image for at least 1 target, or know what the problems are
Write up noise control report (if time)
Set Juno off, try to get a dump of my crash
Probably more support for SPEC-on-Android
=Absences=
'ARM Day' next Monday (30th)
== This week ==
* TCWG-317 - Exploit wide add operations when appropriate for Aarch32 (5/10)
- Blocked as I have not yet determined why the pattern fails on big
endian targets
* TCWG-369 - Exploit wide add operations when appropriate for Aarch64 (1/10)
- Modified code based on minor code style comments
* TCWG-316 - Exploit vector multiply by scalar instructions (3/10)
- Discovered relevant previous RFC:
https://gcc.gnu.org/ml/gcc/2013-09/msg00061.html
- Coded subset of vector patterns
- Debugging combine phase to determine why patterns are not
being utilized
* Misc (1/10)
- Conference calls
== Next week ==
- TCWG-369 - Submit modified patch upstream for final approval
- TCWG-316 - Determine if rtl patterns can be used by combine
- TCWG-317 - Need feedback
- USA Thanksgiving Holidays (November 26-27)
== This Week ==
* TCWG-72 (2/10)
- Rebased patch
- Fixed ICE for x86-gcc with -m32 following Jim's suggestions.
* Target hook conversion (6/10)
- Converted ASM_FORMAT_PRIVATE_NAME, ASM_LABEL_OUTPUT_LABEL,
ASM_OUTPUT_LABELREF to hook
* TCWG-319 (1/10)
- Benchmark jobs for fp in progress on a53, a57.
* Misc (1/10)
- Meetings
== Next Week ==
- Test and send updated patch upstream for tcwg-72
- TCWG-319 benchmarking on cortex-a15
- Holidays from 23-25th November (Mon-Wed).
# Progress #
* TCWG-332, done. [1/10]
Fix GDB bug on stepping over breakpoint on ARM. Patch is pushed in.
* TCWG-423, patches are posted. [5/10].
Support gnu vector in inferior call in AArch64 GDB.
Also correctly handle HVA (homogeneous vector aggregate) in inferior
call.
* TCWG-433, done. [2/10]
All memory issues found by -fsanitize=address in GDB are fixed.
* TCWG-447, done. [1/10]
Fix GDB mainline build warnings and errors in C++ mode on ARM and
AArch64.
* Discussion on the approach of building GDB in C++. Need to test GDB
built in C++ on both ARM and AArch64, from my side. [1/10]
# Plan #
* Understand ST's jtag probe and help them to make use of multi-arch
in GDB.
* Fix GDB internal error in gdb.thread/watchpoint-fork.exp on AArch64.
* TCWG-156, GDB test parity between AArch64 and X86_64.
--
Yao
== Progress ==
* Validation (6/10)
- a few improvements in the validations using the ST compute farm
- thinking about appropriate ways of sharing validation
reports with the GCC community without flooding gcc-testresults
- moved results comparison scripts to a dedicated repo
and updated Jenkins jobs accordingly
* GCC (1/10)
- bug #1869 / glibc dependency on RHEL6
proof of concept to force use of old memcpy
but it will be much safer to build the toolchain
in a suitable container with the right distro
* Misc (conf calls, meetings, emails, ...) (3/10)
- patches and backports reviews
== Next ==
* Validation
- continue preparation of switch, as dev-01 is now back
- improve reporting
* GCC:
- check Neon tests cleanup
- bug #1869
- look at how to send valuable reports to gcc-regression
Hi,
We're currently running into issues with the OE builds due to OE-core
having moved to 2.22. So what's the plan for glibc-linaro 2.22?
--
Koen Kooi
Builds and Baselines | Release Manager
Linaro.org | Open source software for ARM SoCs
Hi,
This question has arisen in the ODP project and the thought is that a 'best
practices' answer would be more likely to be found on this list.
We have a component that wants to make use of specialized instructions for
performing CRC and/or AES computations and was wondering what is the
recommended way for an application to determine whether such instructions
are available in the toolchain and whether the user has overruled their use?
Thanks for any insight you can provide.
Bill
I think there are many issues with binary compatibility beyond
function inlining. An ODP application cannot expect all ODP
implementations to support the same number of ODP queues or
classification rules or even which classification terms (fields) are
supported (efficiently/in HW) etc. Is there some kind of lowest common
denominator an application should expect? Do we want to make
guarantees of an ODP implementation stricter? What are the
consequences of such strict functional guarantees?
I think an application that requires binary compatibility over ARMv8.1
platforms should compile and link against a specific ODP SW
implementation (possibly with some well-defined HW offloads where the
underlying platform can provide the relevant drivers). I.e. more of a
(user-space) Linux architecture than standard ODP (as influenced by
OpenGL). The important binary interfaces then becomes the interfaces
to these offloads/drivers.
On 16 November 2015 at 14:23, Nicolas Morey-Chaisemartin
<nmorey(a)kalray.eu> wrote:
>
>
> On 11/11/2015 09:45 AM, Savolainen, Petri (Nokia - FI/Espoo) wrote:
>>
>>> -----Original Message-----
>>> From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On Behalf Of
>>> EXT Nicolas Morey-Chaisemartin
>>> Sent: Tuesday, November 10, 2015 5:13 PM
>>> To: Zoltan Kiss; linaro-toolchain(a)lists.linaro.org
>>> Cc: lng-odp
>>> Subject: Re: [lng-odp] Runtime inlining
>>>
>>> As I said in the call last week, the problem is wider than that.
>>>
>>> ODP specifies a lot of types but not their sizes, a lot of
>>> enums/defines (things like ODP_PKTIO_INVALID) but not their value
>>> either.
>>> For our port a lot of those values were changed for
>>> performance/implementation reason. So I'm not even compatible between
>>> one version of our ODP port and another one.
>>>
>>> The only way I can see to solve this is for ODP to fix the size of all
>>> these types.
>>> Default/Invalid values are not that easy, as a pointer would have a
>>> completely different behaviour from structs/bitfields
>>>
>>> Nicolas
>>>
>> Type sizes do not need to be fixed in general, but only when an application is build for binary compatibility (the use case we are talking here). Binary compatibility and thus the fixed type sizes are defined per ISA.
>>
>> We can e.g. define a configure target (for our reference implementation == linux-generic) "--binary-compatible=armv8.x" or "--binary-compatible=x86_64". When you build your application with that option, "platform dependent" types and constants would be fixed to pre-defined values specified in (new) ODP API arch files.
>>
>> So instead of building against odp/platform/linux-generic/include/odp/plat/queue_types.h ...
>>
>> typedef ODP_HANDLE_T(odp_queue_t);
>> #define ODP_QUEUE_INVALID _odp_cast_scalar(odp_queue_t, 0)
>> #define ODP_QUEUE_NAME_LEN 32
>>
>>
>> ... you'd build against odp/arch/armv8.x/include/odp/queue_types.h ...
>>
>> typedef uintptr_t odp_queue_t;
>> #define ODP_QUEUE_INVALID ((uintptr_t)0)
>> #define ODP_QUEUE_NAME_LEN 64
>>
>>
>> ... or odp/arch/x86_64/include/odp/queue_types.h
>>
>> typedef uint64_t odp_queue_t;
>> #define ODP_QUEUE_INVALID ((uint64_t)0xffffffffffffffff)
>> #define ODP_QUEUE_NAME_LEN 32
>>
>>
>> For highest performance on a fixed target platform, you'd still build against the platform directly
>>
>> odp/platform/<soc_vendor_xyz>/include/odp/plat/queue_types.h
>>
>> typedef xyz_queue_desc_t * odp_queue_t;
>> #define ODP_QUEUE_INVALID ((xyz_queue_desc_t *)0xdeadbeef)
>> #define ODP_QUEUE_NAME_LEN 20
>>
>>
>> -Petri
>>
>
> It still means that you need to enforce a type for all ODP implementation on a given arch. Which could be problematic.
> As a precise example: the way handles are used now for odp_packet_t brings some useful features for checks and memory savings, but performance wise, they are a "disaster". One of the first thing I did was to switch them to pointers. And if I wanted a high perf linux x86_64 implementation, I'd probably do the same.
>
> Nicolas
> _______________________________________________
> lng-odp mailing list
> lng-odp(a)lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
== Progress ==
LLDB development
-- Root Google Nexus devices and read debug module configuration with
kernel module [TCWG-429] [7/10]
-- Figure out steps to unlock and root Nexus S
-- Figure out steps to build kernel and kernel module for Nexus S
-- Tried out lldb watchpoints with custom kernel on Nexus S
-- Tried out reaching debug co processors without ptrace using kernel module.
-- Identify mix-mode debugging problems (ARM & Thumb) [TCWG-229] [2/10]
-- Ongoing Initial investigation and indentifying code areas needing changes
Miscellaneous [1/10]
-- Meetings, emails, discussions etc.
== Plan ==
-- Root Google Nexus devices and read debug module configuration with
kernel module [TCWG-429]
-- Complete app and kernel module to read debug coprocessor registers.
-- Try them out on remaining Android devices.
-- Identify mix-mode debugging problems (ARM & Thumb) [TCWG-229]
-- Further investigation and testing a mix mode application.
== Progress ==
* Buildbots (4/10)
- Found culprit for self-hosting breakages
- Bot didn't get right because of dirty builds
- Moving all self-hosting bots to clean builds (~3h)
- More work on MIPS patch breaking self-hosting
- Several breakages and bisections
- Adding first cloud (Scaleway) buildbot to local master
- No NEON, so we can't replace the Chromebooks
* Infrastructure (4/10)
- Power cut in Cambridge Lab, no generator yet
- Chromebooks fail at the time of the cut, even with the UPS
batteries still holding. I'm guessing the power regulator
depends on the internal battery to work (and we removed them)
- Bringing all bots up, etc.
- Setting up an HiKey/AMD for benchmarks (APMs are too different)
- Running EEMBC and SPEC on AMD
* Background (2/10)
- Code review, meetings, discussions, general support, etc.
- Upstreaming -meabi, which may fix builds of kernel, android, bsd
- Compiling aarch64-linux-gnu-gcc by hand because Arch pkg didn't work
o 1 day off (2/10)
== Progress ==
o Linaro GCC (6/10)
* FSF branch merge into linaro GCC 5 branch
* Troubleshot various regression after the merge
* Delivered GCC 5.2 2015.11 snapshot
o Upstream work (1/10)
* Sanitizing gfortran testsuite
o Misc (1/10)
* Various meetings
== Plan ==
o Continue on sanitizing testsuite
o Backports, infra, ...
Implement LAVA jobs for microinstance - TCWG-432 [6/10]
* Refactoring to permit sharing of code between uinstance & main
instance, as far as possible
* Further refactoring for sane submission of bundles without inserting
LAVA assumptions in the wrong places
* Tested as far as possible in main instance, using light hacks and fakebench
Jenkins benchmarking job - TCWG-348 [1/10]
* Converted pbl hacks into a sane patch for yaml-to-json.py
Controlled image builds - TCWG-360 [1/10]
* Submitted aarch64 filesystem build for review
* Generated armhf and amd64 filesystems
* Started learning how to generate hwpack
Misc [2/10]
=Plan=
Review security with shared uinstance/main instance code
Expose more data, benchmarks to bundles
Create YAML definition for Jenkins benchmarking job
Generate (controlled) hwpack for at least one target, or know what the
problems are
Write up noise control report (if time)
Have another at crashdump (if time, if new kexec patches)
* TCWG-72 (3/10)
- divmod transform approved by Richard
- builds cleanly on arm-linux-gnueabihf, aarch64-linux-gnu
- Investigating segfault with __bdi64_div.c
happens when mode == DImode and libval_mode == TImode
- Found another segfault on x86 with TImode, on arm
TImode is not supported and compiler aborts. Perhaps we should
not do the transform when mode is TImode ?
- Had a look at expand_binop_twoval_libfunc().
Wrote a similar function to obtain both results but this resulted
in infinite loop in emit_libcall_block_1
- Strangely the bug is reproducible only during the build and doesn't
trigger when compiled with preprocessed version of bid64_div.c
(passing the same set of options).
- waiting for upstream comments
* TCWG-319 (1/10)
- Submitted jobs for fp benchmark on a53, a57
* Misc:
- PR66214 appears to have gone (fixed or became latent), that was
blocking firefox LTO build with trunk
- PR65837 still appears to be present after r230327
* Public Holidays (6/10)
- Diwali festival
== Next Week ==
- Continue with TCWG-72, TCWG-319 benchmarking, target hook conversion
- Run SPEC2k6 with LTO
== Progress ==
- Widening pass (TCWG-547) - 6/10
* Bootstrapped latest patch on ppc64-linux-gnu, aarch64-linux-gnu and
x64-64-linux-gnu.
* Regression testing on ppc64-linux-gnu,
aarch64-linux-gnu arm64-linux-gnu and x64-64-linux-gnu.
* Fixed all of the execution issues
* Posted updated patch to the list
- Misc (4/10)
* Linaro bug 1900
* Continued Looking at LuaJIT code-base
* gcc/bug list
== Plan ==
* bug 1900
* Look at implementing LuaJIT for aarch64
* LTO
Hi,
We have a packaging/linking/optimization problem at LNG, I hope you guys
can give us some advice on that. (Cc'ing ODP list in case someone want
to add something)
We have OpenDataPlane (ODP), an API stretching between userspace
applications and hardware SDKs. It's defined in the form of C headers,
and we already have several implementations to face SDKs (or whathever
is actually controlling the hardware), e.g. linux-generic, a DPDK one etc.
And we have applications, like Open vSwitch (OVS), which now is able to
work with any ODP platform implementation which implements this API
When it comes to packaging, the ideal scenario would be to create one
package for the application, e.g. openvswitch.deb, and one for each
platform, e.g odp-generic.deb, odp-dpdk.deb. The latter would contain
the implementations in the form of a libodp.so file, so the application
can dynamically load the actually installed platform's library runtime,
with all the benefits of dynamic linking.
The trouble is that we have several accessor functions in the API which
are very short and __very__ frequently used. The best example is
"uint32_t odp_packet_len(odp_packet_t pkt)", which returns the length of
the packet. odp_packet_t is an opaque type defined by the
implementation, often a pointer to the packet's actual metadata, so the
actual function call yields to a simple load from that metadata pointer
(+offset). Having it wrapped into a function call brings a significant
performance decrease: when forwarding 64 byte packets at 10 Gbps, I got
13.2 Mpps with function calls. When I've inlined that function it
brought 13.8 Mpps, that's ~5% difference. And there are a lot of other
frequently used short accessor functions with the same problem.
But obviously if I inline these functions I break the ABI, and I need to
compile the application for each platform (and create packages like
openvswitch-odp-dpdk.deb, containing the platform statically linked).
I've tried to look around on Google and in gcc manual, but I couldn't
find a good solution for this kind of problem.
I've checked link time optimization (-flto), but it only helps with
static linking. Is there any way to keep the ODP application and
platform implementation binaries in separate files while having the
performance benefit of inlining?
Regards,
Zoltan
The Linaro Toolchain Working Group (TCWG) is pleased to announce the
2015.11 snapshot of the Linaro GCC 5 source package.
This monthly snapshot[1] is based on FSF GCC 5.2+svn230068 and
includes performance improvements and bug fixes backported from
mainline GCC. This snapshot contents will be part of the 2015.11
stable [1] quarterly release.
This snapshot tarball is available on:
http://snapshots.linaro.org/components/toolchain/gcc-linaro/5.2-2015.11/
Interesting changes in this GCC source package snapshot include:
* Updates to GCC 5.2+svn230068
* Backport of [Bugfix] [AArch32] fp16 Fix PR 67624 - Incorrect
conversion of float Infinity to __fp16
* Backport of [Bugfix] [AArch64] PR 66776 Add cmovdi_insn_uxtw pattern
* Backport of [Bugfix] [AArch64] PR rtl-optimization/68106 LRA
* Backport of [Bugfix] PR48052 fix testcase
* Backport of [Bugfix] PR other/57195
* Backport of [Bugfix] PR rtl-optim/67421 Cost instruction sequences
when doing left wide shift
* Backport of [Bugfix] PR rtl-optimization/67103 Improve conditional
select ops on immediates
* Backport of [Bugfix] PR rtl-optimization/67756
* Backport of [Bugfix] PR target/61578
* Backport of [Bugfix] PR target/61578
* Backport of [Bugfix] PR target/61578
* Backport of [Bugfix] PR tree-optimization/48052 IVOPTS
* Backport of [Bugfix] PR tree-optimization/52563 and 62173 IVOPTS
* Backport of [Bugfix] PR tree-optimization/64454
* Backport of [Bugfix] PR tree-optimization/66449
* Backport of [AArch32] 1/2 Record FPU features as a bit-set
* Backport of [AArch32] 2/2 Use new FPU features representation
* Backport of [AArch32] 1/5 Make room for more CPU feature flags
* Backport of [AArch32] 2/5 Add feature set definitions
* Backport of [AArch32] 3/5 Use new feature set representation
* Backport of [AArch32] 4/5 Use features sets for builtins
* Backport of [AArch32] 5/5 Move initializer into arm-cores.def and
arm-arches.def
* Backport of [AArch32] Add earlyclobber modifier for neon_(vtrn,
vuzp, vzip)<mode>_insn rtx pattern
* Backport of [AArch32] Add missing is_neon_type types
* Backport of [AArch32] arm memcpy of aligned data
* Backport of [AArch32] Fix arm bootstrap failure due to
-Werror=shift-negative-value
* Backport of [AArch32] fix vget_lane on big-endian
* Backport of [AArch32] Use %wd format for lane printing in bounds_check
* Backport of [AArch32/AArch64] 1/15 [FP16] Hide existing float16
intrinsics unless we have a scalar __fp16 type
* Backport of [AArch32/AArch64] 2/15 [fp16] float16x4_t intrinsics in arm_neon.h
* Backport of [AArch32/AArch64] 3/15 Add V8HFmode and float16x8_t type
* Backport of [AArch32/AArch64] 4/15 float16x8_t intrinsics in arm_neon.h
* Backport of [AArch32/AArch64] 5/15 Remaining intrinsics
* Backport of [AArch32/AArch64] 6/15 Add basic FP16 support
* Backport of [AArch32/AArch64] 8/15 Add support for float16x{4,8}_t
vectors/builtins
* Backport of [AArch32/AArch64] 9/15 vld{2,3,4}{,_lane,_dup}, vcombine, vcreate
* Backport of [AArch32/AArch64] 10/15 Implement vcvt_{,high_}f16_f32
* Backport of [AArch32/AArch64] 11/15 vreinterpret(q?),
vget_(low|high), vld1(q?)_dup
* Backport of [AArch32/AArch64] 12/15 Add vcvt(_high)?_f32_f16
intrinsics, with BE RTL fix
* Backport of [AArch32/AArch64] 13/15 Add float16 tests to
advsimd-intrinsics testsuite
* Backport of [AArch32/AArch64] 14/15 Add test of
vcvt{,_high}_i{f32_f16,f16_f32}
* Backport of [AArch32/AArch64] 15/15 Update sourcebuild.texi with
testsuite/effective-target hooks
* Backport of [AArch64] 1/5 Reimplement aarch64_bitmask_imm
* Backport of [AArch64] 2/5 Improve aarch64_internal_mov_immediate by
using faster algorithm
* Backport of [AArch64] 3/5 Remove dead code
* Backport of [AArch64] 4/5 Remove redundant code
* Backport of [AArch64] 5/5 Cleanup immediate generation code in
aarch64_internal_mov_immediate
* Backport of [AArch64] 1/14 Add ident field to struct processor
* Backport of [AArch64] 2/14 Refactor arches handling, add arch enum identifier
* Backport of [AArch64] 3/14 Refactor option override code
* Backport of [AArch64] 4/14 Create TARGET_FIX_ERR_A53_835769 and use
that instead of aarch64_fix_a53_err835769
* Backport of [AArch64] 5/14 Make flag_omit_leaf_frame_pointer
intialize to 2. Define and use TARGET_OMIT_LEAF_FRAME
* Backport of [AArch64] 6/14 Implement TARGET_OPTION_SAVE/TARGET_OPTION_RESTORE
* Backport of [AArch64] 7/14 Implement TARGET_SET_CURRENT_FUNCTION
* Backport of [AArch64] 8/14 Implement TARGET_OPTION_VALID_ATTRIBUTE_P
* Backport of [AArch64] 9/14 Implement TARGET_CAN_INLINE_P
* Backport of [AArch64] 10/14 Implement target pragmas
* Backport of [AArch64] 11/14 Re-layout SIMD builtin types on builtin expansion
* Backport of [AArch64] 12/14 Target attributes and target pragmas tests
* Backport of [AArch64] 13/14 Document AArch64 target attributes and pragmas
* Backport of [AArch64] 14/14 Reuse target_option_current_node when
passing pragma string to target attribute
* Backport of [AArch64] vtbl[34] and vtbx4
* Backport of [AArch64] Add backend aarch64_bfi pattern
* Backport of [AArch64] Add csneg3_uxtw_insn pattern
* Backport of [AArch64] Add support for 64-bit vector-mode ldp/stp
* Backport of [AArch64] Adjust tests to take LSE extension into account
* Backport of [AArch64] [array_mode 1/8] Rename
vec_store_lanes<mode>_lane to aarch64_vec_store_lanes<mode>_lane
* Backport of [AArch64] [array_mode 2/8] Remove VSTRUCT_DREG, use
BLKmode for d-reg aarch64_st/ld expands
* Backport of [AArch64] [array_mode 3/8] Stop using EImode in
aarch64-simd.md and iterators.md
* Backport of [AArch64] [array_mode 4/8] Remove EImode
* Backport of [AArch64] [array_mode 5/8] Remove V_FOUR_ELEM, again
using BLKmode + set_mem_size.
* Backport of [AArch64] [array_mode 6/8] Remove V_TWO_ELEM, again
using BLKmode + set_mem_size.
* Backport of [AArch64] [array_mode 7/8] Combine the expanders using
VSTRUCT:nregs
* Backport of [AArch64] [array_mode 8/8] Add d-registers to
TARGET_ARRAY_MODE_SUPPORTED_P
* Backport of [AArch64] Break -mcpu tie between the compiler and assembler
* Backport of [AArch64] [expand] Check gimple statement to improve
LSHIFT_EXP expand
* Backport of [AArch64] Fix FAIL:
gcc.target/aarch64/target_attr_crypto_ice_1.c (internal compiler
error)
* Backport of [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics
* Backport of [AArch64] Fix vldX/vstX AdvSIMD intrinsics
* Backport of [AArch64] Followup to [AArch64_be] Fix vtbl[34] and vtbx4
* Backport of [AArch64] Force __builtin_aarch64_fp[sc]r argument into a REG
* Backport of [AArch64] Handle const address in aarch64_print_operand
* Backport of [AArch64] Implement copysign[ds]f3
* Backport of [AArch64] Improve code generation for float16 vector code
* Backport of [AArch64] Improve SIMD concatenation with zeroes
* Backport of [AArch64] Remove index from AARCH64_FUSION_PAIR
* Backport of [AArch64] Remove obsolete comment in aarch64-option-extensions.def
* Backport of [AArch64] Remove separate movtf pattern - Use an
iterator for all FP modes
* Backport of [AArch64] Remove the hack for AARCH64_EXTRA_TUNE_ALL
* Backport of [AArch64] TLSLE 1,2 and 3/N
* Backport of [AArch64] Use default_elf_asm_named_section instead of
special cased hook
* Backport of [AArch64] Use default_elf_asm_named_section instead of
special cased hook
* Backport of [AArch64] Use logics_imm type for 2nd alternative of
*and<mode>3nr_compare0
* Backport of [AArch64] Use popcount_hwi instead of homebrew version
* Backport of [Testsuite] Fix race on temp file in gfortran streamio_*.f90 tests
* Backport of [Testsuite] Fix race on temp file in gfortran tests
* Backport of [Testsuite] Fix typo in vcvt_f16.c testcase
* Backport of [Testsuite] Adjust compiling options for
gcc.target/arm/unsigned-float.c
* Backport of [Testsuite] [AArch32] gcc.target/arm/pr67756.c: Fixed warnings
* Backport of [Testsuite] [AArch64] 7/15 Add basic fp16 tests
* Backport of [Testsuite] [AArch64] Adjust some arith+compare tests
for potentially more aggressive if-conversion
* Backport of [Testsuite] [AArch64] Make arm_align_max_stack_pwr.c and
arm_align_max_pwr.c compile testcase, instead of execution
* Backport of [Testsuite] [AArch64] Mark target_attr_1.c as compile-only
* Backport of [testsuite] [AArch64] Remove divisions-to-produce-NaN
from vdiv_f.c
* Backport of [Testsuite] Add float16 lane_f16_indices tests
* Backport of [Testsuite] auto-wipe dump files
* Backport of [Testsuite] Clean up effective_target cache
* Backport of [Testsuite] Clean up effective_target cache
* Backport of [Testsuite] Fix order of dg-do and
dg-require-effective-target directives
* Backport of [testsuite] gcc.dg/builtins-20.c: Remove undefined behavior
* Backport of [Testsuite] gcc.dg/tree-ssa/pr65447.c: Increase searching number
* Backport of [Misc] add separate insn sched class for vector LDP & STP
* Backport of [Misc] ccorrect ChangeLog dates+address
* Backport of [Misc] fix typo in 223858 1/2
* Backport of [Misc] fix typo in 223858 2/2
* Backport of [Misc] Fix bigendian HFmode in native_interpret_real
* Backport of [Misc] model load/store multiples properly in
autoprefetcher scheduling
* Backport of [Misc] Improve auto-increment addressing mode support in
IVO by refactoring add candiate logic
* Backport of [Misc] Improve bound information in loop niter analysis
* Backport of [Misc] Improve conditional select ops on immediates
* Backport of [Misc] Improve loop bound info by simplifying
conversions in iv base
* Backport of [Misc] IVOPS
* Backport of [Misc] Look into unnecessary conversion when checking
mult_op in get_shiftadd_cost
* Backport of [Misc] Allow REG_EQUAL for ZERO_EXTRACT
* Backport of [Misc] mark libstdc++ tests unsupported if they fail
with relocation truncated
* Backport of [Misc] Rerun loop-header-copying just before vectorization
* Backport of [Misc] Allow PLUS+immediate expression in
noce_try_store_flag_constants
* Backport of [Doc] Clarify feature modifiers {no,}{fp,simd,crypto}
Feedback and Support
Subscribe to the important Linaro mailing lists and join our IRC
channels to stay on top of Linaro development.
** Linaro Toolchain Development "mailing list":
http://lists.linaro.org/mailman/listinfo/linaro-toolchain
** Linaro Toolchain IRC channel on irc.freenode.net at @#linaro-tcwg@
* Bug reports should be filed in bugzilla against GCC product:
http://bugs.linaro.org/enter_bug.cgi?product=GCC
* Interested in commercial support? inquire at "Linaro support":
mailto:support@linaro.org
[1]. Stable source package releases are defined as releases where the
full Linaro Toolchain validation plan is executed.
[2]. Source package snapshots are defined when the compiler is only
put through unit-testing and full validation is not performed.
1 day off (Wednesday) (2/10)
== Progress ==
* Validation
- Jenkins jobs maintenance & cleanup
- comparison of build times between old & new lab
- dedicated slave for results comparison works well
* GCC
- trunk monitoring, reported a few new failures.
- high rate of commits before e/o stage1 means
lots of patches to check
- infrastructure problems in the ST compute farm
mean a few false errors needed analysis
- looked at bug #1869, (problem with binary toolsets
on RHEL6). Made some progress
== Next ==
* Validation:
- continue preparation of switch, as dev-01 is now back
- improve reporting
* GCC:
- check Neon tests cleanup
- bug #1869
- look at how to send valuable reports to gcc-regression
* Off on Wed afternoon [1/10].
# Progress #
* Fails in gdb.threads/multiple-step-overs.exp, (TCWG-332) [1/10]
Patch V2 is posted, pending for review.
* TCWG-422, patch is committed. Done. [2/10].
* TCWG-423, patches are ready, being regression tested. [2/10]
* TCWG-433, build GDB with -fsanitize=address, and exposes many memory
issues. Some of them are fixed. [2/10].
* Upstream patch review, [1/10]
* Misc, meeting, [1/10]
# Plan #
* TCWG-423, Post patches upstream.
* Understand ST's jtag probe and help them to make use of multi-arch
with GDB.
* TCWG-433, Continue fixing memory issues exposed by
-fsanitize=address.
--
Yao
Hi Albert,
On Thu, Nov 12, 2015 at 08:20:18AM +0100, Albert ARIBAUD wrote:
> Can you provide the target name and commit ID that you are building,
> s well as the version of the toolchain that you are building with?
> Without being able to reproduce your issue, it's kind of hard to
> diagnose it.
With the explanation from Ard, I understand the thing now. But thanks
for the reply anyway.
Shawn
On 11 November 2015 at 00:45, Savolainen, Petri (Nokia - FI/Espoo) <
petri.savolainen(a)nokia.com> wrote:
>
>
> > -----Original Message-----
> > From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On Behalf Of
> > EXT Nicolas Morey-Chaisemartin
> > Sent: Tuesday, November 10, 2015 5:13 PM
> > To: Zoltan Kiss; linaro-toolchain(a)lists.linaro.org
> > Cc: lng-odp
> > Subject: Re: [lng-odp] Runtime inlining
> >
> > As I said in the call last week, the problem is wider than that.
> >
> > ODP specifies a lot of types but not their sizes, a lot of
> > enums/defines (things like ODP_PKTIO_INVALID) but not their value
> > either.
> > For our port a lot of those values were changed for
> > performance/implementation reason. So I'm not even compatible between
> > one version of our ODP port and another one.
> >
> > The only way I can see to solve this is for ODP to fix the size of all
> > these types.
> > Default/Invalid values are not that easy, as a pointer would have a
> > completely different behaviour from structs/bitfields
> >
> > Nicolas
> >
>
> Type sizes do not need to be fixed in general, but only when an
> application is build for binary compatibility (the use case we are talking
> here). Binary compatibility and thus the fixed type sizes are defined per
> ISA.
>
> We can e.g. define a configure target (for our reference implementation ==
> linux-generic) "--binary-compatible=armv8.x" or
> "--binary-compatible=x86_64". When you build your application with that
> option, "platform dependent" types and constants would be fixed to
> pre-defined values specified in (new) ODP API arch files.
>
> So instead of building against
> odp/platform/linux-generic/include/odp/plat/queue_types.h ...
>
> typedef ODP_HANDLE_T(odp_queue_t);
> #define ODP_QUEUE_INVALID _odp_cast_scalar(odp_queue_t, 0)
> #define ODP_QUEUE_NAME_LEN 32
>
>
> ... you'd build against odp/arch/armv8.x/include/odp/queue_types.h ...
>
With the introduction of odp/arch at the top level I think we should also
move platform/linux-generic/arch to the same location
> typedef uintptr_t odp_queue_t;
> #define ODP_QUEUE_INVALID ((uintptr_t)0)
> #define ODP_QUEUE_NAME_LEN 64
>
>
> ... or odp/arch/x86_64/include/odp/queue_types.h
>
> typedef uint64_t odp_queue_t;
> #define ODP_QUEUE_INVALID ((uint64_t)0xffffffffffffffff)
> #define ODP_QUEUE_NAME_LEN 32
>
>
> For highest performance on a fixed target platform, you'd still build
> against the platform directly
>
> odp/platform/<soc_vendor_xyz>/include/odp/plat/queue_types.h
>
> typedef xyz_queue_desc_t * odp_queue_t;
> #define ODP_QUEUE_INVALID ((xyz_queue_desc_t *)0xdeadbeef)
> #define ODP_QUEUE_NAME_LEN 20
>
>
> -Petri
>
>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-odp(a)lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
--
Mike Holmes
Technical Manager - Linaro Networking Group
Linaro.org <http://www.linaro.org/> *│ *Open source software for ARM SoCs
Holiday [2/10]
Juno crash analysis [2/10]
* Spent some time fiddling with kexec on AArch64
* Worked in one very specific case
* Another patch series is (apparently) coming, will look out for it
and try again
SPEC-on-Android [2/10]
* Supporting Qian on getting this working
* Wrote a readme for the repository, fixed a Makefile bug that Qian's
cross-compiler happened to tickle
Jenkins benchmarking job - TCWG-348 [1/10]
* Tested, tidied up pbl hacks to generate JSON
* Tested my pbl with Jenkins prototype jobs
* A few minor bug fixes/enhancements for pbl
LAVA jobs for uinstance - TCWG-432 [1/10]
* Reworked jobs to support uinstance, maintaining backward
compatibility as far as possible
* Started adding support to submit results to bundle stream
Misc [2/10]
* Debian FS ready to submit
* Usual meetings/mail/etc background
=Plan=
Look at doing pbl hacks properly in Fathi's in-development refactored p-b-l
Pull together Jenkins/LAVA/pbl, ready to test when uinstance is available
Write up noise control report
(If time, if patches land) have another go at crashdump
== Progress ==
o Linaro GCC (4/10)
* Delivered GCC 4.9 2015.10 snapshot
* More backports forGCC 5 2015.11
* Many instabilities on Hetzner this week
o Upstream work (2/10)
* Sanitizing gfortran testsuite
o Release tools (2/10)
* Added RCs and binaries support to our snapshot.linaro.org
publishing job
o Misc (2/10)
* Various meetings
* Some support
== Plan ==
o Track missing backports dependencies
o Continue ongoing tasks.
== This week ==
* TCWG-369 - Exploit wide add operations when appropriate for Aarch64 (4/10)
- Determined that vectorizer is failing for all targets that have
widening adds with
V8HI to V4SI support (aarch64, ia64, powerPC).
- Modified test cases to indicate expected failure with wide add
V8HI to V4SI support
- Patch sent upstream for approval
* Bugzilla 68223 - arm_[su]min_cmp pattern fails
- Resolved by reverting patch for tcwg-146 as pattern fail in some
corner cases. (3/10)
- Reverted patch checked in upstream
* Misc (1/10)
- Conference calls
* Illness, November 2nd (2/10)
== Next week ==
- TCWG-317 - Resolve lto big endian failures
== Progress ==
- Leave (2/10)
- Widening pass (TCWG-547) - 5/10
* Made the latest changes requested in the review
* Fixed bootstrap and bootstrap mis-compare for ppc64-linux-gnu
* Making uninitialized variable as anonymous ssa (as asked in review)
results in few ICEs.
* Posted updated patch for feedback
- Misc (3/10)
* started looking into LTO status
* Looked at LuaJIT for arm
* gcc/bug list
== Plan ==
* continue with widening pass based on feedback
* Look at implementing LuaJIT for aarch64
* LTO
== This week ==
* TCWG-72 (6/10)
- 5 iterations since the original patch. Changes include:
a) Integration into widening_mul patch
b) Rewriting the divmod transform so DIVMOD() is placed before the topmost
div/mod stmt
c) Removed check for widening mode and optab handler check in expand_DIVMOD
d) Fixed ICE when constant is one of the operands to div/mod stmt.
e) Fixed mis-compilation with a test-case when operands matched but in
opposite order.
f) Formatting nits and fixed test-cases.
- Richard suggested no need to check for post-domination conditions.
- Not sure on what condition to gate the transform.
Checking for availability of divmod/div/mod is not sufficient because arm
defines optab handler for mod which only matches r0 % n where n is
constant and power of 2
for other cases it's expanded via divmod libcall thru expand_divmod.
We would rather need
to check if the template for mod/div gets matched than just to check
if optab handler exists.
AFAIK this cannot be done during tree-ssa passes.
I can think of two approaches:
a) Do the transform to DIVMOD representation unconditionally in
widening_mul pass.
And then in expand_DIVMOD check if the template for mod can be matched.
If it does match then undo the transform from DIVMOD to original
representation and expand.
I am not sure how feasible it is to undo the transform at expansion
time, and start expanding the modified cfg.
b) Define a new target hook combine_divmod.
Default implementation could check for optab handler for div/mod/divmod.
and I could override it for arm-backend to additionally check if the
second operand is a constant and power of 2 and fail for this case
(since we want this to be expanded from modsi3 pattern).
Not sure if this is a good idea, I am replicating the information from
the modsi3 pattern.
If the pattern changes, the hook would also need to be changed.
* Convert ASM_FORMAT_PRIVATE_NAME to hook (2/10)
* TCWG-319 (1/10)
- Bencharmking for patch in progress
* Misc (1/10)
- Meetings
- Sync with Kugan
== Next Week ==
- Continue with TCWG-72
- Complete the patch with build, test and config-builds for
ASM_FROMAT_PRIVATE_NAME and submit upstream
- Continue benchmarking TCWG-319, TCWG-310
== Progress ==
* Buildbots (5/10)
- Some broken bots, bisecting, etc
- Helping a MIPS patch pass on ARM bot
* Maintenance (2/10)
- SciMark2 seems not to be unstable or slow any more in ARM64
- Some more investigations on Loop Load Elimination
- Profiling bigfib on APM and HiKey
* Background (3/10)
- Code review, meetings, discussions, general support, etc.
- Some FOSDEM fiddling
- Some power issues
== Progress ==
* Validation
- moved list of unstable tests to a separate repo, to make
maintenance easier (TCWG-425)
- Jenkins jobs maintenance & cleanup
- a few ABE reporting patches
- comparison of results between old & new lab
* GCC
- trunk monitoring, reported a few new failures.
- Send patch to fix vqtb[lx][34] intrinsics on aarch64_be
* Binutils
- Added a Jenkins job to build+check binutils on
a variety of configurations:
https://ci.linaro.org/view/tcwg-ci/job/tcwg-binutils/
- sent a small patch to fix a bug in the recent STM32L4XX erratum patch
== Next ==
* Validation:
- work on the switch to the new lab, once dev-01 is back online
- more tuning to avoid deadlocks
- re-measure build time on dev-01, to better tune other build jobs
* Two half day off. [2/10]
# Progress #
* TCWG-332, fails in gdb.threads/multiple-step-overs.exp. [1/10]
Testing the simpler approach suggested during the review.
* TCWG-387, done. [1/10] GDB patches are pushed in.
* TCWG-422, GNU vector extension support in ARM GDB. [2/10]
Patches are done, and being tested.
* TCWG-423, GNU vector extension support in AArch64 GDB. [2/10]
Writing patches. Find more issues for AArch64 that GDB doesn't
fully understand the AArch64 calling convention. Need more work here.
* Review ARM GDBserver software single step patch. [1/10]
* Misc, meeting, email, [1/10]
# Plan #
* Off on Wed afternoon.
* TCWG-422, post patches
* TCWG-423, continue.
--
Yao
The Linaro Toolchain Working Group is pleased to announce the availability
of the Linaro Stable Binary Toolchain GCC 5.2-2015.11-rc1
Release-Candidate Archives.
http://snapshots.linaro.org/components/toolchain/binaries/5.2-2015.11-rc1/http://snapshots.linaro.org/components/toolchain/gcc-linaro/5.2-2015.11-rc1/
These archives provide cross-toolchain executables (compiler, debugger,
linker, etc.) and shared libraries (libstdc++, libc, etc.) that target ARM
or Aarch64 GNU/Linux and bare-metal environments. The cross-toolchain
binaries execute on a Linux or MS Windows (under mingw32) host
operating-system.
Please evaluate this release-candidate for correctness. Linaro will
shortly spin the Linaro GCC 5.2-2015.11 release if this release-candidate
passes stakeholder validation.
For bugs related to this release-candidate please email
linaro-toolchain(a)lists.linaro.org or file a bug at
https://bugs.linaro.org/enter_bug.cgi?product=Linux%20Binary%20toolchain
NEWS
* GCC 5.2 2015.11-rc1
The Linaro GCC 5.2 2015.11-rc1 binary toolchain release-candidate is
built from the Linaro GCC-5.2-2015.11 release-candidate source archive.
The Linaro GCC-5.2-2015.11 release source archive is derived from the same
sources as the Linaro GCC-5.2-2015.10 snapshot source archive.
--
Ryan S. Arnold
Linaro Toolchain Working Group - Engineering Manager
www.linaro.org
Dear List,
I'm new to this list and have some questions.
Looking at the created code of GCC on ARMv8, we noticed some areas where there is room for performance improvements.
I assume that these items might already be noticed by you guys.
For example:
1) We noticed that when writing typical DGEMM like code, GCC includes unnecessary DUP instruction
2) GCC seems unwilling to use LDP loads
3) For optimal FPU performance on some A57 its needed to interleave instruction working on ODD and EVEN registers
GCC seem not properly support this. Here sometimes 100% performance increase could be reached by different instruction interleaving.
4) Some work loops highly benefit of interleaving of FPU instructinons and loads.
GCC seems to likes to re-arrange the code so that most or all loads are put on top of the loop.
This can reduce the performance of a well written workloop significantly.
I have no patches to fix this.
But I can produce C- code and ASM output which will show these performance issues.
Please tell me what the next recommended step will be now.
Are all these items known already, or shall I provide code examples to further explain them?
Kind regards
Gunnar von Boehn
== Progress ==
* Validation
- comparing results and build times between the 2 labs
- tuning jobs scheduling to avoid deadlocks
* GCC trunk monitoring
- lots of validation results to check after 1 week of holidays :-)
- a few regressions/new failures/wrong tests reported
* Backports
- a few reviews
== Next ==
* Infrastructure/Validation
* GCC dev: try to fix vqtbl intrinsics for aarch64_be before e/o stage1
The Linaro Toolchain Working Group (TCWG) is pleased to announce the
2015.10 snapshot of the Linaro GCC 4.9 source package.
This snapshot[1] is based on FSF GCC 4.9.4-pre+svn229467 and includes
performance improvements and bug fixes backported from mainline GCC.
This snapshot contents will be part of the 2015.11 stable [1]
quarterly release.
This snapshot tarball is available on:
http://snapshots.linaro.org/components/toolchain/gcc-linaro/4.9-2015.10/
Interesting changes in this GCC source package snapshot include:
* Updates to GCC 4.9.4-pre+svn229467
* Backport of [Bugfix] PR tree-optimization/65735
* Backport of [Bugfix] PR tree-optimization/65177
* Backport of [Bugfix] PR tree-optimization/65048
Feedback and Support
Subscribe to the important Linaro mailing lists and join our IRC
channels to stay on top of Linaro development.
** Linaro Toolchain Development "mailing list":
http://lists.linaro.org/mailman/listinfo/linaro-toolchain
** Linaro Toolchain IRC channel on irc.freenode.net at @#linaro-tcwg@
* Bug reports should be filed in bugzilla against GCC product:
http://bugs.linaro.org/enter_bug.cgi?product=GCC
* Interested in commercial support? inquire at "Linaro support":
mailto:support@linaro.org
[1]. Stable source package releases are defined as releases where the
full Linaro Toolchain validation plan is executed.
[2]. Source package snapshots are defined when the compiler is only
put through unit-testing and full validation is not performed.
== Progress ==
o Linaro GCC (9/10)
* Backports and reviews for our GCC 5 2015.11 snapshot
* FSF branch merge and needed backports for our GCC 4.9 2015.10 snapshot
o Misc (1/10)
* Various meetings
== Plan ==
o Complete 4.9 snapshot
Noise control experiments - TCWG-358 [3/10]
* Some analysis of data to date
Debian filesystem - TCWG-360 [3/10]
* Got stuck on LAVA interactions
* Now booting-to-LAVA-usability, needs some cleanup and testing with
real benchmark runs
Benchmarking-via-Jenkins - TCWG-348 [1/10]
* Picked back up on understanding that LAVA uinstance is a-coming
* Hacked pbl.py (post-build-lava) to generate suitable JSON
** As a bonus, this can work as a CLI job-submission tool
=Plan=
Holiday Friday (pending approval)
Set up crashdumping on my Juno, try to learn why it crashes
Finish Debian filesystem
Get Jenkins generating and submitting jobs suitable for uinstance
Write up noise control report (probably will get bumped to next week)
== This week ==
* TCWG-317 - Exploit wide add operations when appropriate for Aarch32 (3/10)
- Could not reproduce test failure in qemu
- Waiting on feedback from Maxim who is running patch on Linaro
validation
* TCWG-369 - Exploit wide add operations when appropriate for Aarch64 (4/10)
- Debugging tree vectorizer to determine why loops with no wide add
operations are no longer being vectorized
* TCWG 146 - Debugging regression on arm big endian with Linaro 5 branch
(2/10)
* Misc (1/10)
- Conference calls
== Next week ==
- TCWG-317 - Resolve lto big endian failures
- TCWG-369 - Identify why loops are not being vectorized
- TCWG-146 - Resolve big endian failures
== This Week ==
* TCWG-72 / PR43721 (8/10)
- Completed writing expand_DIVMOD() and submitted patch upstream
- Reworked patch according to Richard Biener's comments.
(http://people.linaro.org/~prathamesh.kulkarni/pr43721-patch-v2.diff)
- Wrote test-cases.
* TCWG-319 (1/10)
- Benchmark run without patch complete for "int" benchmarks on a53, a57
- Benchmark run with patch in progress for "int" benchmarks on a53, a57
* Misc (1/10)
- Meetings
== Next Week ==
- Continue with TCWG-72, TCWG-310li, TCWG-319 benchmarking
# Progress #
* TCWG-332, [1/10], patch is posted for review.
* TCWG-187, [1/10], looks like a kernel (~3.4) bug on setting VFP
registers through ptrace. Fails go away after I upgrade the kernel.
* TCWG-387, [2/10], binutils patch is committed, while GDB
patches are being reviewed.
* TCWG-422, [3/10], Read AAPCS and gcc source code to understand the
calling convention. On going.
* FSF patch review, [2/10]
** Review patch set "all-stop on top of non-stop for remote".
** Review ARM fast tracepoint patches.
** Reopen PR 15564, as the fail isn't fixed.
* Misc, meeting, [1/10]
# Plan #
* TCWG-422
* Support ST on using AArch64 multi-arch GDB if needed.
* Review ARM GDBserver software single step patch.
--
Yao
== This week ==
* TCWG-317 - Exploit wide add operations when appropriate for Aarch32 (5/10)
- Continued to debug big endian/lto failure of four torture tests
- Differences in lto vs. non-lto executables showed no significant
differences
* TCWG-369 - Exploit wide add operations when appropriate for Aarch64 (4/10)
- Resolved three of six test suite failures
* Misc (1/10)
- Conference calls
== Next week ==
- TCWG-317 - Resolve lto big endian failures by debugging test cases
with qemu
- TCWG-369 - Resolve remaining test suite failures
--
Michael Collison
Linaro Toolchain Working Group
michael.collison(a)linaro.org
Hello all,
When upgrading meta-linaro/meta-linaro-toolchain from daisy to fido, we faced drastic performance downgrade (-34.06%) on linux ipv4 forwarding. The board is ls1021atwr (ARM Cortex-A7 MPCore compliant with ARMv7-A architecture).
We did some investigation, and suspected that it is caused by the toolchain upgrade. Did someone meet similar issue?
The detailed toolchain info:
https://git.linaro.org/openembedded/meta-linaro.git/shortlog/refs/heads/fido
gcc-linaro-4.9.3-2015.03, glibc-linaro-2.20-2014.11, binutils-linaro-2.25-2015.01
https://git.linaro.org/openembedded/meta-linaro.git/shortlog/refs/heads/dai…
gcc-linaro-4.8.3-2014.04, eglibc-linaro-2.19-r2014.04, binutils-linaro-2.24-r2014.03
thanks.
-Ting
o Week 42 off (10/10)
== Progress ==
o Linaro GCC (9/10)
* Completed backports and reviews
* Delivered GCC 5 2015.10 snapshot
* Completed TCWG 389 (deploy snapshot job)
o Misc (1/10)
* Various meetings
== Plan ==
o Back to work
Mustang benchmarking bringup - (no ticket) [2/10]
Controlled Debian build - TCWG-360 [1/10]
* Constructed filesystem, works OK as chroot
* Next step is to boot it
Investigate Workload Automation framework - TCWG-361 [1/10]
* Seems to basically work
* SPEC support dubious
Misc
* LAVA uinstance background (TCWG-396, < 1/10)
* Noise control experiments (TCWG-358, < 1/10)
* ARM management duties [2/10]
== Progress ==
* Maintenance (5/10)
- Investigating some LivermoreLoops that don't vectorise
- Reducing cases to single specific causes
* Background (5/10)
- Code review, meetings, discussions, general support, etc.
- More code of conduct stuff, license change to Apache 2
- Following up on sanitizers VMA discussion