Hi,
We have a packaging/linking/optimization problem at LNG, I hope you guys
can give us some advice on that. (Cc'ing ODP list in case someone want
to add something)
We have OpenDataPlane (ODP), an API stretching between userspace
applications and hardware SDKs. It's defined in the form of C headers,
and we already have several implementations to face SDKs (or whathever
is actually controlling the hardware), e.g. linux-generic, a DPDK one etc.
And we have applications, like Open vSwitch (OVS), which now is able to
work with any ODP platform implementation which implements this API
When it comes to packaging, the ideal scenario would be to create one
package for the application, e.g. openvswitch.deb, and one for each
platform, e.g odp-generic.deb, odp-dpdk.deb. The latter would contain
the implementations in the form of a libodp.so file, so the application
can dynamically load the actually installed platform's library runtime,
with all the benefits of dynamic linking.
The trouble is that we have several accessor functions in the API which
are very short and __very__ frequently used. The best example is
"uint32_t odp_packet_len(odp_packet_t pkt)", which returns the length of
the packet. odp_packet_t is an opaque type defined by the
implementation, often a pointer to the packet's actual metadata, so the
actual function call yields to a simple load from that metadata pointer
(+offset). Having it wrapped into a function call brings a significant
performance decrease: when forwarding 64 byte packets at 10 Gbps, I got
13.2 Mpps with function calls. When I've inlined that function it
brought 13.8 Mpps, that's ~5% difference. And there are a lot of other
frequently used short accessor functions with the same problem.
But obviously if I inline these functions I break the ABI, and I need to
compile the application for each platform (and create packages like
openvswitch-odp-dpdk.deb, containing the platform statically linked).
I've tried to look around on Google and in gcc manual, but I couldn't
find a good solution for this kind of problem.
I've checked link time optimization (-flto), but it only helps with
static linking. Is there any way to keep the ODP application and
platform implementation binaries in separate files while having the
performance benefit of inlining?
Regards,
Zoltan
The Linaro Toolchain Working Group (TCWG) is pleased to announce the
2015.11 snapshot of the Linaro GCC 5 source package.
This monthly snapshot[1] is based on FSF GCC 5.2+svn230068 and
includes performance improvements and bug fixes backported from
mainline GCC. This snapshot contents will be part of the 2015.11
stable [1] quarterly release.
This snapshot tarball is available on:
http://snapshots.linaro.org/components/toolchain/gcc-linaro/5.2-2015.11/
Interesting changes in this GCC source package snapshot include:
* Updates to GCC 5.2+svn230068
* Backport of [Bugfix] [AArch32] fp16 Fix PR 67624 - Incorrect
conversion of float Infinity to __fp16
* Backport of [Bugfix] [AArch64] PR 66776 Add cmovdi_insn_uxtw pattern
* Backport of [Bugfix] [AArch64] PR rtl-optimization/68106 LRA
* Backport of [Bugfix] PR48052 fix testcase
* Backport of [Bugfix] PR other/57195
* Backport of [Bugfix] PR rtl-optim/67421 Cost instruction sequences
when doing left wide shift
* Backport of [Bugfix] PR rtl-optimization/67103 Improve conditional
select ops on immediates
* Backport of [Bugfix] PR rtl-optimization/67756
* Backport of [Bugfix] PR target/61578
* Backport of [Bugfix] PR target/61578
* Backport of [Bugfix] PR target/61578
* Backport of [Bugfix] PR tree-optimization/48052 IVOPTS
* Backport of [Bugfix] PR tree-optimization/52563 and 62173 IVOPTS
* Backport of [Bugfix] PR tree-optimization/64454
* Backport of [Bugfix] PR tree-optimization/66449
* Backport of [AArch32] 1/2 Record FPU features as a bit-set
* Backport of [AArch32] 2/2 Use new FPU features representation
* Backport of [AArch32] 1/5 Make room for more CPU feature flags
* Backport of [AArch32] 2/5 Add feature set definitions
* Backport of [AArch32] 3/5 Use new feature set representation
* Backport of [AArch32] 4/5 Use features sets for builtins
* Backport of [AArch32] 5/5 Move initializer into arm-cores.def and
arm-arches.def
* Backport of [AArch32] Add earlyclobber modifier for neon_(vtrn,
vuzp, vzip)<mode>_insn rtx pattern
* Backport of [AArch32] Add missing is_neon_type types
* Backport of [AArch32] arm memcpy of aligned data
* Backport of [AArch32] Fix arm bootstrap failure due to
-Werror=shift-negative-value
* Backport of [AArch32] fix vget_lane on big-endian
* Backport of [AArch32] Use %wd format for lane printing in bounds_check
* Backport of [AArch32/AArch64] 1/15 [FP16] Hide existing float16
intrinsics unless we have a scalar __fp16 type
* Backport of [AArch32/AArch64] 2/15 [fp16] float16x4_t intrinsics in arm_neon.h
* Backport of [AArch32/AArch64] 3/15 Add V8HFmode and float16x8_t type
* Backport of [AArch32/AArch64] 4/15 float16x8_t intrinsics in arm_neon.h
* Backport of [AArch32/AArch64] 5/15 Remaining intrinsics
* Backport of [AArch32/AArch64] 6/15 Add basic FP16 support
* Backport of [AArch32/AArch64] 8/15 Add support for float16x{4,8}_t
vectors/builtins
* Backport of [AArch32/AArch64] 9/15 vld{2,3,4}{,_lane,_dup}, vcombine, vcreate
* Backport of [AArch32/AArch64] 10/15 Implement vcvt_{,high_}f16_f32
* Backport of [AArch32/AArch64] 11/15 vreinterpret(q?),
vget_(low|high), vld1(q?)_dup
* Backport of [AArch32/AArch64] 12/15 Add vcvt(_high)?_f32_f16
intrinsics, with BE RTL fix
* Backport of [AArch32/AArch64] 13/15 Add float16 tests to
advsimd-intrinsics testsuite
* Backport of [AArch32/AArch64] 14/15 Add test of
vcvt{,_high}_i{f32_f16,f16_f32}
* Backport of [AArch32/AArch64] 15/15 Update sourcebuild.texi with
testsuite/effective-target hooks
* Backport of [AArch64] 1/5 Reimplement aarch64_bitmask_imm
* Backport of [AArch64] 2/5 Improve aarch64_internal_mov_immediate by
using faster algorithm
* Backport of [AArch64] 3/5 Remove dead code
* Backport of [AArch64] 4/5 Remove redundant code
* Backport of [AArch64] 5/5 Cleanup immediate generation code in
aarch64_internal_mov_immediate
* Backport of [AArch64] 1/14 Add ident field to struct processor
* Backport of [AArch64] 2/14 Refactor arches handling, add arch enum identifier
* Backport of [AArch64] 3/14 Refactor option override code
* Backport of [AArch64] 4/14 Create TARGET_FIX_ERR_A53_835769 and use
that instead of aarch64_fix_a53_err835769
* Backport of [AArch64] 5/14 Make flag_omit_leaf_frame_pointer
intialize to 2. Define and use TARGET_OMIT_LEAF_FRAME
* Backport of [AArch64] 6/14 Implement TARGET_OPTION_SAVE/TARGET_OPTION_RESTORE
* Backport of [AArch64] 7/14 Implement TARGET_SET_CURRENT_FUNCTION
* Backport of [AArch64] 8/14 Implement TARGET_OPTION_VALID_ATTRIBUTE_P
* Backport of [AArch64] 9/14 Implement TARGET_CAN_INLINE_P
* Backport of [AArch64] 10/14 Implement target pragmas
* Backport of [AArch64] 11/14 Re-layout SIMD builtin types on builtin expansion
* Backport of [AArch64] 12/14 Target attributes and target pragmas tests
* Backport of [AArch64] 13/14 Document AArch64 target attributes and pragmas
* Backport of [AArch64] 14/14 Reuse target_option_current_node when
passing pragma string to target attribute
* Backport of [AArch64] vtbl[34] and vtbx4
* Backport of [AArch64] Add backend aarch64_bfi pattern
* Backport of [AArch64] Add csneg3_uxtw_insn pattern
* Backport of [AArch64] Add support for 64-bit vector-mode ldp/stp
* Backport of [AArch64] Adjust tests to take LSE extension into account
* Backport of [AArch64] [array_mode 1/8] Rename
vec_store_lanes<mode>_lane to aarch64_vec_store_lanes<mode>_lane
* Backport of [AArch64] [array_mode 2/8] Remove VSTRUCT_DREG, use
BLKmode for d-reg aarch64_st/ld expands
* Backport of [AArch64] [array_mode 3/8] Stop using EImode in
aarch64-simd.md and iterators.md
* Backport of [AArch64] [array_mode 4/8] Remove EImode
* Backport of [AArch64] [array_mode 5/8] Remove V_FOUR_ELEM, again
using BLKmode + set_mem_size.
* Backport of [AArch64] [array_mode 6/8] Remove V_TWO_ELEM, again
using BLKmode + set_mem_size.
* Backport of [AArch64] [array_mode 7/8] Combine the expanders using
VSTRUCT:nregs
* Backport of [AArch64] [array_mode 8/8] Add d-registers to
TARGET_ARRAY_MODE_SUPPORTED_P
* Backport of [AArch64] Break -mcpu tie between the compiler and assembler
* Backport of [AArch64] [expand] Check gimple statement to improve
LSHIFT_EXP expand
* Backport of [AArch64] Fix FAIL:
gcc.target/aarch64/target_attr_crypto_ice_1.c (internal compiler
error)
* Backport of [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics
* Backport of [AArch64] Fix vldX/vstX AdvSIMD intrinsics
* Backport of [AArch64] Followup to [AArch64_be] Fix vtbl[34] and vtbx4
* Backport of [AArch64] Force __builtin_aarch64_fp[sc]r argument into a REG
* Backport of [AArch64] Handle const address in aarch64_print_operand
* Backport of [AArch64] Implement copysign[ds]f3
* Backport of [AArch64] Improve code generation for float16 vector code
* Backport of [AArch64] Improve SIMD concatenation with zeroes
* Backport of [AArch64] Remove index from AARCH64_FUSION_PAIR
* Backport of [AArch64] Remove obsolete comment in aarch64-option-extensions.def
* Backport of [AArch64] Remove separate movtf pattern - Use an
iterator for all FP modes
* Backport of [AArch64] Remove the hack for AARCH64_EXTRA_TUNE_ALL
* Backport of [AArch64] TLSLE 1,2 and 3/N
* Backport of [AArch64] Use default_elf_asm_named_section instead of
special cased hook
* Backport of [AArch64] Use default_elf_asm_named_section instead of
special cased hook
* Backport of [AArch64] Use logics_imm type for 2nd alternative of
*and<mode>3nr_compare0
* Backport of [AArch64] Use popcount_hwi instead of homebrew version
* Backport of [Testsuite] Fix race on temp file in gfortran streamio_*.f90 tests
* Backport of [Testsuite] Fix race on temp file in gfortran tests
* Backport of [Testsuite] Fix typo in vcvt_f16.c testcase
* Backport of [Testsuite] Adjust compiling options for
gcc.target/arm/unsigned-float.c
* Backport of [Testsuite] [AArch32] gcc.target/arm/pr67756.c: Fixed warnings
* Backport of [Testsuite] [AArch64] 7/15 Add basic fp16 tests
* Backport of [Testsuite] [AArch64] Adjust some arith+compare tests
for potentially more aggressive if-conversion
* Backport of [Testsuite] [AArch64] Make arm_align_max_stack_pwr.c and
arm_align_max_pwr.c compile testcase, instead of execution
* Backport of [Testsuite] [AArch64] Mark target_attr_1.c as compile-only
* Backport of [testsuite] [AArch64] Remove divisions-to-produce-NaN
from vdiv_f.c
* Backport of [Testsuite] Add float16 lane_f16_indices tests
* Backport of [Testsuite] auto-wipe dump files
* Backport of [Testsuite] Clean up effective_target cache
* Backport of [Testsuite] Clean up effective_target cache
* Backport of [Testsuite] Fix order of dg-do and
dg-require-effective-target directives
* Backport of [testsuite] gcc.dg/builtins-20.c: Remove undefined behavior
* Backport of [Testsuite] gcc.dg/tree-ssa/pr65447.c: Increase searching number
* Backport of [Misc] add separate insn sched class for vector LDP & STP
* Backport of [Misc] ccorrect ChangeLog dates+address
* Backport of [Misc] fix typo in 223858 1/2
* Backport of [Misc] fix typo in 223858 2/2
* Backport of [Misc] Fix bigendian HFmode in native_interpret_real
* Backport of [Misc] model load/store multiples properly in
autoprefetcher scheduling
* Backport of [Misc] Improve auto-increment addressing mode support in
IVO by refactoring add candiate logic
* Backport of [Misc] Improve bound information in loop niter analysis
* Backport of [Misc] Improve conditional select ops on immediates
* Backport of [Misc] Improve loop bound info by simplifying
conversions in iv base
* Backport of [Misc] IVOPS
* Backport of [Misc] Look into unnecessary conversion when checking
mult_op in get_shiftadd_cost
* Backport of [Misc] Allow REG_EQUAL for ZERO_EXTRACT
* Backport of [Misc] mark libstdc++ tests unsupported if they fail
with relocation truncated
* Backport of [Misc] Rerun loop-header-copying just before vectorization
* Backport of [Misc] Allow PLUS+immediate expression in
noce_try_store_flag_constants
* Backport of [Doc] Clarify feature modifiers {no,}{fp,simd,crypto}
Feedback and Support
Subscribe to the important Linaro mailing lists and join our IRC
channels to stay on top of Linaro development.
** Linaro Toolchain Development "mailing list":
http://lists.linaro.org/mailman/listinfo/linaro-toolchain
** Linaro Toolchain IRC channel on irc.freenode.net at @#linaro-tcwg@
* Bug reports should be filed in bugzilla against GCC product:
http://bugs.linaro.org/enter_bug.cgi?product=GCC
* Interested in commercial support? inquire at "Linaro support":
mailto:support@linaro.org
[1]. Stable source package releases are defined as releases where the
full Linaro Toolchain validation plan is executed.
[2]. Source package snapshots are defined when the compiler is only
put through unit-testing and full validation is not performed.
1 day off (Wednesday) (2/10)
== Progress ==
* Validation
- Jenkins jobs maintenance & cleanup
- comparison of build times between old & new lab
- dedicated slave for results comparison works well
* GCC
- trunk monitoring, reported a few new failures.
- high rate of commits before e/o stage1 means
lots of patches to check
- infrastructure problems in the ST compute farm
mean a few false errors needed analysis
- looked at bug #1869, (problem with binary toolsets
on RHEL6). Made some progress
== Next ==
* Validation:
- continue preparation of switch, as dev-01 is now back
- improve reporting
* GCC:
- check Neon tests cleanup
- bug #1869
- look at how to send valuable reports to gcc-regression
* Off on Wed afternoon [1/10].
# Progress #
* Fails in gdb.threads/multiple-step-overs.exp, (TCWG-332) [1/10]
Patch V2 is posted, pending for review.
* TCWG-422, patch is committed. Done. [2/10].
* TCWG-423, patches are ready, being regression tested. [2/10]
* TCWG-433, build GDB with -fsanitize=address, and exposes many memory
issues. Some of them are fixed. [2/10].
* Upstream patch review, [1/10]
* Misc, meeting, [1/10]
# Plan #
* TCWG-423, Post patches upstream.
* Understand ST's jtag probe and help them to make use of multi-arch
with GDB.
* TCWG-433, Continue fixing memory issues exposed by
-fsanitize=address.
--
Yao
Hi Albert,
On Thu, Nov 12, 2015 at 08:20:18AM +0100, Albert ARIBAUD wrote:
> Can you provide the target name and commit ID that you are building,
> s well as the version of the toolchain that you are building with?
> Without being able to reproduce your issue, it's kind of hard to
> diagnose it.
With the explanation from Ard, I understand the thing now. But thanks
for the reply anyway.
Shawn
On 11 November 2015 at 00:45, Savolainen, Petri (Nokia - FI/Espoo) <
petri.savolainen(a)nokia.com> wrote:
>
>
> > -----Original Message-----
> > From: lng-odp [mailto:lng-odp-bounces@lists.linaro.org] On Behalf Of
> > EXT Nicolas Morey-Chaisemartin
> > Sent: Tuesday, November 10, 2015 5:13 PM
> > To: Zoltan Kiss; linaro-toolchain(a)lists.linaro.org
> > Cc: lng-odp
> > Subject: Re: [lng-odp] Runtime inlining
> >
> > As I said in the call last week, the problem is wider than that.
> >
> > ODP specifies a lot of types but not their sizes, a lot of
> > enums/defines (things like ODP_PKTIO_INVALID) but not their value
> > either.
> > For our port a lot of those values were changed for
> > performance/implementation reason. So I'm not even compatible between
> > one version of our ODP port and another one.
> >
> > The only way I can see to solve this is for ODP to fix the size of all
> > these types.
> > Default/Invalid values are not that easy, as a pointer would have a
> > completely different behaviour from structs/bitfields
> >
> > Nicolas
> >
>
> Type sizes do not need to be fixed in general, but only when an
> application is build for binary compatibility (the use case we are talking
> here). Binary compatibility and thus the fixed type sizes are defined per
> ISA.
>
> We can e.g. define a configure target (for our reference implementation ==
> linux-generic) "--binary-compatible=armv8.x" or
> "--binary-compatible=x86_64". When you build your application with that
> option, "platform dependent" types and constants would be fixed to
> pre-defined values specified in (new) ODP API arch files.
>
> So instead of building against
> odp/platform/linux-generic/include/odp/plat/queue_types.h ...
>
> typedef ODP_HANDLE_T(odp_queue_t);
> #define ODP_QUEUE_INVALID _odp_cast_scalar(odp_queue_t, 0)
> #define ODP_QUEUE_NAME_LEN 32
>
>
> ... you'd build against odp/arch/armv8.x/include/odp/queue_types.h ...
>
With the introduction of odp/arch at the top level I think we should also
move platform/linux-generic/arch to the same location
> typedef uintptr_t odp_queue_t;
> #define ODP_QUEUE_INVALID ((uintptr_t)0)
> #define ODP_QUEUE_NAME_LEN 64
>
>
> ... or odp/arch/x86_64/include/odp/queue_types.h
>
> typedef uint64_t odp_queue_t;
> #define ODP_QUEUE_INVALID ((uint64_t)0xffffffffffffffff)
> #define ODP_QUEUE_NAME_LEN 32
>
>
> For highest performance on a fixed target platform, you'd still build
> against the platform directly
>
> odp/platform/<soc_vendor_xyz>/include/odp/plat/queue_types.h
>
> typedef xyz_queue_desc_t * odp_queue_t;
> #define ODP_QUEUE_INVALID ((xyz_queue_desc_t *)0xdeadbeef)
> #define ODP_QUEUE_NAME_LEN 20
>
>
> -Petri
>
>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-odp(a)lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
--
Mike Holmes
Technical Manager - Linaro Networking Group
Linaro.org <http://www.linaro.org/> *│ *Open source software for ARM SoCs
Holiday [2/10]
Juno crash analysis [2/10]
* Spent some time fiddling with kexec on AArch64
* Worked in one very specific case
* Another patch series is (apparently) coming, will look out for it
and try again
SPEC-on-Android [2/10]
* Supporting Qian on getting this working
* Wrote a readme for the repository, fixed a Makefile bug that Qian's
cross-compiler happened to tickle
Jenkins benchmarking job - TCWG-348 [1/10]
* Tested, tidied up pbl hacks to generate JSON
* Tested my pbl with Jenkins prototype jobs
* A few minor bug fixes/enhancements for pbl
LAVA jobs for uinstance - TCWG-432 [1/10]
* Reworked jobs to support uinstance, maintaining backward
compatibility as far as possible
* Started adding support to submit results to bundle stream
Misc [2/10]
* Debian FS ready to submit
* Usual meetings/mail/etc background
=Plan=
Look at doing pbl hacks properly in Fathi's in-development refactored p-b-l
Pull together Jenkins/LAVA/pbl, ready to test when uinstance is available
Write up noise control report
(If time, if patches land) have another go at crashdump
== Progress ==
o Linaro GCC (4/10)
* Delivered GCC 4.9 2015.10 snapshot
* More backports forGCC 5 2015.11
* Many instabilities on Hetzner this week
o Upstream work (2/10)
* Sanitizing gfortran testsuite
o Release tools (2/10)
* Added RCs and binaries support to our snapshot.linaro.org
publishing job
o Misc (2/10)
* Various meetings
* Some support
== Plan ==
o Track missing backports dependencies
o Continue ongoing tasks.
== This week ==
* TCWG-369 - Exploit wide add operations when appropriate for Aarch64 (4/10)
- Determined that vectorizer is failing for all targets that have
widening adds with
V8HI to V4SI support (aarch64, ia64, powerPC).
- Modified test cases to indicate expected failure with wide add
V8HI to V4SI support
- Patch sent upstream for approval
* Bugzilla 68223 - arm_[su]min_cmp pattern fails
- Resolved by reverting patch for tcwg-146 as pattern fail in some
corner cases. (3/10)
- Reverted patch checked in upstream
* Misc (1/10)
- Conference calls
* Illness, November 2nd (2/10)
== Next week ==
- TCWG-317 - Resolve lto big endian failures
== Progress ==
- Leave (2/10)
- Widening pass (TCWG-547) - 5/10
* Made the latest changes requested in the review
* Fixed bootstrap and bootstrap mis-compare for ppc64-linux-gnu
* Making uninitialized variable as anonymous ssa (as asked in review)
results in few ICEs.
* Posted updated patch for feedback
- Misc (3/10)
* started looking into LTO status
* Looked at LuaJIT for arm
* gcc/bug list
== Plan ==
* continue with widening pass based on feedback
* Look at implementing LuaJIT for aarch64
* LTO
== This week ==
* TCWG-72 (6/10)
- 5 iterations since the original patch. Changes include:
a) Integration into widening_mul patch
b) Rewriting the divmod transform so DIVMOD() is placed before the topmost
div/mod stmt
c) Removed check for widening mode and optab handler check in expand_DIVMOD
d) Fixed ICE when constant is one of the operands to div/mod stmt.
e) Fixed mis-compilation with a test-case when operands matched but in
opposite order.
f) Formatting nits and fixed test-cases.
- Richard suggested no need to check for post-domination conditions.
- Not sure on what condition to gate the transform.
Checking for availability of divmod/div/mod is not sufficient because arm
defines optab handler for mod which only matches r0 % n where n is
constant and power of 2
for other cases it's expanded via divmod libcall thru expand_divmod.
We would rather need
to check if the template for mod/div gets matched than just to check
if optab handler exists.
AFAIK this cannot be done during tree-ssa passes.
I can think of two approaches:
a) Do the transform to DIVMOD representation unconditionally in
widening_mul pass.
And then in expand_DIVMOD check if the template for mod can be matched.
If it does match then undo the transform from DIVMOD to original
representation and expand.
I am not sure how feasible it is to undo the transform at expansion
time, and start expanding the modified cfg.
b) Define a new target hook combine_divmod.
Default implementation could check for optab handler for div/mod/divmod.
and I could override it for arm-backend to additionally check if the
second operand is a constant and power of 2 and fail for this case
(since we want this to be expanded from modsi3 pattern).
Not sure if this is a good idea, I am replicating the information from
the modsi3 pattern.
If the pattern changes, the hook would also need to be changed.
* Convert ASM_FORMAT_PRIVATE_NAME to hook (2/10)
* TCWG-319 (1/10)
- Bencharmking for patch in progress
* Misc (1/10)
- Meetings
- Sync with Kugan
== Next Week ==
- Continue with TCWG-72
- Complete the patch with build, test and config-builds for
ASM_FROMAT_PRIVATE_NAME and submit upstream
- Continue benchmarking TCWG-319, TCWG-310
== Progress ==
* Buildbots (5/10)
- Some broken bots, bisecting, etc
- Helping a MIPS patch pass on ARM bot
* Maintenance (2/10)
- SciMark2 seems not to be unstable or slow any more in ARM64
- Some more investigations on Loop Load Elimination
- Profiling bigfib on APM and HiKey
* Background (3/10)
- Code review, meetings, discussions, general support, etc.
- Some FOSDEM fiddling
- Some power issues
== Progress ==
* Validation
- moved list of unstable tests to a separate repo, to make
maintenance easier (TCWG-425)
- Jenkins jobs maintenance & cleanup
- a few ABE reporting patches
- comparison of results between old & new lab
* GCC
- trunk monitoring, reported a few new failures.
- Send patch to fix vqtb[lx][34] intrinsics on aarch64_be
* Binutils
- Added a Jenkins job to build+check binutils on
a variety of configurations:
https://ci.linaro.org/view/tcwg-ci/job/tcwg-binutils/
- sent a small patch to fix a bug in the recent STM32L4XX erratum patch
== Next ==
* Validation:
- work on the switch to the new lab, once dev-01 is back online
- more tuning to avoid deadlocks
- re-measure build time on dev-01, to better tune other build jobs
* Two half day off. [2/10]
# Progress #
* TCWG-332, fails in gdb.threads/multiple-step-overs.exp. [1/10]
Testing the simpler approach suggested during the review.
* TCWG-387, done. [1/10] GDB patches are pushed in.
* TCWG-422, GNU vector extension support in ARM GDB. [2/10]
Patches are done, and being tested.
* TCWG-423, GNU vector extension support in AArch64 GDB. [2/10]
Writing patches. Find more issues for AArch64 that GDB doesn't
fully understand the AArch64 calling convention. Need more work here.
* Review ARM GDBserver software single step patch. [1/10]
* Misc, meeting, email, [1/10]
# Plan #
* Off on Wed afternoon.
* TCWG-422, post patches
* TCWG-423, continue.
--
Yao
The Linaro Toolchain Working Group is pleased to announce the availability
of the Linaro Stable Binary Toolchain GCC 5.2-2015.11-rc1
Release-Candidate Archives.
http://snapshots.linaro.org/components/toolchain/binaries/5.2-2015.11-rc1/http://snapshots.linaro.org/components/toolchain/gcc-linaro/5.2-2015.11-rc1/
These archives provide cross-toolchain executables (compiler, debugger,
linker, etc.) and shared libraries (libstdc++, libc, etc.) that target ARM
or Aarch64 GNU/Linux and bare-metal environments. The cross-toolchain
binaries execute on a Linux or MS Windows (under mingw32) host
operating-system.
Please evaluate this release-candidate for correctness. Linaro will
shortly spin the Linaro GCC 5.2-2015.11 release if this release-candidate
passes stakeholder validation.
For bugs related to this release-candidate please email
linaro-toolchain(a)lists.linaro.org or file a bug at
https://bugs.linaro.org/enter_bug.cgi?product=Linux%20Binary%20toolchain
NEWS
* GCC 5.2 2015.11-rc1
The Linaro GCC 5.2 2015.11-rc1 binary toolchain release-candidate is
built from the Linaro GCC-5.2-2015.11 release-candidate source archive.
The Linaro GCC-5.2-2015.11 release source archive is derived from the same
sources as the Linaro GCC-5.2-2015.10 snapshot source archive.
--
Ryan S. Arnold
Linaro Toolchain Working Group - Engineering Manager
www.linaro.org
Dear List,
I'm new to this list and have some questions.
Looking at the created code of GCC on ARMv8, we noticed some areas where there is room for performance improvements.
I assume that these items might already be noticed by you guys.
For example:
1) We noticed that when writing typical DGEMM like code, GCC includes unnecessary DUP instruction
2) GCC seems unwilling to use LDP loads
3) For optimal FPU performance on some A57 its needed to interleave instruction working on ODD and EVEN registers
GCC seem not properly support this. Here sometimes 100% performance increase could be reached by different instruction interleaving.
4) Some work loops highly benefit of interleaving of FPU instructinons and loads.
GCC seems to likes to re-arrange the code so that most or all loads are put on top of the loop.
This can reduce the performance of a well written workloop significantly.
I have no patches to fix this.
But I can produce C- code and ASM output which will show these performance issues.
Please tell me what the next recommended step will be now.
Are all these items known already, or shall I provide code examples to further explain them?
Kind regards
Gunnar von Boehn
== Progress ==
* Validation
- comparing results and build times between the 2 labs
- tuning jobs scheduling to avoid deadlocks
* GCC trunk monitoring
- lots of validation results to check after 1 week of holidays :-)
- a few regressions/new failures/wrong tests reported
* Backports
- a few reviews
== Next ==
* Infrastructure/Validation
* GCC dev: try to fix vqtbl intrinsics for aarch64_be before e/o stage1
The Linaro Toolchain Working Group (TCWG) is pleased to announce the
2015.10 snapshot of the Linaro GCC 4.9 source package.
This snapshot[1] is based on FSF GCC 4.9.4-pre+svn229467 and includes
performance improvements and bug fixes backported from mainline GCC.
This snapshot contents will be part of the 2015.11 stable [1]
quarterly release.
This snapshot tarball is available on:
http://snapshots.linaro.org/components/toolchain/gcc-linaro/4.9-2015.10/
Interesting changes in this GCC source package snapshot include:
* Updates to GCC 4.9.4-pre+svn229467
* Backport of [Bugfix] PR tree-optimization/65735
* Backport of [Bugfix] PR tree-optimization/65177
* Backport of [Bugfix] PR tree-optimization/65048
Feedback and Support
Subscribe to the important Linaro mailing lists and join our IRC
channels to stay on top of Linaro development.
** Linaro Toolchain Development "mailing list":
http://lists.linaro.org/mailman/listinfo/linaro-toolchain
** Linaro Toolchain IRC channel on irc.freenode.net at @#linaro-tcwg@
* Bug reports should be filed in bugzilla against GCC product:
http://bugs.linaro.org/enter_bug.cgi?product=GCC
* Interested in commercial support? inquire at "Linaro support":
mailto:support@linaro.org
[1]. Stable source package releases are defined as releases where the
full Linaro Toolchain validation plan is executed.
[2]. Source package snapshots are defined when the compiler is only
put through unit-testing and full validation is not performed.
== Progress ==
o Linaro GCC (9/10)
* Backports and reviews for our GCC 5 2015.11 snapshot
* FSF branch merge and needed backports for our GCC 4.9 2015.10 snapshot
o Misc (1/10)
* Various meetings
== Plan ==
o Complete 4.9 snapshot
Noise control experiments - TCWG-358 [3/10]
* Some analysis of data to date
Debian filesystem - TCWG-360 [3/10]
* Got stuck on LAVA interactions
* Now booting-to-LAVA-usability, needs some cleanup and testing with
real benchmark runs
Benchmarking-via-Jenkins - TCWG-348 [1/10]
* Picked back up on understanding that LAVA uinstance is a-coming
* Hacked pbl.py (post-build-lava) to generate suitable JSON
** As a bonus, this can work as a CLI job-submission tool
=Plan=
Holiday Friday (pending approval)
Set up crashdumping on my Juno, try to learn why it crashes
Finish Debian filesystem
Get Jenkins generating and submitting jobs suitable for uinstance
Write up noise control report (probably will get bumped to next week)
== This week ==
* TCWG-317 - Exploit wide add operations when appropriate for Aarch32 (3/10)
- Could not reproduce test failure in qemu
- Waiting on feedback from Maxim who is running patch on Linaro
validation
* TCWG-369 - Exploit wide add operations when appropriate for Aarch64 (4/10)
- Debugging tree vectorizer to determine why loops with no wide add
operations are no longer being vectorized
* TCWG 146 - Debugging regression on arm big endian with Linaro 5 branch
(2/10)
* Misc (1/10)
- Conference calls
== Next week ==
- TCWG-317 - Resolve lto big endian failures
- TCWG-369 - Identify why loops are not being vectorized
- TCWG-146 - Resolve big endian failures
== This Week ==
* TCWG-72 / PR43721 (8/10)
- Completed writing expand_DIVMOD() and submitted patch upstream
- Reworked patch according to Richard Biener's comments.
(http://people.linaro.org/~prathamesh.kulkarni/pr43721-patch-v2.diff)
- Wrote test-cases.
* TCWG-319 (1/10)
- Benchmark run without patch complete for "int" benchmarks on a53, a57
- Benchmark run with patch in progress for "int" benchmarks on a53, a57
* Misc (1/10)
- Meetings
== Next Week ==
- Continue with TCWG-72, TCWG-310li, TCWG-319 benchmarking
# Progress #
* TCWG-332, [1/10], patch is posted for review.
* TCWG-187, [1/10], looks like a kernel (~3.4) bug on setting VFP
registers through ptrace. Fails go away after I upgrade the kernel.
* TCWG-387, [2/10], binutils patch is committed, while GDB
patches are being reviewed.
* TCWG-422, [3/10], Read AAPCS and gcc source code to understand the
calling convention. On going.
* FSF patch review, [2/10]
** Review patch set "all-stop on top of non-stop for remote".
** Review ARM fast tracepoint patches.
** Reopen PR 15564, as the fail isn't fixed.
* Misc, meeting, [1/10]
# Plan #
* TCWG-422
* Support ST on using AArch64 multi-arch GDB if needed.
* Review ARM GDBserver software single step patch.
--
Yao