On 26 November 2012 16:54, 王韬 wangtao2010@ict.ac.cn wrote:
Hi Michael, Currently, I also note some eembc test cases have regression. And I'm trying to figure out the reason by profiling.
I note that:
- In some cases, ARMCC can vectorize the loops while GCC and Linaro Gcc cannot.
- In test cases ip_pktcheckb**(eembc/networking), ARMCC, GCC, and Linaro GCC cannot make right vectorization decision. On X86, GCC does have a cost model modeling the X86 instructions (function ix86_builtin_vectorization_cost). However, there is no specific counterpart for ARM. Only a default (default_builtin_vectorization_cost) is used.
So, my preliminary plan is: Step 1. Implement a ARM specific cost modeling function and tuning it.
Sounds good.
Step 2. Enable GCC and Linaro GCC to vectorize loops that ARMCC can handle now.
Worth checking.
Besides, I want to know your opinion about this plan and how you will tuning peeling.
The first step with peeling is to disable it and show that no benchmarks regress. Past that we can look into specifics, as the vld1 with alignment assertions is faster than an unaligned vld1.
-- Michael