Hi all,
I am working on enhancing vectorizer cost model for neon.
I note that related work has been discussed. 0. 2011-05-26, https://blueprints.launchpad.net/gcc-linaro/+spec/neon-regressions-11.11 1. 2012-06-12, https://blueprints.launchpad.net/gcc-linaro/+spec/vectorizer-cost-model 2. 2012-06-24, https://blueprints.launchpad.net/gcc-linaro/+spec/refactor-backend-cost-mode...
So, is there any further work on this topic ? Who is working on this ? Maybe I can make some contribuitions to this problem.
Best regards Tao Wang
On 21 November 2012 22:45, 王韬 wangtao2010@ict.ac.cn wrote:
Hi all,
I am working on enhancing vectorizer cost model for neon.
I note that related work has been discussed. 0. 2011-05-26, https://blueprints.launchpad.net/gcc-linaro/+spec/neon-regressions-11.11
- 2012-06-12, https://blueprints.launchpad.net/gcc-linaro/+spec/vectorizer-cost-model
- 2012-06-24, https://blueprints.launchpad.net/gcc-linaro/+spec/refactor-backend-cost-mode...
So, is there any further work on this topic ? Who is working on this ? Maybe I can make some contribuitions to this problem.
Hi there. Thanks for your interest. I find the vectoriser quite interesting as our team is all about performance and, when it kicks in, the vectoriser can give big gains for little developer effort.
The vectoriser currently does quite well and has few regressions on the SPEC and EEMBC benchmarks that we normally run. The next step is tuning peeling as it causes the few remaining regressions and generally doesn't benefit us on ARM. The vectoriser cost model is interesting past that as it helps the vectoriser decide if vectorisation for a certain piece of code is profitable.
We develop upstream on the gcc-patches@gcc.gnu.org list. That's a good place to discuss anything. At the moment I don't have any vectoriser experts on my team but we'll help where we can.
-- Michael
Hi Michael, Currently, I also note some eembc test cases have regression. And I'm trying to figure out the reason by profiling.
I note that: 1. In some cases, ARMCC can vectorize the loops while GCC and Linaro Gcc cannot. 2. In test cases ip_pktcheckb**(eembc/networking), ARMCC, GCC, and Linaro GCC cannot make right vectorization decision. On X86, GCC does have a cost model modeling the X86 instructions (function ix86_builtin_vectorization_cost). However, there is no specific counterpart for ARM. Only a default (default_builtin_vectorization_cost) is used.
So, my preliminary plan is: Step 1. Implement a ARM specific cost modeling function and tuning it. Step 2. Enable GCC and Linaro GCC to vectorize loops that ARMCC can handle now.
Besides, I want to know your opinion about this plan and how you will tuning peeling.
Thanks.
Wang, Tao
-----Original Messages----- From: "Michael Hope" michael.hope@linaro.org Sent Time: Monday, November 26, 2012 To: "王韬" wangtao2010@ict.ac.cn Cc: linaro-neon-opt@lists.linaro.org Subject: Re: [Linaro-neon-opt] how to enhance vectorizer cost model for neon
On 21 November 2012 22:45, 王韬 wangtao2010@ict.ac.cn wrote:
Hi all,
I am working on enhancing vectorizer cost model for neon.
I note that related work has been discussed. 0. 2011-05-26, https://blueprints.launchpad.net/gcc-linaro/+spec/neon-regressions-11.11
- 2012-06-12, https://blueprints.launchpad.net/gcc-linaro/+spec/vectorizer-cost-model
- 2012-06-24, https://blueprints.launchpad.net/gcc-linaro/+spec/refactor-backend-cost-mode...
So, is there any further work on this topic ? Who is working on this ? Maybe I can make some contribuitions to this problem.
Hi there. Thanks for your interest. I find the vectoriser quite interesting as our team is all about performance and, when it kicks in, the vectoriser can give big gains for little developer effort.
The vectoriser currently does quite well and has few regressions on the SPEC and EEMBC benchmarks that we normally run. The next step is tuning peeling as it causes the few remaining regressions and generally doesn't benefit us on ARM. The vectoriser cost model is interesting past that as it helps the vectoriser decide if vectorisation for a certain piece of code is profitable.
We develop upstream on the gcc-patches@gcc.gnu.org list. That's a good place to discuss anything. At the moment I don't have any vectoriser experts on my team but we'll help where we can.
-- Michael
On 26 November 2012 16:54, 王韬 wangtao2010@ict.ac.cn wrote:
Hi Michael, Currently, I also note some eembc test cases have regression. And I'm trying to figure out the reason by profiling.
I note that:
- In some cases, ARMCC can vectorize the loops while GCC and Linaro Gcc cannot.
- In test cases ip_pktcheckb**(eembc/networking), ARMCC, GCC, and Linaro GCC cannot make right vectorization decision. On X86, GCC does have a cost model modeling the X86 instructions (function ix86_builtin_vectorization_cost). However, there is no specific counterpart for ARM. Only a default (default_builtin_vectorization_cost) is used.
So, my preliminary plan is: Step 1. Implement a ARM specific cost modeling function and tuning it.
Sounds good.
Step 2. Enable GCC and Linaro GCC to vectorize loops that ARMCC can handle now.
Worth checking.
Besides, I want to know your opinion about this plan and how you will tuning peeling.
The first step with peeling is to disable it and show that no benchmarks regress. Past that we can look into specifics, as the vld1 with alignment assertions is faster than an unaligned vld1.
-- Michael
linaro-neon-opt@lists.linaro.org