Thoughts?
Thanx, Paul
----- Forwarded message from Jesper Juhl jj@chaosbits.net -----
Date: Sun, 6 Mar 2011 00:49:58 +0100 (CET) From: Jesper Juhl jj@chaosbits.net To: linux-kernel@vger.kernel.org cc: Andrew Morton akpm@linux-foundation.org, "Paul E. McKenney" paulmck@linux.vnet.ibm.com, Ingo Molnar mingo@elte.hu, Daniel Lezcano daniel.lezcano@free.fr, Eric Paris eparis@redhat.com, Roman Zippel zippel@linux-m68k.org Subject: [PATCH][RFC] CC_OPTIMIZE_FOR_SIZE should default to N
I believe that the majority of systems we are built on want a -O2 compiled kernel. Optimizing for size (-Os) is mainly benneficial for embedded systems and systems with very small CPU caches (correct me if I'm wrong). So it seems wrong to me that CC_OPTIMIZE_FOR_SIZE defaults to 'y' and recommends saying 'Y' if unsure. I believe it should default to 'n' and recommend that if unsure. People who bennefit from -Os know who they are and can enable the option if needed/wanted - the majority shouldn't select this. Right?
Signed-off-by: Jesper Juhl jj@chaosbits.net --- Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index be788c0..7e16268 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -886,12 +886,12 @@ endif
config CC_OPTIMIZE_FOR_SIZE bool "Optimize for size" - default y + default n help Enabling this option will pass "-Os" instead of "-O2" to gcc resulting in a smaller kernel.
- If unsure, say Y. + If unsure, say N.
config SYSCTL bool
To quote the GCC manual:
-Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. -Os disables the following optimization flags:
-falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version
That said (and unless there's other undocumented differences), it would seem to take some expertise to comment on the trade off between memory efforts (cache misses, TLB etc) vs instructional effects by having these optimizations off.
I am certainly not that expert but I suspect given the memory bus speeds that typical arm hardware has, we'd want -Os over -O2.
Regards, Tom
On Sat, Mar 5, 2011 at 7:55 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote:
Thoughts?
Thanx, Paul
----- Forwarded message from Jesper Juhl jj@chaosbits.net -----
Date: Sun, 6 Mar 2011 00:49:58 +0100 (CET) From: Jesper Juhl jj@chaosbits.net To: linux-kernel@vger.kernel.org cc: Andrew Morton akpm@linux-foundation.org, "Paul E. McKenney" paulmck@linux.vnet.ibm.com, Ingo Molnar mingo@elte.hu, Daniel Lezcano daniel.lezcano@free.fr, Eric Paris eparis@redhat.com, Roman Zippel zippel@linux-m68k.org Subject: [PATCH][RFC] CC_OPTIMIZE_FOR_SIZE should default to N
I believe that the majority of systems we are built on want a -O2 compiled kernel. Optimizing for size (-Os) is mainly benneficial for embedded systems and systems with very small CPU caches (correct me if I'm wrong). So it seems wrong to me that CC_OPTIMIZE_FOR_SIZE defaults to 'y' and recommends saying 'Y' if unsure. I believe it should default to 'n' and recommend that if unsure. People who bennefit from -Os know who they are and can enable the option if needed/wanted - the majority shouldn't select this. Right?
Signed-off-by: Jesper Juhl jj@chaosbits.net
Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index be788c0..7e16268 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -886,12 +886,12 @@ endif
config CC_OPTIMIZE_FOR_SIZE bool "Optimize for size"
- default y
- default n
help Enabling this option will pass "-Os" instead of "-O2" to gcc resulting in a smaller kernel.
- If unsure, say Y.
- If unsure, say N.
config SYSCTL bool
-- Jesper Juhl jj@chaosbits.net http://www.chaosbits.net/ Plain text mails only, please. Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
----- End forwarded message -----
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
On Tue, Mar 8, 2011 at 8:47 AM, Tom Gall tom.gall@linaro.org wrote:
To quote the GCC manual:
-Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. -Os disables the following optimization flags:
-falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version
That said (and unless there's other undocumented differences), it would seem to take some expertise to comment on the trade off between memory efforts (cache misses, TLB etc) vs instructional effects by having these optimizations off.
I am certainly not that expert but I suspect given the memory bus speeds that typical arm hardware has, we'd want -Os over -O2.
I don't know about the kernel, but here's the difference for some other programs: * pybench: -Os is 24 % slower than -O2 * skia: -Os is 18 % slower than -O2 * CoreMark is similar (I've lost the numbers)
...so you're going to need a large bandwidth saving to beat the core speed improvement of -O2.
-- Michael
On Tue, Mar 08, 2011 at 09:07:10AM +1300, Michael Hope wrote:
On Tue, Mar 8, 2011 at 8:47 AM, Tom Gall tom.gall@linaro.org wrote:
To quote the GCC manual:
-Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. -Os disables the following optimization flags:
-falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version
That said (and unless there's other undocumented differences), it would seem to take some expertise to comment on the trade off between memory efforts (cache misses, TLB etc) vs instructional effects by having these optimizations off.
I am certainly not that expert but I suspect given the memory bus speeds that typical arm hardware has, we'd want -Os over -O2.
I don't know about the kernel, but here's the difference for some other programs:
- pybench: -Os is 24 % slower than -O2
- skia: -Os is 18 % slower than -O2
- CoreMark is similar (I've lost the numbers)
...so you're going to need a large bandwidth saving to beat the core speed improvement of -O2.
So no objections to Jesper Juhl's proposal to make -Os not be default, then?
Thanx, Paul
On 03/07/2011 01:47 PM, Tom Gall wrote:
To quote the GCC manual:
-Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. -Os disables the following optimization flags:
-falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version
That said (and unless there's other undocumented differences), it would seem to take some expertise to comment on the trade off between memory efforts (cache misses, TLB etc) vs instructional effects by having these optimizations off.
I am certainly not that expert but I suspect given the memory bus speeds that typical arm hardware has, we'd want -Os over -O2.
In my experience, -Os also turns off inlining of functions. The case I have looked at is swab32 function in u-boot. With -Os (and -march=armv7-a) I get calls to this function:
__fswab32: rev r0, r0 bx lr
With -O2, I get rev inlined.
Rob