Hi all,
I've recently become aware that a few packages are causing alignment faults on ARM, and are relying on the alignment fixup emulation code in the kernel in order to work.
Such faults are very expensive in terms of CPU cycles, and can generally only result from wrong code (for example, C/C++ code which violates the relevant language standards, assembler which makes invalid assumptions, or functions called with misaligned pointers due to other bugs).
Currently, on a natty Ubuntu desktop image I observe no faults except from firefox and mono-based apps (see below).
As part of the general effort to make open source on ARM better, I think it would be great if we can disable the alignment fixups (or at least enable logging) and work with upstreams to get the affected packages fixed.
For release images we might want to be more forgiving, but for development we have the option of being more aggressive.
The number of affected packages and bugs appears small enough for the fixing effort to be feasible, without temporarily breaking whole distros.
For ARM, we can achieve the goal by augmenting the default kernel command- line options: either
alignment=3 Fix up each alingment fault, but also log the faulting address and name of the offending process to dmesg.
alignment=5 Pass each alignment fault to the user process as SIGBUS (fatal by default) and log the faulting address and name of the offending process to dmesg.
Fault statistics cat also be obtained at runtime by reading /proc/cpu/alignment.
For other architectures, there may be other arch-specific ways of achieving something similar.
I'd be interested in people's views on this.
Cheers ---Dave
More background:
Two known instances of misbehaving userland apps are:
1) firefox-4.x (bug report pending)
A char array declared as a container for C++ objects is cast directly to an object pointer type and deferenced, without ensuring proper alignment.
By sheer luck, the presence of an extra member in the containing class in firefox-3.x means that the char array has a different alignment and so the faults don't occur.
2) gtk-sharp2 (https://bugs.launchpad.net/bugs/798315) (affecting mono-based GUI apps such as banshee and tomboy)
char pointers are cast to 64-bit integer pointers and deferenced, as an attempt at comparing string prefixes faster.
These apps typically generate hundreds or thousands of faults per session, but not millions, but it's still quite a lot of noise in syslog.
I think these are likely to be representative of typical causes of alignment faults: i.e., attempted optimisations which break the rules of the language, and which only show in certain builds, or as side-effects of routine maintenance.
Code like that is going to be a massive own goal for performance on ARM and other architectures which fault unaligned accesses, since the resulting faults are likely to cost thousands of cycles per instance.
On Friday 17 June 2011 14:10:11 Dave Martin wrote:
As part of the general effort to make open source on ARM better, I think it would be great if we can disable the alignment fixups (or at least enable logging) and work with upstreams to get the affected packages fixed.
For release images we might want to be more forgiving, but for development we have the option of being more aggressive.
Yes, makes sense.
These apps typically generate hundreds or thousands of faults per session, but not millions, but it's still quite a lot of noise in syslog.
Then we should make sure that an appropriate rate limiting is in place, like the patch below (untested) would do.
Arnd
8<--- ARM: Add rate-limiting to alignment trap printk
Malicious or buggy applications can easily flood syslog by accessing unaligned data. Better use printk_ratelimited for the warning to prevent this while also allowing us to see the important output.
Signed-off-by: Arnd Bergmann arnd@arndb.de
diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c index 724ba3b..462b98d 100644 --- a/arch/arm/mm/alignment.c +++ b/arch/arm/mm/alignment.c @@ -873,9 +873,9 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs) ai_user += 1;
if (ai_usermode & UM_WARN) - printk("Alignment trap: %s (%d) PC=0x%08lx Instr=0x%0*lx " - "Address=0x%08lx FSR 0x%03x\n", current->comm, - task_pid_nr(current), instrptr, + printk_ratelimited("Alignment trap: %s (%d) PC=0x%08lx " + "Instr=0x%0*lx Address=0x%08lx FSR 0x%03x\n", + current->comm, task_pid_nr(current), instrptr, isize << 1, isize == 2 ? tinstr : instr, addr, fsr);
On Fri, 17 Jun 2011, Arnd Bergmann wrote:
On Friday 17 June 2011 14:10:11 Dave Martin wrote:
As part of the general effort to make open source on ARM better, I think it would be great if we can disable the alignment fixups (or at least enable logging) and work with upstreams to get the affected packages fixed.
For release images we might want to be more forgiving, but for development we have the option of being more aggressive.
Yes, makes sense.
These apps typically generate hundreds or thousands of faults per session, but not millions, but it's still quite a lot of noise in syslog.
Then we should make sure that an appropriate rate limiting is in place, like the patch below (untested) would do.
Arnd
8<--- ARM: Add rate-limiting to alignment trap printk
Malicious or buggy applications can easily flood syslog by accessing unaligned data. Better use printk_ratelimited for the warning to prevent this while also allowing us to see the important output.
No. The logging doesn't happen by default. You have to set the appropriate flag via the kernel cmdline or at runtime by echoing that flag to /proc/cpu/alignment which can be done by root only. This is therefore a debugging facility that should not be rate limited otherwise it loses its purpose.
The only effective rate limiting configuration I would recommend is to SIGBUS misaligned accesses by default. And that's also supported already with the right flag.
Nicolas
On Friday 17 June 2011, Nicolas Pitre wrote:
On Fri, 17 Jun 2011, Arnd Bergmann wrote:
On Friday 17 June 2011 14:10:11 Dave Martin wrote:
As part of the general effort to make open source on ARM better, I think it would be great if we can disable the alignment fixups (or at least enable logging) and work with upstreams to get the affected packages fixed.
The only effective rate limiting configuration I would recommend is to SIGBUS misaligned accesses by default. And that's also supported already with the right flag.
So should we change the default in the prerelease kernels to enable SIGBUS? The immediate result of that would be to break firefox, which would cause a lot of questions on the mailing list.
Arnd
On Sat, 18 Jun 2011, Arnd Bergmann wrote:
On Friday 17 June 2011, Nicolas Pitre wrote:
On Fri, 17 Jun 2011, Arnd Bergmann wrote:
On Friday 17 June 2011 14:10:11 Dave Martin wrote:
As part of the general effort to make open source on ARM better, I think it would be great if we can disable the alignment fixups (or at least enable logging) and work with upstreams to get the affected packages fixed.
The only effective rate limiting configuration I would recommend is to SIGBUS misaligned accesses by default. And that's also supported already with the right flag.
So should we change the default in the prerelease kernels to enable SIGBUS? The immediate result of that would be to break firefox, which would cause a lot of questions on the mailing list.
Only if we really plan on fixing Firefox, and upstream is interested in accepting the fix. Otherwise there is no point, especially when it is possible for those actually interested in this issue to change the misaligned access behavior at run time for themselves.
Nicolas
On Sat, Jun 18, 2011 at 10:48:16AM -0400, Nicolas Pitre wrote:
On Sat, 18 Jun 2011, Arnd Bergmann wrote:
On Friday 17 June 2011, Nicolas Pitre wrote:
On Fri, 17 Jun 2011, Arnd Bergmann wrote:
On Friday 17 June 2011 14:10:11 Dave Martin wrote:
As part of the general effort to make open source on ARM better, I think it would be great if we can disable the alignment fixups (or at least enable logging) and work with upstreams to get the affected packages fixed.
The only effective rate limiting configuration I would recommend is to SIGBUS misaligned accesses by default. And that's also supported already with the right flag.
So should we change the default in the prerelease kernels to enable SIGBUS? The immediate result of that would be to break firefox, which would cause a lot of questions on the mailing list.
Only if we really plan on fixing Firefox, and upstream is interested in accepting the fix. Otherwise there is no point, especially when it is possible for those actually interested in this issue to change the misaligned access behavior at run time for themselves.
It looks like the maemo guys beat us to this whole exercise.
This bug has already been reported and fixed upstream.
https://bugzilla.mozilla.org/show_bug.cgi?id=634594
It looks like it's fixed in mozilla-5.0 (and Ubuntu oneiric)
Cheers ---Dave
On 06/17/2011 01:10 PM, Somebody in the thread at some point said:
Hi -
I've recently become aware that a few packages are causing alignment faults on ARM, and are relying on the alignment fixup emulation code in the kernel in order to work.
Just a FYI a lot of later ARM chips are solving alignment fixups in hardware in the Bus Interface Unit, so the problems won't show up in kernel stats.
Such faults are very expensive in terms of CPU cycles, and can generally only result from wrong code (for example, C/C++ code which violates the relevant language standards, assembler which makes invalid assumptions, or functions called with misaligned pointers due to other bugs).
Agreed it's usually evidence of something broken and / or evil in the code.
There is still going to be a small cost even in hardware fixup so this is very much worth solving despite it's "becoming invisible" because the chips are hiding / solving it already.
-Andy
On 06/17/2011 08:11 AM, Andy Green wrote:
On 06/17/2011 01:10 PM, Somebody in the thread at some point said:
Hi -
I've recently become aware that a few packages are causing alignment faults on ARM, and are relying on the alignment fixup emulation code in the kernel in order to work.
Just a FYI a lot of later ARM chips are solving alignment fixups in hardware in the Bus Interface Unit, so the problems won't show up in kernel stats.
Such faults are very expensive in terms of CPU cycles, and can generally only result from wrong code (for example, C/C++ code which violates the relevant language standards, assembler which makes invalid assumptions, or functions called with misaligned pointers due to other bugs).
Agreed it's usually evidence of something broken and / or evil in the code.
There is still going to be a small cost even in hardware fixup so this is very much worth solving despite it's "becoming invisible" because the chips are hiding / solving it already.
But I believe that h/w feature is turned off in Linux by default. You have to add noalign to the kernel command line to enable.
Rob
On 06/17/2011 08:17 PM, Somebody in the thread at some point said:
On 06/17/2011 08:11 AM, Andy Green wrote:
On 06/17/2011 01:10 PM, Somebody in the thread at some point said:
Hi -
I've recently become aware that a few packages are causing alignment faults on ARM, and are relying on the alignment fixup emulation code in the kernel in order to work.
Just a FYI a lot of later ARM chips are solving alignment fixups in hardware in the Bus Interface Unit, so the problems won't show up in kernel stats.
Such faults are very expensive in terms of CPU cycles, and can generally only result from wrong code (for example, C/C++ code which violates the relevant language standards, assembler which makes invalid assumptions, or functions called with misaligned pointers due to other bugs).
Agreed it's usually evidence of something broken and / or evil in the code.
There is still going to be a small cost even in hardware fixup so this is very much worth solving despite it's "becoming invisible" because the chips are hiding / solving it already.
But I believe that h/w feature is turned off in Linux by default. You have to add noalign to the kernel command line to enable.
I think you'll find the hardware fixups are enabled by default on CPUs with that design, quite possibly it can't be turned off.
This test was done on a Panda:
root@linaro:~# cat /proc/cmdline console=tty0 console=ttyO2,115200n8 earlycon=ttyO2,115200n8 earlyprintk=1 root=/dev/mmcblk0p2 rootwait ro fixrtc vram=32M omapfb.vram=0:16M,1:16M mem=456M
ie, nothing about "noalign".
Test code crafted to blow alignment faults on the dereference:
#include <stdio.h> #include <string.h>
int main(int argc, char * argv[]) { char buf[8]; void *v = &buf[1]; unsigned int *p = (unsigned int *)v;
strcpy(buf, "abcdefg");
printf("0x%08x\n", *p);
return 0; }
Default case with soft fixup enabled:
root@linaro:~# cat /proc/cpu/alignment User: 0 System: 0 Skipped: 0 Half: 0 Word: 0 DWord: 0 Multi: 0 User faults: 2 (fixup)
root@linaro:~# gcc test.c root@linaro:~# ./a.out 0x65646362
Test case with soft fixup disabled:
root@linaro:~# echo 0 > /proc/cpu/alignment root@linaro:~# cat /proc/cpu/alignment User: 0 System: 0 Skipped: 0 Half: 0 Word: 0 DWord: 0 Multi: 0 User faults: 0 (ignored)
root@linaro:~# ./a.out 0x65646362
ie, the result is always fixed up on OMAP4 and kernel fixup code never gets an exception even.
-Andy
There is still going to be a small cost even in hardware fixup so this is very much worth solving despite it's "becoming invisible" because the chips are hiding / solving it already.
But I believe that h/w feature is turned off in Linux by default. You have to add noalign to the kernel command line to enable.
I think you'll find the hardware fixups are enabled by default on CPUs with that design, quite possibly it can't be turned off.
The situation is much more complicated than that. There are two completely different models for misaligned accesses (v6 and pre-v6). v7 cores are only required to support the former.
However even under the v6 model some instructions will fault on misaligned addresses, and the CPU may be configured to fault many others. The exact behavior depends on the particular instruction chosen by the compiler. I don't know whether Linux currently knows how to enable alignment checking on v6/v7 hardware.
int main(int argc, char * argv[]) {
char buf[8]; void *v = &buf[1]; unsigned int *p = (unsigned int *)v;
This does not (reliably) do what you expect. The compiler need not align buf.
Paul
On Fri, 17 Jun 2011, Paul Brook wrote:
There is still going to be a small cost even in hardware fixup so this is very much worth solving despite it's "becoming invisible" because the chips are hiding / solving it already.
But I believe that h/w feature is turned off in Linux by default. You have to add noalign to the kernel command line to enable.
I think you'll find the hardware fixups are enabled by default on CPUs with that design, quite possibly it can't be turned off.
The situation is much more complicated than that. There are two completely different models for misaligned accesses (v6 and pre-v6). v7 cores are only required to support the former.
However even under the v6 model some instructions will fault on misaligned addresses, and the CPU may be configured to fault many others. The exact behavior depends on the particular instruction chosen by the compiler. I don't know whether Linux currently knows how to enable alignment checking on v6/v7 hardware.
It does. Has done for a long time. Please see arch/arm/mm/alignment.c:
static int __init alignment_init(void) { [...] /* * ARMv6 and later CPUs can perform unaligned accesses for * most single load and store instructions up to word size. * LDM, STM, LDRD and STRD still need to be handled. * * Ignoring the alignment fault is not an option on these * CPUs since we spin re-faulting the instruction without * making any progress. */ if (cpu_architecture() >= CPU_ARCH_ARMv6 && (cr_alignment & CR_U)) { cr_alignment &= ~CR_A; cr_no_alignment &= ~CR_A; set_cr(cr_alignment); ai_usermode = UM_FIXUP; }
And if you look at arch/arm/mm/proc-v6.S in __v6_setup, you'll find out that the CR_U bit is set.
int main(int argc, char * argv[]) {
char buf[8]; void *v = &buf[1]; unsigned int *p = (unsigned int *)v;
This does not (reliably) do what you expect. The compiler need not align buf.
Printing the value of p should clarify this.
And, as we can see above, the "simple" accesses are left to the hardware to fix up. However, if the misaligned access is performed using a 64-bit value pointer, then the kernel will trap an exception and the access will be simulated.
Nicolas
char buf[8]; void *v = &buf[1]; unsigned int *p = (unsigned int *)v;
This does not (reliably) do what you expect. The compiler need not align buf.
Printing the value of p should clarify this.
And, as we can see above, the "simple" accesses are left to the hardware to fix up. However, if the misaligned access is performed using a 64-bit value pointer, then the kernel will trap an exception and the access will be simulated.
I think you've missed my point. gcc may (though unlikely in this case) choose to place buf at an odd address. In which case p will happen to be properly aligned.
I'm not sure where you get "64-bit value pointer" from. *p is only a word sized access, and memcpy is defined in terms of bytes so will only be promoted to wider accesses when the compiler believes it is safe.
Paul
On Sat, 18 Jun 2011, Paul Brook wrote:
char buf[8]; void *v = &buf[1]; unsigned int *p = (unsigned int *)v;
This does not (reliably) do what you expect. The compiler need not align buf.
Printing the value of p should clarify this.
And, as we can see above, the "simple" accesses are left to the hardware to fix up. However, if the misaligned access is performed using a 64-bit value pointer, then the kernel will trap an exception and the access will be simulated.
I think you've missed my point. gcc may (though unlikely in this case) choose to place buf at an odd address. In which case p will happen to be properly aligned.
Sorry for being too vague.
My point is to print the value of p i.e. the actual address used to perform the access, which would confirm that the access is truly misaligned or not. That won't force any particular alignment on the buffer obviously, but at least this would clear any doubt as to the validity of the test.
I'm not sure where you get "64-bit value pointer" from. *p is only a word sized access, and memcpy is defined in terms of bytes so will only be promoted to wider accesses when the compiler believes it is safe.
Again I probably was too vague. So let me provide the actual code modified from Andy's expressing what I mean:
int main(int argc, char * argv[]) { char buf[8]; void *v = &buf[1]; unsigned int *p = (unsigned int *)v;
strcpy(buf, "abcdefg");
printf("*%p = 0x%08x\n", p, *p);
return 0; }
That's the original, modified to print the actual address used, which should confirm there is actually a misaligned access performed. And in this case, confirmed by the kernel code I quoted previously, the hardware will perform the misaligned access by itself on ARMv6 and above.
Now, if we use this code instead:
int main(int argc, char * argv[]) { char buf[8]; void *v = &buf[1]; unsigned long long *p = (unsigned long long *)v;
strcpy(buf, "abcdefg");
printf("*p = 0x%016x\n", p, *p);
return 0; }
In this case the kernel alignment trap will be involved, and the stats in /proc/cpu/alignment will increase, as the hardware won't perform the access automatically here.
In both cases the result would be what people expects, although the second case will be far more expensive.
Nicolas
On Sat, 18 Jun 2011, Nicolas Pitre wrote:
int main(int argc, char * argv[]) { char buf[8]; void *v = &buf[1]; unsigned int *p = (unsigned int *)v;
strcpy(buf, "abcdefg"); printf("*%p = 0x%08x\n", p, *p); return 0;
}
Obviously, there is a buffer overflow here, so the buf array should be enlarged.
Nicolas
On 06/17/2011 11:53 PM, Somebody in the thread at some point said:
Hi -
int main(int argc, char * argv[]) {
char buf[8]; void *v =&buf[1]; unsigned int *p = (unsigned int *)v;
This does not (reliably) do what you expect. The compiler need not align buf.
What? Somebody complaining my code does not blow enough faults and exceptions? ^^
If I retry the same test with this, which is definitely proof against such doubts -->
#include <stdio.h> #include <string.h>
int main(int argc, char * argv[]) { char buf[8]; void *v = &buf[1]; void *v1 = &buf[2]; unsigned int *p = (unsigned int *)v; unsigned int *p1 = (unsigned int *)v1;
strcpy(buf, "abcdefg");
printf("0x%08x\n", *p); printf("0x%08x\n", *p1);
return 0; }
I get
root@linaro:~# echo 2 > /proc/cpu/alignment root@linaro:~# ./a.out 0x65646362 0x66656463 root@linaro:~# echo 0 > /proc/cpu/alignment root@linaro:~# ./a.out 0x65646362 0x66656463
ie, it is still always fixed up.
Let's not lose sight of the point of the thread, Dave Martin wants to root out remaining alignment faults in userland which is a great idea, I was warning him depending on what he tests on, eg, Panda, by default he won't be able to see any alignment faults in the first place in the soft fixup code that allows him to get a signal and find the bad code in gdb. And this code does prove that to be the case.
-Andy
On Fri, Jun 17, 2011 at 11:53:21PM +0100, Paul Brook wrote:
There is still going to be a small cost even in hardware fixup so this is very much worth solving despite it's "becoming invisible" because the chips are hiding / solving it already.
But I believe that h/w feature is turned off in Linux by default. You have to add noalign to the kernel command line to enable.
I think you'll find the hardware fixups are enabled by default on CPUs with that design, quite possibly it can't be turned off.
The situation is much more complicated than that. There are two completely different models for misaligned accesses (v6 and pre-v6). v7 cores are only required to support the former.
However even under the v6 model some instructions will fault on misaligned addresses, and the CPU may be configured to fault many others. The exact behavior depends on the particular instruction chosen by the compiler. I don't know whether Linux currently knows how to enable alignment checking on v6/v7 hardware.
Last time I looked into this, I think the conclusion was that you would have to hack the kernel in order to enable full alignment faulting on v6/v7. The CR.A bit setting may be hardcoded for these arches. (But my knowledge could be out of date...)
However, this slightly misses the point: for linaro, ubuntu and armhf at least, we are interested in fixing apps which incur the cost of alignment faults. Non-faulting misaligned instructions on v6/v7 are therefore much less of a problem, since these are much less expensive.
int main(int argc, char * argv[]) {
char buf[8]; void *v = &buf[1]; unsigned int *p = (unsigned int *)v;
This does not (reliably) do what you expect. The compiler need not align buf.
In fact, dereferencing p is unlikely to cause a fault on v6/v7.
Specifically, LDM/STM and LDRD/STRD, plus VFP register loads and stores (but not NEON-specific loads and stores) need to be word aligned and will fault otherwise. These faults are unconditional: the CPU won't do these accesses misaligned natively.
Typically, the compiler emits these instructions when:
a) copying small objects larger than 32 bits (for large objects, memcpy would be used); or
b) accessing fundamental 64-bit types.
Firefox fails under (a); gtk-sharp2 fails under (b). Both cases arise from acting on a misaligned pointer due to an unsafe attempt at optimisation.
Cheers ---Dave
On Fri, Jun 17, 2011 at 01:10:11PM +0100, Dave Martin wrote:
For ARM, we can achieve the goal by augmenting the default kernel command- line options: either
alignment=3 Fix up each alingment fault, but also log the faulting address and name of the offending process to dmesg.
alignment=5 Pass each alignment fault to the user process as SIGBUS (fatal by default) and log the faulting address and name of the offending process to dmesg.
Fault statistics cat also be obtained at runtime by reading /proc/cpu/alignment.
For other architectures, there may be other arch-specific ways of achieving something similar.
Other architectures[1] use the 'prctl' tool, which uses the prctl(PR_SET_UNALIGN,...) kernel interface to control the unaligned trap behavior for the process. If this can be sanely togglable on ARM at runtime, it would be keen to use the same interface on this arch.
HTH,
On Fri, Jun 17, 2011 at 03:23:49PM -0700, Steve Langasek wrote:
On Fri, Jun 17, 2011 at 01:10:11PM +0100, Dave Martin wrote:
For ARM, we can achieve the goal by augmenting the default kernel command- line options: either
alignment=3 Fix up each alingment fault, but also log the faulting address and name of the offending process to dmesg.
alignment=5 Pass each alignment fault to the user process as SIGBUS (fatal by default) and log the faulting address and name of the offending process to dmesg.
Fault statistics cat also be obtained at runtime by reading /proc/cpu/alignment.
For other architectures, there may be other arch-specific ways of achieving something similar.
Other architectures[1] use the 'prctl' tool, which uses the prctl(PR_SET_UNALIGN,...) kernel interface to control the unaligned trap behavior for the process. If this can be sanely togglable on ARM at runtime, it would be keen to use the same interface on this arch.
I guess that would be a new task for someone, if we consider it important enough -- I'll raise it at today's linaro kernel working group meeting.
I don't know if that prctl call is wired up at all on arm kernels, but it certainly won't work for v6/v7 at present. Enabling it looks fairly straightforward, though.
Cheers ---Dave
Dave Martin dave.martin@linaro.org writes: Hi,
Hi all,
I've recently become aware that a few packages are causing alignment faults on ARM, and are relying on the alignment fixup emulation code in the kernel in order to work.
Such faults are very expensive in terms of CPU cycles, and can generally only result from wrong code (for example, C/C++ code which violates the relevant language standards, assembler which makes invalid assumptions, or functions called with misaligned pointers due to other bugs).
Currently, on a natty Ubuntu desktop image I observe no faults except from firefox and mono-based apps (see below).
As part of the general effort to make open source on ARM better, I think it would be great if we can disable the alignment fixups (or at least enable logging) and work with upstreams to get the affected packages fixed.
For release images we might want to be more forgiving, but for development we have the option of being more aggressive.
The number of affected packages and bugs appears small enough for the fixing effort to be feasible, without temporarily breaking whole distros.
For ARM, we can achieve the goal by augmenting the default kernel command- line options: either
alignment=3 Fix up each alingment fault, but also log the faulting address and name of the offending process to dmesg. alignment=5 Pass each alignment fault to the user process as SIGBUS (fatal by default) and log the faulting address and name of the offending process to dmesg.
iirc, someone sent some months/years ago a patch to change the default but it has been rejected because there are (was ?) some libc including glibc doing some unaligned access [1], and this can happen early in the boot process. In this kind of case, things like getting a sigbus would hurt.
Also, as noted by someone else in the thread, you do want to test on something like armv5* or v4* because there are high chances than the trap used by the alignment fix won't be triggered at all on >= armv6.
Arnaud
[1] See commit log of commit d944d549aa86e08cba080396513234cf048fee1f.
On Sat, 18 Jun 2011, Arnaud Patard wrote:
Dave Martin dave.martin@linaro.org writes: Hi,
Hi all,
I've recently become aware that a few packages are causing alignment faults on ARM, and are relying on the alignment fixup emulation code in the kernel in order to work.
Such faults are very expensive in terms of CPU cycles, and can generally only result from wrong code (for example, C/C++ code which violates the relevant language standards, assembler which makes invalid assumptions, or functions called with misaligned pointers due to other bugs).
Currently, on a natty Ubuntu desktop image I observe no faults except from firefox and mono-based apps (see below).
As part of the general effort to make open source on ARM better, I think it would be great if we can disable the alignment fixups (or at least enable logging) and work with upstreams to get the affected packages fixed.
For release images we might want to be more forgiving, but for development we have the option of being more aggressive.
The number of affected packages and bugs appears small enough for the fixing effort to be feasible, without temporarily breaking whole distros.
For ARM, we can achieve the goal by augmenting the default kernel command- line options: either
alignment=3 Fix up each alingment fault, but also log the faulting address and name of the offending process to dmesg. alignment=5 Pass each alignment fault to the user process as SIGBUS (fatal by default) and log the faulting address and name of the offending process to dmesg.
iirc, someone sent some months/years ago a patch to change the default
That was me.
but it has been rejected because there are (was ?) some libc including glibc doing some unaligned access [1], and this can happen early in the boot process. In this kind of case, things like getting a sigbus would hurt.
This is only partly true.
Rewind about 15 years ago when all that Linux supported was ARMv3. On ARMv3 there is no instruction for doing half-word loads/stores, and no instruction to sign extend a loaded byte.
In those days, the compiler was relying on a documented and architecturally defined behavior of misaligned loads/stores which is to rotate the bytes comprising the otherwise aligned word, the rotation position being defined by the sub-word offset. Doing so allowed for certain optimizations to avoid extra shifts and masks.
Then a bunch of binaries were built with a version of GCC making use of those misaligned access tricks.
Then came along ARMv4 with its LDRH, LDRSH, and LDRSB instructions, making those misaligned tricks unnecessary. Hence GCC deprecated those optimizations. Today only the old farts amongst us still remember about this.
So for quite a while now, having a misaligned access on ARM before ARMv6 is quite likely to not produce the commonly expected result. That's why there is code in the kernel to trap and fix up misaligned accesses. However, it is turned off by default for user space. Why?
Turns out that a prominent ARM developer still has binaries from the ARMv3 era around, and the default of not fixing up misaligned user space accesses is for remaining compatible with them.
So if you do have a version of glibc that is not from 15 years ago (that would have to be a.out and not ELF if it was) then you do not want to let misaligned accesses go through unfixed, otherwise you'll simply have latent data corruption somewhere.
Also, as noted by someone else in the thread, you do want to test on something like armv5* or v4* because there are high chances than the trap used by the alignment fix won't be triggered at all on >= armv6.
Given that Linaro is working only with Thumb2-compiled user space, that implies ARMv6 and above only.
[1] See commit log of commit d944d549aa86e08cba080396513234cf048fee1f.
And note the "if not fixed up, results in segfaults" in that log, meaning that the current default is wrong for that case.
Nicolas
On 18 June 2011 22:17, Nicolas Pitre nicolas.pitre@linaro.org wrote:
Turns out that a prominent ARM developer still has binaries from the ARMv3 era around, and the default of not fixing up misaligned user space accesses is for remaining compatible with them.
So if you do have a version of glibc that is not from 15 years ago (that would have to be a.out and not ELF if it was) then you do not want to let misaligned accesses go through unfixed, otherwise you'll simply have latent data corruption somewhere.
Can we tie the alignment correction default to depend if a.out support is in the kernel or not?
Riku
On Sat, Jun 18, 2011 at 03:17:59PM -0400, Nicolas Pitre wrote:
On Sat, 18 Jun 2011, Arnaud Patard wrote:
Dave Martin dave.martin@linaro.org writes: Hi,
Hi all,
I've recently become aware that a few packages are causing alignment faults on ARM, and are relying on the alignment fixup emulation code in the kernel in order to work.
Such faults are very expensive in terms of CPU cycles, and can generally only result from wrong code (for example, C/C++ code which violates the relevant language standards, assembler which makes invalid assumptions, or functions called with misaligned pointers due to other bugs).
Currently, on a natty Ubuntu desktop image I observe no faults except from firefox and mono-based apps (see below).
As part of the general effort to make open source on ARM better, I think it would be great if we can disable the alignment fixups (or at least enable logging) and work with upstreams to get the affected packages fixed.
For release images we might want to be more forgiving, but for development we have the option of being more aggressive.
The number of affected packages and bugs appears small enough for the fixing effort to be feasible, without temporarily breaking whole distros.
For ARM, we can achieve the goal by augmenting the default kernel command- line options: either
alignment=3 Fix up each alingment fault, but also log the faulting address and name of the offending process to dmesg. alignment=5 Pass each alignment fault to the user process as SIGBUS (fatal by default) and log the faulting address and name of the offending process to dmesg.
iirc, someone sent some months/years ago a patch to change the default
That was me.
but it has been rejected because there are (was ?) some libc including glibc doing some unaligned access [1], and this can happen early in the boot process. In this kind of case, things like getting a sigbus would hurt.
This is only partly true.
Rewind about 15 years ago when all that Linux supported was ARMv3. On ARMv3 there is no instruction for doing half-word loads/stores, and no instruction to sign extend a loaded byte.
In those days, the compiler was relying on a documented and architecturally defined behavior of misaligned loads/stores which is to rotate the bytes comprising the otherwise aligned word, the rotation position being defined by the sub-word offset. Doing so allowed for certain optimizations to avoid extra shifts and masks.
Then a bunch of binaries were built with a version of GCC making use of those misaligned access tricks.
Then came along ARMv4 with its LDRH, LDRSH, and LDRSB instructions, making those misaligned tricks unnecessary. Hence GCC deprecated those optimizations. Today only the old farts amongst us still remember about this.
So for quite a while now, having a misaligned access on ARM before ARMv6 is quite likely to not produce the commonly expected result. That's why there is code in the kernel to trap and fix up misaligned accesses. However, it is turned off by default for user space. Why?
Turns out that a prominent ARM developer still has binaries from the ARMv3 era around, and the default of not fixing up misaligned user space accesses is for remaining compatible with them.
The default /proc/cpu/alignment mode seems to be 2 (fixup), on v6/v7, priovided that the v6 unaligned access model (CR_U) is supported by the CPU:
arch/arm/mm/alignment.c:
if (cpu_architecture() >= CPU_ARCH_ARMv6 && (cr_alignment & CR_U)) { cr_alignment &= ~CR_A; cr_no_alignment &= ~CR_A; set_cr(cr_alignment); ai_usermode = UM_FIXUP; }
This suggests that by default, ancient binaries will actually silently misbehave when running on a v6 or later CPU.
So if you do have a version of glibc that is not from 15 years ago (that would have to be a.out and not ELF if it was) then you do not want to let misaligned accesses go through unfixed, otherwise you'll simply have latent data corruption somewhere.
Note that if we enable SIGBUS instead of fixing up, this is "safe" in the sense of preferring a fatal signal to incorrect results.
In the linaro/ubuntu/armhf context, I think we'd have few things to fix, but for debian armel, there is likely to be much more alignment faulting and there might be too much software to fix for this to be easily achieved.
Also, as noted by someone else in the thread, you do want to test on something like armv5* or v4* because there are high chances than the trap used by the alignment fix won't be triggered at all on >= armv6.
Given that Linaro is working only with Thumb2-compiled user space, that implies ARMv6 and above only.
Note that debian-arm is on CC -- this argument applies to the armhf port under development, since this targets v7+.
For the Debian armel port though, the pros and cons are somewhat different since this distro may run on v4/v5.
Cheers ---Dave
[1] See commit log of commit d944d549aa86e08cba080396513234cf048fee1f.
And note the "if not fixed up, results in segfaults" in that log, meaning that the current default is wrong for that case.
Nicolas