Hi all,
I've recently become aware that a few packages are causing alignment faults on ARM, and are relying on the alignment fixup emulation code in the kernel in order to work.
Such faults are very expensive in terms of CPU cycles, and can generally only result from wrong code (for example, C/C++ code which violates the relevant language standards, assembler which makes invalid assumptions, or functions called with misaligned pointers due to other bugs).
Currently, on a natty Ubuntu desktop image I observe no faults except from firefox and mono-based apps (see below).
As part of the general effort to make open source on ARM better, I think it would be great if we can disable the alignment fixups (or at least enable logging) and work with upstreams to get the affected packages fixed.
For release images we might want to be more forgiving, but for development we have the option of being more aggressive.
The number of affected packages and bugs appears small enough for the fixing effort to be feasible, without temporarily breaking whole distros.
For ARM, we can achieve the goal by augmenting the default kernel command- line options: either
alignment=3 Fix up each alingment fault, but also log the faulting address and name of the offending process to dmesg.
alignment=5 Pass each alignment fault to the user process as SIGBUS (fatal by default) and log the faulting address and name of the offending process to dmesg.
Fault statistics cat also be obtained at runtime by reading /proc/cpu/alignment.
For other architectures, there may be other arch-specific ways of achieving something similar.
I'd be interested in people's views on this.
Cheers ---Dave
More background:
Two known instances of misbehaving userland apps are:
1) firefox-4.x (bug report pending)
A char array declared as a container for C++ objects is cast directly to an object pointer type and deferenced, without ensuring proper alignment.
By sheer luck, the presence of an extra member in the containing class in firefox-3.x means that the char array has a different alignment and so the faults don't occur.
2) gtk-sharp2 (https://bugs.launchpad.net/bugs/798315) (affecting mono-based GUI apps such as banshee and tomboy)
char pointers are cast to 64-bit integer pointers and deferenced, as an attempt at comparing string prefixes faster.
These apps typically generate hundreds or thousands of faults per session, but not millions, but it's still quite a lot of noise in syslog.
I think these are likely to be representative of typical causes of alignment faults: i.e., attempted optimisations which break the rules of the language, and which only show in certain builds, or as side-effects of routine maintenance.
Code like that is going to be a massive own goal for performance on ARM and other architectures which fault unaligned accesses, since the resulting faults are likely to cost thousands of cycles per instance.