On Fri, Jun 17, 2011 at 11:53:21PM +0100, Paul Brook wrote:
There is still going to be a small cost even in hardware fixup so this is very much worth solving despite it's "becoming invisible" because the chips are hiding / solving it already.
But I believe that h/w feature is turned off in Linux by default. You have to add noalign to the kernel command line to enable.
I think you'll find the hardware fixups are enabled by default on CPUs with that design, quite possibly it can't be turned off.
The situation is much more complicated than that. There are two completely different models for misaligned accesses (v6 and pre-v6). v7 cores are only required to support the former.
However even under the v6 model some instructions will fault on misaligned addresses, and the CPU may be configured to fault many others. The exact behavior depends on the particular instruction chosen by the compiler. I don't know whether Linux currently knows how to enable alignment checking on v6/v7 hardware.
Last time I looked into this, I think the conclusion was that you would have to hack the kernel in order to enable full alignment faulting on v6/v7. The CR.A bit setting may be hardcoded for these arches. (But my knowledge could be out of date...)
However, this slightly misses the point: for linaro, ubuntu and armhf at least, we are interested in fixing apps which incur the cost of alignment faults. Non-faulting misaligned instructions on v6/v7 are therefore much less of a problem, since these are much less expensive.
int main(int argc, char * argv[]) {
char buf[8]; void *v = &buf[1]; unsigned int *p = (unsigned int *)v;
This does not (reliably) do what you expect. The compiler need not align buf.
In fact, dereferencing p is unlikely to cause a fault on v6/v7.
Specifically, LDM/STM and LDRD/STRD, plus VFP register loads and stores (but not NEON-specific loads and stores) need to be word aligned and will fault otherwise. These faults are unconditional: the CPU won't do these accesses misaligned natively.
Typically, the compiler emits these instructions when:
a) copying small objects larger than 32 bits (for large objects, memcpy would be used); or
b) accessing fundamental 64-bit types.
Firefox fails under (a); gtk-sharp2 fails under (b). Both cases arise from acting on a misaligned pointer due to an unsafe attempt at optimisation.
Cheers ---Dave