On Sat, 18 Jun 2011, Paul Brook wrote:
char buf[8]; void *v = &buf[1]; unsigned int *p = (unsigned int *)v;
This does not (reliably) do what you expect. The compiler need not align buf.
Printing the value of p should clarify this.
And, as we can see above, the "simple" accesses are left to the hardware to fix up. However, if the misaligned access is performed using a 64-bit value pointer, then the kernel will trap an exception and the access will be simulated.
I think you've missed my point. gcc may (though unlikely in this case) choose to place buf at an odd address. In which case p will happen to be properly aligned.
Sorry for being too vague.
My point is to print the value of p i.e. the actual address used to perform the access, which would confirm that the access is truly misaligned or not. That won't force any particular alignment on the buffer obviously, but at least this would clear any doubt as to the validity of the test.
I'm not sure where you get "64-bit value pointer" from. *p is only a word sized access, and memcpy is defined in terms of bytes so will only be promoted to wider accesses when the compiler believes it is safe.
Again I probably was too vague. So let me provide the actual code modified from Andy's expressing what I mean:
int main(int argc, char * argv[]) { char buf[8]; void *v = &buf[1]; unsigned int *p = (unsigned int *)v;
strcpy(buf, "abcdefg");
printf("*%p = 0x%08x\n", p, *p);
return 0; }
That's the original, modified to print the actual address used, which should confirm there is actually a misaligned access performed. And in this case, confirmed by the kernel code I quoted previously, the hardware will perform the misaligned access by itself on ARMv6 and above.
Now, if we use this code instead:
int main(int argc, char * argv[]) { char buf[8]; void *v = &buf[1]; unsigned long long *p = (unsigned long long *)v;
strcpy(buf, "abcdefg");
printf("*p = 0x%016x\n", p, *p);
return 0; }
In this case the kernel alignment trap will be involved, and the stats in /proc/cpu/alignment will increase, as the hardware won't perform the access automatically here.
In both cases the result would be what people expects, although the second case will be far more expensive.
Nicolas