Oh, btw, how bad would it be to just do
#define FASTOP_SIZE 16 static_assert(FASTOP_SIZE >= FASTOP_LENGTH)
and leave it at that?
Afaik both gcc and clang default to -falign-functions=16 *anyway*, and while on 32-bit x86 we have options to minimize alignment, we don't do that on x86-64 afaik.
In fact, we have an option to force *bigger* alignment (DEBUG_FORCE_FUNCTION_ALIGN_64B) but not any way to make it less.
And we use
.p2align 4
in most of our asm, aling with
#define __ALIGN .p2align 4, 0x90
So all the *normal* functions already get 16-byte alignment anyway.
So yeah, it would be less dense, but do we care? Wouldn't the "this is really simple" be a nice thing? It's not like there are a ton of those fastop functions anyway. 128 of them? Plus 16 of the "setCC" ones?
Linus