On Fri, Aug 14, 2020 at 09:21:05PM +0800, Yang Weijiang wrote:
If debug_regs.c is built with newer gcc, e.g., 8.3.1 on my side, then the generated binary looks like over-optimized by gcc:
asm volatile("ss_start: " "xor %%rax,%%rax\n\t" "cpuid\n\t" "movl $0x1a0,%%ecx\n\t" "rdmsr\n\t" : : : "rax", "ecx");
is translated to :
000000000040194e <ss_start>: 40194e: 31 c0 xor %eax,%eax <----- rax->eax? 401950: 0f a2 cpuid 401952: b9 a0 01 00 00 mov $0x1a0,%ecx 401957: 0f 32 rdmsr
As you can see rax is replaced with eax in taret binary code.
It's an optimization. `xor rax, rax` and `xor eax, eax` yield the exact same result, as writing the lower 32 bits of a GPR in 64-bit mode clears the upper 32 bits. Using the eax variant avoids the REX prefix and saves a byte of code.
But if I replace %%rax with %%r8 or any GPR from r8~15, then I get below expected binary:
0000000000401950 <ss_start>: 401950: 45 31 ff xor %r15d,%r15d
This is not replacing %rax with %r15, it's replacing it with %r15d, which is the equivalent of %eax. But that's beside the point. Encoding GPRs r8-r15 requires a REX prefix, so even though you avoid REX.W you still need REX.R, and thus end up with a 3 byte instruction.
401953: 0f a2 cpuid
Note, CPUID consumes EAX. It doesn't look like the code actually consumes the CPUID output, but switching to r15 is at best bizarre.
401955: b9 a0 01 00 00 mov $0x1a0,%ecx 40195a: 0f 32 rdmsr
The difference is the length of xor instruction(2 Byte vs 3 Byte), so this makes below hard-coded instruction length cannot pass runtime check:
/* Instruction lengths starting at ss_start */ int ss_size[4] = { 3, /* xor */ <-------- 2 or 3? 2, /* cpuid */ 5, /* mov */ 2, /* rdmsr */ };
Note: Use 8.2.1 or older gcc, it generates expected 3 bytes xor target code.
I use the default Makefile to build the binaries, and I cannot figure out why this happens, so it comes this patch, maybe you have better solution to resolve the issue. If you know how things work in this way, please let me know, thanks!
Use `xor %%eax, %%eax`. That should always generate a 2 byte instruction. Encoding a 64-bit operation would technically be legal, but I doubt any compiler would do that in practice.