On 17/08/20 18:42, Sean Christopherson wrote:
On Fri, Aug 14, 2020 at 09:21:05PM +0800, Yang Weijiang wrote:
If debug_regs.c is built with newer gcc, e.g., 8.3.1 on my side, then the generated binary looks like over-optimized by gcc:
asm volatile("ss_start: " "xor %%rax,%%rax\n\t" "cpuid\n\t" "movl $0x1a0,%%ecx\n\t" "rdmsr\n\t" : : : "rax", "ecx");
is translated to :
000000000040194e <ss_start>: 40194e: 31 c0 xor %eax,%eax <----- rax->eax? 401950: 0f a2 cpuid 401952: b9 a0 01 00 00 mov $0x1a0,%ecx 401957: 0f 32 rdmsr
As you can see rax is replaced with eax in taret binary code.
It's an optimization. `xor rax, rax` and `xor eax, eax` yield the exact same result, as writing the lower 32 bits of a GPR in 64-bit mode clears the upper 32 bits. Using the eax variant avoids the REX prefix and saves a byte of code.
I would have expected that from binutils though, not GCC.
Use `xor %%eax, %%eax`. That should always generate a 2 byte instruction. Encoding a 64-bit operation would technically be legal, but I doubt any compiler would do that in practice.
Indeed, and in addition the clobbers are incorrect since they miss rbx and rdx. I've sent a patch.
Paolo