I figured it out!
GCC assumes that the stack is 16-byte aligned **before** the call instruction. Since call pushes rip to the stack, GCC will compile code assuming that on entrance to the function, the stack is -8 from a 16-byte aligned address.
Since for TDs we do a ljmp to guest code, providing a function's address, the stack was not modified by a call instruction pushing rip to the stack, so the stack is 16-byte aligned when the guest code starts running, instead of 16-byte aligned -8 that GCC expects.
For VMs, we set rip to a function pointer, and the VM starts running with a 16-byte algined stack too.
To fix this, I propose that in vm_arch_vcpu_add(), we align the allocated stack address and then subtract 8 from that:
@@ -573,10 +573,13 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm
*vm,
uint32_t vcpu_id, vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid()); vcpu_setup(vm, vcpu);
stack_vaddr += (DEFAULT_STACK_PGS * getpagesize());
stack_vaddr = ALIGN_DOWN(stack_vaddr, 16) - 8;
The ALIGN_DOWN should be unnecessary, we've got larger issues if getpagesize() isn't 16-byte aligned and/or if __vm_vaddr_alloc() returns anything but a page-aligned address. Maybe add a TEST_ASSERT() sanity check that stack_vaddr is page-aligned at this point?
And in addition to the comment suggested by Maciej, can you also add a comment explaining the -8 adjust? Yeah, someone can go read the changelog, but I think this is worth explicitly documenting in code.
Lastly, can you post it as a standalone patch?
Many thanks!
Thanks Maciej and Sean, I've made the changes you requested and posted it as a standalone patch at https://lore.kernel.org/lkml/32866e5d00174697730d6231d2fb81f6b8d98c8a.167665...