Re: [RFC PATCH v3 08/31] KVM: selftests: Require GCC to realign stacks on function entry

15 Feb 2023


      On 15.02.2023 01:50, Ackerley Tng wrote:
...
...
On Mon, Jan 23, 2023, Erdem Aktas wrote:
...
On Mon, Jan 23, 2023 at 10:53 AM Sean Christopherson seanjc@google.com wrote:
...
On Mon, Jan 23, 2023, Maciej S. Szmigiero wrote:
...
On 23.01.2023 19:30, Erdem Aktas wrote:
...
On Fri, Jan 20, 2023 at 4:28 PM Sean Christopherson seanjc@google.com wrote:
>
> On Sat, Jan 21, 2023, Ackerley Tng wrote:
> > Some SSE instructions assume a 16-byte aligned stack, and GCC compiles
> > assuming the stack is aligned:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838. This combination
> > results in a #GP in guests.
> >
> > Adding this compiler flag will generate an alternate prologue and
> > epilogue to realign the runtime stack, which makes selftest code
> > slower and bigger, but this is okay since we do not need selftest code
> > to be extremely performant.
>
> Huh, I had completely forgotten that this is why SSE is problematic.  I ran into
> this with the base UPM selftests and just disabled SSE.  /facepalm.
>
> We should figure out exactly what is causing a misaligned stack.  As you've noted,
> the x86-64 ABI requires a 16-byte aligned RSP.  Unless I'm misreading vm_arch_vcpu_add(),
> the starting stack should be page aligned, which means something is causing the
> stack to become unaligned at runtime.  I'd rather hunt down that something than
> paper over it by having the compiler force realignment.
Is not it due to the 32bit execution part of the guest code at boot
time. Any push/pop of 32bit registers might make it a 16-byte
unaligned stack.
32-bit stack needs to be 16-byte aligned, too (at function call boundaries) -
see [1] chapter 2.2.2 "The Stack Frame"
And this showing up in the non-TDX selftests rules that out as the sole problem;
the selftests stuff 64-bit mode, i.e. don't have 32-bit boot code.
Thanks Maciej and Sean for the clarification. I was suspecting the
hand-coded assembly part that we have for TDX tests but  it being
happening in the non-TDX selftests disproves it.
...
Not necessarily, it could be both.  Goofs in the handcoded assembly and PEBKAC
on my end :-)
I figured it out!
GCC assumes that the stack is 16-byte aligned **before** the call
instruction. Since call pushes rip to the stack, GCC will compile code
assuming that on entrance to the function, the stack is -8 from a
16-byte aligned address.
Since for TDs we do a ljmp to guest code, providing a function's
address, the stack was not modified by a call instruction pushing rip to
the stack, so the stack is 16-byte aligned when the guest code starts
running, instead of 16-byte aligned -8 that GCC expects.
For VMs, we set rip to a function pointer, and the VM starts running
with a 16-byte algined stack too.
To fix this, I propose that in vm_arch_vcpu_add(), we align the
allocated stack address and then subtract 8 from that:
Note that if this code is ever used to launch a vCPU with 32-bit entry
point it will need to subtract 4 bytes instead of 8 bytes.
I think it would be worthwhile to at least place a comment mentioning
this near the stack aligning expression so nobody misses this fact.
Thanks,
Maciej

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [RFC PATCH v3 08/31] KVM: selftests: Require GCC to realign stacks on function entry