On Mon, Apr 14, 2025 at 06:28:14PM +0000, Edgecombe, Rick P wrote:
First of all, sorry for not contributing on this since v9. I've had an unusual enormous project conflict (TDX) combined with my test HW dieing.
No worries, there doesn't seem to have been huge urgency on this one :(
Both fixed in the diff below, but in debugging the off-by-one errors I've realized this implementation wastes a shadow stack frame.
I rolled your diff into the series, thanks.
Do we want this? On arm there is SHADOW_STACK_SET_MARKER, which leaves a marker token. But on clone3 it will also leave behind a zero frame from the CMPXCHGed token. So if you use SHADOW_STACK_SET_MARKER you get two marker tokens. And on x86 you will get one one for clone3 but not others, until x86 implements SHADOW_STACK_SET_MARKER. At which point x86 has to diverge from arm (bad) or also have the double marker frame.
The below fixes the x86 functionally, but what do you think of the wasted frame? One fix would be to change shadow_stack_pointer to shadow_stack_token, and then have each arch consume it in the normal HW way, leaving the new thread with:
SSP = clone_args->shadow_stack_token + 8
I think that's a good point with the extra frame, your suggestion is sensible. This didn't translate well when refactoring from specifying the extent of the shadow stack.