On Thu, Apr 10, 2025 at 11:56:44AM +0200, Radim Krčmář wrote:
2025-03-14T14:39:29-07:00, Deepak Gupta debug@rivosinc.com:
As discussed extensively in the changelog for the addition of this syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the existing mmap() and madvise() syscalls do not map entirely well onto the security requirements for shadow stack memory since they lead to windows where memory is allocated but not yet protected or stacks which are not properly and safely initialised. Instead a new syscall map_shadow_stack() has been defined which allocates and initialises a shadow stack page.
This patch implements this syscall for riscv. riscv doesn't require token to be setup by kernel because user mode can do that by itself. However to provide compatibility and portability with other architectues, user mode can specify token set flag.
RISC-V shadow stack could use mmap() and madvise() perfectly well.
Deviating from what other arches are doing will create more thrash. I expect there will be merging of common logic between x86, arm64 and riscv. Infact I did post one such RFC patch set last year (didn't follow up on it). Using `mmap/madvise` defeats that purpose of creating common logic between arches.
There are pitfalls as mentioned with respect to mmap/madivse because of unique nature of shadow stack. And thus it was accepted to create a new syscall to create such mappings. RISC-V will stick to that.
Userspace can always initialize the shadow stack properly and the shadow stack memory is never protected from other malicious threads.
Shadow stack memory is protected from inadvertent stores (be it same thread or a different thread in same address space). Malicious code which can do `sspush/ssamoswap` already assumes that code integrity policies are broken.
I think that the compatibility argument is reasonable. We'd need to modify the other syscalls to allow a write-only mapping anyway.
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c +static noinline unsigned long amo_user_shstk(unsigned long *addr, unsigned long val) +{
- /*
* Never expect -1 on shadow stack. Expect return addresses and zero
*/
- unsigned long swap = -1;
- __enable_user_access();
- asm goto(
".option push\n"
".option arch, +zicfiss\n"
Shouldn't compiler accept ssamoswap.d opcode even without zicfiss arch?
Its illegal instruction if shadow stack aren't available. Current toolchain emits it only if zicfiss is specified in march.
"1: ssamoswap.d %[swap], %[val], %[addr]\n"
_ASM_EXTABLE(1b, %l[fault])
RISCV_ACQUIRE_BARRIER
Why is the barrier here?
IIRC, I was following `arch_cmpxchg_acquire`. But I think that's not needed. What we are doing is `arch_xchg_relaxed` and barrier is not needed.
I did consider adding it to arch/riscv/include/asm/cmpxchg.h but there is limited usage of this primitive and thus kept it limited to usercfi.c
Anyways I'll re-spin removing the barrier.
".option pop\n"
: [swap] "=r" (swap), [addr] "+A" (*addr)
: [val] "r" (val)
: "memory"
: fault
);
- __disable_user_access();
- return swap;
+fault:
- __disable_user_access();
- return -1;
I think we should return 0 and -EFAULT. We can ignore the swapped value, or return it through a pointer.
Consumer of this detects -1 and then return -EFAULT. We would eventually need this when creating shadow stack tokens for kernel shadow stack. I believe `-1` is safe return value which can't be construed as negative kernel address (-EFAULT will be)
+}
+static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long size,
unsigned long token_offset, bool set_tok)
+{
- int flags = MAP_ANONYMOUS | MAP_PRIVATE;
Is MAP_GROWSDOWN pointless?
Not sure. Didn't see that in x86 or arm64 shadow stack creation. Let me know if its useful.
- struct mm_struct *mm = current->mm;
- unsigned long populate, tok_loc = 0;
- if (addr)
flags |= MAP_FIXED_NOREPLACE;
- mmap_write_lock(mm);
- addr = do_mmap(NULL, addr, size, PROT_READ, flags,
PROT_READ implies VM_READ, so won't this select PAGE_COPY in the protection_map instead of PAGE_SHADOWSTACK?
PROT_READ is pointless here and redundant. I haven't checked if I remove it what happens.
`VM_SHADOW_STACK` takes precedence (take a look at pte_mkwrite and pmd_mkwrite. Only way `VM_SHADOW_STACK` is possible in vmflags is via `map_shadow_stack` or `fork/clone` on existing task with shadow stack enabled.
In a nutshell user can't specify `VM_SHADOW_STACK` directly (indirectly via map_shadow_stack syscall or fork/clone) . But if set in vmaflags then it'll take precedence.
Wouldn't avoiding VM_READ also allow us to get rid of the ugly hack in pte_mkwrite? (VM_WRITE would naturally select the right XWR flags.)
VM_SHADOW_STACK | VM_WRITE, 0, &populate, NULL);
- mmap_write_unlock(mm);
+SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{ [...]
- if (addr && (addr & (PAGE_SIZE - 1)))
if (!PAGE_ALIGNED(addr))