Sending out v3 for cpu assisted riscv user mode control flow integrity.
v2 [9] was sent a week ago for this riscv usermode control flow integrity enabling. RFC patchset was (v1) early this year (January) [7].
changes in v3 -------------- envcfg: logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series.
dt-bindings: As suggested, split into separate commit. fixed the messaging that spec is in public review
arch_is_shadow_stack change: arch_is_shadow_stack changed to vma_is_shadow_stack
hwprobe: zicfiss / zicfilp if present will get enumerated in hwprobe
selftests: As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit.
changes in v2 --------------- As part of testing effort, compiled a rootfs with shadow stack and landing pad enabled (libraries and binaries) and booted to shell. As part of long running tests, I have been able to run some spec 2006 benchmarks [8] (here link is provided only for list of benchmarks that were tested for long running tests, excel sheet provided here actually is for some static stats like code size growth on spec binaries). Thus converting from RFC to regular patchset.
Securing control-flow integrity for usermode requires following
- Securing forward control flow : All callsites must reach reach a target that they actually intend to reach.
- Securing backward control flow : All function returns must return to location where they were called from.
This patch series use riscv cpu extension `zicfilp` [2] to secure forward control flow and `zicfiss` [2] to secure backward control flow. `zicfilp` enforces that all indirect calls or jmps must land on a landing pad instr and label embedded in landing pad instr must match a value programmed in `x7` register (at callsite via compiler). `zicfiss` introduces shadow stack which can only be writeable via shadow stack instructions (sspush and ssamoswap) and thus can't be tampered with via inadvertent stores. More details about extension can be read from [2] and there are details in documentation as well (in this patch series).
Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel.
Enabling of control flow integrity for user programs is left to user runtime (specifically expected from dynamic loader). There has been a lot of earlier discussion on the enabling topic around x86 shadow stack enabling [3, 4, 5] and overall consensus had been to let dynamic loader (or usermode) to decide for enabling the feature.
This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. arm64 is expected to implement shadow stack part of these arch agnostic `prctls` [6]
Changes since last time ***********************
Spec changes ------------ - Forward cfi spec has become much simpler. `lpad` instruction is pseudo for `auipc rd, <20bit_imm>`. `lpad` checks x7 against 20bit embedded in instr. Thus label width is 20bit.
- Shadow stack management instructions are reduced to sspush - to push x1/x5 on shadow stack sspopchk - pops from shadow stack and comapres with x1/x5. ssamoswap - atomically swap value on shadow stack. rdssp - reads current shadow stack pointer
- Shadow stack accesses on readonly memory always raise AMO/store page fault. `sspopchk` is load but if underlying page is readonly, it'll raise a store page fault. It simplifies hardware and kernel for COW handling for shadow stack pages.
- riscv defines a new exception type `software check exception` and control flow violations raise software check exception.
- enabling controls for shadow stack and landing are in xenvcfg CSR and controls lower privilege mode enabling. As an example senvcfg controls enabling for U and menvcfg controls enabling for S mode.
core mm shadow stack enabling ----------------------------- Shadow stack for x86 usermode are now in mainline and thus this patch series builds on top of that for arch-agnostic mm related changes. Big thanks and shout out to Rick Edgecombe for that.
selftests --------- Created some minimal selftests to test the patch series.
[1] - https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/ [2] - https://github.com/riscv/riscv-cfi [3] - https://lore.kernel.org/lkml/ZWHcBq0bJ+15eeKs@finisterre.sirena.org.uk/T/#mb... [4] - https://lore.kernel.org/all/20220130211838.8382-1-rick.p.edgecombe@intel.com... [5] - https://lore.kernel.org/lkml/CAHk-=wgP5mk3poVeejw16Asbid0ghDt4okHnWaWKLBkRhQ... [6] - https://lore.kernel.org/linux-mm/20231122-arm64-gcs-v7-2-201c483bd775@kernel... [7] - https://lore.kernel.org/lkml/20240125062739.1339782-1-debug@rivosinc.com/ [8] - https://docs.google.com/spreadsheets/d/1_cHGH4ctNVvFRiS7hW9dEGKtXLAJ3aX4Z_iT... [9] - https://lore.kernel.org/lkml/20240329044459.3990638-1-debug@rivosinc.com/
envcfg CSR defines enabling bits for cache management instructions and soon will control enabling for control flow integrity and pointer masking features.
Control flow integrity enabling for forward cfi and backward cfi are controlled via envcfg and thus need to be enabled on per thread basis.
This patch creates a place holder for envcfg CSR in `thread_info` and adds logic to save and restore on task switching.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/switch_to.h | 10 ++++++++++ arch/riscv/include/asm/thread_info.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h index 7efdb0584d47..2d9a00a30394 100644 --- a/arch/riscv/include/asm/switch_to.h +++ b/arch/riscv/include/asm/switch_to.h @@ -69,6 +69,15 @@ static __always_inline bool has_fpu(void) { return false; } #define __switch_to_fpu(__prev, __next) do { } while (0) #endif
+static inline void __switch_to_envcfg(struct task_struct *next) +{ + register unsigned long envcfg = next->thread_info.envcfg; + + asm volatile (ALTERNATIVE("nop", "csrw " __stringify(CSR_ENVCFG) ", %0", 0, + RISCV_ISA_EXT_XLINUXENVCFG, 1) + :: "r" (envcfg) : "memory"); +} + extern struct task_struct *__switch_to(struct task_struct *, struct task_struct *);
@@ -80,6 +89,7 @@ do { \ __switch_to_fpu(__prev, __next); \ if (has_vector()) \ __switch_to_vector(__prev, __next); \ + __switch_to_envcfg(__next); \ ((last) = __switch_to(__prev, __next)); \ } while (0)
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h index 5d473343634b..a503bdc2f6dd 100644 --- a/arch/riscv/include/asm/thread_info.h +++ b/arch/riscv/include/asm/thread_info.h @@ -56,6 +56,7 @@ struct thread_info { long user_sp; /* User stack pointer */ int cpu; unsigned long syscall_work; /* SYSCALL_WORK_ flags */ + unsigned long envcfg; #ifdef CONFIG_SHADOW_CALL_STACK void *scs_base; void *scs_sp;
On Wed, Apr 03, 2024 at 04:34:49PM -0700, Deepak Gupta wrote:
envcfg CSR defines enabling bits for cache management instructions and soon will control enabling for control flow integrity and pointer masking features.
Control flow integrity enabling for forward cfi and backward cfi are controlled via envcfg and thus need to be enabled on per thread basis.
This patch creates a place holder for envcfg CSR in `thread_info` and adds logic to save and restore on task switching.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/switch_to.h | 10 ++++++++++ arch/riscv/include/asm/thread_info.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h index 7efdb0584d47..2d9a00a30394 100644 --- a/arch/riscv/include/asm/switch_to.h +++ b/arch/riscv/include/asm/switch_to.h @@ -69,6 +69,15 @@ static __always_inline bool has_fpu(void) { return false; } #define __switch_to_fpu(__prev, __next) do { } while (0) #endif +static inline void __switch_to_envcfg(struct task_struct *next) +{
- register unsigned long envcfg = next->thread_info.envcfg;
This doesn't need the register storage class.
- asm volatile (ALTERNATIVE("nop", "csrw " __stringify(CSR_ENVCFG) ", %0", 0,
RISCV_ISA_EXT_XLINUXENVCFG, 1)
:: "r" (envcfg) : "memory");
+}
Something like:
static inline void __switch_to_envcfg(struct task_struct *next) { if (riscv_has_extension_unlikely(RISCV_ISA_EXT_XLINUXENVCFG)) csr_write(CSR_ENVCFG, next->thread_info.envcfg); }
would be easier to read, but the alternative you have written doesn't have the jump that riscv_has_extension_unlikely has so what you have will be more performant.
Does envcfg need to be save/restored always or just with CONFIG_RISCV_USER_CFI?
- Charlie
extern struct task_struct *__switch_to(struct task_struct *, struct task_struct *); @@ -80,6 +89,7 @@ do { \ __switch_to_fpu(__prev, __next); \ if (has_vector()) \ __switch_to_vector(__prev, __next); \
- __switch_to_envcfg(__next); \ ((last) = __switch_to(__prev, __next)); \
} while (0) diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h index 5d473343634b..a503bdc2f6dd 100644 --- a/arch/riscv/include/asm/thread_info.h +++ b/arch/riscv/include/asm/thread_info.h @@ -56,6 +56,7 @@ struct thread_info { long user_sp; /* User stack pointer */ int cpu; unsigned long syscall_work; /* SYSCALL_WORK_ flags */
- unsigned long envcfg;
#ifdef CONFIG_SHADOW_CALL_STACK void *scs_base; void *scs_sp; -- 2.43.2
On Wed, May 08, 2024 at 05:10:46PM -0700, Charlie Jenkins wrote:
On Wed, Apr 03, 2024 at 04:34:49PM -0700, Deepak Gupta wrote:
envcfg CSR defines enabling bits for cache management instructions and soon will control enabling for control flow integrity and pointer masking features.
Control flow integrity enabling for forward cfi and backward cfi are controlled via envcfg and thus need to be enabled on per thread basis.
This patch creates a place holder for envcfg CSR in `thread_info` and adds logic to save and restore on task switching.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/switch_to.h | 10 ++++++++++ arch/riscv/include/asm/thread_info.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h index 7efdb0584d47..2d9a00a30394 100644 --- a/arch/riscv/include/asm/switch_to.h +++ b/arch/riscv/include/asm/switch_to.h @@ -69,6 +69,15 @@ static __always_inline bool has_fpu(void) { return false; } #define __switch_to_fpu(__prev, __next) do { } while (0) #endif
+static inline void __switch_to_envcfg(struct task_struct *next) +{
- register unsigned long envcfg = next->thread_info.envcfg;
This doesn't need the register storage class.
yeah. will fix it. thanks.
- asm volatile (ALTERNATIVE("nop", "csrw " __stringify(CSR_ENVCFG) ", %0", 0,
RISCV_ISA_EXT_XLINUXENVCFG, 1)
:: "r" (envcfg) : "memory");
+}
Something like:
static inline void __switch_to_envcfg(struct task_struct *next) { if (riscv_has_extension_unlikely(RISCV_ISA_EXT_XLINUXENVCFG)) csr_write(CSR_ENVCFG, next->thread_info.envcfg); }
would be easier to read, but the alternative you have written doesn't have the jump that riscv_has_extension_unlikely has so what you have will be more performant.
Yeah looked at codegen of `riscv_has_extension_unlikely` and I didn't like un-necessary jumps, specially in switch_to path. All I want is a CSR write. So used alternative to patch nop with CSR write.
Does envcfg need to be save/restored always or just with CONFIG_RISCV_USER_CFI?
There is no save (no read of CSR). Only restore (writes to CSR).
There are pointer masking patches from Samuel Holland where senvcfg needs to be context switched on per task basis. https://lore.kernel.org/lkml/20240319215915.832127-1-samuel.holland@sifive.c...
Given that this CSR controls user execution environment and is per task basis, I thought its better to not wrap it under CONFIG_RISCV_USER_CFI and rather make it dependend on RISCV_ISA_EXT_XLINUXENVCFG. If any of the extensions which require senvcfg, then simply restore this CSR on per task basis.
- Charlie
extern struct task_struct *__switch_to(struct task_struct *, struct task_struct *);
@@ -80,6 +89,7 @@ do { \ __switch_to_fpu(__prev, __next); \ if (has_vector()) \ __switch_to_vector(__prev, __next); \
- __switch_to_envcfg(__next); \ ((last) = __switch_to(__prev, __next)); \
} while (0)
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h index 5d473343634b..a503bdc2f6dd 100644 --- a/arch/riscv/include/asm/thread_info.h +++ b/arch/riscv/include/asm/thread_info.h @@ -56,6 +56,7 @@ struct thread_info { long user_sp; /* User stack pointer */ int cpu; unsigned long syscall_work; /* SYSCALL_WORK_ flags */
- unsigned long envcfg;
#ifdef CONFIG_SHADOW_CALL_STACK void *scs_base; void *scs_sp; -- 2.43.2
Defines a base default value for envcfg per task. By default all tasks should have cache zeroing capability. Any future base capabilities that apply to all tasks can be turned on same way.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/csr.h | 2 ++ arch/riscv/kernel/process.c | 6 ++++++ 2 files changed, 8 insertions(+)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h index 2468c55933cd..bbd2207adb39 100644 --- a/arch/riscv/include/asm/csr.h +++ b/arch/riscv/include/asm/csr.h @@ -202,6 +202,8 @@ #define ENVCFG_CBIE_FLUSH _AC(0x1, UL) #define ENVCFG_CBIE_INV _AC(0x3, UL) #define ENVCFG_FIOM _AC(0x1, UL) +/* by default all threads should be able to zero cache */ +#define ENVCFG_BASE ENVCFG_CBZE
/* Smstateen bits */ #define SMSTATEEN0_AIA_IMSIC_SHIFT 58 diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index 92922dbd5b5c..d3109557f951 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -152,6 +152,12 @@ void start_thread(struct pt_regs *regs, unsigned long pc, else regs->status |= SR_UXL_64; #endif + /* + * read current envcfg settings, AND it with base settings applicable + * for all the tasks. Base settings should've been set up during CPU + * bring up. + */ + current->thread_info.envcfg = csr_read(CSR_ENVCFG) & ENVCFG_BASE; }
void flush_thread(void)
On Wed, Apr 03, 2024 at 04:34:50PM -0700, Deepak Gupta wrote:
Defines a base default value for envcfg per task. By default all tasks should have cache zeroing capability. Any future base capabilities that apply to all tasks can be turned on same way.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/csr.h | 2 ++ arch/riscv/kernel/process.c | 6 ++++++ 2 files changed, 8 insertions(+)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h index 2468c55933cd..bbd2207adb39 100644 --- a/arch/riscv/include/asm/csr.h +++ b/arch/riscv/include/asm/csr.h @@ -202,6 +202,8 @@ #define ENVCFG_CBIE_FLUSH _AC(0x1, UL) #define ENVCFG_CBIE_INV _AC(0x3, UL) #define ENVCFG_FIOM _AC(0x1, UL) +/* by default all threads should be able to zero cache */ +#define ENVCFG_BASE ENVCFG_CBZE /* Smstateen bits */ #define SMSTATEEN0_AIA_IMSIC_SHIFT 58 diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index 92922dbd5b5c..d3109557f951 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -152,6 +152,12 @@ void start_thread(struct pt_regs *regs, unsigned long pc, else regs->status |= SR_UXL_64; #endif
- /*
* read current envcfg settings, AND it with base settings applicable
* for all the tasks. Base settings should've been set up during CPU
* bring up.
*/
- current->thread_info.envcfg = csr_read(CSR_ENVCFG) & ENVCFG_BASE;
This needs to be gated on xlinuxenvcfg.
- Charlie
} void flush_thread(void) -- 2.43.2
On Fri, May 10, 2024 at 03:33:36PM -0700, Charlie Jenkins wrote:
On Wed, Apr 03, 2024 at 04:34:50PM -0700, Deepak Gupta wrote:
Defines a base default value for envcfg per task. By default all tasks should have cache zeroing capability. Any future base capabilities that apply to all tasks can be turned on same way.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/csr.h | 2 ++ arch/riscv/kernel/process.c | 6 ++++++ 2 files changed, 8 insertions(+)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h index 2468c55933cd..bbd2207adb39 100644 --- a/arch/riscv/include/asm/csr.h +++ b/arch/riscv/include/asm/csr.h @@ -202,6 +202,8 @@ #define ENVCFG_CBIE_FLUSH _AC(0x1, UL) #define ENVCFG_CBIE_INV _AC(0x3, UL) #define ENVCFG_FIOM _AC(0x1, UL) +/* by default all threads should be able to zero cache */ +#define ENVCFG_BASE ENVCFG_CBZE
/* Smstateen bits */ #define SMSTATEEN0_AIA_IMSIC_SHIFT 58 diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index 92922dbd5b5c..d3109557f951 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -152,6 +152,12 @@ void start_thread(struct pt_regs *regs, unsigned long pc, else regs->status |= SR_UXL_64; #endif
- /*
* read current envcfg settings, AND it with base settings applicable
* for all the tasks. Base settings should've been set up during CPU
* bring up.
*/
- current->thread_info.envcfg = csr_read(CSR_ENVCFG) & ENVCFG_BASE;
This needs to be gated on xlinuxenvcfg.
You're right. This csr read should be gated on xlinuxenvcfg. Will fix it.
- Charlie
}
void flush_thread(void)
2.43.2
riscv will need an implementation for exit_thread to clean up shadow stack when thread exits. If current thread had shadow stack enabled, shadow stack is allocated by default for any new thread.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/Kconfig | 1 + arch/riscv/kernel/process.c | 5 +++++ 2 files changed, 6 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index e3142ce531a0..7e0b2bcc388f 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -149,6 +149,7 @@ config RISCV select HAVE_SAMPLE_FTRACE_DIRECT_MULTI select HAVE_STACKPROTECTOR select HAVE_SYSCALL_TRACEPOINTS + select HAVE_EXIT_THREAD select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU select IRQ_DOMAIN select IRQ_FORCED_THREADING diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index d3109557f951..ce577cdc2af3 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -200,6 +200,11 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) return 0; }
+void exit_thread(struct task_struct *tsk) +{ + +} + int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) { unsigned long clone_flags = args->flags;
On Wed, Apr 03, 2024 at 04:34:51PM -0700, Deepak Gupta wrote:
riscv will need an implementation for exit_thread to clean up shadow stack when thread exits. If current thread had shadow stack enabled, shadow stack is allocated by default for any new thread.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/Kconfig | 1 + arch/riscv/kernel/process.c | 5 +++++ 2 files changed, 6 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index e3142ce531a0..7e0b2bcc388f 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -149,6 +149,7 @@ config RISCV select HAVE_SAMPLE_FTRACE_DIRECT_MULTI select HAVE_STACKPROTECTOR select HAVE_SYSCALL_TRACEPOINTS
- select HAVE_EXIT_THREAD select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU select IRQ_DOMAIN select IRQ_FORCED_THREADING
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index d3109557f951..ce577cdc2af3 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -200,6 +200,11 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) return 0; } +void exit_thread(struct task_struct *tsk) +{
+}
int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) { unsigned long clone_flags = args->flags; -- 2.43.2
Reviewed-by: Charlie Jenkins charlie@rivosinc.com
Make an entry for cfi extensions in extensions.yaml.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- .../devicetree/bindings/riscv/extensions.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml index 63d81dc895e5..45b87ad6cc1c 100644 --- a/Documentation/devicetree/bindings/riscv/extensions.yaml +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml @@ -317,6 +317,16 @@ properties: The standard Zicboz extension for cache-block zeroing as ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
+ - const: zicfilp + description: + The standard Zicfilp extension for enforcing forward edge control-flow + integrity in commit 3a20dc9 of riscv-cfi and is in public review. + + - const: zicfiss + description: + The standard Zicfiss extension for enforcing backward edge control-flow + integrity in commit 3a20dc9 of riscv-cfi and is in publc review. + - const: zicntr description: The standard Zicntr extension for base counters and timers, as
On Wed, Apr 03, 2024 at 04:34:52PM -0700, Deepak Gupta wrote:
Make an entry for cfi extensions in extensions.yaml.
Signed-off-by: Deepak Gupta debug@rivosinc.com
.../devicetree/bindings/riscv/extensions.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml index 63d81dc895e5..45b87ad6cc1c 100644 --- a/Documentation/devicetree/bindings/riscv/extensions.yaml +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml @@ -317,6 +317,16 @@ properties: The standard Zicboz extension for cache-block zeroing as ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
- const: zicfilp
description:
The standard Zicfilp extension for enforcing forward edge control-flow
integrity in commit 3a20dc9 of riscv-cfi and is in public review.
Does in public review mean the commit sha is going to change?
- const: zicfiss
description:
The standard Zicfiss extension for enforcing backward edge control-flow
integrity in commit 3a20dc9 of riscv-cfi and is in publc review.
- const: zicntr description: The standard Zicntr extension for base counters and timers, as
-- 2.43.2
On Wed, Apr 10, 2024 at 4:58 AM Rob Herring robh@kernel.org wrote:
On Wed, Apr 03, 2024 at 04:34:52PM -0700, Deepak Gupta wrote:
Make an entry for cfi extensions in extensions.yaml.
Signed-off-by: Deepak Gupta debug@rivosinc.com
.../devicetree/bindings/riscv/extensions.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml index 63d81dc895e5..45b87ad6cc1c 100644 --- a/Documentation/devicetree/bindings/riscv/extensions.yaml +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml @@ -317,6 +317,16 @@ properties: The standard Zicboz extension for cache-block zeroing as ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
- const: zicfilp
description:
The standard Zicfilp extension for enforcing forward edge control-flow
integrity in commit 3a20dc9 of riscv-cfi and is in public review.
Does in public review mean the commit sha is going to change?
Less likely. Next step after public review is to gather comments from public review. If something is really pressing and needs to be addressed, then yes this will change. Else this gets ratified as it is.
On Wed, Apr 10, 2024 at 02:37:21PM -0700, Deepak Gupta wrote:
On Wed, Apr 10, 2024 at 4:58 AM Rob Herring robh@kernel.org wrote:
On Wed, Apr 03, 2024 at 04:34:52PM -0700, Deepak Gupta wrote:
Make an entry for cfi extensions in extensions.yaml.
Signed-off-by: Deepak Gupta debug@rivosinc.com
.../devicetree/bindings/riscv/extensions.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml index 63d81dc895e5..45b87ad6cc1c 100644 --- a/Documentation/devicetree/bindings/riscv/extensions.yaml +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml @@ -317,6 +317,16 @@ properties: The standard Zicboz extension for cache-block zeroing as ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
- const: zicfilp
description:
The standard Zicfilp extension for enforcing forward edge control-flow
integrity in commit 3a20dc9 of riscv-cfi and is in public review.
Does in public review mean the commit sha is going to change?
Less likely. Next step after public review is to gather comments from public review. If something is really pressing and needs to be addressed, then yes this will change. Else this gets ratified as it is.
If the commit sha can change, then it is useless. What's the guarantee someone is going to remember to update it if it changes?
Rob
On Mon, Apr 15, 2024 at 02:41:05PM -0500, Rob Herring wrote:
On Wed, Apr 10, 2024 at 02:37:21PM -0700, Deepak Gupta wrote:
On Wed, Apr 10, 2024 at 4:58 AM Rob Herring robh@kernel.org wrote:
On Wed, Apr 03, 2024 at 04:34:52PM -0700, Deepak Gupta wrote:
Make an entry for cfi extensions in extensions.yaml.
Signed-off-by: Deepak Gupta debug@rivosinc.com
.../devicetree/bindings/riscv/extensions.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml index 63d81dc895e5..45b87ad6cc1c 100644 --- a/Documentation/devicetree/bindings/riscv/extensions.yaml +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml @@ -317,6 +317,16 @@ properties: The standard Zicboz extension for cache-block zeroing as ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
- const: zicfilp
description:
The standard Zicfilp extension for enforcing forward edge control-flow
integrity in commit 3a20dc9 of riscv-cfi and is in public review.
Does in public review mean the commit sha is going to change?
Less likely. Next step after public review is to gather comments from public review. If something is really pressing and needs to be addressed, then yes this will change. Else this gets ratified as it is.
If the commit sha can change, then it is useless. What's the guarantee someone is going to remember to update it if it changes?
Sorry for late reply.
I was following existing wordings and patterns for messaging in this file. You would rather have me remove sha and only mention that spec is in public review?
Rob
On Tue, Apr 16, 2024 at 08:44:16AM -0700, Deepak Gupta wrote:
On Mon, Apr 15, 2024 at 02:41:05PM -0500, Rob Herring wrote:
On Wed, Apr 10, 2024 at 02:37:21PM -0700, Deepak Gupta wrote:
On Wed, Apr 10, 2024 at 4:58 AM Rob Herring robh@kernel.org wrote:
On Wed, Apr 03, 2024 at 04:34:52PM -0700, Deepak Gupta wrote:
Make an entry for cfi extensions in extensions.yaml.
Signed-off-by: Deepak Gupta debug@rivosinc.com
.../devicetree/bindings/riscv/extensions.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml index 63d81dc895e5..45b87ad6cc1c 100644 --- a/Documentation/devicetree/bindings/riscv/extensions.yaml +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml @@ -317,6 +317,16 @@ properties: The standard Zicboz extension for cache-block zeroing as ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
- const: zicfilp
description:
The standard Zicfilp extension for enforcing forward edge control-flow
integrity in commit 3a20dc9 of riscv-cfi and is in public review.
Does in public review mean the commit sha is going to change?
Less likely. Next step after public review is to gather comments from public review. If something is really pressing and needs to be addressed, then yes this will change. Else this gets ratified as it is.
If the commit sha can change, then it is useless. What's the guarantee someone is going to remember to update it if it changes?
Sorry for late reply.
I was following existing wordings and patterns for messaging in this file. You would rather have me remove sha and only mention that spec is in public review?
Nope, having a commit sha is desired. None of this is mergeable until at least the spec becomes frozen, so the sha can be updated at that point to the freeze state - or better yet to the ratified state. Being in public review is not sufficient.
Cheers, Conor
On Thu, May 09, 2024 at 07:14:26PM +0100, Conor Dooley wrote:
On Tue, Apr 16, 2024 at 08:44:16AM -0700, Deepak Gupta wrote:
On Mon, Apr 15, 2024 at 02:41:05PM -0500, Rob Herring wrote:
On Wed, Apr 10, 2024 at 02:37:21PM -0700, Deepak Gupta wrote:
On Wed, Apr 10, 2024 at 4:58 AM Rob Herring robh@kernel.org wrote:
On Wed, Apr 03, 2024 at 04:34:52PM -0700, Deepak Gupta wrote:
Make an entry for cfi extensions in extensions.yaml.
Signed-off-by: Deepak Gupta debug@rivosinc.com
.../devicetree/bindings/riscv/extensions.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml index 63d81dc895e5..45b87ad6cc1c 100644 --- a/Documentation/devicetree/bindings/riscv/extensions.yaml +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml @@ -317,6 +317,16 @@ properties: The standard Zicboz extension for cache-block zeroing as ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
- const: zicfilp
description:
The standard Zicfilp extension for enforcing forward edge control-flow
integrity in commit 3a20dc9 of riscv-cfi and is in public review.
Does in public review mean the commit sha is going to change?
Less likely. Next step after public review is to gather comments from public review. If something is really pressing and needs to be addressed, then yes this will change. Else this gets ratified as it is.
If the commit sha can change, then it is useless. What's the guarantee someone is going to remember to update it if it changes?
Sorry for late reply.
I was following existing wordings and patterns for messaging in this file. You would rather have me remove sha and only mention that spec is in public review?
Nope, having a commit sha is desired. None of this is mergeable until at least the spec becomes frozen, so the sha can be updated at that point to the freeze state - or better yet to the ratified state. Being in public review is not sufficient.
Spec is frozen. As per RVI spec lifecycle, spec freeze is a prior step to public review. Public review concluded on 25th April https://lists.riscv.org/g/tech-ss-lp-cfi/message/91
Next step is ratification whenever board meets.
Cheers, Conor
On Thu, May 09, 2024 at 11:46:26AM -0700, Deepak Gupta wrote:
On Thu, May 09, 2024 at 07:14:26PM +0100, Conor Dooley wrote:
On Tue, Apr 16, 2024 at 08:44:16AM -0700, Deepak Gupta wrote:
On Mon, Apr 15, 2024 at 02:41:05PM -0500, Rob Herring wrote:
On Wed, Apr 10, 2024 at 02:37:21PM -0700, Deepak Gupta wrote:
On Wed, Apr 10, 2024 at 4:58 AM Rob Herring robh@kernel.org wrote:
On Wed, Apr 03, 2024 at 04:34:52PM -0700, Deepak Gupta wrote: > Make an entry for cfi extensions in extensions.yaml. > > Signed-off-by: Deepak Gupta debug@rivosinc.com > --- > .../devicetree/bindings/riscv/extensions.yaml | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml > index 63d81dc895e5..45b87ad6cc1c 100644 > --- a/Documentation/devicetree/bindings/riscv/extensions.yaml > +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml > @@ -317,6 +317,16 @@ properties: > The standard Zicboz extension for cache-block zeroing as ratified > in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs. > > + - const: zicfilp > + description: > + The standard Zicfilp extension for enforcing forward edge control-flow > + integrity in commit 3a20dc9 of riscv-cfi and is in public review.
Does in public review mean the commit sha is going to change?
Less likely. Next step after public review is to gather comments from public review. If something is really pressing and needs to be addressed, then yes this will change. Else this gets ratified as it is.
If the commit sha can change, then it is useless. What's the guarantee someone is going to remember to update it if it changes?
Sorry for late reply.
I was following existing wordings and patterns for messaging in this file. You would rather have me remove sha and only mention that spec is in public review?
Nope, having a commit sha is desired. None of this is mergeable until at least the spec becomes frozen, so the sha can be updated at that point to the freeze state - or better yet to the ratified state. Being in public review is not sufficient.
Spec is frozen. As per RVI spec lifecycle, spec freeze is a prior step to public review. Public review concluded on 25th April https://lists.riscv.org/g/tech-ss-lp-cfi/message/91
Next step is ratification whenever board meets.
Ah, I did the "silly" thing of looking on the RVI website at extension status (because I never know the order of things) and these two extensions were marked on there as being in the inception phase, so I incorrectly assumed that "public review" came before freeze. Freeze is the standard that we have been applying so far, but if ratification is imminent, and nothing has changed in the review period, then it seems sane to just pick the freeze point for the definition.
Cheers, Conor.
On Thu, May 09, 2024 at 09:32:49PM +0100, Conor Dooley wrote:
On Thu, May 09, 2024 at 11:46:26AM -0700, Deepak Gupta wrote:
On Thu, May 09, 2024 at 07:14:26PM +0100, Conor Dooley wrote:
On Tue, Apr 16, 2024 at 08:44:16AM -0700, Deepak Gupta wrote:
On Mon, Apr 15, 2024 at 02:41:05PM -0500, Rob Herring wrote:
On Wed, Apr 10, 2024 at 02:37:21PM -0700, Deepak Gupta wrote:
On Wed, Apr 10, 2024 at 4:58 AM Rob Herring robh@kernel.org wrote: > > On Wed, Apr 03, 2024 at 04:34:52PM -0700, Deepak Gupta wrote: > > Make an entry for cfi extensions in extensions.yaml. > > > > Signed-off-by: Deepak Gupta debug@rivosinc.com > > --- > > .../devicetree/bindings/riscv/extensions.yaml | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml > > index 63d81dc895e5..45b87ad6cc1c 100644 > > --- a/Documentation/devicetree/bindings/riscv/extensions.yaml > > +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml > > @@ -317,6 +317,16 @@ properties: > > The standard Zicboz extension for cache-block zeroing as ratified > > in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs. > > > > + - const: zicfilp > > + description: > > + The standard Zicfilp extension for enforcing forward edge control-flow > > + integrity in commit 3a20dc9 of riscv-cfi and is in public review. > > Does in public review mean the commit sha is going to change? >
Less likely. Next step after public review is to gather comments from public review. If something is really pressing and needs to be addressed, then yes this will change. Else this gets ratified as it is.
If the commit sha can change, then it is useless. What's the guarantee someone is going to remember to update it if it changes?
Sorry for late reply.
I was following existing wordings and patterns for messaging in this file. You would rather have me remove sha and only mention that spec is in public review?
Nope, having a commit sha is desired. None of this is mergeable until at least the spec becomes frozen, so the sha can be updated at that point to the freeze state - or better yet to the ratified state. Being in public review is not sufficient.
Spec is frozen. As per RVI spec lifecycle, spec freeze is a prior step to public review. Public review concluded on 25th April https://lists.riscv.org/g/tech-ss-lp-cfi/message/91
Next step is ratification whenever board meets.
Ah, I did the "silly" thing of looking on the RVI website at extension status (because I never know the order of things) and these two extensions were marked on there as being in the inception phase, so I incorrectly assumed that "public review" came before freeze. Freeze is the standard that we have been applying so far, but if ratification is imminent, and nothing has changed in the review period, then it seems sane to just pick the freeze point for the definition.
Yeah I don't think wiki is that regularly updated. But take a look at Ratification-Ready list of specs here https://wiki.riscv.org/display/HOME/RISC-V+Specification+Status
Cheers, Conor.
This patch adds support for detecting zicfiss and zicfilp. zicfiss and zicfilp stands for unprivleged integer spec extension for shadow stack and branch tracking on indirect branches, respectively.
This patch looks for zicfiss and zicfilp in device tree and accordinlgy lights up bit in cpu feature bitmap. Furthermore this patch adds detection utility functions to return whether shadow stack or landing pads are supported by cpu.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/cpufeature.h | 13 +++++++++++++ arch/riscv/include/asm/hwcap.h | 2 ++ arch/riscv/include/asm/processor.h | 1 + arch/riscv/kernel/cpufeature.c | 2 ++ 4 files changed, 18 insertions(+)
diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h index 0bd11862b760..f0fb8d8ae273 100644 --- a/arch/riscv/include/asm/cpufeature.h +++ b/arch/riscv/include/asm/cpufeature.h @@ -8,6 +8,7 @@
#include <linux/bitmap.h> #include <linux/jump_label.h> +#include <linux/smp.h> #include <asm/hwcap.h> #include <asm/alternative-macros.h> #include <asm/errno.h> @@ -137,4 +138,16 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
+static inline bool cpu_supports_shadow_stack(void) +{ + return (IS_ENABLED(CONFIG_RISCV_USER_CFI) && + riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFISS)); +} + +static inline bool cpu_supports_indirect_br_lp_instr(void) +{ + return (IS_ENABLED(CONFIG_RISCV_USER_CFI) && + riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFILP)); +} + #endif diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index 1f2d2599c655..74b6c727f545 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -80,6 +80,8 @@ #define RISCV_ISA_EXT_ZFA 71 #define RISCV_ISA_EXT_ZTSO 72 #define RISCV_ISA_EXT_ZACAS 73 +#define RISCV_ISA_EXT_ZICFILP 74 +#define RISCV_ISA_EXT_ZICFISS 75
#define RISCV_ISA_EXT_XLINUXENVCFG 127
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index a8509cc31ab2..6c5b3d928b12 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -13,6 +13,7 @@ #include <vdso/processor.h>
#include <asm/ptrace.h> +#include <asm/hwcap.h>
#ifdef CONFIG_64BIT #define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1)) diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 79a5a35fab96..d052cad5b82f 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -263,6 +263,8 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = { __RISCV_ISA_EXT_DATA(h, RISCV_ISA_EXT_h), __RISCV_ISA_EXT_SUPERSET(zicbom, RISCV_ISA_EXT_ZICBOM, riscv_xlinuxenvcfg_exts), __RISCV_ISA_EXT_SUPERSET(zicboz, RISCV_ISA_EXT_ZICBOZ, riscv_xlinuxenvcfg_exts), + __RISCV_ISA_EXT_SUPERSET(zicfilp, RISCV_ISA_EXT_ZICFILP, riscv_xlinuxenvcfg_exts), + __RISCV_ISA_EXT_SUPERSET(zicfiss, RISCV_ISA_EXT_ZICFISS, riscv_xlinuxenvcfg_exts), __RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR), __RISCV_ISA_EXT_DATA(zicond, RISCV_ISA_EXT_ZICOND), __RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
Hi Deepak,
On Thu, Apr 4, 2024 at 7:41 AM Deepak Gupta debug@rivosinc.com wrote:
This patch adds support for detecting zicfiss and zicfilp. zicfiss and zicfilp stands for unprivleged integer spec extension for shadow stack and branch tracking on indirect branches, respectively.
This patch looks for zicfiss and zicfilp in device tree and accordinlgy lights up bit in cpu feature bitmap. Furthermore this patch adds detection utility functions to return whether shadow stack or landing pads are supported by cpu.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/cpufeature.h | 13 +++++++++++++ arch/riscv/include/asm/hwcap.h | 2 ++ arch/riscv/include/asm/processor.h | 1 + arch/riscv/kernel/cpufeature.c | 2 ++ 4 files changed, 18 insertions(+)
diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h index 0bd11862b760..f0fb8d8ae273 100644 --- a/arch/riscv/include/asm/cpufeature.h +++ b/arch/riscv/include/asm/cpufeature.h @@ -8,6 +8,7 @@
#include <linux/bitmap.h> #include <linux/jump_label.h> +#include <linux/smp.h> #include <asm/hwcap.h> #include <asm/alternative-macros.h> #include <asm/errno.h> @@ -137,4 +138,16 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
+static inline bool cpu_supports_shadow_stack(void) +{
return (IS_ENABLED(CONFIG_RISCV_USER_CFI) &&
riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFISS));
+}
+static inline bool cpu_supports_indirect_br_lp_instr(void) +{
return (IS_ENABLED(CONFIG_RISCV_USER_CFI) &&
riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFILP));
+}
#endif diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index 1f2d2599c655..74b6c727f545 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -80,6 +80,8 @@ #define RISCV_ISA_EXT_ZFA 71 #define RISCV_ISA_EXT_ZTSO 72 #define RISCV_ISA_EXT_ZACAS 73
nit: two tabs for alignment
+#define RISCV_ISA_EXT_ZICFILP 74 +#define RISCV_ISA_EXT_ZICFISS 75
#define RISCV_ISA_EXT_XLINUXENVCFG 127
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index a8509cc31ab2..6c5b3d928b12 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -13,6 +13,7 @@ #include <vdso/processor.h>
#include <asm/ptrace.h> +#include <asm/hwcap.h>
#ifdef CONFIG_64BIT #define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1)) diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 79a5a35fab96..d052cad5b82f 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -263,6 +263,8 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = { __RISCV_ISA_EXT_DATA(h, RISCV_ISA_EXT_h), __RISCV_ISA_EXT_SUPERSET(zicbom, RISCV_ISA_EXT_ZICBOM, riscv_xlinuxenvcfg_exts), __RISCV_ISA_EXT_SUPERSET(zicboz, RISCV_ISA_EXT_ZICBOZ, riscv_xlinuxenvcfg_exts),
__RISCV_ISA_EXT_SUPERSET(zicfilp, RISCV_ISA_EXT_ZICFILP, riscv_xlinuxenvcfg_exts),
__RISCV_ISA_EXT_SUPERSET(zicfiss, RISCV_ISA_EXT_ZICFISS, riscv_xlinuxenvcfg_exts), __RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR), __RISCV_ISA_EXT_DATA(zicond, RISCV_ISA_EXT_ZICOND), __RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
-- 2.43.2
Thanks, Andy
On Thu, May 09, 2024 at 08:00:00AM +0800, Andy Chiu wrote:
Hi Deepak,
On Thu, Apr 4, 2024 at 7:41 AM Deepak Gupta debug@rivosinc.com wrote:
This patch adds support for detecting zicfiss and zicfilp. zicfiss and zicfilp stands for unprivleged integer spec extension for shadow stack and branch tracking on indirect branches, respectively.
This patch looks for zicfiss and zicfilp in device tree and accordinlgy lights up bit in cpu feature bitmap. Furthermore this patch adds detection utility functions to return whether shadow stack or landing pads are supported by cpu.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/cpufeature.h | 13 +++++++++++++ arch/riscv/include/asm/hwcap.h | 2 ++ arch/riscv/include/asm/processor.h | 1 + arch/riscv/kernel/cpufeature.c | 2 ++ 4 files changed, 18 insertions(+)
diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h index 0bd11862b760..f0fb8d8ae273 100644 --- a/arch/riscv/include/asm/cpufeature.h +++ b/arch/riscv/include/asm/cpufeature.h @@ -8,6 +8,7 @@
#include <linux/bitmap.h> #include <linux/jump_label.h> +#include <linux/smp.h> #include <asm/hwcap.h> #include <asm/alternative-macros.h> #include <asm/errno.h> @@ -137,4 +138,16 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
+static inline bool cpu_supports_shadow_stack(void) +{
return (IS_ENABLED(CONFIG_RISCV_USER_CFI) &&
riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFISS));
+}
+static inline bool cpu_supports_indirect_br_lp_instr(void) +{
return (IS_ENABLED(CONFIG_RISCV_USER_CFI) &&
riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFILP));
+}
#endif diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index 1f2d2599c655..74b6c727f545 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -80,6 +80,8 @@ #define RISCV_ISA_EXT_ZFA 71 #define RISCV_ISA_EXT_ZTSO 72 #define RISCV_ISA_EXT_ZACAS 73
nit: two tabs for alignment
Deepak, I think you might be using tabs with a display size of 4 spaces that causes a couple of places to have incorrect alignment but would look correct with 4 spaces. Linux uses 8 spaces for tabs.
- Charlie
+#define RISCV_ISA_EXT_ZICFILP 74 +#define RISCV_ISA_EXT_ZICFISS 75
#define RISCV_ISA_EXT_XLINUXENVCFG 127
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index a8509cc31ab2..6c5b3d928b12 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -13,6 +13,7 @@ #include <vdso/processor.h>
#include <asm/ptrace.h> +#include <asm/hwcap.h>
#ifdef CONFIG_64BIT #define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1)) diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 79a5a35fab96..d052cad5b82f 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -263,6 +263,8 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = { __RISCV_ISA_EXT_DATA(h, RISCV_ISA_EXT_h), __RISCV_ISA_EXT_SUPERSET(zicbom, RISCV_ISA_EXT_ZICBOM, riscv_xlinuxenvcfg_exts), __RISCV_ISA_EXT_SUPERSET(zicboz, RISCV_ISA_EXT_ZICBOZ, riscv_xlinuxenvcfg_exts),
__RISCV_ISA_EXT_SUPERSET(zicfilp, RISCV_ISA_EXT_ZICFILP, riscv_xlinuxenvcfg_exts),
__RISCV_ISA_EXT_SUPERSET(zicfiss, RISCV_ISA_EXT_ZICFISS, riscv_xlinuxenvcfg_exts), __RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR), __RISCV_ISA_EXT_DATA(zicond, RISCV_ISA_EXT_ZICOND), __RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
-- 2.43.2
Thanks, Andy
zicfiss and zicfilp extension gets enabled via b3 and b2 in *envcfg CSR. menvcfg controls enabling for S/HS mode. henvcfg control enabling for VS while senvcfg controls enabling for U/VU mode.
zicfilp extension extends *status CSR to hold `expected landing pad` bit. A trap or interrupt can occur between an indirect jmp/call and target instr. `expected landing pad` bit from CPU is recorded into xstatus CSR so that when supervisor performs xret, `expected landing pad` state of CPU can be restored.
zicfiss adds one new CSR - CSR_SSP: CSR_SSP contains current shadow stack pointer.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/csr.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h index bbd2207adb39..3bb126d1c5ff 100644 --- a/arch/riscv/include/asm/csr.h +++ b/arch/riscv/include/asm/csr.h @@ -18,6 +18,15 @@ #define SR_MPP _AC(0x00001800, UL) /* Previously Machine */ #define SR_SUM _AC(0x00040000, UL) /* Supervisor User Memory Access */
+/* zicfilp landing pad status bit */ +#define SR_SPELP _AC(0x00800000, UL) +#define SR_MPELP _AC(0x020000000000, UL) +#ifdef CONFIG_RISCV_M_MODE +#define SR_ELP SR_MPELP +#else +#define SR_ELP SR_SPELP +#endif + #define SR_FS _AC(0x00006000, UL) /* Floating-point Status */ #define SR_FS_OFF _AC(0x00000000, UL) #define SR_FS_INITIAL _AC(0x00002000, UL) @@ -196,6 +205,8 @@ #define ENVCFG_PBMTE (_AC(1, ULL) << 62) #define ENVCFG_CBZE (_AC(1, UL) << 7) #define ENVCFG_CBCFE (_AC(1, UL) << 6) +#define ENVCFG_LPE (_AC(1, UL) << 2) +#define ENVCFG_SSE (_AC(1, UL) << 3) #define ENVCFG_CBIE_SHIFT 4 #define ENVCFG_CBIE (_AC(0x3, UL) << ENVCFG_CBIE_SHIFT) #define ENVCFG_CBIE_ILL _AC(0x0, UL) @@ -216,6 +227,11 @@ #define SMSTATEEN0_HSENVCFG (_ULL(1) << SMSTATEEN0_HSENVCFG_SHIFT) #define SMSTATEEN0_SSTATEEN0_SHIFT 63 #define SMSTATEEN0_SSTATEEN0 (_ULL(1) << SMSTATEEN0_SSTATEEN0_SHIFT) +/* + * zicfiss user mode csr + * CSR_SSP holds current shadow stack pointer. + */ +#define CSR_SSP 0x011
/* symbolic CSR names: */ #define CSR_CYCLE 0xc00
On Wed, Apr 03, 2024 at 04:34:54PM -0700, Deepak Gupta wrote:
zicfiss and zicfilp extension gets enabled via b3 and b2 in *envcfg CSR. menvcfg controls enabling for S/HS mode. henvcfg control enabling for VS while senvcfg controls enabling for U/VU mode.
zicfilp extension extends *status CSR to hold `expected landing pad` bit. A trap or interrupt can occur between an indirect jmp/call and target instr. `expected landing pad` bit from CPU is recorded into xstatus CSR so that when supervisor performs xret, `expected landing pad` state of CPU can be restored.
zicfiss adds one new CSR
- CSR_SSP: CSR_SSP contains current shadow stack pointer.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/csr.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h index bbd2207adb39..3bb126d1c5ff 100644 --- a/arch/riscv/include/asm/csr.h +++ b/arch/riscv/include/asm/csr.h @@ -18,6 +18,15 @@ #define SR_MPP _AC(0x00001800, UL) /* Previously Machine */ #define SR_SUM _AC(0x00040000, UL) /* Supervisor User Memory Access */ +/* zicfilp landing pad status bit */ +#define SR_SPELP _AC(0x00800000, UL) +#define SR_MPELP _AC(0x020000000000, UL) +#ifdef CONFIG_RISCV_M_MODE +#define SR_ELP SR_MPELP +#else +#define SR_ELP SR_SPELP +#endif
#define SR_FS _AC(0x00006000, UL) /* Floating-point Status */ #define SR_FS_OFF _AC(0x00000000, UL) #define SR_FS_INITIAL _AC(0x00002000, UL) @@ -196,6 +205,8 @@ #define ENVCFG_PBMTE (_AC(1, ULL) << 62) #define ENVCFG_CBZE (_AC(1, UL) << 7) #define ENVCFG_CBCFE (_AC(1, UL) << 6) +#define ENVCFG_LPE (_AC(1, UL) << 2) +#define ENVCFG_SSE (_AC(1, UL) << 3) #define ENVCFG_CBIE_SHIFT 4 #define ENVCFG_CBIE (_AC(0x3, UL) << ENVCFG_CBIE_SHIFT) #define ENVCFG_CBIE_ILL _AC(0x0, UL) @@ -216,6 +227,11 @@ #define SMSTATEEN0_HSENVCFG (_ULL(1) << SMSTATEEN0_HSENVCFG_SHIFT) #define SMSTATEEN0_SSTATEEN0_SHIFT 63 #define SMSTATEEN0_SSTATEEN0 (_ULL(1) << SMSTATEEN0_SSTATEEN0_SHIFT) +/*
- zicfiss user mode csr
- CSR_SSP holds current shadow stack pointer.
- */
+#define CSR_SSP 0x011 /* symbolic CSR names: */
#define CSR_CYCLE 0xc00
2.43.2
Reviewed-by: Charlie Jenkins charlie@rivosinc.com
Carves out space in arch specific thread struct for cfi status and shadow stack in usermode on riscv.
This patch does following - defines a new structure cfi_status with status bit for cfi feature - defines shadow stack pointer, base and size in cfi_status structure - defines offsets to new member fields in thread in asm-offsets.c - Saves and restore shadow stack pointer on trap entry (U --> S) and exit (S --> U)
Shadow stack save/restore is gated on feature availiblity and implemented using alternative. CSR can be context switched in `switch_to` as well but soon as kernel shadow stack support gets rolled in, shadow stack pointer will need to be switched at trap entry/exit point (much like `sp`). It can be argued that kernel using shadow stack deployment scenario may not be as prevalant as user mode using this feature. But even if there is some minimal deployment of kernel shadow stack, that means that it needs to be supported. And thus save/restore of shadow stack pointer in entry.S instead of in `switch_to.h`.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 +++ arch/riscv/include/asm/usercfi.h | 24 ++++++++++++++++++++++++ arch/riscv/kernel/asm-offsets.c | 4 ++++ arch/riscv/kernel/entry.S | 26 ++++++++++++++++++++++++++ 5 files changed, 58 insertions(+) create mode 100644 arch/riscv/include/asm/usercfi.h
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index 6c5b3d928b12..f8decf357804 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -14,6 +14,7 @@
#include <asm/ptrace.h> #include <asm/hwcap.h> +#include <asm/usercfi.h>
#ifdef CONFIG_64BIT #define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1)) diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h index a503bdc2f6dd..f1dee307806e 100644 --- a/arch/riscv/include/asm/thread_info.h +++ b/arch/riscv/include/asm/thread_info.h @@ -57,6 +57,9 @@ struct thread_info { int cpu; unsigned long syscall_work; /* SYSCALL_WORK_ flags */ unsigned long envcfg; +#ifdef CONFIG_RISCV_USER_CFI + struct cfi_status user_cfi_state; +#endif #ifdef CONFIG_SHADOW_CALL_STACK void *scs_base; void *scs_sp; diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h new file mode 100644 index 000000000000..4fa201b4fc4e --- /dev/null +++ b/arch/riscv/include/asm/usercfi.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 + * Copyright (C) 2024 Rivos, Inc. + * Deepak Gupta debug@rivosinc.com + */ +#ifndef _ASM_RISCV_USERCFI_H +#define _ASM_RISCV_USERCFI_H + +#ifndef __ASSEMBLY__ +#include <linux/types.h> + +#ifdef CONFIG_RISCV_USER_CFI +struct cfi_status { + unsigned long ubcfi_en : 1; /* Enable for backward cfi. */ + unsigned long rsvd : ((sizeof(unsigned long)*8) - 1); + unsigned long user_shdw_stk; /* Current user shadow stack pointer */ + unsigned long shdw_stk_base; /* Base address of shadow stack */ + unsigned long shdw_stk_size; /* size of shadow stack */ +}; + +#endif /* CONFIG_RISCV_USER_CFI */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_RISCV_USERCFI_H */ diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index a03129f40c46..5c5ea015c776 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -44,6 +44,10 @@ void asm_offsets(void) #endif
OFFSET(TASK_TI_CPU_NUM, task_struct, thread_info.cpu); +#ifdef CONFIG_RISCV_USER_CFI + OFFSET(TASK_TI_CFI_STATUS, task_struct, thread_info.user_cfi_state); + OFFSET(TASK_TI_USER_SSP, task_struct, thread_info.user_cfi_state.user_shdw_stk); +#endif OFFSET(TASK_THREAD_F0, task_struct, thread.fstate.f[0]); OFFSET(TASK_THREAD_F1, task_struct, thread.fstate.f[1]); OFFSET(TASK_THREAD_F2, task_struct, thread.fstate.f[2]); diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index 9d1a305d5508..7245a0ea25c1 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -60,6 +60,20 @@ SYM_CODE_START(handle_exception)
REG_L s0, TASK_TI_USER_SP(tp) csrrc s1, CSR_STATUS, t0 + /* + * If previous mode was U, capture shadow stack pointer and save it away + * Zero CSR_SSP at the same time for sanitization. + */ + ALTERNATIVE("nop; nop; nop; nop", + __stringify( \ + andi s2, s1, SR_SPP; \ + bnez s2, skip_ssp_save; \ + csrrw s2, CSR_SSP, x0; \ + REG_S s2, TASK_TI_USER_SSP(tp); \ + skip_ssp_save:), + 0, + RISCV_ISA_EXT_ZICFISS, + CONFIG_RISCV_USER_CFI) csrr s2, CSR_EPC csrr s3, CSR_TVAL csrr s4, CSR_CAUSE @@ -141,6 +155,18 @@ SYM_CODE_START_NOALIGN(ret_from_exception) * structures again. */ csrw CSR_SCRATCH, tp + + /* + * Going back to U mode, restore shadow stack pointer + */ + ALTERNATIVE("nop; nop", + __stringify( \ + REG_L s3, TASK_TI_USER_SSP(tp); \ + csrw CSR_SSP, s3), + 0, + RISCV_ISA_EXT_ZICFISS, + CONFIG_RISCV_USER_CFI) + 1: #ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE move a0, sp
On Wed, Apr 03, 2024 at 04:34:55PM -0700, Deepak Gupta wrote:
Carves out space in arch specific thread struct for cfi status and shadow stack in usermode on riscv.
This patch does following
- defines a new structure cfi_status with status bit for cfi feature
- defines shadow stack pointer, base and size in cfi_status structure
- defines offsets to new member fields in thread in asm-offsets.c
- Saves and restore shadow stack pointer on trap entry (U --> S) and exit (S --> U)
Shadow stack save/restore is gated on feature availiblity and implemented using alternative. CSR can be context switched in `switch_to` as well but soon as kernel shadow stack support gets rolled in, shadow stack pointer will need to be switched at trap entry/exit point (much like `sp`). It can be argued that kernel using shadow stack deployment scenario may not be as prevalant as user mode using this feature. But even if there is some minimal deployment of kernel shadow stack, that means that it needs to be supported. And thus save/restore of shadow stack pointer in entry.S instead of in `switch_to.h`.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 +++ arch/riscv/include/asm/usercfi.h | 24 ++++++++++++++++++++++++ arch/riscv/kernel/asm-offsets.c | 4 ++++ arch/riscv/kernel/entry.S | 26 ++++++++++++++++++++++++++ 5 files changed, 58 insertions(+) create mode 100644 arch/riscv/include/asm/usercfi.h
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index 6c5b3d928b12..f8decf357804 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -14,6 +14,7 @@ #include <asm/ptrace.h> #include <asm/hwcap.h> +#include <asm/usercfi.h> #ifdef CONFIG_64BIT #define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1)) diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h index a503bdc2f6dd..f1dee307806e 100644 --- a/arch/riscv/include/asm/thread_info.h +++ b/arch/riscv/include/asm/thread_info.h @@ -57,6 +57,9 @@ struct thread_info { int cpu; unsigned long syscall_work; /* SYSCALL_WORK_ flags */ unsigned long envcfg; +#ifdef CONFIG_RISCV_USER_CFI
- struct cfi_status user_cfi_state;
+#endif #ifdef CONFIG_SHADOW_CALL_STACK void *scs_base; void *scs_sp; diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h new file mode 100644 index 000000000000..4fa201b4fc4e --- /dev/null +++ b/arch/riscv/include/asm/usercfi.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0
- Copyright (C) 2024 Rivos, Inc.
- Deepak Gupta debug@rivosinc.com
- */
+#ifndef _ASM_RISCV_USERCFI_H +#define _ASM_RISCV_USERCFI_H
+#ifndef __ASSEMBLY__ +#include <linux/types.h>
+#ifdef CONFIG_RISCV_USER_CFI +struct cfi_status {
- unsigned long ubcfi_en : 1; /* Enable for backward cfi. */
- unsigned long rsvd : ((sizeof(unsigned long)*8) - 1);
- unsigned long user_shdw_stk; /* Current user shadow stack pointer */
- unsigned long shdw_stk_base; /* Base address of shadow stack */
- unsigned long shdw_stk_size; /* size of shadow stack */
+};
+#endif /* CONFIG_RISCV_USER_CFI */
+#endif /* __ASSEMBLY__ */
+#endif /* _ASM_RISCV_USERCFI_H */ diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index a03129f40c46..5c5ea015c776 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -44,6 +44,10 @@ void asm_offsets(void) #endif OFFSET(TASK_TI_CPU_NUM, task_struct, thread_info.cpu); +#ifdef CONFIG_RISCV_USER_CFI
- OFFSET(TASK_TI_CFI_STATUS, task_struct, thread_info.user_cfi_state);
- OFFSET(TASK_TI_USER_SSP, task_struct, thread_info.user_cfi_state.user_shdw_stk);
+#endif OFFSET(TASK_THREAD_F0, task_struct, thread.fstate.f[0]); OFFSET(TASK_THREAD_F1, task_struct, thread.fstate.f[1]); OFFSET(TASK_THREAD_F2, task_struct, thread.fstate.f[2]); diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index 9d1a305d5508..7245a0ea25c1 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -60,6 +60,20 @@ SYM_CODE_START(handle_exception) REG_L s0, TASK_TI_USER_SP(tp) csrrc s1, CSR_STATUS, t0
- /*
* If previous mode was U, capture shadow stack pointer and save it away
* Zero CSR_SSP at the same time for sanitization.
*/
- ALTERNATIVE("nop; nop; nop; nop",
__stringify( \
andi s2, s1, SR_SPP; \
bnez s2, skip_ssp_save; \
csrrw s2, CSR_SSP, x0; \
REG_S s2, TASK_TI_USER_SSP(tp); \
skip_ssp_save:),
0,
RISCV_ISA_EXT_ZICFISS,
csrr s2, CSR_EPC csrr s3, CSR_TVAL csrr s4, CSR_CAUSECONFIG_RISCV_USER_CFI)
@@ -141,6 +155,18 @@ SYM_CODE_START_NOALIGN(ret_from_exception) * structures again. */ csrw CSR_SCRATCH, tp
- /*
* Going back to U mode, restore shadow stack pointer
*/
- ALTERNATIVE("nop; nop",
__stringify( \
REG_L s3, TASK_TI_USER_SSP(tp); \
csrw CSR_SSP, s3),
0,
RISCV_ISA_EXT_ZICFISS,
CONFIG_RISCV_USER_CFI)
1: #ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE move a0, sp -- 2.43.2
Reviewed-by: Charlie Jenkins charlie@rivosinc.com
VM_SHADOW_STACK is defined by x86 as vm flag to mark a shadow stack vma.
x86 uses VM_HIGH_ARCH_5 bit but that limits shadow stack vma to 64bit only. arm64 follows same path (see links)
To keep things simple, RISC-V follows the same. This patch adds `ss` for shadow stack in process maps.
Links: https://lore.kernel.org/lkml/20231009-arm64-gcs-v6-12-78e55deaa4dd@kernel.or...
Signed-off-by: Deepak Gupta debug@rivosinc.com --- fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 11 ++++++++++- 2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 3f78ebbb795f..d9d63eb74f0d 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -702,6 +702,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ #ifdef CONFIG_X86_USER_SHADOW_STACK [ilog2(VM_SHADOW_STACK)] = "ss", +#endif +#ifdef CONFIG_RISCV_USER_CFI + [ilog2(VM_SHADOW_STACK)] = "ss", #endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index f5a97dec5169..64109f6c70f5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -352,7 +352,16 @@ extern unsigned int kobjsize(const void *objp); * for more details on the guard size. */ # define VM_SHADOW_STACK VM_HIGH_ARCH_5 -#else +#endif + +#ifdef CONFIG_RISCV_USER_CFI +/* + * RISC-V is going along with using VM_HIGH_ARCH_5 bit position for shadow stack + */ +#define VM_SHADOW_STACK VM_HIGH_ARCH_5 +#endif + +#ifndef VM_SHADOW_STACK # define VM_SHADOW_STACK VM_NONE #endif
On 04.04.24 01:34, Deepak Gupta wrote:
VM_SHADOW_STACK is defined by x86 as vm flag to mark a shadow stack vma.
x86 uses VM_HIGH_ARCH_5 bit but that limits shadow stack vma to 64bit only. arm64 follows same path (see links)
To keep things simple, RISC-V follows the same. This patch adds `ss` for shadow stack in process maps.
Links: https://lore.kernel.org/lkml/20231009-arm64-gcs-v6-12-78e55deaa4dd@kernel.or...
Signed-off-by: Deepak Gupta debug@rivosinc.com
fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 11 ++++++++++- 2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 3f78ebbb795f..d9d63eb74f0d 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -702,6 +702,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ #ifdef CONFIG_X86_USER_SHADOW_STACK [ilog2(VM_SHADOW_STACK)] = "ss", +#endif +#ifdef CONFIG_RISCV_USER_CFI
#endif }; size_t i;[ilog2(VM_SHADOW_STACK)] = "ss",
diff --git a/include/linux/mm.h b/include/linux/mm.h index f5a97dec5169..64109f6c70f5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -352,7 +352,16 @@ extern unsigned int kobjsize(const void *objp);
- for more details on the guard size.
*/ # define VM_SHADOW_STACK VM_HIGH_ARCH_5 -#else +#endif
+#ifdef CONFIG_RISCV_USER_CFI +/*
- RISC-V is going along with using VM_HIGH_ARCH_5 bit position for shadow stack
- */
Wow, really?! I could never have guesses that from the code :P
Just drop that comment. Are them semantics the same as for the x86 variant documented? ("VM_SHADOW_STACK should not be set with VM_SHARED because of lack of")
I assume so. So it might now make sense to merge both paths
#if defined(CONFIG_X86_USER_SHADOW_STACK) || defined(CONFIG_RISCV_USER_CFI)
or even introduce some ARCH_HAS_SHADOW_STACK so we can remove these arch-specific thingies here.
On Thu, Apr 04, 2024 at 08:58:06PM +0200, David Hildenbrand wrote:
or even introduce some ARCH_HAS_SHADOW_STACK so we can remove these arch-specific thingies here.
It would be convenient if you could pull the ARCH_HAS_USER_SHADOW_STACK patch out of my clone3 series to do that:
https://lore.kernel.org/all/20240203-clone3-shadow-stack-v5-3-322c69598e4b@k...
On 04.04.24 21:04, Mark Brown wrote:
On Thu, Apr 04, 2024 at 08:58:06PM +0200, David Hildenbrand wrote:
or even introduce some ARCH_HAS_SHADOW_STACK so we can remove these arch-specific thingies here.
It would be convenient if you could pull the ARCH_HAS_USER_SHADOW_STACK patch out of my clone3 series to do that:
https://lore.kernel.org/all/20240203-clone3-shadow-stack-v5-3-322c69598e4b@kernel.org/
Crazy, I completely forgot about that one. Yes!
On Thu, Apr 4, 2024 at 12:15 PM David Hildenbrand david@redhat.com wrote:
On 04.04.24 21:04, Mark Brown wrote:
On Thu, Apr 04, 2024 at 08:58:06PM +0200, David Hildenbrand wrote:
or even introduce some ARCH_HAS_SHADOW_STACK so we can remove these arch-specific thingies here.
It would be convenient if you could pull the ARCH_HAS_USER_SHADOW_STACK patch out of my clone3 series to do that:
https://lore.kernel.org/all/20240203-clone3-shadow-stack-v5-3-322c69598e4b@kernel.org/
Crazy, I completely forgot about that one. Yes!
I missed that. Roger. Will do that in the next series. Thanks.
VM_SHADOW_STACK (alias to VM_HIGH_ARCH_5) to encode shadow stack VMA.
This patch changes checks of VM_SHADOW_STACK flag in generic code to call to a function `vma_is_shadow_stack` which will return true if its a shadow stack vma and default stub (when support doesnt exist) returns false.
Signed-off-by: Deepak Gupta debug@rivosinc.com Suggested-by: Mike Rapoport rppt@kernel.org --- include/linux/mm.h | 13 ++++++++++++- mm/gup.c | 5 +++-- mm/internal.h | 2 +- 3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 64109f6c70f5..9952937be659 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -363,8 +363,19 @@ extern unsigned int kobjsize(const void *objp);
#ifndef VM_SHADOW_STACK # define VM_SHADOW_STACK VM_NONE + +static inline bool vma_is_shadow_stack(vm_flags_t vm_flags) +{ + return false; +} +#else +static inline bool vma_is_shadow_stack(vm_flags_t vm_flags) +{ + return (vm_flags & VM_SHADOW_STACK); +} #endif
+ #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) @@ -3473,7 +3484,7 @@ static inline unsigned long stack_guard_start_gap(struct vm_area_struct *vma) return stack_guard_gap;
/* See reasoning around the VM_SHADOW_STACK definition */ - if (vma->vm_flags & VM_SHADOW_STACK) + if (vma->vm_flags && vma_is_shadow_stack(vma->vm_flags)) return PAGE_SIZE;
return 0; diff --git a/mm/gup.c b/mm/gup.c index df83182ec72d..a7a02eb0a6b3 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1053,7 +1053,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) !writable_file_mapping_allowed(vma, gup_flags)) return -EFAULT;
- if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { + if (!(vm_flags & VM_WRITE) || vma_is_shadow_stack(vm_flags)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */ @@ -1071,7 +1071,8 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if (!is_cow_mapping(vm_flags)) return -EFAULT; } - } else if (!(vm_flags & VM_READ)) { + } else if (!(vm_flags & VM_READ) && !vma_is_shadow_stack(vm_flags)) { + /* reads allowed if its shadow stack vma */ if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /* diff --git a/mm/internal.h b/mm/internal.h index f309a010d50f..5035b5a58df0 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -572,7 +572,7 @@ static inline bool is_exec_mapping(vm_flags_t flags) */ static inline bool is_stack_mapping(vm_flags_t flags) { - return ((flags & VM_STACK) == VM_STACK) || (flags & VM_SHADOW_STACK); + return ((flags & VM_STACK) == VM_STACK) || vma_is_shadow_stack(flags); }
/*
On 04.04.24 01:34, Deepak Gupta wrote:
VM_SHADOW_STACK (alias to VM_HIGH_ARCH_5) to encode shadow stack VMA.
This patch changes checks of VM_SHADOW_STACK flag in generic code to call to a function `vma_is_shadow_stack` which will return true if its a shadow stack vma and default stub (when support doesnt exist) returns false.
Signed-off-by: Deepak Gupta debug@rivosinc.com Suggested-by: Mike Rapoport rppt@kernel.org
include/linux/mm.h | 13 ++++++++++++- mm/gup.c | 5 +++-- mm/internal.h | 2 +- 3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 64109f6c70f5..9952937be659 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -363,8 +363,19 @@ extern unsigned int kobjsize(const void *objp); #ifndef VM_SHADOW_STACK # define VM_SHADOW_STACK VM_NONE
+static inline bool vma_is_shadow_stack(vm_flags_t vm_flags) +{
- return false;
+} +#else +static inline bool vma_is_shadow_stack(vm_flags_t vm_flags) +{
- return (vm_flags & VM_SHADOW_STACK);
+} #endif
You can simply do outside the ifdef
static inline bool vma_is_shadow_stack(vm_flags_t vm_flags) { return !!(vm_flags & VM_SHADOW_STACK); }
This will work even when VM_SHADOW_STACK is defined to be VM_NONE.
unrelated code change
#if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) @@ -3473,7 +3484,7 @@ static inline unsigned long stack_guard_start_gap(struct vm_area_struct *vma) return stack_guard_gap; /* See reasoning around the VM_SHADOW_STACK definition */
- if (vma->vm_flags & VM_SHADOW_STACK)
- if (vma->vm_flags && vma_is_shadow_stack(vma->vm_flags))
Pretty sure:
if (vma_is_shadow_stack(vma->vm_flags))
return PAGE_SIZE;
return 0; diff --git a/mm/gup.c b/mm/gup.c index df83182ec72d..a7a02eb0a6b3 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1053,7 +1053,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) !writable_file_mapping_allowed(vma, gup_flags)) return -EFAULT;
if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
if (!(vm_flags & VM_WRITE) || vma_is_shadow_stack(vm_flags)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */
@@ -1071,7 +1071,8 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if (!is_cow_mapping(vm_flags)) return -EFAULT; }
- } else if (!(vm_flags & VM_READ)) {
- } else if (!(vm_flags & VM_READ) && !vma_is_shadow_stack(vm_flags)) {
- /* reads allowed if its shadow stack vma */ if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /*
Unless I am missing something, this is not a simple cleanup. It should go into a separate patch with a clearly documented reason for that change.
On Thu, Apr 04, 2024 at 09:02:17PM +0200, David Hildenbrand wrote:
On 04.04.24 01:34, Deepak Gupta wrote:
}
- } else if (!(vm_flags & VM_READ)) {
- } else if (!(vm_flags & VM_READ) && !vma_is_shadow_stack(vm_flags)) {
- /* reads allowed if its shadow stack vma */ if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /*
Unless I am missing something, this is not a simple cleanup. It should go into a separate patch with a clearly documented reason for that change.
I tried that here https://lore.kernel.org/linux-mm/CAKC1njTPBqtsAOn-CWhB+-8FaZ2KWkkz-vRZr7MZq=... But at that time, VM_SHADOW_STACK for riscv meant only VM_WRITE. So I think there was obvious uneasiness with that part.
Now we have VM_SHADOW_STACK pretty much same for all arches and only 64bit. I'll try it again as a separate patch.
-- Cheers,
David / dhildenb
`arch_calc_vm_prot_bits` is implemented on risc-v to return VM_READ | VM_WRITE if PROT_WRITE is specified. Similarly `riscv_sys_mmap` is updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ). This is to make sure that any existing apps using PROT_WRITE still work.
Earlier `protection_map[VM_WRITE]` used to pick read-write PTE encodings. Now `protection_map[VM_WRITE]` will always pick PAGE_SHADOWSTACK PTE encodings for shadow stack. Above changes ensure that existing apps continue to work because underneath kernel will be picking `protection_map[VM_WRITE|VM_READ]` PTE encodings.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/mman.h | 24 ++++++++++++++++++++++++ arch/riscv/include/asm/pgtable.h | 1 + arch/riscv/kernel/sys_riscv.c | 11 +++++++++++ arch/riscv/mm/init.c | 2 +- mm/mmap.c | 1 + 5 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/mman.h
diff --git a/arch/riscv/include/asm/mman.h b/arch/riscv/include/asm/mman.h new file mode 100644 index 000000000000..ef9fedf32546 --- /dev/null +++ b/arch/riscv/include/asm/mman.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_MMAN_H__ +#define __ASM_MMAN_H__ + +#include <linux/compiler.h> +#include <linux/types.h> +#include <uapi/asm/mman.h> + +static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, + unsigned long pkey __always_unused) +{ + unsigned long ret = 0; + + /* + * If PROT_WRITE was specified, force it to VM_READ | VM_WRITE. + * Only VM_WRITE means shadow stack. + */ + if (prot & PROT_WRITE) + ret = (VM_READ | VM_WRITE); + return ret; +} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) + +#endif /* ! __ASM_MMAN_H__ */ diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 6066822e7396..4d5983bc6766 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -184,6 +184,7 @@ extern struct pt_alloc_ops pt_ops __initdata; #define PAGE_READ_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) #define PAGE_WRITE_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | \ _PAGE_EXEC | _PAGE_WRITE) +#define PAGE_SHADOWSTACK __pgprot(_PAGE_BASE | _PAGE_WRITE)
#define PAGE_COPY PAGE_READ #define PAGE_COPY_EXEC PAGE_READ_EXEC diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index f1c1416a9f1e..846c36b1b3d5 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -8,6 +8,8 @@ #include <linux/syscalls.h> #include <asm/cacheflush.h> #include <asm-generic/mman-common.h> +#include <vdso/vsyscall.h> +#include <asm/mman.h>
static long riscv_sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, @@ -17,6 +19,15 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) return -EINVAL;
+ /* + * If only PROT_WRITE is specified then extend that to PROT_READ + * protection_map[VM_WRITE] is now going to select shadow stack encodings. + * So specifying PROT_WRITE actually should select protection_map [VM_WRITE | VM_READ] + * If user wants to create shadow stack then they should use `map_shadow_stack` syscall. + */ + if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ))) + prot |= PROT_READ; + return ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> (PAGE_SHIFT - page_shift_offset)); } diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index fa34cf55037b..98e5ece4052a 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -299,7 +299,7 @@ pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); static const pgprot_t protection_map[16] = { [VM_NONE] = PAGE_NONE, [VM_READ] = PAGE_READ, - [VM_WRITE] = PAGE_COPY, + [VM_WRITE] = PAGE_SHADOWSTACK, [VM_WRITE | VM_READ] = PAGE_COPY, [VM_EXEC] = PAGE_EXEC, [VM_EXEC | VM_READ] = PAGE_READ_EXEC, diff --git a/mm/mmap.c b/mm/mmap.c index d89770eaab6b..57a974f49b00 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -47,6 +47,7 @@ #include <linux/oom.h> #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/processor.h>
#include <linux/uaccess.h> #include <asm/cacheflush.h>
On Wed, Apr 03, 2024 at 04:34:58PM -0700, Deepak Gupta wrote:
`arch_calc_vm_prot_bits` is implemented on risc-v to return VM_READ | VM_WRITE if PROT_WRITE is specified. Similarly `riscv_sys_mmap` is updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ). This is to make sure that any existing apps using PROT_WRITE still work.
Earlier `protection_map[VM_WRITE]` used to pick read-write PTE encodings. Now `protection_map[VM_WRITE]` will always pick PAGE_SHADOWSTACK PTE encodings for shadow stack. Above changes ensure that existing apps continue to work because underneath kernel will be picking `protection_map[VM_WRITE|VM_READ]` PTE encodings.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/mman.h | 24 ++++++++++++++++++++++++ arch/riscv/include/asm/pgtable.h | 1 + arch/riscv/kernel/sys_riscv.c | 11 +++++++++++ arch/riscv/mm/init.c | 2 +- mm/mmap.c | 1 + 5 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/mman.h
diff --git a/arch/riscv/include/asm/mman.h b/arch/riscv/include/asm/mman.h new file mode 100644 index 000000000000..ef9fedf32546 --- /dev/null +++ b/arch/riscv/include/asm/mman.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_MMAN_H__ +#define __ASM_MMAN_H__
+#include <linux/compiler.h> +#include <linux/types.h> +#include <uapi/asm/mman.h>
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
- unsigned long pkey __always_unused)
+{
- unsigned long ret = 0;
- /*
* If PROT_WRITE was specified, force it to VM_READ | VM_WRITE.
* Only VM_WRITE means shadow stack.
*/
- if (prot & PROT_WRITE)
ret = (VM_READ | VM_WRITE);
- return ret;
+} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
+#endif /* ! __ASM_MMAN_H__ */ diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 6066822e7396..4d5983bc6766 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -184,6 +184,7 @@ extern struct pt_alloc_ops pt_ops __initdata; #define PAGE_READ_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) #define PAGE_WRITE_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | \ _PAGE_EXEC | _PAGE_WRITE) +#define PAGE_SHADOWSTACK __pgprot(_PAGE_BASE | _PAGE_WRITE) #define PAGE_COPY PAGE_READ #define PAGE_COPY_EXEC PAGE_READ_EXEC diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index f1c1416a9f1e..846c36b1b3d5 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -8,6 +8,8 @@ #include <linux/syscalls.h> #include <asm/cacheflush.h> #include <asm-generic/mman-common.h> +#include <vdso/vsyscall.h> +#include <asm/mman.h> static long riscv_sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, @@ -17,6 +19,15 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) return -EINVAL;
- /*
* If only PROT_WRITE is specified then extend that to PROT_READ
* protection_map[VM_WRITE] is now going to select shadow stack encodings.
* So specifying PROT_WRITE actually should select protection_map [VM_WRITE | VM_READ]
* If user wants to create shadow stack then they should use `map_shadow_stack` syscall.
*/
- if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
The comments says that this should extend to PROT_READ if only PROT_WRITE is specified. This condition instead is checking if PROT_WRITE is selected but PROT_READ is not. If prot is (VM_EXEC | VM_WRITE) then it would be extended to (VM_EXEC | VM_WRITE | VM_READ). This will not currently cause any issues because these both map to the same value in the protection_map PAGE_COPY_EXEC, however this seems to be not the intention of this change.
prot == PROT_WRITE better suits the condition explained in the comment.
prot |= PROT_READ;
- return ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> (PAGE_SHIFT - page_shift_offset));
} diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index fa34cf55037b..98e5ece4052a 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -299,7 +299,7 @@ pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); static const pgprot_t protection_map[16] = { [VM_NONE] = PAGE_NONE, [VM_READ] = PAGE_READ,
- [VM_WRITE] = PAGE_COPY,
- [VM_WRITE] = PAGE_SHADOWSTACK, [VM_WRITE | VM_READ] = PAGE_COPY, [VM_EXEC] = PAGE_EXEC, [VM_EXEC | VM_READ] = PAGE_READ_EXEC,
diff --git a/mm/mmap.c b/mm/mmap.c index d89770eaab6b..57a974f49b00 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -47,6 +47,7 @@ #include <linux/oom.h> #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/processor.h>
It doesn't seem like this is necessary for this patch.
- Charlie
#include <linux/uaccess.h>
#include <asm/cacheflush.h>
2.43.2
On Fri, May 10, 2024 at 02:02:54PM -0700, Charlie Jenkins wrote:
On Wed, Apr 03, 2024 at 04:34:58PM -0700, Deepak Gupta wrote:
`arch_calc_vm_prot_bits` is implemented on risc-v to return VM_READ | VM_WRITE if PROT_WRITE is specified. Similarly `riscv_sys_mmap` is updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ). This is to make sure that any existing apps using PROT_WRITE still work.
Earlier `protection_map[VM_WRITE]` used to pick read-write PTE encodings. Now `protection_map[VM_WRITE]` will always pick PAGE_SHADOWSTACK PTE encodings for shadow stack. Above changes ensure that existing apps continue to work because underneath kernel will be picking `protection_map[VM_WRITE|VM_READ]` PTE encodings.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/mman.h | 24 ++++++++++++++++++++++++ arch/riscv/include/asm/pgtable.h | 1 + arch/riscv/kernel/sys_riscv.c | 11 +++++++++++ arch/riscv/mm/init.c | 2 +- mm/mmap.c | 1 + 5 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/mman.h
diff --git a/arch/riscv/include/asm/mman.h b/arch/riscv/include/asm/mman.h new file mode 100644 index 000000000000..ef9fedf32546 --- /dev/null +++ b/arch/riscv/include/asm/mman.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_MMAN_H__ +#define __ASM_MMAN_H__
+#include <linux/compiler.h> +#include <linux/types.h> +#include <uapi/asm/mman.h>
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
- unsigned long pkey __always_unused)
+{
- unsigned long ret = 0;
- /*
* If PROT_WRITE was specified, force it to VM_READ | VM_WRITE.
* Only VM_WRITE means shadow stack.
*/
- if (prot & PROT_WRITE)
ret = (VM_READ | VM_WRITE);
- return ret;
+} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
+#endif /* ! __ASM_MMAN_H__ */ diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 6066822e7396..4d5983bc6766 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -184,6 +184,7 @@ extern struct pt_alloc_ops pt_ops __initdata; #define PAGE_READ_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) #define PAGE_WRITE_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | \ _PAGE_EXEC | _PAGE_WRITE) +#define PAGE_SHADOWSTACK __pgprot(_PAGE_BASE | _PAGE_WRITE)
#define PAGE_COPY PAGE_READ #define PAGE_COPY_EXEC PAGE_READ_EXEC diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index f1c1416a9f1e..846c36b1b3d5 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -8,6 +8,8 @@ #include <linux/syscalls.h> #include <asm/cacheflush.h> #include <asm-generic/mman-common.h> +#include <vdso/vsyscall.h> +#include <asm/mman.h>
static long riscv_sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, @@ -17,6 +19,15 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) return -EINVAL;
- /*
* If only PROT_WRITE is specified then extend that to PROT_READ
* protection_map[VM_WRITE] is now going to select shadow stack encodings.
* So specifying PROT_WRITE actually should select protection_map [VM_WRITE | VM_READ]
* If user wants to create shadow stack then they should use `map_shadow_stack` syscall.
*/
- if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
The comments says that this should extend to PROT_READ if only PROT_WRITE is specified. This condition instead is checking if PROT_WRITE is selected but PROT_READ is not. If prot is (VM_EXEC | VM_WRITE) then it would be extended to (VM_EXEC | VM_WRITE | VM_READ). This will not currently cause any issues because these both map to the same value in the protection_map PAGE_COPY_EXEC, however this seems to be not the intention of this change.
prot == PROT_WRITE better suits the condition explained in the comment.
If someone specifies this (PROT_EXEC | PROT_WRITE) today, it works because of the way permissions are setup in `protection_map`. On risc-v there is no way to have a page which is execute and write only. So expectation is that if some apps were using `PROT_EXEC | PROT_WRITE` today, they were working because internally it was translating to read, write and execute on page permissions level. This patch make sure that, it stays same from page permissions perspective.
If someone was using PROT_EXEC, it may translate to execute only and this change doesn't impact that.
Patch simply looks for presence of `PROT_WRITE` and absence of `PROT_READ` in protection flags and if that condition is satisfied, it assumes that caller assumed page is going to be read allowed as well.
prot |= PROT_READ;
- return ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> (PAGE_SHIFT - page_shift_offset));
} diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index fa34cf55037b..98e5ece4052a 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -299,7 +299,7 @@ pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); static const pgprot_t protection_map[16] = { [VM_NONE] = PAGE_NONE, [VM_READ] = PAGE_READ,
- [VM_WRITE] = PAGE_COPY,
- [VM_WRITE] = PAGE_SHADOWSTACK, [VM_WRITE | VM_READ] = PAGE_COPY, [VM_EXEC] = PAGE_EXEC, [VM_EXEC | VM_READ] = PAGE_READ_EXEC,
diff --git a/mm/mmap.c b/mm/mmap.c index d89770eaab6b..57a974f49b00 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -47,6 +47,7 @@ #include <linux/oom.h> #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/processor.h>
It doesn't seem like this is necessary for this patch.
Thanks. Yeah it looks like I forgot to remove this over the churn. Will fix it.
- Charlie
#include <linux/uaccess.h>
#include <asm/cacheflush.h>
2.43.2
On Mon, May 13, 2024 at 10:47:25AM -0700, Deepak Gupta wrote:
On Fri, May 10, 2024 at 02:02:54PM -0700, Charlie Jenkins wrote:
On Wed, Apr 03, 2024 at 04:34:58PM -0700, Deepak Gupta wrote:
`arch_calc_vm_prot_bits` is implemented on risc-v to return VM_READ | VM_WRITE if PROT_WRITE is specified. Similarly `riscv_sys_mmap` is updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ). This is to make sure that any existing apps using PROT_WRITE still work.
Earlier `protection_map[VM_WRITE]` used to pick read-write PTE encodings. Now `protection_map[VM_WRITE]` will always pick PAGE_SHADOWSTACK PTE encodings for shadow stack. Above changes ensure that existing apps continue to work because underneath kernel will be picking `protection_map[VM_WRITE|VM_READ]` PTE encodings.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/mman.h | 24 ++++++++++++++++++++++++ arch/riscv/include/asm/pgtable.h | 1 + arch/riscv/kernel/sys_riscv.c | 11 +++++++++++ arch/riscv/mm/init.c | 2 +- mm/mmap.c | 1 + 5 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/mman.h
diff --git a/arch/riscv/include/asm/mman.h b/arch/riscv/include/asm/mman.h new file mode 100644 index 000000000000..ef9fedf32546 --- /dev/null +++ b/arch/riscv/include/asm/mman.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_MMAN_H__ +#define __ASM_MMAN_H__
+#include <linux/compiler.h> +#include <linux/types.h> +#include <uapi/asm/mman.h>
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
- unsigned long pkey __always_unused)
+{
- unsigned long ret = 0;
- /*
* If PROT_WRITE was specified, force it to VM_READ | VM_WRITE.
* Only VM_WRITE means shadow stack.
*/
- if (prot & PROT_WRITE)
ret = (VM_READ | VM_WRITE);
- return ret;
+} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
+#endif /* ! __ASM_MMAN_H__ */ diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 6066822e7396..4d5983bc6766 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -184,6 +184,7 @@ extern struct pt_alloc_ops pt_ops __initdata; #define PAGE_READ_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) #define PAGE_WRITE_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | \ _PAGE_EXEC | _PAGE_WRITE) +#define PAGE_SHADOWSTACK __pgprot(_PAGE_BASE | _PAGE_WRITE)
#define PAGE_COPY PAGE_READ #define PAGE_COPY_EXEC PAGE_READ_EXEC diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index f1c1416a9f1e..846c36b1b3d5 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -8,6 +8,8 @@ #include <linux/syscalls.h> #include <asm/cacheflush.h> #include <asm-generic/mman-common.h> +#include <vdso/vsyscall.h> +#include <asm/mman.h>
static long riscv_sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, @@ -17,6 +19,15 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) return -EINVAL;
- /*
* If only PROT_WRITE is specified then extend that to PROT_READ
* protection_map[VM_WRITE] is now going to select shadow stack encodings.
* So specifying PROT_WRITE actually should select protection_map [VM_WRITE | VM_READ]
* If user wants to create shadow stack then they should use `map_shadow_stack` syscall.
*/
- if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
The comments says that this should extend to PROT_READ if only PROT_WRITE is specified. This condition instead is checking if PROT_WRITE is selected but PROT_READ is not. If prot is (VM_EXEC | VM_WRITE) then it would be extended to (VM_EXEC | VM_WRITE | VM_READ). This will not currently cause any issues because these both map to the same value in the protection_map PAGE_COPY_EXEC, however this seems to be not the intention of this change.
prot == PROT_WRITE better suits the condition explained in the comment.
If someone specifies this (PROT_EXEC | PROT_WRITE) today, it works because of the way permissions are setup in `protection_map`. On risc-v there is no way to have a page which is execute and write only. So expectation is that if some apps were using `PROT_EXEC | PROT_WRITE` today, they were working because internally it was translating to read, write and execute on page permissions level. This patch make sure that, it stays same from page permissions perspective.
If someone was using PROT_EXEC, it may translate to execute only and this change doesn't impact that.
Patch simply looks for presence of `PROT_WRITE` and absence of `PROT_READ` in protection flags and if that condition is satisfied, it assumes that caller assumed page is going to be read allowed as well.
The purpose of this change is for compatibility with shadow stack pages but this affects flags for pages that are not shadow stack pages. Adding PROT_READ to the other cases is redundant as protection_map already handles that mapping. Permissions being strictly PROT_WRITE is the only case that needs to be handled, and is the only case that is called out in the commit message and in the comment.
- Charlie
prot |= PROT_READ;
- return ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> (PAGE_SHIFT - page_shift_offset));
} diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index fa34cf55037b..98e5ece4052a 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -299,7 +299,7 @@ pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); static const pgprot_t protection_map[16] = { [VM_NONE] = PAGE_NONE, [VM_READ] = PAGE_READ,
- [VM_WRITE] = PAGE_COPY,
- [VM_WRITE] = PAGE_SHADOWSTACK, [VM_WRITE | VM_READ] = PAGE_COPY, [VM_EXEC] = PAGE_EXEC, [VM_EXEC | VM_READ] = PAGE_READ_EXEC,
diff --git a/mm/mmap.c b/mm/mmap.c index d89770eaab6b..57a974f49b00 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -47,6 +47,7 @@ #include <linux/oom.h> #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/processor.h>
It doesn't seem like this is necessary for this patch.
Thanks. Yeah it looks like I forgot to remove this over the churn. Will fix it.
- Charlie
#include <linux/uaccess.h>
#include <asm/cacheflush.h>
2.43.2
On Mon, May 13, 2024 at 11:36:49AM -0700, Charlie Jenkins wrote:
On Mon, May 13, 2024 at 10:47:25AM -0700, Deepak Gupta wrote:
On Fri, May 10, 2024 at 02:02:54PM -0700, Charlie Jenkins wrote:
On Wed, Apr 03, 2024 at 04:34:58PM -0700, Deepak Gupta wrote:
`arch_calc_vm_prot_bits` is implemented on risc-v to return VM_READ | VM_WRITE if PROT_WRITE is specified. Similarly `riscv_sys_mmap` is updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ). This is to make sure that any existing apps using PROT_WRITE still work.
Earlier `protection_map[VM_WRITE]` used to pick read-write PTE encodings. Now `protection_map[VM_WRITE]` will always pick PAGE_SHADOWSTACK PTE encodings for shadow stack. Above changes ensure that existing apps continue to work because underneath kernel will be picking `protection_map[VM_WRITE|VM_READ]` PTE encodings.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/mman.h | 24 ++++++++++++++++++++++++ arch/riscv/include/asm/pgtable.h | 1 + arch/riscv/kernel/sys_riscv.c | 11 +++++++++++ arch/riscv/mm/init.c | 2 +- mm/mmap.c | 1 + 5 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/mman.h
diff --git a/arch/riscv/include/asm/mman.h b/arch/riscv/include/asm/mman.h new file mode 100644 index 000000000000..ef9fedf32546 --- /dev/null +++ b/arch/riscv/include/asm/mman.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_MMAN_H__ +#define __ASM_MMAN_H__
+#include <linux/compiler.h> +#include <linux/types.h> +#include <uapi/asm/mman.h>
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
- unsigned long pkey __always_unused)
+{
- unsigned long ret = 0;
- /*
* If PROT_WRITE was specified, force it to VM_READ | VM_WRITE.
* Only VM_WRITE means shadow stack.
*/
- if (prot & PROT_WRITE)
ret = (VM_READ | VM_WRITE);
- return ret;
+} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
+#endif /* ! __ASM_MMAN_H__ */ diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 6066822e7396..4d5983bc6766 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -184,6 +184,7 @@ extern struct pt_alloc_ops pt_ops __initdata; #define PAGE_READ_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) #define PAGE_WRITE_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | \ _PAGE_EXEC | _PAGE_WRITE) +#define PAGE_SHADOWSTACK __pgprot(_PAGE_BASE | _PAGE_WRITE)
#define PAGE_COPY PAGE_READ #define PAGE_COPY_EXEC PAGE_READ_EXEC diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index f1c1416a9f1e..846c36b1b3d5 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -8,6 +8,8 @@ #include <linux/syscalls.h> #include <asm/cacheflush.h> #include <asm-generic/mman-common.h> +#include <vdso/vsyscall.h> +#include <asm/mman.h>
static long riscv_sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, @@ -17,6 +19,15 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) return -EINVAL;
- /*
* If only PROT_WRITE is specified then extend that to PROT_READ
* protection_map[VM_WRITE] is now going to select shadow stack encodings.
* So specifying PROT_WRITE actually should select protection_map [VM_WRITE | VM_READ]
* If user wants to create shadow stack then they should use `map_shadow_stack` syscall.
*/
- if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
The comments says that this should extend to PROT_READ if only PROT_WRITE is specified. This condition instead is checking if PROT_WRITE is selected but PROT_READ is not. If prot is (VM_EXEC | VM_WRITE) then it would be extended to (VM_EXEC | VM_WRITE | VM_READ). This will not currently cause any issues because these both map to the same value in the protection_map PAGE_COPY_EXEC, however this seems to be not the intention of this change.
prot == PROT_WRITE better suits the condition explained in the comment.
If someone specifies this (PROT_EXEC | PROT_WRITE) today, it works because of the way permissions are setup in `protection_map`. On risc-v there is no way to have a page which is execute and write only. So expectation is that if some apps were using `PROT_EXEC | PROT_WRITE` today, they were working because internally it was translating to read, write and execute on page permissions level. This patch make sure that, it stays same from page permissions perspective.
If someone was using PROT_EXEC, it may translate to execute only and this change doesn't impact that.
Patch simply looks for presence of `PROT_WRITE` and absence of `PROT_READ` in protection flags and if that condition is satisfied, it assumes that caller assumed page is going to be read allowed as well.
The purpose of this change is for compatibility with shadow stack pages but this affects flags for pages that are not shadow stack pages. Adding PROT_READ to the other cases is redundant as protection_map already handles that mapping. Permissions being strictly PROT_WRITE is the only case that needs to be handled, and is the only case that is called out in the commit message and in the comment.
Yeah that's fine. I can change the commit message or just strictly check for PROT_WRITE. It doesn't change bottomline, I am fine with either option.
Let me know your preference.
- Charlie
prot |= PROT_READ;
- return ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> (PAGE_SHIFT - page_shift_offset));
} diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index fa34cf55037b..98e5ece4052a 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -299,7 +299,7 @@ pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); static const pgprot_t protection_map[16] = { [VM_NONE] = PAGE_NONE, [VM_READ] = PAGE_READ,
- [VM_WRITE] = PAGE_COPY,
- [VM_WRITE] = PAGE_SHADOWSTACK, [VM_WRITE | VM_READ] = PAGE_COPY, [VM_EXEC] = PAGE_EXEC, [VM_EXEC | VM_READ] = PAGE_READ_EXEC,
diff --git a/mm/mmap.c b/mm/mmap.c index d89770eaab6b..57a974f49b00 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -47,6 +47,7 @@ #include <linux/oom.h> #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/processor.h>
It doesn't seem like this is necessary for this patch.
Thanks. Yeah it looks like I forgot to remove this over the churn. Will fix it.
- Charlie
#include <linux/uaccess.h>
#include <asm/cacheflush.h>
2.43.2
On Mon, May 13, 2024 at 11:41:34AM -0700, Deepak Gupta wrote:
On Mon, May 13, 2024 at 11:36:49AM -0700, Charlie Jenkins wrote:
On Mon, May 13, 2024 at 10:47:25AM -0700, Deepak Gupta wrote:
On Fri, May 10, 2024 at 02:02:54PM -0700, Charlie Jenkins wrote:
On Wed, Apr 03, 2024 at 04:34:58PM -0700, Deepak Gupta wrote:
`arch_calc_vm_prot_bits` is implemented on risc-v to return VM_READ | VM_WRITE if PROT_WRITE is specified. Similarly `riscv_sys_mmap` is updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ). This is to make sure that any existing apps using PROT_WRITE still work.
Earlier `protection_map[VM_WRITE]` used to pick read-write PTE encodings. Now `protection_map[VM_WRITE]` will always pick PAGE_SHADOWSTACK PTE encodings for shadow stack. Above changes ensure that existing apps continue to work because underneath kernel will be picking `protection_map[VM_WRITE|VM_READ]` PTE encodings.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/mman.h | 24 ++++++++++++++++++++++++ arch/riscv/include/asm/pgtable.h | 1 + arch/riscv/kernel/sys_riscv.c | 11 +++++++++++ arch/riscv/mm/init.c | 2 +- mm/mmap.c | 1 + 5 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/mman.h
diff --git a/arch/riscv/include/asm/mman.h b/arch/riscv/include/asm/mman.h new file mode 100644 index 000000000000..ef9fedf32546 --- /dev/null +++ b/arch/riscv/include/asm/mman.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_MMAN_H__ +#define __ASM_MMAN_H__
+#include <linux/compiler.h> +#include <linux/types.h> +#include <uapi/asm/mman.h>
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
- unsigned long pkey __always_unused)
+{
- unsigned long ret = 0;
- /*
* If PROT_WRITE was specified, force it to VM_READ | VM_WRITE.
* Only VM_WRITE means shadow stack.
*/
- if (prot & PROT_WRITE)
ret = (VM_READ | VM_WRITE);
- return ret;
+} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
+#endif /* ! __ASM_MMAN_H__ */ diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 6066822e7396..4d5983bc6766 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -184,6 +184,7 @@ extern struct pt_alloc_ops pt_ops __initdata; #define PAGE_READ_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) #define PAGE_WRITE_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | \ _PAGE_EXEC | _PAGE_WRITE) +#define PAGE_SHADOWSTACK __pgprot(_PAGE_BASE | _PAGE_WRITE)
#define PAGE_COPY PAGE_READ #define PAGE_COPY_EXEC PAGE_READ_EXEC diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index f1c1416a9f1e..846c36b1b3d5 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -8,6 +8,8 @@ #include <linux/syscalls.h> #include <asm/cacheflush.h> #include <asm-generic/mman-common.h> +#include <vdso/vsyscall.h> +#include <asm/mman.h>
static long riscv_sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, @@ -17,6 +19,15 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) return -EINVAL;
- /*
* If only PROT_WRITE is specified then extend that to PROT_READ
* protection_map[VM_WRITE] is now going to select shadow stack encodings.
* So specifying PROT_WRITE actually should select protection_map [VM_WRITE | VM_READ]
* If user wants to create shadow stack then they should use `map_shadow_stack` syscall.
*/
- if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
The comments says that this should extend to PROT_READ if only PROT_WRITE is specified. This condition instead is checking if PROT_WRITE is selected but PROT_READ is not. If prot is (VM_EXEC | VM_WRITE) then it would be extended to (VM_EXEC | VM_WRITE | VM_READ). This will not currently cause any issues because these both map to the same value in the protection_map PAGE_COPY_EXEC, however this seems to be not the intention of this change.
prot == PROT_WRITE better suits the condition explained in the comment.
If someone specifies this (PROT_EXEC | PROT_WRITE) today, it works because of the way permissions are setup in `protection_map`. On risc-v there is no way to have a page which is execute and write only. So expectation is that if some apps were using `PROT_EXEC | PROT_WRITE` today, they were working because internally it was translating to read, write and execute on page permissions level. This patch make sure that, it stays same from page permissions perspective.
If someone was using PROT_EXEC, it may translate to execute only and this change doesn't impact that.
Patch simply looks for presence of `PROT_WRITE` and absence of `PROT_READ` in protection flags and if that condition is satisfied, it assumes that caller assumed page is going to be read allowed as well.
The purpose of this change is for compatibility with shadow stack pages but this affects flags for pages that are not shadow stack pages. Adding PROT_READ to the other cases is redundant as protection_map already handles that mapping. Permissions being strictly PROT_WRITE is the only case that needs to be handled, and is the only case that is called out in the commit message and in the comment.
Yeah that's fine. I can change the commit message or just strictly check for PROT_WRITE. It doesn't change bottomline, I am fine with either option.
Let me know your preference.
I would prefer the strict check. This is not critical though so I will support whatever you decide!
- Charlie
- Charlie
prot |= PROT_READ;
- return ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> (PAGE_SHIFT - page_shift_offset));
} diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index fa34cf55037b..98e5ece4052a 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -299,7 +299,7 @@ pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); static const pgprot_t protection_map[16] = { [VM_NONE] = PAGE_NONE, [VM_READ] = PAGE_READ,
- [VM_WRITE] = PAGE_COPY,
- [VM_WRITE] = PAGE_SHADOWSTACK, [VM_WRITE | VM_READ] = PAGE_COPY, [VM_EXEC] = PAGE_EXEC, [VM_EXEC | VM_READ] = PAGE_READ_EXEC,
diff --git a/mm/mmap.c b/mm/mmap.c index d89770eaab6b..57a974f49b00 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -47,6 +47,7 @@ #include <linux/oom.h> #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/processor.h>
It doesn't seem like this is necessary for this patch.
Thanks. Yeah it looks like I forgot to remove this over the churn. Will fix it.
- Charlie
#include <linux/uaccess.h>
#include <asm/cacheflush.h>
2.43.2
Hi Deepak,
On 04/04/2024 01:34, Deepak Gupta wrote:
`arch_calc_vm_prot_bits` is implemented on risc-v to return VM_READ | VM_WRITE if PROT_WRITE is specified. Similarly `riscv_sys_mmap` is updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ). This is to make sure that any existing apps using PROT_WRITE still work.
Earlier `protection_map[VM_WRITE]` used to pick read-write PTE encodings. Now `protection_map[VM_WRITE]` will always pick PAGE_SHADOWSTACK PTE encodings for shadow stack. Above changes ensure that existing apps continue to work because underneath kernel will be picking `protection_map[VM_WRITE|VM_READ]` PTE encodings.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/mman.h | 24 ++++++++++++++++++++++++ arch/riscv/include/asm/pgtable.h | 1 + arch/riscv/kernel/sys_riscv.c | 11 +++++++++++ arch/riscv/mm/init.c | 2 +- mm/mmap.c | 1 + 5 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/mman.h
diff --git a/arch/riscv/include/asm/mman.h b/arch/riscv/include/asm/mman.h new file mode 100644 index 000000000000..ef9fedf32546 --- /dev/null +++ b/arch/riscv/include/asm/mman.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_MMAN_H__ +#define __ASM_MMAN_H__
+#include <linux/compiler.h> +#include <linux/types.h> +#include <uapi/asm/mman.h>
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
- unsigned long pkey __always_unused)
+{
- unsigned long ret = 0;
- /*
* If PROT_WRITE was specified, force it to VM_READ | VM_WRITE.
* Only VM_WRITE means shadow stack.
*/
- if (prot & PROT_WRITE)
ret = (VM_READ | VM_WRITE);
- return ret;
+} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
+#endif /* ! __ASM_MMAN_H__ */ diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 6066822e7396..4d5983bc6766 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -184,6 +184,7 @@ extern struct pt_alloc_ops pt_ops __initdata; #define PAGE_READ_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) #define PAGE_WRITE_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | \ _PAGE_EXEC | _PAGE_WRITE) +#define PAGE_SHADOWSTACK __pgprot(_PAGE_BASE | _PAGE_WRITE) #define PAGE_COPY PAGE_READ #define PAGE_COPY_EXEC PAGE_READ_EXEC diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index f1c1416a9f1e..846c36b1b3d5 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -8,6 +8,8 @@ #include <linux/syscalls.h> #include <asm/cacheflush.h> #include <asm-generic/mman-common.h> +#include <vdso/vsyscall.h> +#include <asm/mman.h> static long riscv_sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, @@ -17,6 +19,15 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) return -EINVAL;
- /*
* If only PROT_WRITE is specified then extend that to PROT_READ
* protection_map[VM_WRITE] is now going to select shadow stack encodings.
* So specifying PROT_WRITE actually should select protection_map [VM_WRITE | VM_READ]
* If user wants to create shadow stack then they should use `map_shadow_stack` syscall.
*/
- if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
prot |= PROT_READ;
- return ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> (PAGE_SHIFT - page_shift_offset)); }
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index fa34cf55037b..98e5ece4052a 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -299,7 +299,7 @@ pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); static const pgprot_t protection_map[16] = { [VM_NONE] = PAGE_NONE, [VM_READ] = PAGE_READ,
- [VM_WRITE] = PAGE_COPY,
- [VM_WRITE] = PAGE_SHADOWSTACK, [VM_WRITE | VM_READ] = PAGE_COPY, [VM_EXEC] = PAGE_EXEC, [VM_EXEC | VM_READ] = PAGE_READ_EXEC,
diff --git a/mm/mmap.c b/mm/mmap.c index d89770eaab6b..57a974f49b00 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -47,6 +47,7 @@ #include <linux/oom.h> #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/processor.h> #include <linux/uaccess.h> #include <asm/cacheflush.h>
What happens if someone restricts the permission to PROT_WRITE using mprotect()? I would say this is an issue since it would turn the pages into shadow stack pages.
On Sun, May 12, 2024 at 06:24:45PM +0200, Alexandre Ghiti wrote:
Hi Deepak,
On 04/04/2024 01:34, Deepak Gupta wrote:
`arch_calc_vm_prot_bits` is implemented on risc-v to return VM_READ | VM_WRITE if PROT_WRITE is specified. Similarly `riscv_sys_mmap` is updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ). This is to make sure that any existing apps using PROT_WRITE still work.
Earlier `protection_map[VM_WRITE]` used to pick read-write PTE encodings. Now `protection_map[VM_WRITE]` will always pick PAGE_SHADOWSTACK PTE encodings for shadow stack. Above changes ensure that existing apps continue to work because underneath kernel will be picking `protection_map[VM_WRITE|VM_READ]` PTE encodings.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/mman.h | 24 ++++++++++++++++++++++++ arch/riscv/include/asm/pgtable.h | 1 + arch/riscv/kernel/sys_riscv.c | 11 +++++++++++ arch/riscv/mm/init.c | 2 +- mm/mmap.c | 1 + 5 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/mman.h
diff --git a/arch/riscv/include/asm/mman.h b/arch/riscv/include/asm/mman.h new file mode 100644 index 000000000000..ef9fedf32546 --- /dev/null +++ b/arch/riscv/include/asm/mman.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_MMAN_H__ +#define __ASM_MMAN_H__
+#include <linux/compiler.h> +#include <linux/types.h> +#include <uapi/asm/mman.h>
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
- unsigned long pkey __always_unused)
+{
- unsigned long ret = 0;
- /*
* If PROT_WRITE was specified, force it to VM_READ | VM_WRITE.
* Only VM_WRITE means shadow stack.
*/
- if (prot & PROT_WRITE)
ret = (VM_READ | VM_WRITE);
- return ret;
+} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
+#endif /* ! __ASM_MMAN_H__ */ diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 6066822e7396..4d5983bc6766 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -184,6 +184,7 @@ extern struct pt_alloc_ops pt_ops __initdata; #define PAGE_READ_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) #define PAGE_WRITE_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | \ _PAGE_EXEC | _PAGE_WRITE) +#define PAGE_SHADOWSTACK __pgprot(_PAGE_BASE | _PAGE_WRITE) #define PAGE_COPY PAGE_READ #define PAGE_COPY_EXEC PAGE_READ_EXEC diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index f1c1416a9f1e..846c36b1b3d5 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -8,6 +8,8 @@ #include <linux/syscalls.h> #include <asm/cacheflush.h> #include <asm-generic/mman-common.h> +#include <vdso/vsyscall.h> +#include <asm/mman.h> static long riscv_sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, @@ -17,6 +19,15 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) return -EINVAL;
- /*
* If only PROT_WRITE is specified then extend that to PROT_READ
* protection_map[VM_WRITE] is now going to select shadow stack encodings.
* So specifying PROT_WRITE actually should select protection_map [VM_WRITE | VM_READ]
* If user wants to create shadow stack then they should use `map_shadow_stack` syscall.
*/
- if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
prot |= PROT_READ;
- return ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> (PAGE_SHIFT - page_shift_offset));
} diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index fa34cf55037b..98e5ece4052a 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -299,7 +299,7 @@ pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); static const pgprot_t protection_map[16] = { [VM_NONE] = PAGE_NONE, [VM_READ] = PAGE_READ,
- [VM_WRITE] = PAGE_COPY,
- [VM_WRITE] = PAGE_SHADOWSTACK, [VM_WRITE | VM_READ] = PAGE_COPY, [VM_EXEC] = PAGE_EXEC, [VM_EXEC | VM_READ] = PAGE_READ_EXEC,
diff --git a/mm/mmap.c b/mm/mmap.c index d89770eaab6b..57a974f49b00 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -47,6 +47,7 @@ #include <linux/oom.h> #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/processor.h> #include <linux/uaccess.h> #include <asm/cacheflush.h>
What happens if someone restricts the permission to PROT_WRITE using mprotect()? I would say this is an issue since it would turn the pages into shadow stack pages.
look at this patch in this patch series. "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE"
It implements `arch_calc_vm_prot_bits` for risc-v and enforces that incoming PROT_WRITE is converted to VM_READ | VM_WRITE. And thus it'll become read/write memory. This way `mprotect` can be used to convert a shadow stack page to read/write memory but not a regular memory to shadow stack page.
This patch implements creating shadow stack pte (on riscv). Creating shadow stack PTE on riscv means that clearing RWX and then setting W=1.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/pgtable.h | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 4d5983bc6766..6362407f1e83 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -408,6 +408,12 @@ static inline pte_t pte_mkwrite_novma(pte_t pte) return __pte(pte_val(pte) | _PAGE_WRITE); }
+static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + /* shadow stack on risc-v is XWR = 010. Clear everything and only set _PAGE_WRITE */ + return __pte((pte_val(pte) & ~(_PAGE_LEAF)) | _PAGE_WRITE); +} + /* static inline pte_t pte_mkexec(pte_t pte) */
static inline pte_t pte_mkdirty(pte_t pte) @@ -693,6 +699,12 @@ static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))); }
+static inline pmd_t pmd_mkwrite_shstk(pmd_t pte) +{ + /* shadow stack on risc-v is XWR = 010. Clear everything and only set _PAGE_WRITE */ + return __pmd((pmd_val(pte) & ~(_PAGE_LEAF)) | _PAGE_WRITE); +} + static inline pmd_t pmd_wrprotect(pmd_t pmd) { return pte_pmd(pte_wrprotect(pmd_pte(pmd)));
On 04/04/2024 01:34, Deepak Gupta wrote:
This patch implements creating shadow stack pte (on riscv). Creating shadow stack PTE on riscv means that clearing RWX and then setting W=1.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/pgtable.h | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 4d5983bc6766..6362407f1e83 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -408,6 +408,12 @@ static inline pte_t pte_mkwrite_novma(pte_t pte) return __pte(pte_val(pte) | _PAGE_WRITE); } +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{
- /* shadow stack on risc-v is XWR = 010. Clear everything and only set _PAGE_WRITE */
Nit: Not sure the comment is necessary
- return __pte((pte_val(pte) & ~(_PAGE_LEAF)) | _PAGE_WRITE);
+}
- /* static inline pte_t pte_mkexec(pte_t pte) */
static inline pte_t pte_mkdirty(pte_t pte) @@ -693,6 +699,12 @@ static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))); } +static inline pmd_t pmd_mkwrite_shstk(pmd_t pte) +{
- /* shadow stack on risc-v is XWR = 010. Clear everything and only set _PAGE_WRITE */
- return __pmd((pmd_val(pte) & ~(_PAGE_LEAF)) | _PAGE_WRITE);
+}
- static inline pmd_t pmd_wrprotect(pmd_t pmd) { return pte_pmd(pte_wrprotect(pmd_pte(pmd)));
Otherwise:
Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com
Thanks,
Alex
pte_mkwrite creates PTEs with WRITE encodings for underlying arch. Underlying arch can have two types of writeable mappings. One that can be written using regular store instructions. Another one that can only be written using specialized store instructions (like shadow stack stores). pte_mkwrite can select write PTE encoding based on VMA range (i.e. VM_SHADOW_STACK)
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/pgtable.h | 7 +++++++ arch/riscv/mm/pgtable.c | 21 +++++++++++++++++++++ 2 files changed, 28 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 6362407f1e83..9b837239d3e8 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -403,6 +403,10 @@ static inline pte_t pte_wrprotect(pte_t pte)
/* static inline pte_t pte_mkread(pte_t pte) */
+struct vm_area_struct; +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma); +#define pte_mkwrite pte_mkwrite + static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_WRITE); @@ -694,6 +698,9 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pte_pmd(pte_mkyoung(pmd_pte(pmd))); }
+pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +#define pmd_mkwrite pmd_mkwrite + static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))); diff --git a/arch/riscv/mm/pgtable.c b/arch/riscv/mm/pgtable.c index ef887efcb679..c84ae2e0424d 100644 --- a/arch/riscv/mm/pgtable.c +++ b/arch/riscv/mm/pgtable.c @@ -142,3 +142,24 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, return pmd; } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + if (vma_is_shadow_stack(vma->vm_flags)) + return pte_mkwrite_shstk(pte); + + pte = pte_mkwrite_novma(pte); + + return pte; +} + +pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + if (vma_is_shadow_stack(vma->vm_flags)) + return pmd_mkwrite_shstk(pmd); + + pmd = pmd_mkwrite_novma(pmd); + + return pmd; +} +
On 04/04/2024 01:35, Deepak Gupta wrote:
pte_mkwrite creates PTEs with WRITE encodings for underlying arch. Underlying arch can have two types of writeable mappings. One that can be written using regular store instructions. Another one that can only be written using specialized store instructions (like shadow stack stores). pte_mkwrite can select write PTE encoding based on VMA range (i.e. VM_SHADOW_STACK)
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/pgtable.h | 7 +++++++ arch/riscv/mm/pgtable.c | 21 +++++++++++++++++++++ 2 files changed, 28 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 6362407f1e83..9b837239d3e8 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -403,6 +403,10 @@ static inline pte_t pte_wrprotect(pte_t pte) /* static inline pte_t pte_mkread(pte_t pte) */ +struct vm_area_struct; +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma); +#define pte_mkwrite pte_mkwrite
- static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_WRITE);
@@ -694,6 +698,9 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pte_pmd(pte_mkyoung(pmd_pte(pmd))); } +pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +#define pmd_mkwrite pmd_mkwrite
- static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd)));
diff --git a/arch/riscv/mm/pgtable.c b/arch/riscv/mm/pgtable.c index ef887efcb679..c84ae2e0424d 100644 --- a/arch/riscv/mm/pgtable.c +++ b/arch/riscv/mm/pgtable.c @@ -142,3 +142,24 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, return pmd; } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{
- if (vma_is_shadow_stack(vma->vm_flags))
return pte_mkwrite_shstk(pte);
- pte = pte_mkwrite_novma(pte);
I would directly return pte_mkwrite_novma(pte) instead of assigning pte.
- return pte;
+}
+pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{
- if (vma_is_shadow_stack(vma->vm_flags))
return pmd_mkwrite_shstk(pmd);
- pmd = pmd_mkwrite_novma(pmd);
Ditto here.
- return pmd;
+}
Otherwise:
Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com
Thanks,
Alex
On Sun, May 12, 2024 at 06:28:59PM +0200, Alexandre Ghiti wrote:
On 04/04/2024 01:35, Deepak Gupta wrote:
pte_mkwrite creates PTEs with WRITE encodings for underlying arch. Underlying arch can have two types of writeable mappings. One that can be written using regular store instructions. Another one that can only be written using specialized store instructions (like shadow stack stores). pte_mkwrite can select write PTE encoding based on VMA range (i.e. VM_SHADOW_STACK)
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/pgtable.h | 7 +++++++ arch/riscv/mm/pgtable.c | 21 +++++++++++++++++++++ 2 files changed, 28 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 6362407f1e83..9b837239d3e8 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -403,6 +403,10 @@ static inline pte_t pte_wrprotect(pte_t pte) /* static inline pte_t pte_mkread(pte_t pte) */ +struct vm_area_struct; +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma); +#define pte_mkwrite pte_mkwrite
static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_WRITE); @@ -694,6 +698,9 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pte_pmd(pte_mkyoung(pmd_pte(pmd))); } +pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +#define pmd_mkwrite pmd_mkwrite
static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))); diff --git a/arch/riscv/mm/pgtable.c b/arch/riscv/mm/pgtable.c index ef887efcb679..c84ae2e0424d 100644 --- a/arch/riscv/mm/pgtable.c +++ b/arch/riscv/mm/pgtable.c @@ -142,3 +142,24 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, return pmd; } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{
- if (vma_is_shadow_stack(vma->vm_flags))
return pte_mkwrite_shstk(pte);
- pte = pte_mkwrite_novma(pte);
I would directly return pte_mkwrite_novma(pte) instead of assigning pte.
noted.
- return pte;
+}
+pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{
- if (vma_is_shadow_stack(vma->vm_flags))
return pmd_mkwrite_shstk(pmd);
- pmd = pmd_mkwrite_novma(pmd);
Ditto here.
noted here too.
- return pmd;
+}
Otherwise:
Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com
Thanks,
Alex
`fork` implements copy on write (COW) by making pages readonly in child and parent both.
ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE. Assumption is that page is readable and on fault copy on write happens.
To implement COW on such pages, clearing up W bit makes them XWR = 000. This will result in wrong PTE setting which says no perms but V=1 and PFN field pointing to final page. Instead desired behavior is to turn it into a readable page, take an access (load/store) fault on sspush/sspop (shadow stack) and then perform COW on such pages. This way regular reads would still be allowed and not lead to COW maintaining current behavior of COW on non-shadow stack but writeable memory.
On the other hand it doesn't interfere with existing COW for read-write memory. Assumption is always that _PAGE_READ must have been set and thus setting _PAGE_READ is harmless.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/pgtable.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 9b837239d3e8..7a1c2a98d272 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -398,7 +398,7 @@ static inline int pte_special(pte_t pte)
static inline pte_t pte_wrprotect(pte_t pte) { - return __pte(pte_val(pte) & ~(_PAGE_WRITE)); + return __pte((pte_val(pte) & ~(_PAGE_WRITE)) | (_PAGE_READ)); }
/* static inline pte_t pte_mkread(pte_t pte) */ @@ -581,7 +581,15 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) { - atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep); + volatile pte_t read_pte = *ptep; + /* + * ptep_set_wrprotect can be called for shadow stack ranges too. + * shadow stack memory is XWR = 010 and thus clearing _PAGE_WRITE will lead to + * encoding 000b which is wrong encoding with V = 1. This should lead to page fault + * but we dont want this wrong configuration to be set in page tables. + */ + atomic_long_set((atomic_long_t *)ptep, + ((pte_val(read_pte) & ~(unsigned long)_PAGE_WRITE) | _PAGE_READ)); }
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
On 04/04/2024 01:35, Deepak Gupta wrote:
`fork` implements copy on write (COW) by making pages readonly in child and parent both.
ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE. Assumption is that page is readable and on fault copy on write happens.
To implement COW on such pages,
I guess you mean "shadow stack pages" here.
clearing up W bit makes them XWR = 000. This will result in wrong PTE setting which says no perms but V=1 and PFN field pointing to final page. Instead desired behavior is to turn it into a readable page, take an access (load/store) fault on sspush/sspop (shadow stack) and then perform COW on such pages. This way regular reads would still be allowed and not lead to COW maintaining current behavior of COW on non-shadow stack but writeable memory.
On the other hand it doesn't interfere with existing COW for read-write memory. Assumption is always that _PAGE_READ must have been set and thus setting _PAGE_READ is harmless.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/pgtable.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 9b837239d3e8..7a1c2a98d272 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -398,7 +398,7 @@ static inline int pte_special(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) {
- return __pte(pte_val(pte) & ~(_PAGE_WRITE));
- return __pte((pte_val(pte) & ~(_PAGE_WRITE)) | (_PAGE_READ)); }
/* static inline pte_t pte_mkread(pte_t pte) */ @@ -581,7 +581,15 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) {
- atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep);
- volatile pte_t read_pte = *ptep;
- /*
* ptep_set_wrprotect can be called for shadow stack ranges too.
* shadow stack memory is XWR = 010 and thus clearing _PAGE_WRITE will lead to
* encoding 000b which is wrong encoding with V = 1. This should lead to page fault
* but we dont want this wrong configuration to be set in page tables.
*/
- atomic_long_set((atomic_long_t *)ptep,
}((pte_val(read_pte) & ~(unsigned long)_PAGE_WRITE) | _PAGE_READ));
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
Doesn't making the shadow stack page readable allow "normal" loads to access the page? If it does, isn't that an issue (security-wise)?
On Sun, May 12, 2024 at 06:31:24PM +0200, Alexandre Ghiti wrote:
On 04/04/2024 01:35, Deepak Gupta wrote:
`fork` implements copy on write (COW) by making pages readonly in child and parent both.
ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE. Assumption is that page is readable and on fault copy on write happens.
To implement COW on such pages,
I guess you mean "shadow stack pages" here.
Yes I meant shadow stack pages. Will fix the message.
clearing up W bit makes them XWR = 000. This will result in wrong PTE setting which says no perms but V=1 and PFN field pointing to final page. Instead desired behavior is to turn it into a readable page, take an access (load/store) fault on sspush/sspop (shadow stack) and then perform COW on such pages. This way regular reads would still be allowed and not lead to COW maintaining current behavior of COW on non-shadow stack but writeable memory.
On the other hand it doesn't interfere with existing COW for read-write memory. Assumption is always that _PAGE_READ must have been set and thus setting _PAGE_READ is harmless.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/pgtable.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 9b837239d3e8..7a1c2a98d272 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -398,7 +398,7 @@ static inline int pte_special(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) {
- return __pte(pte_val(pte) & ~(_PAGE_WRITE));
- return __pte((pte_val(pte) & ~(_PAGE_WRITE)) | (_PAGE_READ));
} /* static inline pte_t pte_mkread(pte_t pte) */ @@ -581,7 +581,15 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) {
- atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep);
- volatile pte_t read_pte = *ptep;
- /*
* ptep_set_wrprotect can be called for shadow stack ranges too.
* shadow stack memory is XWR = 010 and thus clearing _PAGE_WRITE will lead to
* encoding 000b which is wrong encoding with V = 1. This should lead to page fault
* but we dont want this wrong configuration to be set in page tables.
*/
- atomic_long_set((atomic_long_t *)ptep,
((pte_val(read_pte) & ~(unsigned long)_PAGE_WRITE) | _PAGE_READ));
} #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
Doesn't making the shadow stack page readable allow "normal" loads to access the page? If it does, isn't that an issue (security-wise)?
When shadow stack permissions are there (i.e. R=0, W=1, X=0), then also shadow stack is readable through "normal" loads. So nothing changes when it converts into a readonly page from page permissions perspective.
Security-wise it's not a concern because from threat modeling perspective, if attacker had read-write primitives (via some bug in program) available to read and write address space of process/task; then they would have availiblity of return addresses on normal stack. It's the write primitive that is concerning and to be protected against. And that's why shadow stack is not writeable using "normal" stores.
Hi Deepak,
On Mon, May 13, 2024 at 7:32 PM Deepak Gupta debug@rivosinc.com wrote:
On Sun, May 12, 2024 at 06:31:24PM +0200, Alexandre Ghiti wrote:
On 04/04/2024 01:35, Deepak Gupta wrote:
`fork` implements copy on write (COW) by making pages readonly in child and parent both.
ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE. Assumption is that page is readable and on fault copy on write happens.
To implement COW on such pages,
I guess you mean "shadow stack pages" here.
Yes I meant shadow stack pages. Will fix the message.
clearing up W bit makes them XWR = 000. This will result in wrong PTE setting which says no perms but V=1 and PFN field pointing to final page. Instead desired behavior is to turn it into a readable page, take an access (load/store) fault on sspush/sspop (shadow stack) and then perform COW on such pages. This way regular reads would still be allowed and not lead to COW maintaining current behavior of COW on non-shadow stack but writeable memory.
On the other hand it doesn't interfere with existing COW for read-write memory. Assumption is always that _PAGE_READ must have been set and thus setting _PAGE_READ is harmless.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/pgtable.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 9b837239d3e8..7a1c2a98d272 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -398,7 +398,7 @@ static inline int pte_special(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) {
return __pte(pte_val(pte) & ~(_PAGE_WRITE));
return __pte((pte_val(pte) & ~(_PAGE_WRITE)) | (_PAGE_READ));
} /* static inline pte_t pte_mkread(pte_t pte) */ @@ -581,7 +581,15 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) {
atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep);
volatile pte_t read_pte = *ptep;
Sorry I missed this ^. You need to use ptep_get() to get the value of a pte. And why do you need the volatile here?
/*
* ptep_set_wrprotect can be called for shadow stack ranges too.
* shadow stack memory is XWR = 010 and thus clearing _PAGE_WRITE will lead to
* encoding 000b which is wrong encoding with V = 1. This should lead to page fault
* but we dont want this wrong configuration to be set in page tables.
*/
atomic_long_set((atomic_long_t *)ptep,
((pte_val(read_pte) & ~(unsigned long)_PAGE_WRITE) | _PAGE_READ));
} #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
Doesn't making the shadow stack page readable allow "normal" loads to access the page? If it does, isn't that an issue (security-wise)?
When shadow stack permissions are there (i.e. R=0, W=1, X=0), then also shadow stack is readable through "normal" loads. So nothing changes when it converts into a readonly page from page permissions perspective.
Security-wise it's not a concern because from threat modeling perspective, if attacker had read-write primitives (via some bug in program) available to read and write address space of process/task; then they would have availiblity of return addresses on normal stack. It's the write primitive that is concerning and to be protected against. And that's why shadow stack is not writeable using "normal" stores.
Thanks for the explanation!
With the use of ptep_get(), you can add:
Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com
Thanks,
Alex
On Thu, May 23, 2024 at 04:59:30PM +0200, Alexandre Ghiti wrote:
Hi Deepak,
On Mon, May 13, 2024 at 7:32 PM Deepak Gupta debug@rivosinc.com wrote:
On Sun, May 12, 2024 at 06:31:24PM +0200, Alexandre Ghiti wrote:
On 04/04/2024 01:35, Deepak Gupta wrote:
`fork` implements copy on write (COW) by making pages readonly in child and parent both.
ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE. Assumption is that page is readable and on fault copy on write happens.
To implement COW on such pages,
I guess you mean "shadow stack pages" here.
Yes I meant shadow stack pages. Will fix the message.
clearing up W bit makes them XWR = 000. This will result in wrong PTE setting which says no perms but V=1 and PFN field pointing to final page. Instead desired behavior is to turn it into a readable page, take an access (load/store) fault on sspush/sspop (shadow stack) and then perform COW on such pages. This way regular reads would still be allowed and not lead to COW maintaining current behavior of COW on non-shadow stack but writeable memory.
On the other hand it doesn't interfere with existing COW for read-write memory. Assumption is always that _PAGE_READ must have been set and thus setting _PAGE_READ is harmless.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/pgtable.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 9b837239d3e8..7a1c2a98d272 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -398,7 +398,7 @@ static inline int pte_special(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) {
return __pte(pte_val(pte) & ~(_PAGE_WRITE));
return __pte((pte_val(pte) & ~(_PAGE_WRITE)) | (_PAGE_READ));
} /* static inline pte_t pte_mkread(pte_t pte) */ @@ -581,7 +581,15 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) {
atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep);
volatile pte_t read_pte = *ptep;
Sorry I missed this ^. You need to use ptep_get() to get the value of a pte.
Noted. will fix it.
And why do you need the volatile here?
I don't remember the reason. It's probably not needed here. But I am sure I was debugging something and trying everything. And this probably slipped sanitization before sending patches.
Will fix it.
/*
* ptep_set_wrprotect can be called for shadow stack ranges too.
* shadow stack memory is XWR = 010 and thus clearing _PAGE_WRITE will lead to
* encoding 000b which is wrong encoding with V = 1. This should lead to page fault
* but we dont want this wrong configuration to be set in page tables.
*/
atomic_long_set((atomic_long_t *)ptep,
((pte_val(read_pte) & ~(unsigned long)_PAGE_WRITE) | _PAGE_READ));
} #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
Doesn't making the shadow stack page readable allow "normal" loads to access the page? If it does, isn't that an issue (security-wise)?
When shadow stack permissions are there (i.e. R=0, W=1, X=0), then also shadow stack is readable through "normal" loads. So nothing changes when it converts into a readonly page from page permissions perspective.
Security-wise it's not a concern because from threat modeling perspective, if attacker had read-write primitives (via some bug in program) available to read and write address space of process/task; then they would have availiblity of return addresses on normal stack. It's the write primitive that is concerning and to be protected against. And that's why shadow stack is not writeable using "normal" stores.
Thanks for the explanation!
With the use of ptep_get(), you can add:
Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com
Thanks,
Alex
As discussed extensively in the changelog for the addition of this syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the existing mmap() and madvise() syscalls do not map entirely well onto the security requirements for shadow stack memory since they lead to windows where memory is allocated but not yet protected or stacks which are not properly and safely initialised. Instead a new syscall map_shadow_stack() has been defined which allocates and initialises a shadow stack page.
This patch implements this syscall for riscv. riscv doesn't require token to be setup by kernel because user mode can do that by itself. However to provide compatibility and portability with other architectues, user mode can specify token set flag.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/usercfi.c | 149 ++++++++++++++++++++++++++++++++ include/uapi/asm-generic/mman.h | 1 + 3 files changed, 152 insertions(+) create mode 100644 arch/riscv/kernel/usercfi.c
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 604d6bf7e476..3bec82f4e94c 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -107,3 +107,5 @@ obj-$(CONFIG_COMPAT) += compat_vdso/
obj-$(CONFIG_64BIT) += pi/ obj-$(CONFIG_ACPI) += acpi.o + +obj-$(CONFIG_RISCV_USER_CFI) += usercfi.o diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c new file mode 100644 index 000000000000..c4ed0d4e33d6 --- /dev/null +++ b/arch/riscv/kernel/usercfi.c @@ -0,0 +1,149 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024 Rivos, Inc. + * Deepak Gupta debug@rivosinc.com + */ + +#include <linux/sched.h> +#include <linux/bitops.h> +#include <linux/types.h> +#include <linux/mm.h> +#include <linux/mman.h> +#include <linux/uaccess.h> +#include <linux/sizes.h> +#include <linux/user.h> +#include <linux/syscalls.h> +#include <linux/prctl.h> +#include <asm/csr.h> +#include <asm/usercfi.h> + +#define SHSTK_ENTRY_SIZE sizeof(void *) + +/* + * Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen + * implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to + * shadow stack. To keep it simple, we plan to use `ssamoswap` to perform writes on shadow + * stack. + */ +static noinline unsigned long amo_user_shstk(unsigned long *addr, unsigned long val) +{ + /* + * Since shadow stack is supported only in 64bit configuration, + * ssamoswap.d is used below. CONFIG_RISCV_USER_CFI is dependent + * on 64BIT and compile of this file is dependent on CONFIG_RISCV_USER_CFI + * In case ssamoswap faults, return -1. + * Never expect -1 on shadow stack. Expect return addresses and zero + */ + unsigned long swap = -1; + + __enable_user_access(); + asm goto( + ".option push\n" + ".option arch, +zicfiss\n" + "1: ssamoswap.d %[swap], %[val], %[addr]\n" + _ASM_EXTABLE(1b, %l[fault]) + RISCV_ACQUIRE_BARRIER + ".option pop\n" + : [swap] "=r" (swap), [addr] "+A" (*addr) + : [val] "r" (val) + : "memory" + : fault + ); + __disable_user_access(); + return swap; +fault: + __disable_user_access(); + return -1; +} + +/* + * Create a restore token on the shadow stack. A token is always XLEN wide + * and aligned to XLEN. + */ +static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) +{ + unsigned long addr; + + /* Token must be aligned */ + if (!IS_ALIGNED(ssp, SHSTK_ENTRY_SIZE)) + return -EINVAL; + + /* On RISC-V we're constructing token to be function of address itself */ + addr = ssp - SHSTK_ENTRY_SIZE; + + if (amo_user_shstk((unsigned long __user *)addr, (unsigned long) ssp) == -1) + return -EFAULT; + + if (token_addr) + *token_addr = addr; + + return 0; +} + +static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long size, + unsigned long token_offset, + bool set_tok) +{ + int flags = MAP_ANONYMOUS | MAP_PRIVATE; + struct mm_struct *mm = current->mm; + unsigned long populate, tok_loc = 0; + + if (addr) + flags |= MAP_FIXED_NOREPLACE; + + mmap_write_lock(mm); + addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &populate, NULL); + mmap_write_unlock(mm); + + if (!set_tok || IS_ERR_VALUE(addr)) + goto out; + + if (create_rstor_token(addr + token_offset, &tok_loc)) { + vm_munmap(addr, size); + return -EINVAL; + } + + addr = tok_loc; + +out: + return addr; +} + +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{ + bool set_tok = flags & SHADOW_STACK_SET_TOKEN; + unsigned long aligned_size = 0; + + if (!cpu_supports_shadow_stack()) + return -EOPNOTSUPP; + + /* Anything other than set token should result in invalid param */ + if (flags & ~SHADOW_STACK_SET_TOKEN) + return -EINVAL; + + /* + * Unlike other architectures, on RISC-V, SSP pointer is held in CSR_SSP and is available + * CSR in all modes. CSR accesses are performed using 12bit index programmed in instruction + * itself. This provides static property on register programming and writes to CSR can't + * be unintentional from programmer's perspective. As long as programmer has guarded areas + * which perform writes to CSR_SSP properly, shadow stack pivoting is not possible. Since + * CSR_SSP is writeable by user mode, it itself can setup a shadow stack token subsequent + * to allocation. Although in order to provide portablity with other architecture (because + * `map_shadow_stack` is arch agnostic syscall), RISC-V will follow expectation of a token + * flag in flags and if provided in flags, setup a token at the base. + */ + + /* If there isn't space for a token */ + if (set_tok && size < SHSTK_ENTRY_SIZE) + return -ENOSPC; + + if (addr && (addr % PAGE_SIZE)) + return -EINVAL; + + aligned_size = PAGE_ALIGN(size); + if (aligned_size < size) + return -EOVERFLOW; + + return allocate_shadow_stack(addr, aligned_size, size, set_tok); +} diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h index 57e8195d0b53..0c0ac6214de6 100644 --- a/include/uapi/asm-generic/mman.h +++ b/include/uapi/asm-generic/mman.h @@ -19,4 +19,5 @@ #define MCL_FUTURE 2 /* lock all future mappings */ #define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */ #endif /* __ASM_GENERIC_MMAN_H */
On 04/04/2024 01:35, Deepak Gupta wrote:
As discussed extensively in the changelog for the addition of this syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the existing mmap() and madvise() syscalls do not map entirely well onto the security requirements for shadow stack memory since they lead to windows where memory is allocated but not yet protected or stacks which are not properly and safely initialised. Instead a new syscall map_shadow_stack() has been defined which allocates and initialises a shadow stack page.
This patch implements this syscall for riscv. riscv doesn't require token to be setup by kernel because user mode can do that by itself. However to provide compatibility and portability with other architectues, user mode can specify token set flag.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/usercfi.c | 149 ++++++++++++++++++++++++++++++++ include/uapi/asm-generic/mman.h | 1 + 3 files changed, 152 insertions(+) create mode 100644 arch/riscv/kernel/usercfi.c
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 604d6bf7e476..3bec82f4e94c 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -107,3 +107,5 @@ obj-$(CONFIG_COMPAT) += compat_vdso/ obj-$(CONFIG_64BIT) += pi/ obj-$(CONFIG_ACPI) += acpi.o
+obj-$(CONFIG_RISCV_USER_CFI) += usercfi.o diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c new file mode 100644 index 000000000000..c4ed0d4e33d6 --- /dev/null +++ b/arch/riscv/kernel/usercfi.c @@ -0,0 +1,149 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- Copyright (C) 2024 Rivos, Inc.
- Deepak Gupta debug@rivosinc.com
- */
+#include <linux/sched.h> +#include <linux/bitops.h> +#include <linux/types.h> +#include <linux/mm.h> +#include <linux/mman.h> +#include <linux/uaccess.h> +#include <linux/sizes.h> +#include <linux/user.h> +#include <linux/syscalls.h> +#include <linux/prctl.h> +#include <asm/csr.h> +#include <asm/usercfi.h>
+#define SHSTK_ENTRY_SIZE sizeof(void *)
+/*
- Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen
- implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to
- shadow stack. To keep it simple, we plan to use `ssamoswap` to perform writes on shadow
- stack.
- */
+static noinline unsigned long amo_user_shstk(unsigned long *addr, unsigned long val) +{
- /*
* Since shadow stack is supported only in 64bit configuration,
* ssamoswap.d is used below.
* CONFIG_RISCV_USER_CFI is dependent
* on 64BIT and compile of this file is dependent on CONFIG_RISCV_USER_CFI
* In case ssamoswap faults, return -1.
To me, this part of the comment is not needed.
* Never expect -1 on shadow stack. Expect return addresses and zero
In that case, should we BUG() instead?
*/
- unsigned long swap = -1;
- __enable_user_access();
- asm goto(
".option push\n"
".option arch, +zicfiss\n"
"1: ssamoswap.d %[swap], %[val], %[addr]\n"
_ASM_EXTABLE(1b, %l[fault])
RISCV_ACQUIRE_BARRIER
".option pop\n"
: [swap] "=r" (swap), [addr] "+A" (*addr)
: [val] "r" (val)
: "memory"
: fault
);
- __disable_user_access();
- return swap;
+fault:
- __disable_user_access();
- return -1;
+}
+/*
- Create a restore token on the shadow stack. A token is always XLEN wide
- and aligned to XLEN.
- */
+static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) +{
- unsigned long addr;
- /* Token must be aligned */
- if (!IS_ALIGNED(ssp, SHSTK_ENTRY_SIZE))
return -EINVAL;
- /* On RISC-V we're constructing token to be function of address itself */
- addr = ssp - SHSTK_ENTRY_SIZE;
- if (amo_user_shstk((unsigned long __user *)addr, (unsigned long) ssp) == -1)
return -EFAULT;
- if (token_addr)
*token_addr = addr;
- return 0;
+}
+static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long size,
unsigned long token_offset,
bool set_tok)
+{
- int flags = MAP_ANONYMOUS | MAP_PRIVATE;
- struct mm_struct *mm = current->mm;
- unsigned long populate, tok_loc = 0;
- if (addr)
flags |= MAP_FIXED_NOREPLACE;
- mmap_write_lock(mm);
- addr = do_mmap(NULL, addr, size, PROT_READ, flags,
Hmmm why do you map the shadow stack as PROT_READ here?
VM_SHADOW_STACK | VM_WRITE, 0, &populate, NULL);
- mmap_write_unlock(mm);
- if (!set_tok || IS_ERR_VALUE(addr))
goto out;
- if (create_rstor_token(addr + token_offset, &tok_loc)) {
vm_munmap(addr, size);
return -EINVAL;
- }
- addr = tok_loc;
+out:
- return addr;
+}
+SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{
- bool set_tok = flags & SHADOW_STACK_SET_TOKEN;
- unsigned long aligned_size = 0;
- if (!cpu_supports_shadow_stack())
return -EOPNOTSUPP;
- /* Anything other than set token should result in invalid param */
- if (flags & ~SHADOW_STACK_SET_TOKEN)
return -EINVAL;
- /*
* Unlike other architectures, on RISC-V, SSP pointer is held in CSR_SSP and is available
* CSR in all modes. CSR accesses are performed using 12bit index programmed in instruction
* itself. This provides static property on register programming and writes to CSR can't
* be unintentional from programmer's perspective. As long as programmer has guarded areas
* which perform writes to CSR_SSP properly, shadow stack pivoting is not possible. Since
* CSR_SSP is writeable by user mode, it itself can setup a shadow stack token subsequent
* to allocation. Although in order to provide portablity with other architecture (because
* `map_shadow_stack` is arch agnostic syscall), RISC-V will follow expectation of a token
* flag in flags and if provided in flags, setup a token at the base.
*/
- /* If there isn't space for a token */
- if (set_tok && size < SHSTK_ENTRY_SIZE)
return -ENOSPC;
- if (addr && (addr % PAGE_SIZE))
I would use:
if (addr && (addr & (PAGE_SIZE - 1))
return -EINVAL;
- aligned_size = PAGE_ALIGN(size);
- if (aligned_size < size)
return -EOVERFLOW;
- return allocate_shadow_stack(addr, aligned_size, size, set_tok);
+} diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h index 57e8195d0b53..0c0ac6214de6 100644 --- a/include/uapi/asm-generic/mman.h +++ b/include/uapi/asm-generic/mman.h @@ -19,4 +19,5 @@ #define MCL_FUTURE 2 /* lock all future mappings */ #define MCL_ONFAULT 4 /* lock all pages that are faulted in */ +#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */ #endif /* __ASM_GENERIC_MMAN_H */
Don't we need to advertise this new syscall to the man pages?
On Sun, May 12, 2024 at 06:50:18PM +0200, Alexandre Ghiti wrote:
On 04/04/2024 01:35, Deepak Gupta wrote:
As discussed extensively in the changelog for the addition of this syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the existing mmap() and madvise() syscalls do not map entirely well onto the security requirements for shadow stack memory since they lead to windows where memory is allocated but not yet protected or stacks which are not properly and safely initialised. Instead a new syscall map_shadow_stack() has been defined which allocates and initialises a shadow stack page.
This patch implements this syscall for riscv. riscv doesn't require token to be setup by kernel because user mode can do that by itself. However to provide compatibility and portability with other architectues, user mode can specify token set flag.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/usercfi.c | 149 ++++++++++++++++++++++++++++++++ include/uapi/asm-generic/mman.h | 1 + 3 files changed, 152 insertions(+) create mode 100644 arch/riscv/kernel/usercfi.c
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 604d6bf7e476..3bec82f4e94c 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -107,3 +107,5 @@ obj-$(CONFIG_COMPAT) += compat_vdso/ obj-$(CONFIG_64BIT) += pi/ obj-$(CONFIG_ACPI) += acpi.o
+obj-$(CONFIG_RISCV_USER_CFI) += usercfi.o diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c new file mode 100644 index 000000000000..c4ed0d4e33d6 --- /dev/null +++ b/arch/riscv/kernel/usercfi.c @@ -0,0 +1,149 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- Copyright (C) 2024 Rivos, Inc.
- Deepak Gupta debug@rivosinc.com
- */
+#include <linux/sched.h> +#include <linux/bitops.h> +#include <linux/types.h> +#include <linux/mm.h> +#include <linux/mman.h> +#include <linux/uaccess.h> +#include <linux/sizes.h> +#include <linux/user.h> +#include <linux/syscalls.h> +#include <linux/prctl.h> +#include <asm/csr.h> +#include <asm/usercfi.h>
+#define SHSTK_ENTRY_SIZE sizeof(void *)
+/*
- Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen
- implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to
- shadow stack. To keep it simple, we plan to use `ssamoswap` to perform writes on shadow
- stack.
- */
+static noinline unsigned long amo_user_shstk(unsigned long *addr, unsigned long val) +{
- /*
* Since shadow stack is supported only in 64bit configuration,
* ssamoswap.d is used below.
* CONFIG_RISCV_USER_CFI is dependent
* on 64BIT and compile of this file is dependent on CONFIG_RISCV_USER_CFI
* In case ssamoswap faults, return -1.
To me, this part of the comment is not needed.
Ok, will remove it.
* Never expect -1 on shadow stack. Expect return addresses and zero
In that case, should we BUG() instead?
Caller (create_rstor_token) of `amo_user_shstk` is returning -EFAULT. It'll translate to signal (SIGSEGV) delivery to user app or terminate.
*/
- unsigned long swap = -1;
- __enable_user_access();
- asm goto(
".option push\n"
".option arch, +zicfiss\n"
"1: ssamoswap.d %[swap], %[val], %[addr]\n"
_ASM_EXTABLE(1b, %l[fault])
RISCV_ACQUIRE_BARRIER
".option pop\n"
: [swap] "=r" (swap), [addr] "+A" (*addr)
: [val] "r" (val)
: "memory"
: fault
);
- __disable_user_access();
- return swap;
+fault:
- __disable_user_access();
- return -1;
+}
+/*
- Create a restore token on the shadow stack. A token is always XLEN wide
- and aligned to XLEN.
- */
+static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) +{
- unsigned long addr;
- /* Token must be aligned */
- if (!IS_ALIGNED(ssp, SHSTK_ENTRY_SIZE))
return -EINVAL;
- /* On RISC-V we're constructing token to be function of address itself */
- addr = ssp - SHSTK_ENTRY_SIZE;
- if (amo_user_shstk((unsigned long __user *)addr, (unsigned long) ssp) == -1)
return -EFAULT;
- if (token_addr)
*token_addr = addr;
- return 0;
+}
+static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long size,
unsigned long token_offset,
bool set_tok)
+{
- int flags = MAP_ANONYMOUS | MAP_PRIVATE;
- struct mm_struct *mm = current->mm;
- unsigned long populate, tok_loc = 0;
- if (addr)
flags |= MAP_FIXED_NOREPLACE;
- mmap_write_lock(mm);
- addr = do_mmap(NULL, addr, size, PROT_READ, flags,
Hmmm why do you map the shadow stack as PROT_READ here?
I believe its redundant here. I followed what x86 did for their shadow stack creation. GCS (arm shadow stack) patches also do same thing. Collectively, we think at some time in future many of these flows will become generic (arch agnostic).
VM_SHADOW_STACK | VM_WRITE, 0, &populate, NULL);
- mmap_write_unlock(mm);
- if (!set_tok || IS_ERR_VALUE(addr))
goto out;
- if (create_rstor_token(addr + token_offset, &tok_loc)) {
vm_munmap(addr, size);
return -EINVAL;
- }
- addr = tok_loc;
+out:
- return addr;
+}
+SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{
- bool set_tok = flags & SHADOW_STACK_SET_TOKEN;
- unsigned long aligned_size = 0;
- if (!cpu_supports_shadow_stack())
return -EOPNOTSUPP;
- /* Anything other than set token should result in invalid param */
- if (flags & ~SHADOW_STACK_SET_TOKEN)
return -EINVAL;
- /*
* Unlike other architectures, on RISC-V, SSP pointer is held in CSR_SSP and is available
* CSR in all modes. CSR accesses are performed using 12bit index programmed in instruction
* itself. This provides static property on register programming and writes to CSR can't
* be unintentional from programmer's perspective. As long as programmer has guarded areas
* which perform writes to CSR_SSP properly, shadow stack pivoting is not possible. Since
* CSR_SSP is writeable by user mode, it itself can setup a shadow stack token subsequent
* to allocation. Although in order to provide portablity with other architecture (because
* `map_shadow_stack` is arch agnostic syscall), RISC-V will follow expectation of a token
* flag in flags and if provided in flags, setup a token at the base.
*/
- /* If there isn't space for a token */
- if (set_tok && size < SHSTK_ENTRY_SIZE)
return -ENOSPC;
- if (addr && (addr % PAGE_SIZE))
I would use:
if (addr && (addr & (PAGE_SIZE - 1))
noted.
return -EINVAL;
- aligned_size = PAGE_ALIGN(size);
- if (aligned_size < size)
return -EOVERFLOW;
- return allocate_shadow_stack(addr, aligned_size, size, set_tok);
+} diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h index 57e8195d0b53..0c0ac6214de6 100644 --- a/include/uapi/asm-generic/mman.h +++ b/include/uapi/asm-generic/mman.h @@ -19,4 +19,5 @@ #define MCL_FUTURE 2 /* lock all future mappings */ #define MCL_ONFAULT 4 /* lock all pages that are faulted in */ +#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */ #endif /* __ASM_GENERIC_MMAN_H */
Don't we need to advertise this new syscall to the man pages?
`map_shadow_stack` is already mainline as part of x86. I am assuming there is man page for this. I'll check to be sure and confirm here.
Userspace specifies VM_CLONE to share address space and spawn new thread. `clone` allow userspace to specify a new stack for new thread. However there is no way to specify new shadow stack base address without changing API. This patch allocates a new shadow stack whenever VM_CLONE is given.
In case of VM_FORK, parent is suspended until child finishes and thus can child use parent shadow stack. In case of !VM_CLONE, COW kicks in because entire address space is copied from parent to child.
`clone3` is extensible and can provide mechanisms using which shadow stack as an input parameter can be provided. This is not settled yet and being extensively discussed on mailing list. Once that's settled, this commit will adapt to that.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/usercfi.h | 39 ++++++++++ arch/riscv/kernel/process.c | 12 ++- arch/riscv/kernel/usercfi.c | 121 +++++++++++++++++++++++++++++++ 3 files changed, 171 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h index 4fa201b4fc4e..b47574a7a8c9 100644 --- a/arch/riscv/include/asm/usercfi.h +++ b/arch/riscv/include/asm/usercfi.h @@ -8,6 +8,9 @@ #ifndef __ASSEMBLY__ #include <linux/types.h>
+struct task_struct; +struct kernel_clone_args; + #ifdef CONFIG_RISCV_USER_CFI struct cfi_status { unsigned long ubcfi_en : 1; /* Enable for backward cfi. */ @@ -17,6 +20,42 @@ struct cfi_status { unsigned long shdw_stk_size; /* size of shadow stack */ };
+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, + const struct kernel_clone_args *args); +void shstk_release(struct task_struct *tsk); +void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size); +void set_active_shstk(struct task_struct *task, unsigned long shstk_addr); +bool is_shstk_enabled(struct task_struct *task); + +#else + +static inline unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, + const struct kernel_clone_args *args) +{ + return 0; +} + +static inline void shstk_release(struct task_struct *tsk) +{ + +} + +static inline void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, + unsigned long size) +{ + +} + +static inline void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) +{ + +} + +static inline bool is_shstk_enabled(struct task_struct *task) +{ + return false; +} + #endif /* CONFIG_RISCV_USER_CFI */
#endif /* __ASSEMBLY__ */ diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index ce577cdc2af3..ef48a25b0eff 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -26,6 +26,7 @@ #include <asm/cpuidle.h> #include <asm/vector.h> #include <asm/cpufeature.h> +#include <asm/usercfi.h>
register unsigned long gp_in_global __asm__("gp");
@@ -202,7 +203,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
void exit_thread(struct task_struct *tsk) { - + if (IS_ENABLED(CONFIG_RISCV_USER_CFI)) + shstk_release(tsk); }
int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) @@ -210,6 +212,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) unsigned long clone_flags = args->flags; unsigned long usp = args->stack; unsigned long tls = args->tls; + unsigned long ssp = 0; struct pt_regs *childregs = task_pt_regs(p);
memset(&p->thread.s, 0, sizeof(p->thread.s)); @@ -225,11 +228,18 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) p->thread.s[0] = (unsigned long)args->fn; p->thread.s[1] = (unsigned long)args->fn_arg; } else { + /* allocate new shadow stack if needed. In case of CLONE_VM we have to */ + ssp = shstk_alloc_thread_stack(p, args); + if (IS_ERR_VALUE(ssp)) + return PTR_ERR((void *)ssp); + *childregs = *(current_pt_regs()); /* Turn off status.VS */ riscv_v_vstate_off(childregs); if (usp) /* User fork */ childregs->sp = usp; + if (ssp) /* if needed, set new ssp */ + set_active_shstk(p, ssp); if (clone_flags & CLONE_SETTLS) childregs->tp = tls; childregs->a0 = 0; /* Return value of fork() */ diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c index c4ed0d4e33d6..11ef7ab925c9 100644 --- a/arch/riscv/kernel/usercfi.c +++ b/arch/riscv/kernel/usercfi.c @@ -19,6 +19,41 @@
#define SHSTK_ENTRY_SIZE sizeof(void *)
+bool is_shstk_enabled(struct task_struct *task) +{ + return task->thread_info.user_cfi_state.ubcfi_en ? true : false; +} + +void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size) +{ + task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr; + task->thread_info.user_cfi_state.shdw_stk_size = size; +} + +unsigned long get_shstk_base(struct task_struct *task, unsigned long *size) +{ + if (size) + *size = task->thread_info.user_cfi_state.shdw_stk_size; + return task->thread_info.user_cfi_state.shdw_stk_base; +} + +void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) +{ + task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr; +} + +/* + * If size is 0, then to be compatible with regular stack we want it to be as big as + * regular stack. Else PAGE_ALIGN it and return back + */ +static unsigned long calc_shstk_size(unsigned long size) +{ + if (size) + return PAGE_ALIGN(size); + + return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); +} + /* * Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen * implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to @@ -147,3 +182,89 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi
return allocate_shadow_stack(addr, aligned_size, size, set_tok); } + +/* + * This gets called during clone/clone3/fork. And is needed to allocate a shadow stack for + * cases where CLONE_VM is specified and thus a different stack is specified by user. We + * thus need a separate shadow stack too. How does separate shadow stack is specified by + * user is still being debated. Once that's settled, remove this part of the comment. + * This function simply returns 0 if shadow stack are not supported or if separate shadow + * stack allocation is not needed (like in case of !CLONE_VM) + */ +unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, + const struct kernel_clone_args *args) +{ + unsigned long addr, size; + + /* If shadow stack is not supported, return 0 */ + if (!cpu_supports_shadow_stack()) + return 0; + + /* + * If shadow stack is not enabled on the new thread, skip any + * switch to a new shadow stack. + */ + if (is_shstk_enabled(tsk)) + return 0; + + /* + * For CLONE_VFORK the child will share the parents shadow stack. + * Set base = 0 and size = 0, this is special means to track this state + * so the freeing logic run for child knows to leave it alone. + */ + if (args->flags & CLONE_VFORK) { + set_shstk_base(tsk, 0, 0); + return 0; + } + + /* + * For !CLONE_VM the child will use a copy of the parents shadow + * stack. + */ + if (!(args->flags & CLONE_VM)) + return 0; + + /* + * reaching here means, CLONE_VM was specified and thus a separate shadow + * stack is needed for new cloned thread. Note: below allocation is happening + * using current mm. + */ + size = calc_shstk_size(args->stack_size); + addr = allocate_shadow_stack(0, size, 0, false); + if (IS_ERR_VALUE(addr)) + return addr; + + set_shstk_base(tsk, addr, size); + + return addr + size; +} + +void shstk_release(struct task_struct *tsk) +{ + unsigned long base = 0, size = 0; + /* If shadow stack is not supported or not enabled, nothing to release */ + if (!cpu_supports_shadow_stack() || + !is_shstk_enabled(tsk)) + return; + + /* + * When fork() with CLONE_VM fails, the child (tsk) already has a + * shadow stack allocated, and exit_thread() calls this function to + * free it. In this case the parent (current) and the child share + * the same mm struct. Move forward only when they're same. + */ + if (!tsk->mm || tsk->mm != current->mm) + return; + + /* + * We know shadow stack is enabled but if base is NULL, then + * this task is not managing its own shadow stack (CLONE_VFORK). So + * skip freeing it. + */ + base = get_shstk_base(tsk, &size); + if (!base) + return; + + vm_munmap(base, size); + set_shstk_base(tsk, 0, 0); +}
On 04/04/2024 01:35, Deepak Gupta wrote:
Userspace specifies VM_CLONE to share address space and spawn new thread.
CLONE_VM?
`clone` allow userspace to specify a new stack for new thread. However there is no way to specify new shadow stack base address without changing API. This patch allocates a new shadow stack whenever VM_CLONE is given.
In case of VM_FORK, parent is suspended until child finishes and thus can
You mean CLONE_VFORK here right?
child use parent shadow stack. In case of !VM_CLONE, COW kicks in because entire address space is copied from parent to child.
`clone3` is extensible and can provide mechanisms using which shadow stack as an input parameter can be provided. This is not settled yet and being extensively discussed on mailing list. Once that's settled, this commit will adapt to that.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/usercfi.h | 39 ++++++++++ arch/riscv/kernel/process.c | 12 ++- arch/riscv/kernel/usercfi.c | 121 +++++++++++++++++++++++++++++++ 3 files changed, 171 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h index 4fa201b4fc4e..b47574a7a8c9 100644 --- a/arch/riscv/include/asm/usercfi.h +++ b/arch/riscv/include/asm/usercfi.h @@ -8,6 +8,9 @@ #ifndef __ASSEMBLY__ #include <linux/types.h> +struct task_struct; +struct kernel_clone_args;
- #ifdef CONFIG_RISCV_USER_CFI struct cfi_status { unsigned long ubcfi_en : 1; /* Enable for backward cfi. */
@@ -17,6 +20,42 @@ struct cfi_status { unsigned long shdw_stk_size; /* size of shadow stack */ }; +unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
const struct kernel_clone_args *args);
+void shstk_release(struct task_struct *tsk); +void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size); +void set_active_shstk(struct task_struct *task, unsigned long shstk_addr); +bool is_shstk_enabled(struct task_struct *task);
+#else
+static inline unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
const struct kernel_clone_args *args)
+{
- return 0;
+}
+static inline void shstk_release(struct task_struct *tsk) +{
+}
+static inline void set_shstk_base(struct task_struct *task, unsigned long shstk_addr,
unsigned long size)
+{
+}
+static inline void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) +{
+}
+static inline bool is_shstk_enabled(struct task_struct *task) +{
- return false;
+}
- #endif /* CONFIG_RISCV_USER_CFI */
#endif /* __ASSEMBLY__ */ diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index ce577cdc2af3..ef48a25b0eff 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -26,6 +26,7 @@ #include <asm/cpuidle.h> #include <asm/vector.h> #include <asm/cpufeature.h> +#include <asm/usercfi.h> register unsigned long gp_in_global __asm__("gp"); @@ -202,7 +203,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) void exit_thread(struct task_struct *tsk) {
- if (IS_ENABLED(CONFIG_RISCV_USER_CFI))
}shstk_release(tsk);
int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) @@ -210,6 +212,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) unsigned long clone_flags = args->flags; unsigned long usp = args->stack; unsigned long tls = args->tls;
- unsigned long ssp = 0; struct pt_regs *childregs = task_pt_regs(p);
memset(&p->thread.s, 0, sizeof(p->thread.s)); @@ -225,11 +228,18 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) p->thread.s[0] = (unsigned long)args->fn; p->thread.s[1] = (unsigned long)args->fn_arg; } else {
/* allocate new shadow stack if needed. In case of CLONE_VM we have to */
ssp = shstk_alloc_thread_stack(p, args);
if (IS_ERR_VALUE(ssp))
return PTR_ERR((void *)ssp);
- *childregs = *(current_pt_regs()); /* Turn off status.VS */ riscv_v_vstate_off(childregs); if (usp) /* User fork */ childregs->sp = usp;
if (ssp) /* if needed, set new ssp */
if (clone_flags & CLONE_SETTLS) childregs->tp = tls; childregs->a0 = 0; /* Return value of fork() */set_active_shstk(p, ssp);
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c index c4ed0d4e33d6..11ef7ab925c9 100644 --- a/arch/riscv/kernel/usercfi.c +++ b/arch/riscv/kernel/usercfi.c @@ -19,6 +19,41 @@ #define SHSTK_ENTRY_SIZE sizeof(void *) +bool is_shstk_enabled(struct task_struct *task) +{
- return task->thread_info.user_cfi_state.ubcfi_en ? true : false;
+}
+void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size) +{
- task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr;
- task->thread_info.user_cfi_state.shdw_stk_size = size;
+}
+unsigned long get_shstk_base(struct task_struct *task, unsigned long *size) +{
- if (size)
*size = task->thread_info.user_cfi_state.shdw_stk_size;
- return task->thread_info.user_cfi_state.shdw_stk_base;
+}
+void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) +{
- task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr;
+}
+/*
- If size is 0, then to be compatible with regular stack we want it to be as big as
- regular stack. Else PAGE_ALIGN it and return back
- */
+static unsigned long calc_shstk_size(unsigned long size) +{
- if (size)
return PAGE_ALIGN(size);
- return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G));
+}
- /*
- Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen
- implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to
@@ -147,3 +182,89 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi return allocate_shadow_stack(addr, aligned_size, size, set_tok); }
+/*
- This gets called during clone/clone3/fork. And is needed to allocate a shadow stack for
- cases where CLONE_VM is specified and thus a different stack is specified by user. We
- thus need a separate shadow stack too. How does separate shadow stack is specified by
- user is still being debated. Once that's settled, remove this part of the comment.
- This function simply returns 0 if shadow stack are not supported or if separate shadow
- stack allocation is not needed (like in case of !CLONE_VM)
- */
+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
const struct kernel_clone_args *args)
+{
- unsigned long addr, size;
- /* If shadow stack is not supported, return 0 */
- if (!cpu_supports_shadow_stack())
return 0;
- /*
* If shadow stack is not enabled on the new thread, skip any
* switch to a new shadow stack.
*/
- if (is_shstk_enabled(tsk))
return 0;
- /*
* For CLONE_VFORK the child will share the parents shadow stack.
* Set base = 0 and size = 0, this is special means to track this state
* so the freeing logic run for child knows to leave it alone.
*/
- if (args->flags & CLONE_VFORK) {
set_shstk_base(tsk, 0, 0);
return 0;
- }
- /*
* For !CLONE_VM the child will use a copy of the parents shadow
* stack.
*/
- if (!(args->flags & CLONE_VM))
return 0;
- /*
* reaching here means, CLONE_VM was specified and thus a separate shadow
* stack is needed for new cloned thread. Note: below allocation is happening
* using current mm.
*/
- size = calc_shstk_size(args->stack_size);
- addr = allocate_shadow_stack(0, size, 0, false);
- if (IS_ERR_VALUE(addr))
return addr;
- set_shstk_base(tsk, addr, size);
- return addr + size;
+}
+void shstk_release(struct task_struct *tsk) +{
- unsigned long base = 0, size = 0;
- /* If shadow stack is not supported or not enabled, nothing to release */
- if (!cpu_supports_shadow_stack() ||
!is_shstk_enabled(tsk))
return;
- /*
* When fork() with CLONE_VM fails, the child (tsk) already has a
* shadow stack allocated, and exit_thread() calls this function to
* free it. In this case the parent (current) and the child share
* the same mm struct. Move forward only when they're same.
*/
- if (!tsk->mm || tsk->mm != current->mm)
return;
- /*
* We know shadow stack is enabled but if base is NULL, then
* this task is not managing its own shadow stack (CLONE_VFORK). So
* skip freeing it.
*/
- base = get_shstk_base(tsk, &size);
- if (!base)
return;
- vm_munmap(base, size);
- set_shstk_base(tsk, 0, 0);
+}
On Sun, May 12, 2024 at 07:05:27PM +0200, Alexandre Ghiti wrote:
On 04/04/2024 01:35, Deepak Gupta wrote:
Userspace specifies VM_CLONE to share address space and spawn new thread.
CLONE_VM?
Yes I meant CLONE_VM, will fix it.
`clone` allow userspace to specify a new stack for new thread. However there is no way to specify new shadow stack base address without changing API. This patch allocates a new shadow stack whenever VM_CLONE is given.
In case of VM_FORK, parent is suspended until child finishes and thus can
You mean CLONE_VFORK here right?
Yes I meant CLONE_VFORK, will fix it.
child use parent shadow stack. In case of !VM_CLONE, COW kicks in because entire address space is copied from parent to child.
`clone3` is extensible and can provide mechanisms using which shadow stack as an input parameter can be provided. This is not settled yet and being extensively discussed on mailing list. Once that's settled, this commit will adapt to that.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/asm/usercfi.h | 39 ++++++++++ arch/riscv/kernel/process.c | 12 ++- arch/riscv/kernel/usercfi.c | 121 +++++++++++++++++++++++++++++++ 3 files changed, 171 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h index 4fa201b4fc4e..b47574a7a8c9 100644 --- a/arch/riscv/include/asm/usercfi.h +++ b/arch/riscv/include/asm/usercfi.h @@ -8,6 +8,9 @@ #ifndef __ASSEMBLY__ #include <linux/types.h> +struct task_struct; +struct kernel_clone_args;
#ifdef CONFIG_RISCV_USER_CFI struct cfi_status { unsigned long ubcfi_en : 1; /* Enable for backward cfi. */ @@ -17,6 +20,42 @@ struct cfi_status { unsigned long shdw_stk_size; /* size of shadow stack */ }; +unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
const struct kernel_clone_args *args);
+void shstk_release(struct task_struct *tsk); +void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size); +void set_active_shstk(struct task_struct *task, unsigned long shstk_addr); +bool is_shstk_enabled(struct task_struct *task);
+#else
+static inline unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
const struct kernel_clone_args *args)
+{
- return 0;
+}
+static inline void shstk_release(struct task_struct *tsk) +{
+}
+static inline void set_shstk_base(struct task_struct *task, unsigned long shstk_addr,
unsigned long size)
+{
+}
+static inline void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) +{
+}
+static inline bool is_shstk_enabled(struct task_struct *task) +{
- return false;
+}
#endif /* CONFIG_RISCV_USER_CFI */ #endif /* __ASSEMBLY__ */ diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index ce577cdc2af3..ef48a25b0eff 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -26,6 +26,7 @@ #include <asm/cpuidle.h> #include <asm/vector.h> #include <asm/cpufeature.h> +#include <asm/usercfi.h> register unsigned long gp_in_global __asm__("gp"); @@ -202,7 +203,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) void exit_thread(struct task_struct *tsk) {
- if (IS_ENABLED(CONFIG_RISCV_USER_CFI))
shstk_release(tsk);
} int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) @@ -210,6 +212,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) unsigned long clone_flags = args->flags; unsigned long usp = args->stack; unsigned long tls = args->tls;
- unsigned long ssp = 0; struct pt_regs *childregs = task_pt_regs(p); memset(&p->thread.s, 0, sizeof(p->thread.s));
@@ -225,11 +228,18 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) p->thread.s[0] = (unsigned long)args->fn; p->thread.s[1] = (unsigned long)args->fn_arg; } else {
/* allocate new shadow stack if needed. In case of CLONE_VM we have to */
ssp = shstk_alloc_thread_stack(p, args);
if (IS_ERR_VALUE(ssp))
return PTR_ERR((void *)ssp);
- *childregs = *(current_pt_regs()); /* Turn off status.VS */ riscv_v_vstate_off(childregs); if (usp) /* User fork */ childregs->sp = usp;
if (ssp) /* if needed, set new ssp */
if (clone_flags & CLONE_SETTLS) childregs->tp = tls; childregs->a0 = 0; /* Return value of fork() */set_active_shstk(p, ssp);
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c index c4ed0d4e33d6..11ef7ab925c9 100644 --- a/arch/riscv/kernel/usercfi.c +++ b/arch/riscv/kernel/usercfi.c @@ -19,6 +19,41 @@ #define SHSTK_ENTRY_SIZE sizeof(void *) +bool is_shstk_enabled(struct task_struct *task) +{
- return task->thread_info.user_cfi_state.ubcfi_en ? true : false;
+}
+void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size) +{
- task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr;
- task->thread_info.user_cfi_state.shdw_stk_size = size;
+}
+unsigned long get_shstk_base(struct task_struct *task, unsigned long *size) +{
- if (size)
*size = task->thread_info.user_cfi_state.shdw_stk_size;
- return task->thread_info.user_cfi_state.shdw_stk_base;
+}
+void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) +{
- task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr;
+}
+/*
- If size is 0, then to be compatible with regular stack we want it to be as big as
- regular stack. Else PAGE_ALIGN it and return back
- */
+static unsigned long calc_shstk_size(unsigned long size) +{
- if (size)
return PAGE_ALIGN(size);
- return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G));
+}
/*
- Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen
- implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to
@@ -147,3 +182,89 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi return allocate_shadow_stack(addr, aligned_size, size, set_tok); }
+/*
- This gets called during clone/clone3/fork. And is needed to allocate a shadow stack for
- cases where CLONE_VM is specified and thus a different stack is specified by user. We
- thus need a separate shadow stack too. How does separate shadow stack is specified by
- user is still being debated. Once that's settled, remove this part of the comment.
- This function simply returns 0 if shadow stack are not supported or if separate shadow
- stack allocation is not needed (like in case of !CLONE_VM)
- */
+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
const struct kernel_clone_args *args)
+{
- unsigned long addr, size;
- /* If shadow stack is not supported, return 0 */
- if (!cpu_supports_shadow_stack())
return 0;
- /*
* If shadow stack is not enabled on the new thread, skip any
* switch to a new shadow stack.
*/
- if (is_shstk_enabled(tsk))
return 0;
- /*
* For CLONE_VFORK the child will share the parents shadow stack.
* Set base = 0 and size = 0, this is special means to track this state
* so the freeing logic run for child knows to leave it alone.
*/
- if (args->flags & CLONE_VFORK) {
set_shstk_base(tsk, 0, 0);
return 0;
- }
- /*
* For !CLONE_VM the child will use a copy of the parents shadow
* stack.
*/
- if (!(args->flags & CLONE_VM))
return 0;
- /*
* reaching here means, CLONE_VM was specified and thus a separate shadow
* stack is needed for new cloned thread. Note: below allocation is happening
* using current mm.
*/
- size = calc_shstk_size(args->stack_size);
- addr = allocate_shadow_stack(0, size, 0, false);
- if (IS_ERR_VALUE(addr))
return addr;
- set_shstk_base(tsk, addr, size);
- return addr + size;
+}
+void shstk_release(struct task_struct *tsk) +{
- unsigned long base = 0, size = 0;
- /* If shadow stack is not supported or not enabled, nothing to release */
- if (!cpu_supports_shadow_stack() ||
!is_shstk_enabled(tsk))
return;
- /*
* When fork() with CLONE_VM fails, the child (tsk) already has a
* shadow stack allocated, and exit_thread() calls this function to
* free it. In this case the parent (current) and the child share
* the same mm struct. Move forward only when they're same.
*/
- if (!tsk->mm || tsk->mm != current->mm)
return;
- /*
* We know shadow stack is enabled but if base is NULL, then
* this task is not managing its own shadow stack (CLONE_VFORK). So
* skip freeing it.
*/
- base = get_shstk_base(tsk, &size);
- if (!base)
return;
- vm_munmap(base, size);
- set_shstk_base(tsk, 0, 0);
+}
From: Mark Brown broonie@kernel.org
Three architectures (x86, aarch64, riscv) have announced support for shadow stacks with fairly similar functionality. While x86 is using arch_prctl() to control the functionality neither arm64 nor riscv uses that interface so this patch adds arch-agnostic prctl() support to get and set status of shadow stacks and lock the current configuration to prevent further changes, with support for turning on and off individual subfeatures so applications can limit their exposure to features that they do not need. The features are:
- PR_SHADOW_STACK_ENABLE: Tracking and enforcement of shadow stacks, including allocation of a shadow stack if one is not already allocated. - PR_SHADOW_STACK_WRITE: Writes to specific addresses in the shadow stack. - PR_SHADOW_STACK_PUSH: Push additional values onto the shadow stack. - PR_SHADOW_STACK_DISABLE: Allow to disable shadow stack. Note once locked, disable must fail.
These features are expected to be inherited by new threads and cleared on exec(), unknown features should be rejected for enable but accepted for locking (in order to allow for future proofing).
This is based on a patch originally written by Deepak Gupta but later modified by Mark Brown for arm's GCS patch series.
Signed-off-by: Mark Brown broonie@kernel.org Co-developed-by: Deepak Gupta debug@rivosinc.com --- include/linux/mm.h | 3 +++ include/uapi/linux/prctl.h | 22 ++++++++++++++++++++++ kernel/sys.c | 30 ++++++++++++++++++++++++++++++ 3 files changed, 55 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 9952937be659..1d08e1fd2f6a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4201,5 +4201,8 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn)
return range_contains_unaccepted_memory(paddr, paddr + PAGE_SIZE); } +int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status); +int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status); +int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
#endif /* _LINUX_MM_H */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 370ed14b1ae0..3c66ed8f46d8 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -306,4 +306,26 @@ struct prctl_mm_map { # define PR_RISCV_V_VSTATE_CTRL_NEXT_MASK 0xc # define PR_RISCV_V_VSTATE_CTRL_MASK 0x1f
+/* + * Get the current shadow stack configuration for the current thread, + * this will be the value configured via PR_SET_SHADOW_STACK_STATUS. + */ +#define PR_GET_SHADOW_STACK_STATUS 71 + +/* + * Set the current shadow stack configuration. Enabling the shadow + * stack will cause a shadow stack to be allocated for the thread. + */ +#define PR_SET_SHADOW_STACK_STATUS 72 +# define PR_SHADOW_STACK_ENABLE (1UL << 0) +# define PR_SHADOW_STACK_WRITE (1UL << 1) +# define PR_SHADOW_STACK_PUSH (1UL << 2) + +/* + * Prevent further changes to the specified shadow stack + * configuration. All bits may be locked via this call, including + * undefined bits. + */ +#define PR_LOCK_SHADOW_STACK_STATUS 73 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index f8e543f1e38a..242e9f147791 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2315,6 +2315,21 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, return -EINVAL; }
+int __weak arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status) +{ + return -EINVAL; +} + +int __weak arch_set_shadow_stack_status(struct task_struct *t, unsigned long status) +{ + return -EINVAL; +} + +int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status) +{ + return -EINVAL; +} + #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
#ifdef CONFIG_ANON_VMA_NAME @@ -2757,6 +2772,21 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_RISCV_V_GET_CONTROL: error = RISCV_V_GET_CONTROL(); break; + case PR_GET_SHADOW_STACK_STATUS: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = arch_get_shadow_stack_status(me, (unsigned long __user *) arg2); + break; + case PR_SET_SHADOW_STACK_STATUS: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = arch_set_shadow_stack_status(me, arg2); + break; + case PR_LOCK_SHADOW_STACK_STATUS: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = arch_lock_shadow_stack_status(me, arg2); + break; default: error = -EINVAL; break;
Three architectures (x86, aarch64, riscv) have support for indirect branch tracking feature in a very similar fashion. On a very high level, indirect branch tracking is a CPU feature where CPU tracks branches which uses memory operand to perform control transfer in program. As part of this tracking on indirect branches, CPU goes in a state where it expects a landing pad instr on target and if not found then CPU raises some fault (architecture dependent)
x86 landing pad instr - `ENDBRANCH` aarch64 landing pad instr - `BTI` riscv landing instr - `lpad`
Given that three major arches have support for indirect branch tracking, This patch makes `prctl` for indirect branch tracking arch agnostic.
To allow userspace to enable this feature for itself, following prtcls are defined: - PR_GET_INDIR_BR_LP_STATUS: Gets current configured status for indirect branch tracking. - PR_SET_INDIR_BR_LP_STATUS: Sets a configuration for indirect branch tracking. Following status options are allowed - PR_INDIR_BR_LP_ENABLE: Enables indirect branch tracking on user thread. - PR_INDIR_BR_LP_DISABLE; Disables indirect branch tracking on user thread. - PR_LOCK_INDIR_BR_LP_STATUS: Locks configured status for indirect branch tracking for user thread.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- include/uapi/linux/prctl.h | 27 +++++++++++++++++++++++++++ kernel/sys.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 3c66ed8f46d8..b7a8212a068e 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -328,4 +328,31 @@ struct prctl_mm_map { */ #define PR_LOCK_SHADOW_STACK_STATUS 73
+/* + * Get the current indirect branch tracking configuration for the current + * thread, this will be the value configured via PR_SET_INDIR_BR_LP_STATUS. + */ +#define PR_GET_INDIR_BR_LP_STATUS 74 + +/* + * Set the indirect branch tracking configuration. PR_INDIR_BR_LP_ENABLE will + * enable cpu feature for user thread, to track all indirect branches and ensure + * they land on arch defined landing pad instruction. + * x86 - If enabled, an indirect branch must land on `ENDBRANCH` instruction. + * arch64 - If enabled, an indirect branch must land on `BTI` instruction. + * riscv - If enabled, an indirect branch must land on `lpad` instruction. + * PR_INDIR_BR_LP_DISABLE will disable feature for user thread and indirect + * branches will no more be tracked by cpu to land on arch defined landing pad + * instruction. + */ +#define PR_SET_INDIR_BR_LP_STATUS 75 +# define PR_INDIR_BR_LP_ENABLE (1UL << 0) + +/* + * Prevent further changes to the specified indirect branch tracking + * configuration. All bits may be locked via this call, including + * undefined bits. + */ +#define PR_LOCK_INDIR_BR_LP_STATUS 76 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 242e9f147791..c770060c3f06 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2330,6 +2330,21 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st return -EINVAL; }
+int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{ + return -EINVAL; +} + +int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{ + return -EINVAL; +} + +int __weak arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{ + return -EINVAL; +} + #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
#ifdef CONFIG_ANON_VMA_NAME @@ -2787,6 +2802,21 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, return -EINVAL; error = arch_lock_shadow_stack_status(me, arg2); break; + case PR_GET_INDIR_BR_LP_STATUS: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = arch_get_indir_br_lp_status(me, (unsigned long __user *) arg2); + break; + case PR_SET_INDIR_BR_LP_STATUS: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = arch_set_indir_br_lp_status(me, (unsigned long __user *) arg2); + break; + case PR_LOCK_INDIR_BR_LP_STATUS: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = arch_lock_indir_br_lp_status(me, (unsigned long __user *) arg2); + break; default: error = -EINVAL; break;
On Wed, Apr 03, 2024 at 04:35:05PM -0700, Deepak Gupta wrote:
Three architectures (x86, aarch64, riscv) have support for indirect branch tracking feature in a very similar fashion. On a very high level, indirect branch tracking is a CPU feature where CPU tracks branches which uses memory operand to perform control transfer in program. As part of this tracking on indirect branches, CPU goes in a state where it expects a landing pad instr on target and if not found then CPU raises some fault (architecture dependent)
x86 landing pad instr - `ENDBRANCH` aarch64 landing pad instr - `BTI` riscv landing instr - `lpad`
Given that three major arches have support for indirect branch tracking, This patch makes `prctl` for indirect branch tracking arch agnostic.
To allow userspace to enable this feature for itself, following prtcls are defined:
- PR_GET_INDIR_BR_LP_STATUS: Gets current configured status for indirect branch tracking.
- PR_SET_INDIR_BR_LP_STATUS: Sets a configuration for indirect branch tracking. Following status options are allowed - PR_INDIR_BR_LP_ENABLE: Enables indirect branch tracking on user thread. - PR_INDIR_BR_LP_DISABLE; Disables indirect branch tracking on user thread.
- PR_LOCK_INDIR_BR_LP_STATUS: Locks configured status for indirect branch tracking for user thread.
Signed-off-by: Deepak Gupta debug@rivosinc.com
include/uapi/linux/prctl.h | 27 +++++++++++++++++++++++++++ kernel/sys.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 3c66ed8f46d8..b7a8212a068e 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -328,4 +328,31 @@ struct prctl_mm_map { */ #define PR_LOCK_SHADOW_STACK_STATUS 73 +/*
- Get the current indirect branch tracking configuration for the current
- thread, this will be the value configured via PR_SET_INDIR_BR_LP_STATUS.
- */
+#define PR_GET_INDIR_BR_LP_STATUS 74
+/*
- Set the indirect branch tracking configuration. PR_INDIR_BR_LP_ENABLE will
- enable cpu feature for user thread, to track all indirect branches and ensure
- they land on arch defined landing pad instruction.
- x86 - If enabled, an indirect branch must land on `ENDBRANCH` instruction.
- arch64 - If enabled, an indirect branch must land on `BTI` instruction.
- riscv - If enabled, an indirect branch must land on `lpad` instruction.
- PR_INDIR_BR_LP_DISABLE will disable feature for user thread and indirect
- branches will no more be tracked by cpu to land on arch defined landing pad
- instruction.
- */
+#define PR_SET_INDIR_BR_LP_STATUS 75 +# define PR_INDIR_BR_LP_ENABLE (1UL << 0)
+/*
- Prevent further changes to the specified indirect branch tracking
- configuration. All bits may be locked via this call, including
- undefined bits.
- */
+#define PR_LOCK_INDIR_BR_LP_STATUS 76
#endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 242e9f147791..c770060c3f06 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2330,6 +2330,21 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st return -EINVAL; } +int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{
- return -EINVAL;
+}
+int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{
- return -EINVAL;
+}
+int __weak arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{
- return -EINVAL;
+}
These weak references each cause a warning:
kernel/sys.c:2333:12: warning: no previous prototype for 'arch_get_indir_br_lp_status' [-Wmissing-prototypes] 2333 | int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/sys.c:2338:12: warning: no previous prototype for 'arch_set_indir_br_lp_status' [-Wmissing-prototypes] 2338 | int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/sys.c:2343:12: warning: no previous prototype for 'arch_lock_indir_br_lp_status' [-Wmissing-prototypes] 2343 | int __weak arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
Can the definitions be added to include/linux/mm.h alongside the *_shadow_stack_status() definitions?
- Charlie
#define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE) #ifdef CONFIG_ANON_VMA_NAME @@ -2787,6 +2802,21 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, return -EINVAL; error = arch_lock_shadow_stack_status(me, arg2); break;
- case PR_GET_INDIR_BR_LP_STATUS:
if (arg3 || arg4 || arg5)
return -EINVAL;
error = arch_get_indir_br_lp_status(me, (unsigned long __user *) arg2);
break;
- case PR_SET_INDIR_BR_LP_STATUS:
if (arg3 || arg4 || arg5)
return -EINVAL;
error = arch_set_indir_br_lp_status(me, (unsigned long __user *) arg2);
break;
- case PR_LOCK_INDIR_BR_LP_STATUS:
if (arg3 || arg4 || arg5)
return -EINVAL;
error = arch_lock_indir_br_lp_status(me, (unsigned long __user *) arg2);
default: error = -EINVAL; break;break;
-- 2.43.2
On Fri, May 10, 2024 at 04:29:19PM -0700, Charlie Jenkins wrote:
On Wed, Apr 03, 2024 at 04:35:05PM -0700, Deepak Gupta wrote:
Three architectures (x86, aarch64, riscv) have support for indirect branch tracking feature in a very similar fashion. On a very high level, indirect branch tracking is a CPU feature where CPU tracks branches which uses memory operand to perform control transfer in program. As part of this tracking on indirect branches, CPU goes in a state where it expects a landing pad instr on target and if not found then CPU raises some fault (architecture dependent)
x86 landing pad instr - `ENDBRANCH` aarch64 landing pad instr - `BTI` riscv landing instr - `lpad`
Given that three major arches have support for indirect branch tracking, This patch makes `prctl` for indirect branch tracking arch agnostic.
To allow userspace to enable this feature for itself, following prtcls are defined:
- PR_GET_INDIR_BR_LP_STATUS: Gets current configured status for indirect branch tracking.
- PR_SET_INDIR_BR_LP_STATUS: Sets a configuration for indirect branch tracking. Following status options are allowed - PR_INDIR_BR_LP_ENABLE: Enables indirect branch tracking on user thread. - PR_INDIR_BR_LP_DISABLE; Disables indirect branch tracking on user thread.
- PR_LOCK_INDIR_BR_LP_STATUS: Locks configured status for indirect branch tracking for user thread.
Signed-off-by: Deepak Gupta debug@rivosinc.com
include/uapi/linux/prctl.h | 27 +++++++++++++++++++++++++++ kernel/sys.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 3c66ed8f46d8..b7a8212a068e 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -328,4 +328,31 @@ struct prctl_mm_map { */ #define PR_LOCK_SHADOW_STACK_STATUS 73
+/*
- Get the current indirect branch tracking configuration for the current
- thread, this will be the value configured via PR_SET_INDIR_BR_LP_STATUS.
- */
+#define PR_GET_INDIR_BR_LP_STATUS 74
+/*
- Set the indirect branch tracking configuration. PR_INDIR_BR_LP_ENABLE will
- enable cpu feature for user thread, to track all indirect branches and ensure
- they land on arch defined landing pad instruction.
- x86 - If enabled, an indirect branch must land on `ENDBRANCH` instruction.
- arch64 - If enabled, an indirect branch must land on `BTI` instruction.
- riscv - If enabled, an indirect branch must land on `lpad` instruction.
- PR_INDIR_BR_LP_DISABLE will disable feature for user thread and indirect
- branches will no more be tracked by cpu to land on arch defined landing pad
- instruction.
- */
+#define PR_SET_INDIR_BR_LP_STATUS 75 +# define PR_INDIR_BR_LP_ENABLE (1UL << 0)
+/*
- Prevent further changes to the specified indirect branch tracking
- configuration. All bits may be locked via this call, including
- undefined bits.
- */
+#define PR_LOCK_INDIR_BR_LP_STATUS 76
#endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 242e9f147791..c770060c3f06 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2330,6 +2330,21 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st return -EINVAL; }
+int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{
- return -EINVAL;
+}
+int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{
- return -EINVAL;
+}
+int __weak arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{
- return -EINVAL;
+}
These weak references each cause a warning:
kernel/sys.c:2333:12: warning: no previous prototype for 'arch_get_indir_br_lp_status' [-Wmissing-prototypes] 2333 | int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/sys.c:2338:12: warning: no previous prototype for 'arch_set_indir_br_lp_status' [-Wmissing-prototypes] 2338 | int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/sys.c:2343:12: warning: no previous prototype for 'arch_lock_indir_br_lp_status' [-Wmissing-prototypes] 2343 | int __weak arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
Can the definitions be added to include/linux/mm.h alongside the *_shadow_stack_status() definitions?
Noted. Will work on a fix for this.
- Charlie
#define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
#ifdef CONFIG_ANON_VMA_NAME @@ -2787,6 +2802,21 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, return -EINVAL; error = arch_lock_shadow_stack_status(me, arg2); break;
- case PR_GET_INDIR_BR_LP_STATUS:
if (arg3 || arg4 || arg5)
return -EINVAL;
error = arch_get_indir_br_lp_status(me, (unsigned long __user *) arg2);
break;
- case PR_SET_INDIR_BR_LP_STATUS:
if (arg3 || arg4 || arg5)
return -EINVAL;
error = arch_set_indir_br_lp_status(me, (unsigned long __user *) arg2);
break;
- case PR_LOCK_INDIR_BR_LP_STATUS:
if (arg3 || arg4 || arg5)
return -EINVAL;
error = arch_lock_indir_br_lp_status(me, (unsigned long __user *) arg2);
default: error = -EINVAL; break;break;
-- 2.43.2
Implement architecture agnostic prctls() interface for setting and getting shadow stack status.
prctls implemented are PR_GET_SHADOW_STACK_STATUS, PR_SET_SHADOW_STACK_STATUS and PR_LOCK_SHADOW_STACK_STATUS.
As part of PR_SET_SHADOW_STACK_STATUS/PR_GET_SHADOW_STACK_STATUS, only PR_SHADOW_STACK_ENABLE is implemented because RISCV allows each mode to write to their own shadow stack using `sspush` or `ssamoswap`.
PR_LOCK_SHADOW_STACK_STATUS locks current configuration of shadow stack enabling.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/usercfi.h | 18 +++++- arch/riscv/kernel/process.c | 8 +++ arch/riscv/kernel/usercfi.c | 107 +++++++++++++++++++++++++++++++ 3 files changed, 132 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h index b47574a7a8c9..a168ae0fa5d8 100644 --- a/arch/riscv/include/asm/usercfi.h +++ b/arch/riscv/include/asm/usercfi.h @@ -7,6 +7,7 @@
#ifndef __ASSEMBLY__ #include <linux/types.h> +#include <linux/prctl.h>
struct task_struct; struct kernel_clone_args; @@ -14,7 +15,8 @@ struct kernel_clone_args; #ifdef CONFIG_RISCV_USER_CFI struct cfi_status { unsigned long ubcfi_en : 1; /* Enable for backward cfi. */ - unsigned long rsvd : ((sizeof(unsigned long)*8) - 1); + unsigned long ubcfi_locked : 1; + unsigned long rsvd : ((sizeof(unsigned long)*8) - 2); unsigned long user_shdw_stk; /* Current user shadow stack pointer */ unsigned long shdw_stk_base; /* Base address of shadow stack */ unsigned long shdw_stk_size; /* size of shadow stack */ @@ -26,6 +28,10 @@ void shstk_release(struct task_struct *tsk); void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size); void set_active_shstk(struct task_struct *task, unsigned long shstk_addr); bool is_shstk_enabled(struct task_struct *task); +bool is_shstk_locked(struct task_struct *task); +void set_shstk_status(struct task_struct *task, bool enable); + +#define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK (PR_SHADOW_STACK_ENABLE)
#else
@@ -56,6 +62,16 @@ static inline bool is_shstk_enabled(struct task_struct *task) return false; }
+static inline bool is_shstk_locked(struct task_struct *task) +{ + return false; +} + +static inline void set_shstk_status(struct task_struct *task, bool enable) +{ + +} + #endif /* CONFIG_RISCV_USER_CFI */
#endif /* __ASSEMBLY__ */ diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index ef48a25b0eff..3fb8b23f629b 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -145,6 +145,14 @@ void start_thread(struct pt_regs *regs, unsigned long pc, regs->epc = pc; regs->sp = sp;
+ /* + * clear shadow stack state on exec. + * libc will set it later via prctl. + */ + set_shstk_status(current, false); + set_shstk_base(current, 0, 0); + set_active_shstk(current, 0); + #ifdef CONFIG_64BIT regs->status &= ~SR_UXL;
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c index 11ef7ab925c9..cdedf1f78b3e 100644 --- a/arch/riscv/kernel/usercfi.c +++ b/arch/riscv/kernel/usercfi.c @@ -24,6 +24,16 @@ bool is_shstk_enabled(struct task_struct *task) return task->thread_info.user_cfi_state.ubcfi_en ? true : false; }
+bool is_shstk_allocated(struct task_struct *task) +{ + return task->thread_info.user_cfi_state.shdw_stk_base ? true : false; +} + +bool is_shstk_locked(struct task_struct *task) +{ + return task->thread_info.user_cfi_state.ubcfi_locked ? true : false; +} + void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size) { task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr; @@ -42,6 +52,23 @@ void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr; }
+void set_shstk_status(struct task_struct *task, bool enable) +{ + task->thread_info.user_cfi_state.ubcfi_en = enable ? 1 : 0; + + if (enable) + task->thread_info.envcfg |= ENVCFG_SSE; + else + task->thread_info.envcfg &= ~ENVCFG_SSE; + + csr_write(CSR_ENVCFG, task->thread_info.envcfg); +} + +void set_shstk_lock(struct task_struct *task) +{ + task->thread_info.user_cfi_state.ubcfi_locked = 1; +} + /* * If size is 0, then to be compatible with regular stack we want it to be as big as * regular stack. Else PAGE_ALIGN it and return back @@ -268,3 +295,83 @@ void shstk_release(struct task_struct *tsk) vm_munmap(base, size); set_shstk_base(tsk, 0, 0); } + +int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status) +{ + unsigned long bcfi_status = 0; + + if (!cpu_supports_shadow_stack()) + return -EINVAL; + + /* this means shadow stack is enabled on the task */ + bcfi_status |= (is_shstk_enabled(t) ? PR_SHADOW_STACK_ENABLE : 0); + + return copy_to_user(status, &bcfi_status, sizeof(bcfi_status)) ? -EFAULT : 0; +} + +int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status) +{ + unsigned long size = 0, addr = 0; + bool enable_shstk = false; + + if (!cpu_supports_shadow_stack()) + return -EINVAL; + + /* Reject unknown flags */ + if (status & ~PR_SHADOW_STACK_SUPPORTED_STATUS_MASK) + return -EINVAL; + + /* bcfi status is locked and further can't be modified by user */ + if (is_shstk_locked(t)) + return -EINVAL; + + enable_shstk = status & PR_SHADOW_STACK_ENABLE; + /* Request is to enable shadow stack and shadow stack is not enabled already */ + if (enable_shstk && !is_shstk_enabled(t)) { + /* shadow stack was allocated and enable request again + * no need to support such usecase and return EINVAL. + */ + if (is_shstk_allocated(t)) + return -EINVAL; + + size = calc_shstk_size(0); + addr = allocate_shadow_stack(0, size, 0, false); + if (IS_ERR_VALUE(addr)) + return -ENOMEM; + set_shstk_base(t, addr, size); + set_active_shstk(t, addr + size); + } + + /* + * If a request to disable shadow stack happens, let's go ahead and release it + * Although, if CLONE_VFORKed child did this, then in that case we will end up + * not releasing the shadow stack (because it might be needed in parent). Although + * we will disable it for VFORKed child. And if VFORKed child tries to enable again + * then in that case, it'll get entirely new shadow stack because following condition + * are true + * - shadow stack was not enabled for vforked child + * - shadow stack base was anyways pointing to 0 + * This shouldn't be a big issue because we want parent to have availability of shadow + * stack whenever VFORKed child releases resources via exit or exec but at the same + * time we want VFORKed child to break away and establish new shadow stack if it desires + * + */ + if (!enable_shstk) + shstk_release(t); + + set_shstk_status(t, enable_shstk); + return 0; +} + +int arch_lock_shadow_stack_status(struct task_struct *task, + unsigned long arg) +{ + /* If shtstk not supported or not enabled on task, nothing to lock here */ + if (!cpu_supports_shadow_stack() || + !is_shstk_enabled(task)) + return -EINVAL; + + set_shstk_lock(task); + + return 0; +}
prctls implemented are: PR_SET_INDIR_BR_LP_STATUS, PR_GET_INDIR_BR_LP_STATUS and PR_LOCK_INDIR_BR_LP_STATUS.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/usercfi.h | 22 ++++++++- arch/riscv/kernel/process.c | 5 +++ arch/riscv/kernel/usercfi.c | 76 ++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h index a168ae0fa5d8..8accdc8ec164 100644 --- a/arch/riscv/include/asm/usercfi.h +++ b/arch/riscv/include/asm/usercfi.h @@ -16,7 +16,9 @@ struct kernel_clone_args; struct cfi_status { unsigned long ubcfi_en : 1; /* Enable for backward cfi. */ unsigned long ubcfi_locked : 1; - unsigned long rsvd : ((sizeof(unsigned long)*8) - 2); + unsigned long ufcfi_en : 1; /* Enable for forward cfi. Note that ELP goes in sstatus */ + unsigned long ufcfi_locked : 1; + unsigned long rsvd : ((sizeof(unsigned long)*8) - 4); unsigned long user_shdw_stk; /* Current user shadow stack pointer */ unsigned long shdw_stk_base; /* Base address of shadow stack */ unsigned long shdw_stk_size; /* size of shadow stack */ @@ -30,6 +32,9 @@ void set_active_shstk(struct task_struct *task, unsigned long shstk_addr); bool is_shstk_enabled(struct task_struct *task); bool is_shstk_locked(struct task_struct *task); void set_shstk_status(struct task_struct *task, bool enable); +bool is_indir_lp_enabled(struct task_struct *task); +bool is_indir_lp_locked(struct task_struct *task); +void set_indir_lp_status(struct task_struct *task, bool enable);
#define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK (PR_SHADOW_STACK_ENABLE)
@@ -72,6 +77,21 @@ static inline void set_shstk_status(struct task_struct *task, bool enable)
}
+static inline bool is_indir_lp_enabled(struct task_struct *task) +{ + return false; +} + +static inline bool is_indir_lp_locked(struct task_struct *task) +{ + return false; +} + +static inline void set_indir_lp_status(struct task_struct *task, bool enable) +{ + +} + #endif /* CONFIG_RISCV_USER_CFI */
#endif /* __ASSEMBLY__ */ diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index 3fb8b23f629b..ebed7589c51a 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -152,6 +152,11 @@ void start_thread(struct pt_regs *regs, unsigned long pc, set_shstk_status(current, false); set_shstk_base(current, 0, 0); set_active_shstk(current, 0); + /* + * disable indirect branch tracking on exec. + * libc will enable it later via prctl. + */ + set_indir_lp_status(current, false);
#ifdef CONFIG_64BIT regs->status &= ~SR_UXL; diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c index cdedf1f78b3e..13920b9d86f3 100644 --- a/arch/riscv/kernel/usercfi.c +++ b/arch/riscv/kernel/usercfi.c @@ -69,6 +69,32 @@ void set_shstk_lock(struct task_struct *task) task->thread_info.user_cfi_state.ubcfi_locked = 1; }
+bool is_indir_lp_enabled(struct task_struct *task) +{ + return task->thread_info.user_cfi_state.ufcfi_en ? true : false; +} + +bool is_indir_lp_locked(struct task_struct *task) +{ + return task->thread_info.user_cfi_state.ufcfi_locked ? true : false; +} + +void set_indir_lp_status(struct task_struct *task, bool enable) +{ + task->thread_info.user_cfi_state.ufcfi_en = enable ? 1 : 0; + + if (enable) + task->thread_info.envcfg |= ENVCFG_LPE; + else + task->thread_info.envcfg &= ~ENVCFG_LPE; + + csr_write(CSR_ENVCFG, task->thread_info.envcfg); +} + +void set_indir_lp_lock(struct task_struct *task) +{ + task->thread_info.user_cfi_state.ufcfi_locked = 1; +} /* * If size is 0, then to be compatible with regular stack we want it to be as big as * regular stack. Else PAGE_ALIGN it and return back @@ -375,3 +401,53 @@ int arch_lock_shadow_stack_status(struct task_struct *task,
return 0; } + +int arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{ + unsigned long fcfi_status = 0; + + if (!cpu_supports_indirect_br_lp_instr()) + return -EINVAL; + + /* indirect branch tracking is enabled on the task or not */ + fcfi_status |= (is_indir_lp_enabled(t) ? PR_INDIR_BR_LP_ENABLE : 0); + + return copy_to_user(status, &fcfi_status, sizeof(fcfi_status)) ? -EFAULT : 0; +} + +int arch_set_indir_br_lp_status(struct task_struct *t, unsigned long status) +{ + bool enable_indir_lp = false; + + if (!cpu_supports_indirect_br_lp_instr()) + return -EINVAL; + + /* indirect branch tracking is locked and further can't be modified by user */ + if (is_indir_lp_locked(t)) + return -EINVAL; + + /* Reject unknown flags */ + if (status & ~PR_INDIR_BR_LP_ENABLE) + return -EINVAL; + + enable_indir_lp = (status & PR_INDIR_BR_LP_ENABLE) ? true : false; + set_indir_lp_status(t, enable_indir_lp); + + return 0; +} + +int arch_lock_indir_br_lp_status(struct task_struct *task, + unsigned long arg) +{ + /* + * If indirect branch tracking is not supported or not enabled on task, + * nothing to lock here + */ + if (!cpu_supports_indirect_br_lp_instr() || + !is_indir_lp_enabled(task)) + return -EINVAL; + + set_indir_lp_lock(task); + + return 0; +}
Updating __show_regs to print captured shadow stack pointer as well. On tasks where shadow stack is disabled, it'll simply print 0.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/kernel/process.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index ebed7589c51a..079fd6cd6446 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -89,8 +89,8 @@ void __show_regs(struct pt_regs *regs) regs->s8, regs->s9, regs->s10); pr_cont(" s11: " REG_FMT " t3 : " REG_FMT " t4 : " REG_FMT "\n", regs->s11, regs->t3, regs->t4); - pr_cont(" t5 : " REG_FMT " t6 : " REG_FMT "\n", - regs->t5, regs->t6); + pr_cont(" t5 : " REG_FMT " t6 : " REG_FMT " ssp : " REG_FMT "\n", + regs->t5, regs->t6, get_active_shstk(current));
pr_cont("status: " REG_FMT " badaddr: " REG_FMT " cause: " REG_FMT "\n", regs->status, regs->badaddr, regs->cause);
On 04/04/2024 01:35, Deepak Gupta wrote:
Updating __show_regs to print captured shadow stack pointer as well. On tasks where shadow stack is disabled, it'll simply print 0.
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/kernel/process.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index ebed7589c51a..079fd6cd6446 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -89,8 +89,8 @@ void __show_regs(struct pt_regs *regs) regs->s8, regs->s9, regs->s10); pr_cont(" s11: " REG_FMT " t3 : " REG_FMT " t4 : " REG_FMT "\n", regs->s11, regs->t3, regs->t4);
- pr_cont(" t5 : " REG_FMT " t6 : " REG_FMT "\n",
regs->t5, regs->t6);
- pr_cont(" t5 : " REG_FMT " t6 : " REG_FMT " ssp : " REG_FMT "\n",
regs->t5, regs->t6, get_active_shstk(current));
pr_cont("status: " REG_FMT " badaddr: " REG_FMT " cause: " REG_FMT "\n", regs->status, regs->badaddr, regs->cause);
Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com
zicfiss / zicfilp introduces a new exception to priv isa `software check exception` with cause code = 18. This patch implements software check exception.
Additionally it implements a cfi violation handler which checks for code in xtval. If xtval=2, it means that sw check exception happened because of an indirect branch not landing on 4 byte aligned PC or not landing on `lpad` instruction or label value embedded in `lpad` not matching label value setup in `x7`. If xtval=3, it means that sw check exception happened because of mismatch between link register (x1 or x5) and top of shadow stack (on execution of `sspopchk`).
In case of cfi violation, SIGSEGV is raised with code=SEGV_CPERR. SEGV_CPERR was introduced by x86 shadow stack patches.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/kernel/entry.S | 3 ++ arch/riscv/kernel/traps.c | 38 +++++++++++++++++++++++++ 3 files changed, 42 insertions(+)
diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h index cd627ec289f1..5a27cefd7805 100644 --- a/arch/riscv/include/asm/asm-prototypes.h +++ b/arch/riscv/include/asm/asm-prototypes.h @@ -51,6 +51,7 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_u); DECLARE_DO_ERROR_INFO(do_trap_ecall_s); DECLARE_DO_ERROR_INFO(do_trap_ecall_m); DECLARE_DO_ERROR_INFO(do_trap_break); +DECLARE_DO_ERROR_INFO(do_trap_software_check);
asmlinkage void handle_bad_stack(struct pt_regs *regs); asmlinkage void do_page_fault(struct pt_regs *regs); diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index 7245a0ea25c1..f97af4ff5237 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -374,6 +374,9 @@ SYM_DATA_START_LOCAL(excp_vect_table) RISCV_PTR do_page_fault /* load page fault */ RISCV_PTR do_trap_unknown RISCV_PTR do_page_fault /* store page fault */ + RISCV_PTR do_trap_unknown /* cause=16 */ + RISCV_PTR do_trap_unknown /* cause=17 */ + RISCV_PTR do_trap_software_check /* cause=18 is sw check exception */ SYM_DATA_END_LABEL(excp_vect_table, SYM_L_LOCAL, excp_vect_table_end)
#ifndef CONFIG_MMU diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index a1b9be3c4332..9fba263428a1 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -339,6 +339,44 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
}
+#define CFI_TVAL_FCFI_CODE 2 +#define CFI_TVAL_BCFI_CODE 3 +/* handle cfi violations */ +bool handle_user_cfi_violation(struct pt_regs *regs) +{ + bool ret = false; + unsigned long tval = csr_read(CSR_TVAL); + + if (((tval == CFI_TVAL_FCFI_CODE) && cpu_supports_indirect_br_lp_instr()) || + ((tval == CFI_TVAL_BCFI_CODE) && cpu_supports_shadow_stack())) { + do_trap_error(regs, SIGSEGV, SEGV_CPERR, regs->epc, + "Oops - control flow violation"); + ret = true; + } + + return ret; +} +/* + * software check exception is defined with risc-v cfi spec. Software check + * exception is raised when:- + * a) An indirect branch doesn't land on 4 byte aligned PC or `lpad` + * instruction or `label` value programmed in `lpad` instr doesn't + * match with value setup in `x7`. reported code in `xtval` is 2. + * b) `sspopchk` instruction finds a mismatch between top of shadow stack (ssp) + * and x1/x5. reported code in `xtval` is 3. + */ +asmlinkage __visible __trap_section void do_trap_software_check(struct pt_regs *regs) +{ + if (user_mode(regs)) { + /* not a cfi violation, then merge into flow of unknown trap handler */ + if (!handle_user_cfi_violation(regs)) + do_trap_unknown(regs); + } else { + /* sw check exception coming from kernel is a bug in kernel */ + die(regs, "Kernel BUG"); + } +} + #ifdef CONFIG_MMU asmlinkage __visible noinstr void do_page_fault(struct pt_regs *regs) {
Shadow stack needs to be saved and restored on signal delivery and signal return.
sigcontext embedded in ucontext is extendible. Adding cfi state in there which can be used to save cfi state before signal delivery and restore cfi state on sigreturn
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/uapi/asm/sigcontext.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/arch/riscv/include/uapi/asm/sigcontext.h b/arch/riscv/include/uapi/asm/sigcontext.h index cd4f175dc837..5ccdd94a0855 100644 --- a/arch/riscv/include/uapi/asm/sigcontext.h +++ b/arch/riscv/include/uapi/asm/sigcontext.h @@ -21,6 +21,10 @@ struct __sc_riscv_v_state { struct __riscv_v_ext_state v_state; } __attribute__((aligned(16)));
+struct __sc_riscv_cfi_state { + unsigned long ss_ptr; /* shadow stack pointer */ + unsigned long rsvd; /* keeping another word reserved in case we need it */ +}; /* * Signal context structure * @@ -29,6 +33,7 @@ struct __sc_riscv_v_state { */ struct sigcontext { struct user_regs_struct sc_regs; + struct __sc_riscv_cfi_state sc_cfi_state; union { union __riscv_fp_state sc_fpregs; struct __riscv_extra_ext_header sc_extdesc;
Hi Deepak,
On Thu, Apr 4, 2024 at 7:42 AM Deepak Gupta debug@rivosinc.com wrote:
Shadow stack needs to be saved and restored on signal delivery and signal return.
sigcontext embedded in ucontext is extendible. Adding cfi state in there which can be used to save cfi state before signal delivery and restore cfi state on sigreturn
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/uapi/asm/sigcontext.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/arch/riscv/include/uapi/asm/sigcontext.h b/arch/riscv/include/uapi/asm/sigcontext.h index cd4f175dc837..5ccdd94a0855 100644 --- a/arch/riscv/include/uapi/asm/sigcontext.h +++ b/arch/riscv/include/uapi/asm/sigcontext.h @@ -21,6 +21,10 @@ struct __sc_riscv_v_state { struct __riscv_v_ext_state v_state; } __attribute__((aligned(16)));
+struct __sc_riscv_cfi_state {
unsigned long ss_ptr; /* shadow stack pointer */
unsigned long rsvd; /* keeping another word reserved in case we need it */
+}; /*
- Signal context structure
@@ -29,6 +33,7 @@ struct __sc_riscv_v_state { */ struct sigcontext { struct user_regs_struct sc_regs;
struct __sc_riscv_cfi_state sc_cfi_state;
I am concerned about this change as this could potentially break uabi. Let's say there is a pre-CFI program running on this kernel. It receives a signal so the kernel lays out the sig-stack as presented in this structure. If the program accesses sc_fpregs, it would now get sc_cfi_state. As the offset has changed, and the pre-CFI program has not been re-compiled.
union { union __riscv_fp_state sc_fpregs; struct __riscv_extra_ext_header sc_extdesc;
-- 2.43.2
There may be two ways to deal with this. One is to use a different signal ABI for CFI-enabled programs. This may complicate the user space because new programs will have to determine whether it should use the CFI-ABI at run time. Another way is to follow what Vector does for signal stack. It adds a way to introduce new extensions on signal stack without impacting ABI.
Please let me know if I misunderstand anything, thanks.
Cheers, Andy
On Fri, May 24, 2024 at 05:46:16PM +0800, Andy Chiu wrote:
Hi Deepak,
On Thu, Apr 4, 2024 at 7:42 AM Deepak Gupta debug@rivosinc.com wrote:
Shadow stack needs to be saved and restored on signal delivery and signal return.
sigcontext embedded in ucontext is extendible. Adding cfi state in there which can be used to save cfi state before signal delivery and restore cfi state on sigreturn
Signed-off-by: Deepak Gupta debug@rivosinc.com
arch/riscv/include/uapi/asm/sigcontext.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/arch/riscv/include/uapi/asm/sigcontext.h b/arch/riscv/include/uapi/asm/sigcontext.h index cd4f175dc837..5ccdd94a0855 100644 --- a/arch/riscv/include/uapi/asm/sigcontext.h +++ b/arch/riscv/include/uapi/asm/sigcontext.h @@ -21,6 +21,10 @@ struct __sc_riscv_v_state { struct __riscv_v_ext_state v_state; } __attribute__((aligned(16)));
+struct __sc_riscv_cfi_state {
unsigned long ss_ptr; /* shadow stack pointer */
unsigned long rsvd; /* keeping another word reserved in case we need it */
+}; /*
- Signal context structure
@@ -29,6 +33,7 @@ struct __sc_riscv_v_state { */ struct sigcontext { struct user_regs_struct sc_regs;
struct __sc_riscv_cfi_state sc_cfi_state;
I am concerned about this change as this could potentially break uabi. Let's say there is a pre-CFI program running on this kernel. It receives a signal so the kernel lays out the sig-stack as presented in this structure. If the program accesses sc_fpregs, it would now get sc_cfi_state. As the offset has changed, and the pre-CFI program has not been re-compiled.
Yeah this is a problem if program was built with older kernel/old toolchain (or cfi unaware toolchain). Thanks.
union { union __riscv_fp_state sc_fpregs; struct __riscv_extra_ext_header sc_extdesc;
-- 2.43.2
There may be two ways to deal with this. One is to use a different signal ABI for CFI-enabled programs. This may complicate the user space because new programs will have to determine whether it should use the CFI-ABI at run time. Another way is to follow what Vector does for signal stack. It adds a way to introduce new extensions on signal stack without impacting ABI.
Please let me know if I misunderstand anything, thanks.
I think following how vector does would be cleaner. Let me munch on this a little bit.
Cheers, Andy
Save shadow stack pointer in sigcontext structure while delivering signal. Restore shadow stack pointer from sigcontext on sigreturn.
As part of save operation, kernel uses `ssamoswap` to save snapshot of current shadow stack on shadow stack itself (can be called as a save token). During restore on sigreturn, kernel retrieves token from top of shadow stack and validates it. This allows that user mode can't arbitrary pivot to any shadow stack address without having a token and thus provide strong security assurance between signaly delivery and sigreturn window.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/usercfi.h | 19 +++++++++++ arch/riscv/kernel/signal.c | 45 +++++++++++++++++++++++++ arch/riscv/kernel/usercfi.c | 57 ++++++++++++++++++++++++++++++++ 3 files changed, 121 insertions(+)
diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h index 8accdc8ec164..507a27d5f53c 100644 --- a/arch/riscv/include/asm/usercfi.h +++ b/arch/riscv/include/asm/usercfi.h @@ -8,6 +8,7 @@ #ifndef __ASSEMBLY__ #include <linux/types.h> #include <linux/prctl.h> +#include <linux/errno.h>
struct task_struct; struct kernel_clone_args; @@ -35,6 +36,9 @@ void set_shstk_status(struct task_struct *task, bool enable); bool is_indir_lp_enabled(struct task_struct *task); bool is_indir_lp_locked(struct task_struct *task); void set_indir_lp_status(struct task_struct *task, bool enable); +unsigned long get_active_shstk(struct task_struct *task); +int restore_user_shstk(struct task_struct *tsk, unsigned long shstk_ptr); +int save_user_shstk(struct task_struct *tsk, unsigned long *saved_shstk_ptr);
#define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK (PR_SHADOW_STACK_ENABLE)
@@ -77,6 +81,16 @@ static inline void set_shstk_status(struct task_struct *task, bool enable)
}
+static inline int restore_user_shstk(struct task_struct *tsk, unsigned long shstk_ptr) +{ + return -EINVAL; +} + +static inline int save_user_shstk(struct task_struct *tsk, unsigned long *saved_shstk_ptr) +{ + return -EINVAL; +} + static inline bool is_indir_lp_enabled(struct task_struct *task) { return false; @@ -92,6 +106,11 @@ static inline void set_indir_lp_status(struct task_struct *task, bool enable)
}
+static inline unsigned long get_active_shstk(struct task_struct *task) +{ + return 0; +} + #endif /* CONFIG_RISCV_USER_CFI */
#endif /* __ASSEMBLY__ */ diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c index 501e66debf69..428a886ab6ef 100644 --- a/arch/riscv/kernel/signal.c +++ b/arch/riscv/kernel/signal.c @@ -22,6 +22,7 @@ #include <asm/vector.h> #include <asm/csr.h> #include <asm/cacheflush.h> +#include <asm/usercfi.h>
unsigned long signal_minsigstksz __ro_after_init;
@@ -232,6 +233,7 @@ SYSCALL_DEFINE0(rt_sigreturn) struct pt_regs *regs = current_pt_regs(); struct rt_sigframe __user *frame; struct task_struct *task; + unsigned long ss_ptr = 0; sigset_t set; size_t frame_size = get_rt_frame_size(false);
@@ -254,6 +256,26 @@ SYSCALL_DEFINE0(rt_sigreturn) if (restore_altstack(&frame->uc.uc_stack)) goto badframe;
+ /* + * Restore shadow stack as a form of token stored on shadow stack itself as a safe + * way to restore. + * A token on shadow gives following properties + * - Safe save and restore for shadow stack switching. Any save of shadow stack + * must have had saved a token on shadow stack. Similarly any restore of shadow + * stack must check the token before restore. Since writing to shadow stack with + * address of shadow stack itself is not easily allowed. A restore without a save + * is quite difficult for an attacker to perform. + * - A natural break. A token in shadow stack provides a natural break in shadow stack + * So a single linear range can be bucketed into different shadow stack segments. + * sspopchk will detect the condition and fault to kernel as sw check exception. + */ + if (__copy_from_user(&ss_ptr, &frame->uc.uc_mcontext.sc_cfi_state.ss_ptr, + sizeof(unsigned long))) + goto badframe; + + if (is_shstk_enabled(current) && restore_user_shstk(current, ss_ptr)) + goto badframe; + regs->cause = -1UL;
return regs->a0; @@ -323,6 +345,7 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct rt_sigframe __user *frame; long err = 0; unsigned long __maybe_unused addr; + unsigned long ss_ptr = 0; size_t frame_size = get_rt_frame_size(false);
frame = get_sigframe(ksig, regs, frame_size); @@ -334,6 +357,23 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set, /* Create the ucontext. */ err |= __put_user(0, &frame->uc.uc_flags); err |= __put_user(NULL, &frame->uc.uc_link); + /* + * Save a pointer to shadow stack itself on shadow stack as a form of token. + * A token on shadow gives following properties + * - Safe save and restore for shadow stack switching. Any save of shadow stack + * must have had saved a token on shadow stack. Similarly any restore of shadow + * stack must check the token before restore. Since writing to shadow stack with + * address of shadow stack itself is not easily allowed. A restore without a save + * is quite difficult for an attacker to perform. + * - A natural break. A token in shadow stack provides a natural break in shadow stack + * So a single linear range can be bucketed into different shadow stack segments. Any + * sspopchk will detect the condition and fault to kernel as sw check exception. + */ + if (is_shstk_enabled(current)) { + err |= save_user_shstk(current, &ss_ptr); + err |= __put_user(ss_ptr, &frame->uc.uc_mcontext.sc_cfi_state.ss_ptr); + } + err |= __save_altstack(&frame->uc.uc_stack, regs->sp); err |= setup_sigcontext(frame, regs); err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set)); @@ -344,6 +384,11 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set, #ifdef CONFIG_MMU regs->ra = (unsigned long)VDSO_SYMBOL( current->mm->context.vdso, rt_sigreturn); + + /* if bcfi is enabled x1 (ra) and x5 (t0) must match. not sure if we need this? */ + if (is_shstk_enabled(current)) + regs->t0 = regs->ra; + #else /* * For the nommu case we don't have a VDSO. Instead we push two diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c index 13920b9d86f3..db5b32500050 100644 --- a/arch/riscv/kernel/usercfi.c +++ b/arch/riscv/kernel/usercfi.c @@ -52,6 +52,11 @@ void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr; }
+unsigned long get_active_shstk(struct task_struct *task) +{ + return task->thread_info.user_cfi_state.user_shdw_stk; +} + void set_shstk_status(struct task_struct *task, bool enable) { task->thread_info.user_cfi_state.ubcfi_en = enable ? 1 : 0; @@ -168,6 +173,58 @@ static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) return 0; }
+/* + * Save user shadow stack pointer on shadow stack itself and return pointer to saved location + * returns -EFAULT if operation was unsuccessful + */ +int save_user_shstk(struct task_struct *tsk, unsigned long *saved_shstk_ptr) +{ + unsigned long ss_ptr = 0; + unsigned long token_loc = 0; + int ret = 0; + + if (saved_shstk_ptr == NULL) + return -EINVAL; + + ss_ptr = get_active_shstk(tsk); + ret = create_rstor_token(ss_ptr, &token_loc); + + if (!ret) { + *saved_shstk_ptr = token_loc; + set_active_shstk(tsk, token_loc); + } + + return ret; +} + +/* + * Restores user shadow stack pointer from token on shadow stack for task `tsk` + * returns -EFAULT if operation was unsuccessful + */ +int restore_user_shstk(struct task_struct *tsk, unsigned long shstk_ptr) +{ + unsigned long token = 0; + + token = amo_user_shstk((unsigned long __user *)shstk_ptr, 0); + + if (token == -1) + return -EFAULT; + + /* invalid token, return EINVAL */ + if ((token - shstk_ptr) != SHSTK_ENTRY_SIZE) { + pr_info_ratelimited( + "%s[%d]: bad restore token in %s: pc=%p sp=%p, token=%p, shstk_ptr=%p\n", + tsk->comm, task_pid_nr(tsk), __func__, + (void *)(task_pt_regs(tsk)->epc), (void *)(task_pt_regs(tsk)->sp), + (void *)token, (void *)shstk_ptr); + return -EINVAL; + } + + /* all checks passed, set active shstk and return success */ + set_active_shstk(tsk, token); + return 0; +} + static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long size, unsigned long token_offset, bool set_tok)
Expose a new register type NT_RISCV_USER_CFI for risc-v cfi status and state. Intentionally both landing pad and shadow stack status and state are rolled into cfi state. Creating two different NT_RISCV_USER_XXX would not be useful and wastage of a note type. Enabling or disabling of feature is not allowed via ptrace set interface. However setting `elp` state or setting shadow stack pointer are allowed via ptrace set interface. It is expected `gdb` might have use to fixup `elp` state or `shadow stack` pointer.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/uapi/asm/ptrace.h | 18 ++++++ arch/riscv/kernel/ptrace.c | 83 ++++++++++++++++++++++++++++ include/uapi/linux/elf.h | 1 + 3 files changed, 102 insertions(+)
diff --git a/arch/riscv/include/uapi/asm/ptrace.h b/arch/riscv/include/uapi/asm/ptrace.h index a38268b19c3d..512be06a8661 100644 --- a/arch/riscv/include/uapi/asm/ptrace.h +++ b/arch/riscv/include/uapi/asm/ptrace.h @@ -127,6 +127,24 @@ struct __riscv_v_regset_state { */ #define RISCV_MAX_VLENB (8192)
+struct __cfi_status { + /* indirect branch tracking state */ + __u64 lp_en : 1; + __u64 lp_lock : 1; + __u64 elp_state : 1; + + /* shadow stack status */ + __u64 shstk_en : 1; + __u64 shstk_lock : 1; + + __u64 rsvd : sizeof(__u64) - 5; +}; + +struct user_cfi_state { + struct __cfi_status cfi_status; + __u64 shstk_ptr; +}; + #endif /* __ASSEMBLY__ */
#endif /* _UAPI_ASM_RISCV_PTRACE_H */ diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c index e8515aa9d80b..33d4b32cc6a7 100644 --- a/arch/riscv/kernel/ptrace.c +++ b/arch/riscv/kernel/ptrace.c @@ -19,6 +19,7 @@ #include <linux/regset.h> #include <linux/sched.h> #include <linux/sched/task_stack.h> +#include <asm/usercfi.h>
enum riscv_regset { REGSET_X, @@ -28,6 +29,9 @@ enum riscv_regset { #ifdef CONFIG_RISCV_ISA_V REGSET_V, #endif +#ifdef CONFIG_RISCV_USER_CFI + REGSET_CFI, +#endif };
static int riscv_gpr_get(struct task_struct *target, @@ -152,6 +156,75 @@ static int riscv_vr_set(struct task_struct *target, } #endif
+#ifdef CONFIG_RISCV_USER_CFI +static int riscv_cfi_get(struct task_struct *target, + const struct user_regset *regset, + struct membuf to) +{ + struct user_cfi_state user_cfi; + struct pt_regs *regs; + + regs = task_pt_regs(target); + + user_cfi.cfi_status.lp_en = is_indir_lp_enabled(target); + user_cfi.cfi_status.lp_lock = is_indir_lp_locked(target); + user_cfi.cfi_status.elp_state = (regs->status & SR_ELP); + + user_cfi.cfi_status.shstk_en = is_shstk_enabled(target); + user_cfi.cfi_status.shstk_lock = is_shstk_locked(target); + user_cfi.shstk_ptr = get_active_shstk(target); + + return membuf_write(&to, &user_cfi, sizeof(user_cfi)); +} + +/* + * Does it make sense to allowing enable / disable of cfi via ptrace? + * Not allowing enable / disable / locking control via ptrace for now. + * Setting shadow stack pointer is allowed. GDB might use it to unwind or + * some other fixup. Similarly gdb might want to suppress elp and may want + * to reset elp state. + */ +static int riscv_cfi_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + int ret; + struct user_cfi_state user_cfi; + struct pt_regs *regs; + + regs = task_pt_regs(target); + + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &user_cfi, 0, -1); + if (ret) + return ret; + + /* + * Not allowing enabling or locking shadow stack or landing pad + * There is no disabling of shadow stack or landing pad via ptrace + * rsvd field should be set to zero so that if those fields are needed in future + */ + if (user_cfi.cfi_status.lp_en || user_cfi.cfi_status.lp_lock || + user_cfi.cfi_status.shstk_en || user_cfi.cfi_status.shstk_lock || + !user_cfi.cfi_status.rsvd) + return -EINVAL; + + /* If lpad is enabled on target and ptrace requests to set / clear elp, do that */ + if (is_indir_lp_enabled(target)) { + if (user_cfi.cfi_status.elp_state) /* set elp state */ + regs->status |= SR_ELP; + else + regs->status &= ~SR_ELP; /* clear elp state */ + } + + /* If shadow stack enabled on target, set new shadow stack pointer */ + if (is_shstk_enabled(target)) + set_active_shstk(target, user_cfi.shstk_ptr); + + return 0; +} +#endif + static const struct user_regset riscv_user_regset[] = { [REGSET_X] = { .core_note_type = NT_PRSTATUS, @@ -182,6 +255,16 @@ static const struct user_regset riscv_user_regset[] = { .set = riscv_vr_set, }, #endif +#ifdef CONFIG_RISCV_USER_CFI + [REGSET_CFI] = { + .core_note_type = NT_RISCV_USER_CFI, + .align = sizeof(__u64), + .n = sizeof(struct user_cfi_state) / sizeof(__u64), + .size = sizeof(__u64), + .regset_get = riscv_cfi_get, + .set = riscv_cfi_set, + } +#endif };
static const struct user_regset_view riscv_user_native_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index 9417309b7230..f60b2de66b1c 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -447,6 +447,7 @@ typedef struct elf64_shdr { #define NT_MIPS_MSA 0x802 /* MIPS SIMD registers */ #define NT_RISCV_CSR 0x900 /* RISC-V Control and Status Registers */ #define NT_RISCV_VECTOR 0x901 /* RISC-V vector registers */ +#define NT_RISCV_USER_CFI 0x902 /* RISC-V shadow stack state */ #define NT_LOONGARCH_CPUCFG 0xa00 /* LoongArch CPU config registers */ #define NT_LOONGARCH_CSR 0xa01 /* LoongArch control and status registers */ #define NT_LOONGARCH_LSX 0xa02 /* LoongArch Loongson SIMD Extension registers */
Adding enumeration of zicfilp and zicfiss extensions in hwprobe syscall.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/uapi/asm/hwprobe.h | 2 ++ arch/riscv/kernel/sys_hwprobe.c | 2 ++ 2 files changed, 4 insertions(+)
diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h index 9f2a8e3ff204..4ffc6de1eed7 100644 --- a/arch/riscv/include/uapi/asm/hwprobe.h +++ b/arch/riscv/include/uapi/asm/hwprobe.h @@ -59,6 +59,8 @@ struct riscv_hwprobe { #define RISCV_HWPROBE_EXT_ZTSO (1ULL << 33) #define RISCV_HWPROBE_EXT_ZACAS (1ULL << 34) #define RISCV_HWPROBE_EXT_ZICOND (1ULL << 35) +#define RISCV_HWPROBE_EXT_ZICFILP (1ULL << 36) +#define RISCV_HWPROBE_EXT_ZICFISS (1ULL << 37) #define RISCV_HWPROBE_KEY_CPUPERF_0 5 #define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0) #define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0) diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c index a7c56b41efd2..ddc7a9612a90 100644 --- a/arch/riscv/kernel/sys_hwprobe.c +++ b/arch/riscv/kernel/sys_hwprobe.c @@ -111,6 +111,8 @@ static void hwprobe_isa_ext0(struct riscv_hwprobe *pair, EXT_KEY(ZTSO); EXT_KEY(ZACAS); EXT_KEY(ZICOND); + EXT_KEY(ZICFILP); + EXT_KEY(ZICFISS);
if (has_vector()) { EXT_KEY(ZVBB);
This patch creates a config for shadow stack support and landing pad instr support. Shadow stack support and landing instr support can be enabled by selecting `CONFIG_RISCV_USER_CFI`. Selecting `CONFIG_RISCV_USER_CFI` wires up path to enumerate CPU support and if cpu support exists, kernel will support cpu assisted user mode cfi.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/Kconfig | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 7e0b2bcc388f..d6f1303ef660 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -203,6 +203,24 @@ config ARCH_HAS_BROKEN_DWARF5 # https://github.com/llvm/llvm-project/commit/7ffabb61a5569444b5ac9322e22e5471... depends on LD_IS_LLD && LLD_VERSION < 180000
+config RISCV_USER_CFI + def_bool y + bool "riscv userspace control flow integrity" + depends on 64BIT && $(cc-option,-mabi=lp64 -march=rv64ima_zicfiss) + depends on RISCV_ALTERNATIVE + select ARCH_USES_HIGH_VMA_FLAGS + help + Provides CPU assisted control flow integrity to userspace tasks. + Control flow integrity is provided by implementing shadow stack for + backward edge and indirect branch tracking for forward edge in program. + Shadow stack protection is a hardware feature that detects function + return address corruption. This helps mitigate ROP attacks. + Indirect branch tracking enforces that all indirect branches must land + on a landing pad instruction else CPU will fault. This mitigates against + JOP / COP attacks. Applications must be enabled to use it, and old user- + space does not get protection "for free". + default y + config ARCH_MMAP_RND_BITS_MIN default 18 if 64BIT default 8
Adding documentation on landing pad aka indirect branch tracking on riscv and kernel interfaces exposed so that user tasks can enable it.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- Documentation/arch/riscv/zicfilp.rst | 104 +++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 Documentation/arch/riscv/zicfilp.rst
diff --git a/Documentation/arch/riscv/zicfilp.rst b/Documentation/arch/riscv/zicfilp.rst new file mode 100644 index 000000000000..3007c81f0465 --- /dev/null +++ b/Documentation/arch/riscv/zicfilp.rst @@ -0,0 +1,104 @@ +.. SPDX-License-Identifier: GPL-2.0 + +:Author: Deepak Gupta debug@rivosinc.com +:Date: 12 January 2024 + +==================================================== +Tracking indirect control transfers on RISC-V Linux +==================================================== + +This document briefly describes the interface provided to userspace by Linux +to enable indirect branch tracking for user mode applications on RISV-V + +1. Feature Overview +-------------------- + +Memory corruption issues usually result in to crashes, however when in hands of +an adversary and if used creatively can result into variety security issues. + +One of those security issues can be code re-use attacks on program where adversary +can use corrupt function pointers and chain them together to perform jump oriented +programming (JOP) or call oriented programming (COP) and thus compromising control +flow integrity (CFI) of the program. + +Function pointers live in read-write memory and thus are susceptible to corruption +and allows an adversary to reach any program counter (PC) in address space. On +RISC-V zicfilp extension enforces a restriction on such indirect control transfers + + - indirect control transfers must land on a landing pad instruction `lpad`. + There are two exception to this rule + - rs1 = x1 or rs1 = x5, i.e. a return from a function and returns are + protected using shadow stack (see zicfiss.rst) + + - rs1 = x7. On RISC-V compiler usually does below to reach function + which is beyond the offset possible J-type instruction. + + "auipc x7, <imm>" + "jalr (x7)" + + Such form of indirect control transfer are still immutable and don't rely + on memory and thus rs1=x7 is exempted from tracking and considered software + guarded jumps. + +`lpad` instruction is pseudo of `auipc rd, <imm_20bit>` and is a HINT nop. `lpad` +instruction must be aligned on 4 byte boundary and compares 20 bit immediate with x7. +If `imm_20bit` == 0, CPU don't perform any comparision with x7. If `imm_20bit` != 0, +then `imm_20bit` must match x7 else CPU will raise `software check exception` +(cause=18)with `*tval = 2`. + +Compiler can generate a hash over function signatures and setup them (truncated +to 20bit) in x7 at callsites and function proglogs can have `lpad` with same +function hash. This further reduces number of program counters a call site can +reach. + +2. ELF and psABI +----------------- + +Toolchain sets up `GNU_PROPERTY_RISCV_FEATURE_1_FCFI` for property +`GNU_PROPERTY_RISCV_FEATURE_1_AND` in notes section of the object file. + +3. Linux enabling +------------------ + +User space programs can have multiple shared objects loaded in its address space +and it's a difficult task to make sure all the dependencies have been compiled +with support of indirect branch. Thus it's left to dynamic loader to enable +indirect branch tracking for the program. + +4. prctl() enabling +-------------------- + +`PR_SET_INDIR_BR_LP_STATUS` / `PR_GET_INDIR_BR_LP_STATUS` / +`PR_LOCK_INDIR_BR_LP_STATUS` are three prctls added to manage indirect branch +tracking. prctls are arch agnostic and returns -EINVAL on other arches. + +`PR_SET_INDIR_BR_LP_STATUS`: If arg1 `PR_INDIR_BR_LP_ENABLE` and if CPU supports +`zicfilp` then kernel will enabled indirect branch tracking for the task. +Dynamic loader can issue this `prctl` once it has determined that all the objects +loaded in address space support indirect branch tracking. Additionally if there is +a `dlopen` to an object which wasn't compiled with `zicfilp`, dynamic loader can +issue this prctl with arg1 set to 0 (i.e. `PR_INDIR_BR_LP_ENABLE` being clear) + +`PR_GET_INDIR_BR_LP_STATUS`: Returns current status of indirect branch tracking. +If enabled it'll return `PR_INDIR_BR_LP_ENABLE` + +`PR_LOCK_INDIR_BR_LP_STATUS`: Locks current status of indirect branch tracking on +the task. User space may want to run with strict security posture and wouldn't want +loading of objects without `zicfilp` support in it and thus would want to disallow +disabling of indirect branch tracking. In that case user space can use this prctl +to lock current settings. + +5. violations related to indirect branch tracking +-------------------------------------------------- + +Pertaining to indirect branch tracking, CPU raises software check exception in +following conditions + - missing `lpad` after indirect call / jmp + - `lpad` not on 4 byte boundary + - `imm_20bit` embedded in `lpad` instruction doesn't match with `x7` + +In all 3 cases, `*tval = 2` is captured and software check exception is raised +(cause=18) + +Linux kernel will treat this as `SIGSEV`` with code = `SEGV_CPERR` and follow +normal course of signal delivery.
On Wed, Apr 03, 2024 at 04:35:15PM -0700, Deepak Gupta wrote:
Adding documentation on landing pad aka indirect branch tracking on riscv and kernel interfaces exposed so that user tasks can enable it.
Signed-off-by: Deepak Gupta debug@rivosinc.com
Documentation/arch/riscv/zicfilp.rst | 104 +++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 Documentation/arch/riscv/zicfilp.rst
diff --git a/Documentation/arch/riscv/zicfilp.rst b/Documentation/arch/riscv/zicfilp.rst new file mode 100644 index 000000000000..3007c81f0465 --- /dev/null +++ b/Documentation/arch/riscv/zicfilp.rst @@ -0,0 +1,104 @@ +.. SPDX-License-Identifier: GPL-2.0
+:Author: Deepak Gupta debug@rivosinc.com +:Date: 12 January 2024
+==================================================== +Tracking indirect control transfers on RISC-V Linux +====================================================
+This document briefly describes the interface provided to userspace by Linux +to enable indirect branch tracking for user mode applications on RISV-V
+1. Feature Overview +--------------------
+Memory corruption issues usually result in to crashes, however when in hands of +an adversary and if used creatively can result into variety security issues.
+One of those security issues can be code re-use attacks on program where adversary +can use corrupt function pointers and chain them together to perform jump oriented +programming (JOP) or call oriented programming (COP) and thus compromising control +flow integrity (CFI) of the program.
+Function pointers live in read-write memory and thus are susceptible to corruption +and allows an adversary to reach any program counter (PC) in address space. On +RISC-V zicfilp extension enforces a restriction on such indirect control transfers
- indirect control transfers must land on a landing pad instruction `lpad`.
There are two exception to this rule
- rs1 = x1 or rs1 = x5, i.e. a return from a function and returns are
What is a return that is not a return from a function?
protected using shadow stack (see zicfiss.rst)
- rs1 = x7. On RISC-V compiler usually does below to reach function
which is beyond the offset possible J-type instruction.
"auipc x7, <imm>"
"jalr (x7)"
Such form of indirect control transfer are still immutable and don't rely
on memory and thus rs1=x7 is exempted from tracking and considered software
guarded jumps.
+`lpad` instruction is pseudo of `auipc rd, <imm_20bit>` and is a HINT nop. `lpad`
I think this should say "x0" or instead of "rd", or mention that rd=x0.
+instruction must be aligned on 4 byte boundary and compares 20 bit immediate with x7. +If `imm_20bit` == 0, CPU don't perform any comparision with x7. If `imm_20bit` != 0, +then `imm_20bit` must match x7 else CPU will raise `software check exception` +(cause=18)with `*tval = 2`.
+Compiler can generate a hash over function signatures and setup them (truncated +to 20bit) in x7 at callsites and function proglogs can have `lpad` with same
"prologues" instead of "proglogs"
+function hash. This further reduces number of program counters a call site can +reach.
+2. ELF and psABI +-----------------
+Toolchain sets up `GNU_PROPERTY_RISCV_FEATURE_1_FCFI` for property +`GNU_PROPERTY_RISCV_FEATURE_1_AND` in notes section of the object file.
+3. Linux enabling +------------------
+User space programs can have multiple shared objects loaded in its address space +and it's a difficult task to make sure all the dependencies have been compiled +with support of indirect branch. Thus it's left to dynamic loader to enable +indirect branch tracking for the program.
+4. prctl() enabling +--------------------
+`PR_SET_INDIR_BR_LP_STATUS` / `PR_GET_INDIR_BR_LP_STATUS` / +`PR_LOCK_INDIR_BR_LP_STATUS` are three prctls added to manage indirect branch +tracking. prctls are arch agnostic and returns -EINVAL on other arches.
+`PR_SET_INDIR_BR_LP_STATUS`: If arg1 `PR_INDIR_BR_LP_ENABLE` and if CPU supports +`zicfilp` then kernel will enabled indirect branch tracking for the task. +Dynamic loader can issue this `prctl` once it has determined that all the objects +loaded in address space support indirect branch tracking. Additionally if there is +a `dlopen` to an object which wasn't compiled with `zicfilp`, dynamic loader can +issue this prctl with arg1 set to 0 (i.e. `PR_INDIR_BR_LP_ENABLE` being clear)
+`PR_GET_INDIR_BR_LP_STATUS`: Returns current status of indirect branch tracking. +If enabled it'll return `PR_INDIR_BR_LP_ENABLE`
+`PR_LOCK_INDIR_BR_LP_STATUS`: Locks current status of indirect branch tracking on +the task. User space may want to run with strict security posture and wouldn't want +loading of objects without `zicfilp` support in it and thus would want to disallow +disabling of indirect branch tracking. In that case user space can use this prctl +to lock current settings.
+5. violations related to indirect branch tracking +--------------------------------------------------
+Pertaining to indirect branch tracking, CPU raises software check exception in +following conditions
- missing `lpad` after indirect call / jmp
- `lpad` not on 4 byte boundary
- `imm_20bit` embedded in `lpad` instruction doesn't match with `x7`
+In all 3 cases, `*tval = 2` is captured and software check exception is raised +(cause=18)
+Linux kernel will treat this as `SIGSEV`` with code = `SEGV_CPERR` and follow
+normal course of signal delivery.
2.43.2
On Fri, May 10, 2024 at 01:30:32PM -0700, Charlie Jenkins wrote:
On Wed, Apr 03, 2024 at 04:35:15PM -0700, Deepak Gupta wrote:
Adding documentation on landing pad aka indirect branch tracking on riscv and kernel interfaces exposed so that user tasks can enable it.
Signed-off-by: Deepak Gupta debug@rivosinc.com
Documentation/arch/riscv/zicfilp.rst | 104 +++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 Documentation/arch/riscv/zicfilp.rst
diff --git a/Documentation/arch/riscv/zicfilp.rst b/Documentation/arch/riscv/zicfilp.rst new file mode 100644 index 000000000000..3007c81f0465 --- /dev/null +++ b/Documentation/arch/riscv/zicfilp.rst @@ -0,0 +1,104 @@ +.. SPDX-License-Identifier: GPL-2.0
+:Author: Deepak Gupta debug@rivosinc.com +:Date: 12 January 2024
+==================================================== +Tracking indirect control transfers on RISC-V Linux +====================================================
+This document briefly describes the interface provided to userspace by Linux +to enable indirect branch tracking for user mode applications on RISV-V
+1. Feature Overview +--------------------
+Memory corruption issues usually result in to crashes, however when in hands of +an adversary and if used creatively can result into variety security issues.
+One of those security issues can be code re-use attacks on program where adversary +can use corrupt function pointers and chain them together to perform jump oriented +programming (JOP) or call oriented programming (COP) and thus compromising control +flow integrity (CFI) of the program.
+Function pointers live in read-write memory and thus are susceptible to corruption +and allows an adversary to reach any program counter (PC) in address space. On +RISC-V zicfilp extension enforces a restriction on such indirect control transfers
- indirect control transfers must land on a landing pad instruction `lpad`.
There are two exception to this rule
- rs1 = x1 or rs1 = x5, i.e. a return from a function and returns are
What is a return that is not a return from a function?
Those would be a jump or call (depending on convention of whether return is saved in x1/x5)
protected using shadow stack (see zicfiss.rst)
- rs1 = x7. On RISC-V compiler usually does below to reach function
which is beyond the offset possible J-type instruction.
"auipc x7, <imm>"
"jalr (x7)"
Such form of indirect control transfer are still immutable and don't rely
on memory and thus rs1=x7 is exempted from tracking and considered software
guarded jumps.
+`lpad` instruction is pseudo of `auipc rd, <imm_20bit>` and is a HINT nop. `lpad`
I think this should say "x0" or instead of "rd", or mention that rd=x0.
Yeah I missed that. will fix it.
+instruction must be aligned on 4 byte boundary and compares 20 bit immediate with x7. +If `imm_20bit` == 0, CPU don't perform any comparision with x7. If `imm_20bit` != 0, +then `imm_20bit` must match x7 else CPU will raise `software check exception` +(cause=18)with `*tval = 2`.
+Compiler can generate a hash over function signatures and setup them (truncated +to 20bit) in x7 at callsites and function proglogs can have `lpad` with same
"prologues" instead of "proglogs"
Will fix it.
+function hash. This further reduces number of program counters a call site can +reach.
+2. ELF and psABI +-----------------
+Toolchain sets up `GNU_PROPERTY_RISCV_FEATURE_1_FCFI` for property +`GNU_PROPERTY_RISCV_FEATURE_1_AND` in notes section of the object file.
+3. Linux enabling +------------------
+User space programs can have multiple shared objects loaded in its address space +and it's a difficult task to make sure all the dependencies have been compiled +with support of indirect branch. Thus it's left to dynamic loader to enable +indirect branch tracking for the program.
+4. prctl() enabling +--------------------
+`PR_SET_INDIR_BR_LP_STATUS` / `PR_GET_INDIR_BR_LP_STATUS` / +`PR_LOCK_INDIR_BR_LP_STATUS` are three prctls added to manage indirect branch +tracking. prctls are arch agnostic and returns -EINVAL on other arches.
+`PR_SET_INDIR_BR_LP_STATUS`: If arg1 `PR_INDIR_BR_LP_ENABLE` and if CPU supports +`zicfilp` then kernel will enabled indirect branch tracking for the task. +Dynamic loader can issue this `prctl` once it has determined that all the objects +loaded in address space support indirect branch tracking. Additionally if there is +a `dlopen` to an object which wasn't compiled with `zicfilp`, dynamic loader can +issue this prctl with arg1 set to 0 (i.e. `PR_INDIR_BR_LP_ENABLE` being clear)
+`PR_GET_INDIR_BR_LP_STATUS`: Returns current status of indirect branch tracking. +If enabled it'll return `PR_INDIR_BR_LP_ENABLE`
+`PR_LOCK_INDIR_BR_LP_STATUS`: Locks current status of indirect branch tracking on +the task. User space may want to run with strict security posture and wouldn't want +loading of objects without `zicfilp` support in it and thus would want to disallow +disabling of indirect branch tracking. In that case user space can use this prctl +to lock current settings.
+5. violations related to indirect branch tracking +--------------------------------------------------
+Pertaining to indirect branch tracking, CPU raises software check exception in +following conditions
- missing `lpad` after indirect call / jmp
- `lpad` not on 4 byte boundary
- `imm_20bit` embedded in `lpad` instruction doesn't match with `x7`
+In all 3 cases, `*tval = 2` is captured and software check exception is raised +(cause=18)
+Linux kernel will treat this as `SIGSEV`` with code = `SEGV_CPERR` and follow
+normal course of signal delivery.
2.43.2
Adding documentation on shadow stack for user mode on riscv and kernel interfaces exposed so that user tasks can enable it.
Signed-off-by: Deepak Gupta debug@rivosinc.com --- Documentation/arch/riscv/zicfiss.rst | 169 +++++++++++++++++++++++++++ 1 file changed, 169 insertions(+) create mode 100644 Documentation/arch/riscv/zicfiss.rst
diff --git a/Documentation/arch/riscv/zicfiss.rst b/Documentation/arch/riscv/zicfiss.rst new file mode 100644 index 000000000000..f133b6af9c15 --- /dev/null +++ b/Documentation/arch/riscv/zicfiss.rst @@ -0,0 +1,169 @@ +.. SPDX-License-Identifier: GPL-2.0 + +:Author: Deepak Gupta debug@rivosinc.com +:Date: 12 January 2024 + +========================================================= +Shadow stack to protect function returns on RISC-V Linux +========================================================= + +This document briefly describes the interface provided to userspace by Linux +to enable shadow stack for user mode applications on RISV-V + +1. Feature Overview +-------------------- + +Memory corruption issues usually result in to crashes, however when in hands of +an adversary and if used creatively can result into variety security issues. + +One of those security issues can be code re-use attacks on program where adversary +can use corrupt return addresses present on stack and chain them together to perform +return oriented programming (ROP) and thus compromising control flow integrity (CFI) +of the program. + +Return addresses live on stack and thus in read-write memory and thus are +susceptible to corruption and allows an adversary to reach any program counter +(PC) in address space. On RISC-V `zicfiss` extension provides an alternate stack +`shadow stack` on which return addresses can be safely placed in prolog of the +function and retrieved in epilog. `zicfiss` extension makes following changes + + - PTE encodings for shadow stack virtual memory + An earlier reserved encoding in first stage translation i.e. + PTE.R=0, PTE.W=1, PTE.X=0 becomes PTE encoding for shadow stack pages. + + - `sspush x1/x5` instruction pushes (stores) `x1/x5` to shadow stack. + + - `sspopchk x1/x5` instruction pops (loads) from shadow stack and compares + with `x1/x5` and if un-equal, CPU raises `software check exception` with + `*tval = 3` + +Compiler toolchain makes sure that function prologs have `sspush x1/x5` to save return +address on shadow stack in addition to regular stack. Similarly function epilogs have +`ld x5, offset(x2)`; `sspopchk x5` to ensure that popped value from regular stack +matches with popped value from shadow stack. + +2. Shadow stack protections and linux memory manager +----------------------------------------------------- + +As mentioned earlier, shadow stack get new page table encodings and thus have some +special properties assigned to them and instructions that operate on them as below + + - Regular stores to shadow stack memory raises access store faults. + This way shadow stack memory is protected from stray inadvertant + writes + + - Regular loads to shadow stack memory are allowed. + This allows stack trace utilities or backtrace functions to read + true callstack (not tampered) + + - Only shadow stack instructions can generate shadow stack load or + shadow stack store. + + - Shadow stack load / shadow stack store on read-only memory raises + AMO/store page fault. Thus both `sspush x1/x5` and `sspopchk x1/x5` + will raise AMO/store page fault. This simplies COW handling in kernel + During fork, kernel can convert shadow stack pages into read-only + memory (as it does for regular read-write memory) and as soon as + subsequent `sspush` or `sspopchk` in userspace is encountered, then + kernel can perform COW. + + - Shadow stack load / shadow stack store on read-write, read-write- + execute memory raises an access fault. This is a fatal condition + because shadow stack should never be operating on read-write, read- + write-execute memory. + +3. ELF and psABI +----------------- + +Toolchain sets up `GNU_PROPERTY_RISCV_FEATURE_1_BCFI` for property +`GNU_PROPERTY_RISCV_FEATURE_1_AND` in notes section of the object file. + +4. Linux enabling +------------------ + +User space programs can have multiple shared objects loaded in its address space +and it's a difficult task to make sure all the dependencies have been compiled +with support of shadow stack. Thus it's left to dynamic loader to enable +shadow stack for the program. + +5. prctl() enabling +-------------------- + +`PR_SET_SHADOW_STACK_STATUS` / `PR_GET_SHADOW_STACK_STATUS` / +`PR_LOCK_SHADOW_STACK_STATUS` are three prctls added to manage shadow stack +enabling for tasks. prctls are arch agnostic and returns -EINVAL on other arches. + +`PR_SET_SHADOW_STACK_STATUS`: If arg1 `PR_SHADOW_STACK_ENABLE` and if CPU supports +`zicfiss` then kernel will enable shadow stack for the task. Dynamic loader can +issue this `prctl` once it has determined that all the objects loaded in address +space have support for shadow stack. Additionally if there is a `dlopen` to an +object which wasn't compiled with `zicfiss`, dynamic loader can issue this prctl +with arg1 set to 0 (i.e. `PR_SHADOW_STACK_ENABLE` being clear) + +`PR_GET_SHADOW_STACK_STATUS`: Returns current status of indirect branch tracking. +If enabled it'll return `PR_SHADOW_STACK_ENABLE` + +`PR_LOCK_SHADOW_STACK_STATUS`: Locks current status of shadow stack enabling on the +task. User space may want to run with strict security posture and wouldn't want +loading of objects without `zicfiss` support in it and thus would want to disallow +disabling of shadow stack on current task. In that case user space can use this prctl +to lock current settings. + +5. violations related to returns with shadow stack enabled +----------------------------------------------------------- + +Pertaining to shadow stack, CPU raises software check exception in following +condition + + - On execution of `sspopchk x1/x5`, x1/x5 didn't match top of shadow stack. + If mismatch happens then cpu does `*tval = 3` and raise software check + exception + +Linux kernel will treat this as `SIGSEV`` with code = `SEGV_CPERR` and follow +normal course of signal delivery. + +6. Shadow stack tokens +----------------------- +Regular stores on shadow stacks are not allowed and thus can't be tampered with via +arbitrary stray writes due to bugs. Method of pivoting / switching to shadow stack +is simply writing to csr `CSR_SSP` changes active shadow stack. This can be problematic +because usually value to be written to `CSR_SSP` will be loaded somewhere in writeable +memory and thus allows an adversary to corruption bug in software to pivot to an any +address in shadow stack range. Shadow stack tokens can help mitigate this problem by +making sure that: + + - When software is switching away from a shadow stack, shadow stack pointer should be + saved on shadow stack itself and call it `shadow stack token` + + - When software is switching to a shadow stack, it should read the `shadow stack token` + from shadow stack pointer and verify that `shadow stack token` itself is pointer to + shadow stack itself. + + - Once the token verification is done, software can perform the write to `CSR_SSP` to + switch shadow stack. + +Here software can be user mode task runtime itself which is managing various contexts +as part of single thread. Software can be kernel as well when kernel has to deliver a +signal to user task and must save shadow stack pointer. Kernel can perform similar +procedure by saving a token on user shadow stack itself. This way whenever sigreturn +happens, kernel can read the token and verify the token and then switch to shadow stack. +Using this mechanism, kernel helps user task so that any corruption issue in user task +is not exploited by adversary by arbitrarily using `sigreturn`. Adversary will have to +make sure that there is a `shadow stack token` in addition to invoking `sigreturn` + +7. Signal shadow stack +----------------------- +Following structure has been added to sigcontext for RISC-V. `rsvd` field has been kept +in case we need some extra information in future for landing pads / indirect branch +tracking. It has been kept today in order to allow backward compatibility in future. + +struct __sc_riscv_cfi_state { + unsigned long ss_ptr; + unsigned long rsvd; +}; + +As part of signal delivery, shadow stack token is saved on current shadow stack itself and +updated pointer is saved away in `ss_ptr` field in `__sc_riscv_cfi_state` under `sigcontext` +Existing shadow stack allocation is used for signal delivery. During `sigreturn`, kernel will +obtain `ss_ptr` from `sigcontext` and verify the saved token on shadow stack itself and switch +shadow stack.
Adds kselftest for RISC-V control flow integrity implementation for user mode. There is not a lot going on in kernel for enabling landing pad for user mode. cfi selftest are intended to be compiled with zicfilp and zicfiss enabled compiler. Thus kselftest simply checks if landing pad and shadow stack for the binary and process are enabled or not. selftest then register a signal handler for SIGSEGV. Any control flow violation are reported as SIGSEGV with si_code = SEGV_CPERR. Test will fail on recieving any SEGV_CPERR. Shadow stack part has more changes in kernel and thus there are separate tests for that - Exercise `map_shadow_stack` syscall - `fork` test to make sure COW works for shadow stack pages - gup tests As of today kernel uses FOLL_FORCE when access happens to memory via /proc/<pid>/mem. Not breaking that for shadow stack - signal test. Make sure signal delivery results in token creation on shadow stack and consumes (and verifies) token on sigreturn - shadow stack protection test. attempts to write using regular store instruction on shadow stack memory must result in access faults
Signed-off-by: Deepak Gupta debug@rivosinc.com --- tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 10 + .../testing/selftests/riscv/cfi/cfi_rv_test.h | 83 ++++ .../selftests/riscv/cfi/riscv_cfi_test.c | 82 ++++ .../testing/selftests/riscv/cfi/shadowstack.c | 362 ++++++++++++++++++ .../testing/selftests/riscv/cfi/shadowstack.h | 37 ++ 7 files changed, 578 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/riscv/cfi/.gitignore create mode 100644 tools/testing/selftests/riscv/cfi/Makefile create mode 100644 tools/testing/selftests/riscv/cfi/cfi_rv_test.h create mode 100644 tools/testing/selftests/riscv/cfi/riscv_cfi_test.c create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.c create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.h
diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile index 4a9ff515a3a0..867e5875b7ce 100644 --- a/tools/testing/selftests/riscv/Makefile +++ b/tools/testing/selftests/riscv/Makefile @@ -5,7 +5,7 @@ ARCH ?= $(shell uname -m 2>/dev/null || echo not)
ifneq (,$(filter $(ARCH),riscv)) -RISCV_SUBTARGETS ?= hwprobe vector mm +RISCV_SUBTARGETS ?= hwprobe vector mm cfi else RISCV_SUBTARGETS := endif diff --git a/tools/testing/selftests/riscv/cfi/.gitignore b/tools/testing/selftests/riscv/cfi/.gitignore new file mode 100644 index 000000000000..ce7623f9da28 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/.gitignore @@ -0,0 +1,3 @@ +cfitests +riscv_cfi_test +shadowstack \ No newline at end of file diff --git a/tools/testing/selftests/riscv/cfi/Makefile b/tools/testing/selftests/riscv/cfi/Makefile new file mode 100644 index 000000000000..b65f7ff38a32 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/Makefile @@ -0,0 +1,10 @@ +CFLAGS += -I$(top_srcdir)/tools/include + +CFLAGS += -march=rv64gc_zicfilp_zicfiss + +TEST_GEN_PROGS := cfitests + +include ../../lib.mk + +$(OUTPUT)/cfitests: riscv_cfi_test.c shadowstack.c + $(CC) -o$@ $(CFLAGS) $(LDFLAGS) $^ diff --git a/tools/testing/selftests/riscv/cfi/cfi_rv_test.h b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h new file mode 100644 index 000000000000..fa1cf7183672 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef SELFTEST_RISCV_CFI_H +#define SELFTEST_RISCV_CFI_H +#include <stddef.h> +#include <sys/types.h> +#include "shadowstack.h" + +#define RISCV_CFI_SELFTEST_COUNT RISCV_SHADOW_STACK_TESTS + +#define CHILD_EXIT_CODE_SSWRITE 10 +#define CHILD_EXIT_CODE_SIG_TEST 11 + +#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \ +({ \ + register long _num __asm__ ("a7") = (num); \ + register long _arg1 __asm__ ("a0") = (long)(arg1); \ + register long _arg2 __asm__ ("a1") = (long)(arg2); \ + register long _arg3 __asm__ ("a2") = (long)(arg3); \ + register long _arg4 __asm__ ("a3") = (long)(arg4); \ + register long _arg5 __asm__ ("a4") = (long)(arg5); \ + \ + __asm__ volatile ( \ + "ecall\n" \ + : "+r"(_arg1) \ + : "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \ + "r"(_num) \ + : "memory", "cc" \ + ); \ + _arg1; \ +}) + +#define my_syscall3(num, arg1, arg2, arg3) \ +({ \ + register long _num __asm__ ("a7") = (num); \ + register long _arg1 __asm__ ("a0") = (long)(arg1); \ + register long _arg2 __asm__ ("a1") = (long)(arg2); \ + register long _arg3 __asm__ ("a2") = (long)(arg3); \ + \ + __asm__ volatile ( \ + "ecall\n" \ + : "+r"(_arg1) \ + : "r"(_arg2), "r"(_arg3), \ + "r"(_num) \ + : "memory", "cc" \ + ); \ + _arg1; \ +}) + +#ifndef __NR_prctl +#define __NR_prctl 167 +#endif + +#ifndef __NR_map_shadow_stack +#define __NR_map_shadow_stack 453 +#endif + +#define CSR_SSP 0x011 + +#ifdef __ASSEMBLY__ +#define __ASM_STR(x) x +#else +#define __ASM_STR(x) #x +#endif + +#define csr_read(csr) \ +({ \ + register unsigned long __v; \ + __asm__ __volatile__ ("csrr %0, " __ASM_STR(csr) \ + : "=r" (__v) : \ + : "memory"); \ + __v; \ +}) + +#define csr_write(csr, val) \ +({ \ + unsigned long __v = (unsigned long) (val); \ + __asm__ __volatile__ ("csrw " __ASM_STR(csr) ", %0" \ + : : "rK" (__v) \ + : "memory"); \ +}) + +#endif diff --git a/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c new file mode 100644 index 000000000000..f22b3f0f24de --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "../../kselftest.h" +#include <signal.h> +#include <asm/ucontext.h> +#include <linux/prctl.h> +#include "cfi_rv_test.h" + +/* do not optimize cfi related test functions */ +#pragma GCC push_options +#pragma GCC optimize("O0") + +void sigsegv_handler(int signum, siginfo_t *si, void *uc) +{ + struct ucontext *ctx = (struct ucontext *) uc; + + if (si->si_code == SEGV_CPERR) { + printf("Control flow violation happened somewhere\n"); + printf("pc where violation happened %lx\n", ctx->uc_mcontext.gregs[0]); + exit(-1); + } + + printf("In sigsegv handler\n"); + /* all other cases are expected to be of shadow stack write case */ + exit(CHILD_EXIT_CODE_SSWRITE); +} + +bool register_signal_handler(void) +{ + struct sigaction sa = {}; + + sa.sa_sigaction = sigsegv_handler; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) { + printf("registering signal handler for landing pad violation failed\n"); + return false; + } + + return true; +} + +int main(int argc, char *argv[]) +{ + int ret = 0; + unsigned long lpad_status = 0, ss_status = 0; + + ksft_print_header(); + + ksft_set_plan(RISCV_CFI_SELFTEST_COUNT); + + ksft_print_msg("starting risc-v tests\n"); + + /* + * Landing pad test. Not a lot of kernel changes to support landing + * pad for user mode except lighting up a bit in senvcfg via a prctl + * Enable landing pad through out the execution of test binary + */ + ret = my_syscall5(__NR_prctl, PR_GET_INDIR_BR_LP_STATUS, &lpad_status, 0, 0, 0); + if (ret) + ksft_exit_skip("Get landing pad status failed with %d\n", ret); + + if (!(lpad_status & PR_INDIR_BR_LP_ENABLE)) + ksft_exit_skip("landing pad is not enabled, should be enabled via glibc\n"); + + ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0); + if (ret) + ksft_exit_skip("Get shadow stack failed with %d\n", ret); + + if (!(ss_status & PR_SHADOW_STACK_ENABLE)) + ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n"); + + if (!register_signal_handler()) + ksft_exit_skip("registering signal handler for SIGSEGV failed\n"); + + ksft_print_msg("landing pad and shadow stack are enabled for binary\n"); + ksft_print_msg("starting risc-v shadow stack tests\n"); + execute_shadow_stack_tests(); + + ksft_finished(); +} + +#pragma GCC pop_options diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.c b/tools/testing/selftests/riscv/cfi/shadowstack.c new file mode 100644 index 000000000000..2f65eb970c44 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/shadowstack.c @@ -0,0 +1,362 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "../../kselftest.h" +#include <sys/wait.h> +#include <signal.h> +#include <fcntl.h> +#include <asm-generic/unistd.h> +#include <sys/mman.h> +#include "shadowstack.h" +#include "cfi_rv_test.h" + +/* do not optimize shadow stack related test functions */ +#pragma GCC push_options +#pragma GCC optimize("O0") + +void zar(void) +{ + unsigned long ssp = 0; + + ssp = csr_read(CSR_SSP); + printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp); +} + +void bar(void) +{ + printf("inside %s\n", __func__); + zar(); +} + +void foo(void) +{ + printf("inside %s\n", __func__); + bar(); +} + +void zar_child(void) +{ + unsigned long ssp = 0; + + ssp = csr_read(CSR_SSP); + printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp); +} + +void bar_child(void) +{ + printf("inside %s\n", __func__); + zar_child(); +} + +void foo_child(void) +{ + printf("inside %s\n", __func__); + bar_child(); +} + +typedef void (call_func_ptr)(void); +/* + * call couple of functions to test push pop. + */ +int shadow_stack_call_tests(call_func_ptr fn_ptr, bool parent) +{ + if (parent) + printf("call test for parent\n"); + else + printf("call test for child\n"); + + (fn_ptr)(); + + return 0; +} + +/* forks a thread, and ensure shadow stacks fork out */ +bool shadow_stack_fork_test(unsigned long test_num, void *ctx) +{ + int pid = 0, child_status = 0, parent_pid = 0, ret = 0; + unsigned long ss_status = 0; + + printf("exercising shadow stack fork test\n"); + + ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0); + if (ret) { + printf("shadow stack get status prctl failed with errorcode %d\n", ret); + return false; + } + + if (!(ss_status & PR_SHADOW_STACK_ENABLE)) + ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n"); + + parent_pid = getpid(); + pid = fork(); + + if (pid) { + printf("Parent pid %d and child pid %d\n", parent_pid, pid); + shadow_stack_call_tests(&foo, true); + } else + shadow_stack_call_tests(&foo_child, false); + + if (pid) { + printf("waiting on child to finish\n"); + wait(&child_status); + } else { + /* exit child gracefully */ + exit(0); + } + + if (pid && WIFSIGNALED(child_status)) { + printf("child faulted"); + return false; + } + + return true; +} + +/* exercise `map_shadow_stack`, pivot to it and call some functions to ensure it works */ +#define SHADOW_STACK_ALLOC_SIZE 4096 +bool shadow_stack_map_test(unsigned long test_num, void *ctx) +{ + unsigned long shdw_addr; + int ret = 0; + + shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0); + + if (((long) shdw_addr) <= 0) { + printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr); + return false; + } + + ret = munmap((void *) shdw_addr, SHADOW_STACK_ALLOC_SIZE); + + if (ret) { + printf("munmap failed with error code %d\n", ret); + return false; + } + + return true; +} + +/* + * shadow stack protection tests. map a shadow stack and + * validate all memory protections work on it + */ +bool shadow_stack_protection_test(unsigned long test_num, void *ctx) +{ + unsigned long shdw_addr; + unsigned long *write_addr = NULL; + int ret = 0, pid = 0, child_status = 0; + + shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0); + + if (((long) shdw_addr) <= 0) { + printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr); + return false; + } + + write_addr = (unsigned long *) shdw_addr; + pid = fork(); + + /* no child was created, return false */ + if (pid == -1) + return false; + + /* + * try to perform a store from child on shadow stack memory + * it should result in SIGSEGV + */ + if (!pid) { + /* below write must lead to SIGSEGV */ + *write_addr = 0xdeadbeef; + } else { + wait(&child_status); + } + + /* test fail, if 0xdeadbeef present on shadow stack address */ + if (*write_addr == 0xdeadbeef) { + printf("write suceeded\n"); + return false; + } + + /* if child reached here, then fail */ + if (!pid) { + printf("child reached unreachable state\n"); + return false; + } + + /* if child exited via signal handler but not for write on ss */ + if (WIFEXITED(child_status) && + WEXITSTATUS(child_status) != CHILD_EXIT_CODE_SSWRITE) { + printf("child wasn't signaled for write on shadow stack\n"); + return false; + } + + ret = munmap(write_addr, SHADOW_STACK_ALLOC_SIZE); + if (ret) { + printf("munmap failed with error code %d\n", ret); + return false; + } + + return true; +} + +#define SS_MAGIC_WRITE_VAL 0xbeefdead + +int gup_tests(int mem_fd, unsigned long *shdw_addr) +{ + unsigned long val = 0; + + lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET); + if (read(mem_fd, &val, sizeof(val)) < 0) { + printf("reading shadow stack mem via gup failed\n"); + return 1; + } + + val = SS_MAGIC_WRITE_VAL; + lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET); + if (write(mem_fd, &val, sizeof(val)) < 0) { + printf("writing shadow stack mem via gup failed\n"); + return 1; + } + + if (*shdw_addr != SS_MAGIC_WRITE_VAL) { + printf("GUP write to shadow stack memory didn't happen\n"); + return 1; + } + + return 0; +} + +bool shadow_stack_gup_tests(unsigned long test_num, void *ctx) +{ + unsigned long shdw_addr = 0; + unsigned long *write_addr = NULL; + int fd = 0; + bool ret = false; + + shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0); + + if (((long) shdw_addr) <= 0) { + printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr); + return false; + } + + write_addr = (unsigned long *) shdw_addr; + + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + return false; + + if (gup_tests(fd, write_addr)) { + printf("gup tests failed\n"); + goto out; + } + + ret = true; +out: + if (shdw_addr && munmap(write_addr, SHADOW_STACK_ALLOC_SIZE)) { + printf("munmap failed with error code %d\n", ret); + ret = false; + } + + return ret; +} + +volatile bool break_loop; + +void sigusr1_handler(int signo) +{ + printf("In sigusr1 handler\n"); + break_loop = true; +} + +bool sigusr1_signal_test(void) +{ + struct sigaction sa = {}; + + sa.sa_handler = sigusr1_handler; + sa.sa_flags = 0; + sigemptyset(&sa.sa_mask); + if (sigaction(SIGUSR1, &sa, NULL)) { + printf("registering signal handler for SIGUSR1 failed\n"); + return false; + } + + return true; +} +/* + * shadow stack signal test. shadow stack must be enabled. + * register a signal, fork another thread which is waiting + * on signal. Send a signal from parent to child, verify + * that signal was received by child. If not test fails + */ +bool shadow_stack_signal_test(unsigned long test_num, void *ctx) +{ + int pid = 0, child_status = 0, ret = 0; + unsigned long ss_status = 0; + + ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0); + if (ret) { + printf("shadow stack get status prctl failed with errorcode %d\n", ret); + return false; + } + + if (!(ss_status & PR_SHADOW_STACK_ENABLE)) + ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n"); + + /* this should be caught by signal handler and do an exit */ + if (!sigusr1_signal_test()) { + printf("registering sigusr1 handler failed\n"); + exit(-1); + } + + pid = fork(); + + if (pid == -1) { + printf("signal test: fork failed\n"); + goto out; + } + + if (pid == 0) { + while (!break_loop) + sleep(1); + + exit(11); + /* child shouldn't go beyond here */ + } + + /* send SIGUSR1 to child */ + kill(pid, SIGUSR1); + wait(&child_status); + +out: + + return (WIFEXITED(child_status) && + WEXITSTATUS(child_status) == 11); +} + +int execute_shadow_stack_tests(void) +{ + int ret = 0; + unsigned long test_count = 0; + unsigned long shstk_status = 0; + + printf("Executing RISC-V shadow stack self tests\n"); + + ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &shstk_status, 0, 0, 0); + + if (ret != 0) + ksft_exit_skip("Get shadow stack status failed with %d\n", ret); + + /* + * If we are here that means get shadow stack status succeeded and + * thus shadow stack support is baked in the kernel. + */ + while (test_count < ARRAY_SIZE(shstk_tests)) { + ksft_test_result((*shstk_tests[test_count].t_func)(test_count, NULL), + shstk_tests[test_count].name); + test_count++; + } + + return 0; +} + +#pragma GCC pop_options diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.h b/tools/testing/selftests/riscv/cfi/shadowstack.h new file mode 100644 index 000000000000..b43e74136a26 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/shadowstack.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef SELFTEST_SHADOWSTACK_TEST_H +#define SELFTEST_SHADOWSTACK_TEST_H +#include <stddef.h> +#include <linux/prctl.h> + +/* + * a cfi test returns true for success or false for fail + * takes a number for test number to index into array and void pointer. + */ +typedef bool (*shstk_test_func)(unsigned long test_num, void *); + +struct shadow_stack_tests { + char *name; + shstk_test_func t_func; +}; + +bool shadow_stack_fork_test(unsigned long test_num, void *ctx); +bool shadow_stack_map_test(unsigned long test_num, void *ctx); +bool shadow_stack_protection_test(unsigned long test_num, void *ctx); +bool shadow_stack_gup_tests(unsigned long test_num, void *ctx); +bool shadow_stack_signal_test(unsigned long test_num, void *ctx); + +static struct shadow_stack_tests shstk_tests[] = { + { "shstk fork test\n", shadow_stack_fork_test }, + { "map shadow stack syscall\n", shadow_stack_map_test }, + { "shadow stack gup tests\n", shadow_stack_gup_tests }, + { "shadow stack signal tests\n", shadow_stack_signal_test}, + { "memory protections of shadow stack memory\n", shadow_stack_protection_test } +}; + +#define RISCV_SHADOW_STACK_TESTS ARRAY_SIZE(shstk_tests) + +int execute_shadow_stack_tests(void); + +#endif
On Wed, Apr 03, 2024 at 04:35:17PM -0700, Deepak Gupta wrote:
Adds kselftest for RISC-V control flow integrity implementation for user mode. There is not a lot going on in kernel for enabling landing pad for user mode. cfi selftest are intended to be compiled with zicfilp and zicfiss enabled compiler. Thus kselftest simply checks if landing pad and shadow stack for the binary and process are enabled or not. selftest then register a signal handler for SIGSEGV. Any control flow violation are reported as SIGSEGV with si_code = SEGV_CPERR. Test will fail on recieving any SEGV_CPERR. Shadow stack part has more changes in kernel and thus there are separate tests for that
- Exercise `map_shadow_stack` syscall
- `fork` test to make sure COW works for shadow stack pages
- gup tests As of today kernel uses FOLL_FORCE when access happens to memory via /proc/<pid>/mem. Not breaking that for shadow stack
- signal test. Make sure signal delivery results in token creation on shadow stack and consumes (and verifies) token on sigreturn
- shadow stack protection test. attempts to write using regular store
instruction on shadow stack memory must result in access faults
Signed-off-by: Deepak Gupta debug@rivosinc.com
tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 10 + .../testing/selftests/riscv/cfi/cfi_rv_test.h | 83 ++++ .../selftests/riscv/cfi/riscv_cfi_test.c | 82 ++++ .../testing/selftests/riscv/cfi/shadowstack.c | 362 ++++++++++++++++++ .../testing/selftests/riscv/cfi/shadowstack.h | 37 ++ 7 files changed, 578 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/riscv/cfi/.gitignore create mode 100644 tools/testing/selftests/riscv/cfi/Makefile create mode 100644 tools/testing/selftests/riscv/cfi/cfi_rv_test.h create mode 100644 tools/testing/selftests/riscv/cfi/riscv_cfi_test.c create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.c create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.h
diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile index 4a9ff515a3a0..867e5875b7ce 100644 --- a/tools/testing/selftests/riscv/Makefile +++ b/tools/testing/selftests/riscv/Makefile @@ -5,7 +5,7 @@ ARCH ?= $(shell uname -m 2>/dev/null || echo not) ifneq (,$(filter $(ARCH),riscv)) -RISCV_SUBTARGETS ?= hwprobe vector mm +RISCV_SUBTARGETS ?= hwprobe vector mm cfi else RISCV_SUBTARGETS := endif diff --git a/tools/testing/selftests/riscv/cfi/.gitignore b/tools/testing/selftests/riscv/cfi/.gitignore new file mode 100644 index 000000000000..ce7623f9da28 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/.gitignore @@ -0,0 +1,3 @@ +cfitests +riscv_cfi_test +shadowstack \ No newline at end of file diff --git a/tools/testing/selftests/riscv/cfi/Makefile b/tools/testing/selftests/riscv/cfi/Makefile new file mode 100644 index 000000000000..b65f7ff38a32 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/Makefile @@ -0,0 +1,10 @@ +CFLAGS += -I$(top_srcdir)/tools/include
+CFLAGS += -march=rv64gc_zicfilp_zicfiss
+TEST_GEN_PROGS := cfitests
+include ../../lib.mk
+$(OUTPUT)/cfitests: riscv_cfi_test.c shadowstack.c
- $(CC) -o$@ $(CFLAGS) $(LDFLAGS) $^
diff --git a/tools/testing/selftests/riscv/cfi/cfi_rv_test.h b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h new file mode 100644 index 000000000000..fa1cf7183672 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef SELFTEST_RISCV_CFI_H +#define SELFTEST_RISCV_CFI_H +#include <stddef.h> +#include <sys/types.h> +#include "shadowstack.h"
+#define RISCV_CFI_SELFTEST_COUNT RISCV_SHADOW_STACK_TESTS
+#define CHILD_EXIT_CODE_SSWRITE 10 +#define CHILD_EXIT_CODE_SIG_TEST 11
+#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \ +({ \
- register long _num __asm__ ("a7") = (num); \
- register long _arg1 __asm__ ("a0") = (long)(arg1); \
- register long _arg2 __asm__ ("a1") = (long)(arg2); \
- register long _arg3 __asm__ ("a2") = (long)(arg3); \
- register long _arg4 __asm__ ("a3") = (long)(arg4); \
- register long _arg5 __asm__ ("a4") = (long)(arg5); \
\
- __asm__ volatile ( \
"ecall\n" \
: "+r"(_arg1) \
: "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
"r"(_num) \
: "memory", "cc" \
- ); \
- _arg1; \
+})
+#define my_syscall3(num, arg1, arg2, arg3) \ +({ \
- register long _num __asm__ ("a7") = (num); \
- register long _arg1 __asm__ ("a0") = (long)(arg1); \
- register long _arg2 __asm__ ("a1") = (long)(arg2); \
- register long _arg3 __asm__ ("a2") = (long)(arg3); \
\
- __asm__ volatile ( \
"ecall\n" \
: "+r"(_arg1) \
: "r"(_arg2), "r"(_arg3), \
"r"(_num) \
: "memory", "cc" \
- ); \
- _arg1; \
+})
+#ifndef __NR_prctl +#define __NR_prctl 167 +#endif
+#ifndef __NR_map_shadow_stack +#define __NR_map_shadow_stack 453 +#endif
+#define CSR_SSP 0x011
+#ifdef __ASSEMBLY__ +#define __ASM_STR(x) x +#else +#define __ASM_STR(x) #x +#endif
+#define csr_read(csr) \ +({ \
- register unsigned long __v; \
- __asm__ __volatile__ ("csrr %0, " __ASM_STR(csr) \
: "=r" (__v) : \
: "memory"); \
- __v; \
+})
+#define csr_write(csr, val) \ +({ \
- unsigned long __v = (unsigned long) (val); \
- __asm__ __volatile__ ("csrw " __ASM_STR(csr) ", %0" \
: : "rK" (__v) \
: "memory"); \
+})
+#endif diff --git a/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c new file mode 100644 index 000000000000..f22b3f0f24de --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0-only
+#include "../../kselftest.h" +#include <signal.h> +#include <asm/ucontext.h> +#include <linux/prctl.h> +#include "cfi_rv_test.h"
+/* do not optimize cfi related test functions */ +#pragma GCC push_options +#pragma GCC optimize("O0")
+void sigsegv_handler(int signum, siginfo_t *si, void *uc) +{
- struct ucontext *ctx = (struct ucontext *) uc;
- if (si->si_code == SEGV_CPERR) {
printf("Control flow violation happened somewhere\n");
printf("pc where violation happened %lx\n", ctx->uc_mcontext.gregs[0]);
exit(-1);
- }
- printf("In sigsegv handler\n");
- /* all other cases are expected to be of shadow stack write case */
- exit(CHILD_EXIT_CODE_SSWRITE);
+}
+bool register_signal_handler(void) +{
- struct sigaction sa = {};
- sa.sa_sigaction = sigsegv_handler;
- sa.sa_flags = SA_SIGINFO;
- if (sigaction(SIGSEGV, &sa, NULL)) {
printf("registering signal handler for landing pad violation failed\n");
return false;
- }
- return true;
+}
+int main(int argc, char *argv[]) +{
- int ret = 0;
- unsigned long lpad_status = 0, ss_status = 0;
- ksft_print_header();
- ksft_set_plan(RISCV_CFI_SELFTEST_COUNT);
- ksft_print_msg("starting risc-v tests\n");
- /*
* Landing pad test. Not a lot of kernel changes to support landing
* pad for user mode except lighting up a bit in senvcfg via a prctl
* Enable landing pad through out the execution of test binary
*/
- ret = my_syscall5(__NR_prctl, PR_GET_INDIR_BR_LP_STATUS, &lpad_status, 0, 0, 0);
There is an assumption here that the libc supports setting INDIR_BR_LP_STATUS but does not support the standard prctl interface defined in <sys/prctl.h>. my_syscall5() is defined to fill in gaps in the libc, so this test case should also set the status manually rather than relying on the libc.
I don't think it's necessary to define my_syscall5() since every libc should have a prctl() definition. However, these CFI prctls are very new and glibc does not yet support (correct me if I am wrong) it so these prctls should be enabled by the test cases.
- Charlie
- if (ret)
ksft_exit_skip("Get landing pad status failed with %d\n", ret);
- if (!(lpad_status & PR_INDIR_BR_LP_ENABLE))
ksft_exit_skip("landing pad is not enabled, should be enabled via glibc\n");
- ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
- if (ret)
ksft_exit_skip("Get shadow stack failed with %d\n", ret);
- if (!(ss_status & PR_SHADOW_STACK_ENABLE))
ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
- if (!register_signal_handler())
ksft_exit_skip("registering signal handler for SIGSEGV failed\n");
- ksft_print_msg("landing pad and shadow stack are enabled for binary\n");
- ksft_print_msg("starting risc-v shadow stack tests\n");
- execute_shadow_stack_tests();
- ksft_finished();
+}
+#pragma GCC pop_options diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.c b/tools/testing/selftests/riscv/cfi/shadowstack.c new file mode 100644 index 000000000000..2f65eb970c44 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/shadowstack.c @@ -0,0 +1,362 @@ +// SPDX-License-Identifier: GPL-2.0-only
+#include "../../kselftest.h" +#include <sys/wait.h> +#include <signal.h> +#include <fcntl.h> +#include <asm-generic/unistd.h> +#include <sys/mman.h> +#include "shadowstack.h" +#include "cfi_rv_test.h"
+/* do not optimize shadow stack related test functions */ +#pragma GCC push_options +#pragma GCC optimize("O0")
+void zar(void) +{
- unsigned long ssp = 0;
- ssp = csr_read(CSR_SSP);
- printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp);
+}
+void bar(void) +{
- printf("inside %s\n", __func__);
- zar();
+}
+void foo(void) +{
- printf("inside %s\n", __func__);
- bar();
+}
+void zar_child(void) +{
- unsigned long ssp = 0;
- ssp = csr_read(CSR_SSP);
- printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp);
+}
+void bar_child(void) +{
- printf("inside %s\n", __func__);
- zar_child();
+}
+void foo_child(void) +{
- printf("inside %s\n", __func__);
- bar_child();
+}
+typedef void (call_func_ptr)(void); +/*
- call couple of functions to test push pop.
- */
+int shadow_stack_call_tests(call_func_ptr fn_ptr, bool parent) +{
- if (parent)
printf("call test for parent\n");
- else
printf("call test for child\n");
- (fn_ptr)();
- return 0;
+}
+/* forks a thread, and ensure shadow stacks fork out */ +bool shadow_stack_fork_test(unsigned long test_num, void *ctx) +{
- int pid = 0, child_status = 0, parent_pid = 0, ret = 0;
- unsigned long ss_status = 0;
- printf("exercising shadow stack fork test\n");
- ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
- if (ret) {
printf("shadow stack get status prctl failed with errorcode %d\n", ret);
return false;
- }
- if (!(ss_status & PR_SHADOW_STACK_ENABLE))
ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
- parent_pid = getpid();
- pid = fork();
- if (pid) {
printf("Parent pid %d and child pid %d\n", parent_pid, pid);
shadow_stack_call_tests(&foo, true);
- } else
shadow_stack_call_tests(&foo_child, false);
- if (pid) {
printf("waiting on child to finish\n");
wait(&child_status);
- } else {
/* exit child gracefully */
exit(0);
- }
- if (pid && WIFSIGNALED(child_status)) {
printf("child faulted");
return false;
- }
- return true;
+}
+/* exercise `map_shadow_stack`, pivot to it and call some functions to ensure it works */ +#define SHADOW_STACK_ALLOC_SIZE 4096 +bool shadow_stack_map_test(unsigned long test_num, void *ctx) +{
- unsigned long shdw_addr;
- int ret = 0;
- shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
- if (((long) shdw_addr) <= 0) {
printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
return false;
- }
- ret = munmap((void *) shdw_addr, SHADOW_STACK_ALLOC_SIZE);
- if (ret) {
printf("munmap failed with error code %d\n", ret);
return false;
- }
- return true;
+}
+/*
- shadow stack protection tests. map a shadow stack and
- validate all memory protections work on it
- */
+bool shadow_stack_protection_test(unsigned long test_num, void *ctx) +{
- unsigned long shdw_addr;
- unsigned long *write_addr = NULL;
- int ret = 0, pid = 0, child_status = 0;
- shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
- if (((long) shdw_addr) <= 0) {
printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
return false;
- }
- write_addr = (unsigned long *) shdw_addr;
- pid = fork();
- /* no child was created, return false */
- if (pid == -1)
return false;
- /*
* try to perform a store from child on shadow stack memory
* it should result in SIGSEGV
*/
- if (!pid) {
/* below write must lead to SIGSEGV */
*write_addr = 0xdeadbeef;
- } else {
wait(&child_status);
- }
- /* test fail, if 0xdeadbeef present on shadow stack address */
- if (*write_addr == 0xdeadbeef) {
printf("write suceeded\n");
return false;
- }
- /* if child reached here, then fail */
- if (!pid) {
printf("child reached unreachable state\n");
return false;
- }
- /* if child exited via signal handler but not for write on ss */
- if (WIFEXITED(child_status) &&
WEXITSTATUS(child_status) != CHILD_EXIT_CODE_SSWRITE) {
printf("child wasn't signaled for write on shadow stack\n");
return false;
- }
- ret = munmap(write_addr, SHADOW_STACK_ALLOC_SIZE);
- if (ret) {
printf("munmap failed with error code %d\n", ret);
return false;
- }
- return true;
+}
+#define SS_MAGIC_WRITE_VAL 0xbeefdead
+int gup_tests(int mem_fd, unsigned long *shdw_addr) +{
- unsigned long val = 0;
- lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET);
- if (read(mem_fd, &val, sizeof(val)) < 0) {
printf("reading shadow stack mem via gup failed\n");
return 1;
- }
- val = SS_MAGIC_WRITE_VAL;
- lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET);
- if (write(mem_fd, &val, sizeof(val)) < 0) {
printf("writing shadow stack mem via gup failed\n");
return 1;
- }
- if (*shdw_addr != SS_MAGIC_WRITE_VAL) {
printf("GUP write to shadow stack memory didn't happen\n");
return 1;
- }
- return 0;
+}
+bool shadow_stack_gup_tests(unsigned long test_num, void *ctx) +{
- unsigned long shdw_addr = 0;
- unsigned long *write_addr = NULL;
- int fd = 0;
- bool ret = false;
- shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
- if (((long) shdw_addr) <= 0) {
printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
return false;
- }
- write_addr = (unsigned long *) shdw_addr;
- fd = open("/proc/self/mem", O_RDWR);
- if (fd == -1)
return false;
- if (gup_tests(fd, write_addr)) {
printf("gup tests failed\n");
goto out;
- }
- ret = true;
+out:
- if (shdw_addr && munmap(write_addr, SHADOW_STACK_ALLOC_SIZE)) {
printf("munmap failed with error code %d\n", ret);
ret = false;
- }
- return ret;
+}
+volatile bool break_loop;
+void sigusr1_handler(int signo) +{
- printf("In sigusr1 handler\n");
- break_loop = true;
+}
+bool sigusr1_signal_test(void) +{
- struct sigaction sa = {};
- sa.sa_handler = sigusr1_handler;
- sa.sa_flags = 0;
- sigemptyset(&sa.sa_mask);
- if (sigaction(SIGUSR1, &sa, NULL)) {
printf("registering signal handler for SIGUSR1 failed\n");
return false;
- }
- return true;
+} +/*
- shadow stack signal test. shadow stack must be enabled.
- register a signal, fork another thread which is waiting
- on signal. Send a signal from parent to child, verify
- that signal was received by child. If not test fails
- */
+bool shadow_stack_signal_test(unsigned long test_num, void *ctx) +{
- int pid = 0, child_status = 0, ret = 0;
- unsigned long ss_status = 0;
- ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
- if (ret) {
printf("shadow stack get status prctl failed with errorcode %d\n", ret);
return false;
- }
- if (!(ss_status & PR_SHADOW_STACK_ENABLE))
ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
- /* this should be caught by signal handler and do an exit */
- if (!sigusr1_signal_test()) {
printf("registering sigusr1 handler failed\n");
exit(-1);
- }
- pid = fork();
- if (pid == -1) {
printf("signal test: fork failed\n");
goto out;
- }
- if (pid == 0) {
while (!break_loop)
sleep(1);
exit(11);
/* child shouldn't go beyond here */
- }
- /* send SIGUSR1 to child */
- kill(pid, SIGUSR1);
- wait(&child_status);
+out:
- return (WIFEXITED(child_status) &&
WEXITSTATUS(child_status) == 11);
+}
+int execute_shadow_stack_tests(void) +{
- int ret = 0;
- unsigned long test_count = 0;
- unsigned long shstk_status = 0;
- printf("Executing RISC-V shadow stack self tests\n");
- ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &shstk_status, 0, 0, 0);
- if (ret != 0)
ksft_exit_skip("Get shadow stack status failed with %d\n", ret);
- /*
* If we are here that means get shadow stack status succeeded and
* thus shadow stack support is baked in the kernel.
*/
- while (test_count < ARRAY_SIZE(shstk_tests)) {
ksft_test_result((*shstk_tests[test_count].t_func)(test_count, NULL),
shstk_tests[test_count].name);
test_count++;
- }
- return 0;
+}
+#pragma GCC pop_options diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.h b/tools/testing/selftests/riscv/cfi/shadowstack.h new file mode 100644 index 000000000000..b43e74136a26 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/shadowstack.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef SELFTEST_SHADOWSTACK_TEST_H +#define SELFTEST_SHADOWSTACK_TEST_H +#include <stddef.h> +#include <linux/prctl.h>
+/*
- a cfi test returns true for success or false for fail
- takes a number for test number to index into array and void pointer.
- */
+typedef bool (*shstk_test_func)(unsigned long test_num, void *);
+struct shadow_stack_tests {
- char *name;
- shstk_test_func t_func;
+};
+bool shadow_stack_fork_test(unsigned long test_num, void *ctx); +bool shadow_stack_map_test(unsigned long test_num, void *ctx); +bool shadow_stack_protection_test(unsigned long test_num, void *ctx); +bool shadow_stack_gup_tests(unsigned long test_num, void *ctx); +bool shadow_stack_signal_test(unsigned long test_num, void *ctx);
+static struct shadow_stack_tests shstk_tests[] = {
- { "shstk fork test\n", shadow_stack_fork_test },
- { "map shadow stack syscall\n", shadow_stack_map_test },
- { "shadow stack gup tests\n", shadow_stack_gup_tests },
- { "shadow stack signal tests\n", shadow_stack_signal_test},
- { "memory protections of shadow stack memory\n", shadow_stack_protection_test }
+};
+#define RISCV_SHADOW_STACK_TESTS ARRAY_SIZE(shstk_tests)
+int execute_shadow_stack_tests(void);
+#endif
2.43.2
On Thu, May 09, 2024 at 11:21:15AM -0700, Charlie Jenkins wrote:
On Wed, Apr 03, 2024 at 04:35:17PM -0700, Deepak Gupta wrote:
+int main(int argc, char *argv[]) +{
- int ret = 0;
- unsigned long lpad_status = 0, ss_status = 0;
- ksft_print_header();
- ksft_set_plan(RISCV_CFI_SELFTEST_COUNT);
- ksft_print_msg("starting risc-v tests\n");
- /*
* Landing pad test. Not a lot of kernel changes to support landing
* pad for user mode except lighting up a bit in senvcfg via a prctl
* Enable landing pad through out the execution of test binary
*/
- ret = my_syscall5(__NR_prctl, PR_GET_INDIR_BR_LP_STATUS, &lpad_status, 0, 0, 0);
There is an assumption here that the libc supports setting INDIR_BR_LP_STATUS but does not support the standard prctl interface defined in <sys/prctl.h>. my_syscall5() is defined to fill in gaps in the libc, so this test case should also set the status manually rather than relying on the libc.
I don't think it's necessary to define my_syscall5() since every libc should have a prctl() definition. However, these CFI prctls are very new and glibc does not yet support (correct me if I am wrong) it so these prctls should be enabled by the test cases.
In one of my previous patches, it was setting landing pad and shadow stack enabling directly via handcrafted prctl macro. I changed it to check for status for following reasons
- If this binary is compiled with landing pad and shadow stack option then toolchain being used already has libc with shadow stack and landing pad enabling
- Currently upstream glibc toolchain dont have support but libc with toolchain has the support.
In case of shadow stack enabling, macro is needed and `prctl` function can't be used. Because you enter `prctl` function with no shadow stack but exit with shadow stack and will lead to fault in its epilog.
Due to all these reasons, kselftests have to be compiled with toolchain with cfi codegen and thus libc should have support to light them up. Here tests only checks if they are already lit up, If not it fails.
Although you're spot on one thing here, since this test is assuming libc already lit-up landing pad and shadow stack. It doesn't need macro here for status check of feature and can simply use `prctl` syscall interface.
- Charlie
- if (ret)
On Wed, Apr 03, 2024 at 04:35:17PM -0700, Deepak Gupta wrote:
Adds kselftest for RISC-V control flow integrity implementation for user mode. There is not a lot going on in kernel for enabling landing pad for user mode. cfi selftest are intended to be compiled with zicfilp and zicfiss enabled compiler. Thus kselftest simply checks if landing pad and shadow stack for the binary and process are enabled or not. selftest then register a signal handler for SIGSEGV. Any control flow violation are reported as SIGSEGV with si_code = SEGV_CPERR. Test will fail on recieving any SEGV_CPERR. Shadow stack part has more changes in kernel and thus there are separate tests for that
- Exercise `map_shadow_stack` syscall
- `fork` test to make sure COW works for shadow stack pages
- gup tests As of today kernel uses FOLL_FORCE when access happens to memory via /proc/<pid>/mem. Not breaking that for shadow stack
- signal test. Make sure signal delivery results in token creation on shadow stack and consumes (and verifies) token on sigreturn
- shadow stack protection test. attempts to write using regular store
instruction on shadow stack memory must result in access faults
Signed-off-by: Deepak Gupta debug@rivosinc.com
tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 10 + .../testing/selftests/riscv/cfi/cfi_rv_test.h | 83 ++++ .../selftests/riscv/cfi/riscv_cfi_test.c | 82 ++++ .../testing/selftests/riscv/cfi/shadowstack.c | 362 ++++++++++++++++++ .../testing/selftests/riscv/cfi/shadowstack.h | 37 ++ 7 files changed, 578 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/riscv/cfi/.gitignore create mode 100644 tools/testing/selftests/riscv/cfi/Makefile create mode 100644 tools/testing/selftests/riscv/cfi/cfi_rv_test.h create mode 100644 tools/testing/selftests/riscv/cfi/riscv_cfi_test.c create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.c create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.h
diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile index 4a9ff515a3a0..867e5875b7ce 100644 --- a/tools/testing/selftests/riscv/Makefile +++ b/tools/testing/selftests/riscv/Makefile @@ -5,7 +5,7 @@ ARCH ?= $(shell uname -m 2>/dev/null || echo not) ifneq (,$(filter $(ARCH),riscv)) -RISCV_SUBTARGETS ?= hwprobe vector mm +RISCV_SUBTARGETS ?= hwprobe vector mm cfi else RISCV_SUBTARGETS := endif diff --git a/tools/testing/selftests/riscv/cfi/.gitignore b/tools/testing/selftests/riscv/cfi/.gitignore new file mode 100644 index 000000000000..ce7623f9da28 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/.gitignore @@ -0,0 +1,3 @@ +cfitests +riscv_cfi_test +shadowstack \ No newline at end of file diff --git a/tools/testing/selftests/riscv/cfi/Makefile b/tools/testing/selftests/riscv/cfi/Makefile new file mode 100644 index 000000000000..b65f7ff38a32 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/Makefile @@ -0,0 +1,10 @@ +CFLAGS += -I$(top_srcdir)/tools/include
+CFLAGS += -march=rv64gc_zicfilp_zicfiss
+TEST_GEN_PROGS := cfitests
+include ../../lib.mk
+$(OUTPUT)/cfitests: riscv_cfi_test.c shadowstack.c
- $(CC) -o$@ $(CFLAGS) $(LDFLAGS) $^
diff --git a/tools/testing/selftests/riscv/cfi/cfi_rv_test.h b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h new file mode 100644 index 000000000000..fa1cf7183672 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef SELFTEST_RISCV_CFI_H +#define SELFTEST_RISCV_CFI_H +#include <stddef.h> +#include <sys/types.h> +#include "shadowstack.h"
+#define RISCV_CFI_SELFTEST_COUNT RISCV_SHADOW_STACK_TESTS
+#define CHILD_EXIT_CODE_SSWRITE 10 +#define CHILD_EXIT_CODE_SIG_TEST 11
+#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \ +({ \
- register long _num __asm__ ("a7") = (num); \
- register long _arg1 __asm__ ("a0") = (long)(arg1); \
- register long _arg2 __asm__ ("a1") = (long)(arg2); \
- register long _arg3 __asm__ ("a2") = (long)(arg3); \
- register long _arg4 __asm__ ("a3") = (long)(arg4); \
- register long _arg5 __asm__ ("a4") = (long)(arg5); \
\
- __asm__ volatile ( \
"ecall\n" \
: "+r"(_arg1) \
: "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
"r"(_num) \
: "memory", "cc" \
- ); \
- _arg1; \
+})
+#define my_syscall3(num, arg1, arg2, arg3) \ +({ \
- register long _num __asm__ ("a7") = (num); \
- register long _arg1 __asm__ ("a0") = (long)(arg1); \
- register long _arg2 __asm__ ("a1") = (long)(arg2); \
- register long _arg3 __asm__ ("a2") = (long)(arg3); \
\
- __asm__ volatile ( \
"ecall\n" \
: "+r"(_arg1) \
: "r"(_arg2), "r"(_arg3), \
"r"(_num) \
: "memory", "cc" \
- ); \
- _arg1; \
+})
+#ifndef __NR_prctl +#define __NR_prctl 167 +#endif
+#ifndef __NR_map_shadow_stack +#define __NR_map_shadow_stack 453 +#endif
+#define CSR_SSP 0x011
+#ifdef __ASSEMBLY__ +#define __ASM_STR(x) x +#else +#define __ASM_STR(x) #x +#endif
+#define csr_read(csr) \ +({ \
- register unsigned long __v; \
- __asm__ __volatile__ ("csrr %0, " __ASM_STR(csr) \
: "=r" (__v) : \
: "memory"); \
- __v; \
+})
+#define csr_write(csr, val) \ +({ \
- unsigned long __v = (unsigned long) (val); \
- __asm__ __volatile__ ("csrw " __ASM_STR(csr) ", %0" \
: : "rK" (__v) \
: "memory"); \
+})
+#endif diff --git a/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c new file mode 100644 index 000000000000..f22b3f0f24de --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0-only
+#include "../../kselftest.h" +#include <signal.h> +#include <asm/ucontext.h> +#include <linux/prctl.h> +#include "cfi_rv_test.h"
+/* do not optimize cfi related test functions */ +#pragma GCC push_options +#pragma GCC optimize("O0")
+void sigsegv_handler(int signum, siginfo_t *si, void *uc) +{
- struct ucontext *ctx = (struct ucontext *) uc;
- if (si->si_code == SEGV_CPERR) {
printf("Control flow violation happened somewhere\n");
printf("pc where violation happened %lx\n", ctx->uc_mcontext.gregs[0]);
exit(-1);
- }
- printf("In sigsegv handler\n");
- /* all other cases are expected to be of shadow stack write case */
- exit(CHILD_EXIT_CODE_SSWRITE);
+}
+bool register_signal_handler(void) +{
- struct sigaction sa = {};
- sa.sa_sigaction = sigsegv_handler;
- sa.sa_flags = SA_SIGINFO;
- if (sigaction(SIGSEGV, &sa, NULL)) {
printf("registering signal handler for landing pad violation failed\n");
return false;
- }
- return true;
+}
+int main(int argc, char *argv[]) +{
- int ret = 0;
- unsigned long lpad_status = 0, ss_status = 0;
- ksft_print_header();
- ksft_set_plan(RISCV_CFI_SELFTEST_COUNT);
- ksft_print_msg("starting risc-v tests\n");
- /*
* Landing pad test. Not a lot of kernel changes to support landing
* pad for user mode except lighting up a bit in senvcfg via a prctl
* Enable landing pad through out the execution of test binary
*/
- ret = my_syscall5(__NR_prctl, PR_GET_INDIR_BR_LP_STATUS, &lpad_status, 0, 0, 0);
- if (ret)
ksft_exit_skip("Get landing pad status failed with %d\n", ret);
- if (!(lpad_status & PR_INDIR_BR_LP_ENABLE))
ksft_exit_skip("landing pad is not enabled, should be enabled via glibc\n");
- ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
- if (ret)
ksft_exit_skip("Get shadow stack failed with %d\n", ret);
- if (!(ss_status & PR_SHADOW_STACK_ENABLE))
ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
- if (!register_signal_handler())
ksft_exit_skip("registering signal handler for SIGSEGV failed\n");
- ksft_print_msg("landing pad and shadow stack are enabled for binary\n");
- ksft_print_msg("starting risc-v shadow stack tests\n");
- execute_shadow_stack_tests();
- ksft_finished();
The test case framework is based off of static variables, so these tests actually report that nothing passed because the setup is in this file and the actual test cases are in a different file. This can be remedied by moving ksft_set_plan(RISCV_CFI_SELFTEST_COUNT) and ksft_finished() into execute_shadow_stack_tests().
There are two versions of the kselftest framework and the one that this is using is the low-level version that has the note in the header:
kselftest.h: low-level kselftest framework to include from selftest programs. When possible, please use kselftest_harness.h instead.
There is not a good enough reason for you to change up this code to use kselftest_harness.h instead, but just something to think about for any future test cases you may write.
- Charlie
+}
+#pragma GCC pop_options diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.c b/tools/testing/selftests/riscv/cfi/shadowstack.c new file mode 100644 index 000000000000..2f65eb970c44 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/shadowstack.c @@ -0,0 +1,362 @@ +// SPDX-License-Identifier: GPL-2.0-only
+#include "../../kselftest.h" +#include <sys/wait.h> +#include <signal.h> +#include <fcntl.h> +#include <asm-generic/unistd.h> +#include <sys/mman.h> +#include "shadowstack.h" +#include "cfi_rv_test.h"
+/* do not optimize shadow stack related test functions */ +#pragma GCC push_options +#pragma GCC optimize("O0")
+void zar(void) +{
- unsigned long ssp = 0;
- ssp = csr_read(CSR_SSP);
- printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp);
+}
+void bar(void) +{
- printf("inside %s\n", __func__);
- zar();
+}
+void foo(void) +{
- printf("inside %s\n", __func__);
- bar();
+}
+void zar_child(void) +{
- unsigned long ssp = 0;
- ssp = csr_read(CSR_SSP);
- printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp);
+}
+void bar_child(void) +{
- printf("inside %s\n", __func__);
- zar_child();
+}
+void foo_child(void) +{
- printf("inside %s\n", __func__);
- bar_child();
+}
+typedef void (call_func_ptr)(void); +/*
- call couple of functions to test push pop.
- */
+int shadow_stack_call_tests(call_func_ptr fn_ptr, bool parent) +{
- if (parent)
printf("call test for parent\n");
- else
printf("call test for child\n");
- (fn_ptr)();
- return 0;
+}
+/* forks a thread, and ensure shadow stacks fork out */ +bool shadow_stack_fork_test(unsigned long test_num, void *ctx) +{
- int pid = 0, child_status = 0, parent_pid = 0, ret = 0;
- unsigned long ss_status = 0;
- printf("exercising shadow stack fork test\n");
- ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
- if (ret) {
printf("shadow stack get status prctl failed with errorcode %d\n", ret);
return false;
- }
- if (!(ss_status & PR_SHADOW_STACK_ENABLE))
ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
- parent_pid = getpid();
- pid = fork();
- if (pid) {
printf("Parent pid %d and child pid %d\n", parent_pid, pid);
shadow_stack_call_tests(&foo, true);
- } else
shadow_stack_call_tests(&foo_child, false);
- if (pid) {
printf("waiting on child to finish\n");
wait(&child_status);
- } else {
/* exit child gracefully */
exit(0);
- }
- if (pid && WIFSIGNALED(child_status)) {
printf("child faulted");
return false;
- }
- return true;
+}
+/* exercise `map_shadow_stack`, pivot to it and call some functions to ensure it works */ +#define SHADOW_STACK_ALLOC_SIZE 4096 +bool shadow_stack_map_test(unsigned long test_num, void *ctx) +{
- unsigned long shdw_addr;
- int ret = 0;
- shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
- if (((long) shdw_addr) <= 0) {
printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
return false;
- }
- ret = munmap((void *) shdw_addr, SHADOW_STACK_ALLOC_SIZE);
- if (ret) {
printf("munmap failed with error code %d\n", ret);
return false;
- }
- return true;
+}
+/*
- shadow stack protection tests. map a shadow stack and
- validate all memory protections work on it
- */
+bool shadow_stack_protection_test(unsigned long test_num, void *ctx) +{
- unsigned long shdw_addr;
- unsigned long *write_addr = NULL;
- int ret = 0, pid = 0, child_status = 0;
- shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
- if (((long) shdw_addr) <= 0) {
printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
return false;
- }
- write_addr = (unsigned long *) shdw_addr;
- pid = fork();
- /* no child was created, return false */
- if (pid == -1)
return false;
- /*
* try to perform a store from child on shadow stack memory
* it should result in SIGSEGV
*/
- if (!pid) {
/* below write must lead to SIGSEGV */
*write_addr = 0xdeadbeef;
- } else {
wait(&child_status);
- }
- /* test fail, if 0xdeadbeef present on shadow stack address */
- if (*write_addr == 0xdeadbeef) {
printf("write suceeded\n");
return false;
- }
- /* if child reached here, then fail */
- if (!pid) {
printf("child reached unreachable state\n");
return false;
- }
- /* if child exited via signal handler but not for write on ss */
- if (WIFEXITED(child_status) &&
WEXITSTATUS(child_status) != CHILD_EXIT_CODE_SSWRITE) {
printf("child wasn't signaled for write on shadow stack\n");
return false;
- }
- ret = munmap(write_addr, SHADOW_STACK_ALLOC_SIZE);
- if (ret) {
printf("munmap failed with error code %d\n", ret);
return false;
- }
- return true;
+}
+#define SS_MAGIC_WRITE_VAL 0xbeefdead
+int gup_tests(int mem_fd, unsigned long *shdw_addr) +{
- unsigned long val = 0;
- lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET);
- if (read(mem_fd, &val, sizeof(val)) < 0) {
printf("reading shadow stack mem via gup failed\n");
return 1;
- }
- val = SS_MAGIC_WRITE_VAL;
- lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET);
- if (write(mem_fd, &val, sizeof(val)) < 0) {
printf("writing shadow stack mem via gup failed\n");
return 1;
- }
- if (*shdw_addr != SS_MAGIC_WRITE_VAL) {
printf("GUP write to shadow stack memory didn't happen\n");
return 1;
- }
- return 0;
+}
+bool shadow_stack_gup_tests(unsigned long test_num, void *ctx) +{
- unsigned long shdw_addr = 0;
- unsigned long *write_addr = NULL;
- int fd = 0;
- bool ret = false;
- shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
- if (((long) shdw_addr) <= 0) {
printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
return false;
- }
- write_addr = (unsigned long *) shdw_addr;
- fd = open("/proc/self/mem", O_RDWR);
- if (fd == -1)
return false;
- if (gup_tests(fd, write_addr)) {
printf("gup tests failed\n");
goto out;
- }
- ret = true;
+out:
- if (shdw_addr && munmap(write_addr, SHADOW_STACK_ALLOC_SIZE)) {
printf("munmap failed with error code %d\n", ret);
ret = false;
- }
- return ret;
+}
+volatile bool break_loop;
+void sigusr1_handler(int signo) +{
- printf("In sigusr1 handler\n");
- break_loop = true;
+}
+bool sigusr1_signal_test(void) +{
- struct sigaction sa = {};
- sa.sa_handler = sigusr1_handler;
- sa.sa_flags = 0;
- sigemptyset(&sa.sa_mask);
- if (sigaction(SIGUSR1, &sa, NULL)) {
printf("registering signal handler for SIGUSR1 failed\n");
return false;
- }
- return true;
+} +/*
- shadow stack signal test. shadow stack must be enabled.
- register a signal, fork another thread which is waiting
- on signal. Send a signal from parent to child, verify
- that signal was received by child. If not test fails
- */
+bool shadow_stack_signal_test(unsigned long test_num, void *ctx) +{
- int pid = 0, child_status = 0, ret = 0;
- unsigned long ss_status = 0;
- ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
- if (ret) {
printf("shadow stack get status prctl failed with errorcode %d\n", ret);
return false;
- }
- if (!(ss_status & PR_SHADOW_STACK_ENABLE))
ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
- /* this should be caught by signal handler and do an exit */
- if (!sigusr1_signal_test()) {
printf("registering sigusr1 handler failed\n");
exit(-1);
- }
- pid = fork();
- if (pid == -1) {
printf("signal test: fork failed\n");
goto out;
- }
- if (pid == 0) {
while (!break_loop)
sleep(1);
exit(11);
/* child shouldn't go beyond here */
- }
- /* send SIGUSR1 to child */
- kill(pid, SIGUSR1);
- wait(&child_status);
+out:
- return (WIFEXITED(child_status) &&
WEXITSTATUS(child_status) == 11);
+}
+int execute_shadow_stack_tests(void) +{
- int ret = 0;
- unsigned long test_count = 0;
- unsigned long shstk_status = 0;
- printf("Executing RISC-V shadow stack self tests\n");
- ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &shstk_status, 0, 0, 0);
- if (ret != 0)
ksft_exit_skip("Get shadow stack status failed with %d\n", ret);
- /*
* If we are here that means get shadow stack status succeeded and
* thus shadow stack support is baked in the kernel.
*/
- while (test_count < ARRAY_SIZE(shstk_tests)) {
ksft_test_result((*shstk_tests[test_count].t_func)(test_count, NULL),
shstk_tests[test_count].name);
test_count++;
- }
- return 0;
+}
+#pragma GCC pop_options diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.h b/tools/testing/selftests/riscv/cfi/shadowstack.h new file mode 100644 index 000000000000..b43e74136a26 --- /dev/null +++ b/tools/testing/selftests/riscv/cfi/shadowstack.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef SELFTEST_SHADOWSTACK_TEST_H +#define SELFTEST_SHADOWSTACK_TEST_H +#include <stddef.h> +#include <linux/prctl.h>
+/*
- a cfi test returns true for success or false for fail
- takes a number for test number to index into array and void pointer.
- */
+typedef bool (*shstk_test_func)(unsigned long test_num, void *);
+struct shadow_stack_tests {
- char *name;
- shstk_test_func t_func;
+};
+bool shadow_stack_fork_test(unsigned long test_num, void *ctx); +bool shadow_stack_map_test(unsigned long test_num, void *ctx); +bool shadow_stack_protection_test(unsigned long test_num, void *ctx); +bool shadow_stack_gup_tests(unsigned long test_num, void *ctx); +bool shadow_stack_signal_test(unsigned long test_num, void *ctx);
+static struct shadow_stack_tests shstk_tests[] = {
- { "shstk fork test\n", shadow_stack_fork_test },
- { "map shadow stack syscall\n", shadow_stack_map_test },
- { "shadow stack gup tests\n", shadow_stack_gup_tests },
- { "shadow stack signal tests\n", shadow_stack_signal_test},
- { "memory protections of shadow stack memory\n", shadow_stack_protection_test }
+};
+#define RISCV_SHADOW_STACK_TESTS ARRAY_SIZE(shstk_tests)
+int execute_shadow_stack_tests(void);
+#endif
2.43.2
On Wed, Apr 03, 2024 at 04:34:48PM -0700, Deepak Gupta wrote:
Sending out v3 for cpu assisted riscv user mode control flow integrity.
v2 [9] was sent a week ago for this riscv usermode control flow integrity enabling. RFC patchset was (v1) early this year (January) [7].
changes in v3
envcfg: logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series.
dt-bindings: As suggested, split into separate commit. fixed the messaging that spec is in public review
arch_is_shadow_stack change: arch_is_shadow_stack changed to vma_is_shadow_stack
hwprobe: zicfiss / zicfilp if present will get enumerated in hwprobe
selftests: As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit.
changes in v2
As part of testing effort, compiled a rootfs with shadow stack and landing pad enabled (libraries and binaries) and booted to shell. As part of long running tests, I have been able to run some spec 2006 benchmarks [8] (here link is provided only for list of benchmarks that were tested for long running tests, excel sheet provided here actually is for some static stats like code size growth on spec binaries). Thus converting from RFC to regular patchset.
Securing control-flow integrity for usermode requires following
- Securing forward control flow : All callsites must reach reach a target that they actually intend to reach. - Securing backward control flow : All function returns must return to location where they were called from.
This patch series use riscv cpu extension `zicfilp` [2] to secure forward control flow and `zicfiss` [2] to secure backward control flow. `zicfilp` enforces that all indirect calls or jmps must land on a landing pad instr and label embedded in landing pad instr must match a value programmed in `x7` register (at callsite via compiler). `zicfiss` introduces shadow stack which can only be writeable via shadow stack instructions (sspush and ssamoswap) and thus can't be tampered with via inadvertent stores. More details about extension can be read from [2] and there are details in documentation as well (in this patch series).
Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel.
Enabling of control flow integrity for user programs is left to user runtime (specifically expected from dynamic loader). There has been a lot of earlier discussion on the enabling topic around x86 shadow stack enabling [3, 4, 5] and overall consensus had been to let dynamic loader (or usermode) to decide for enabling the feature.
This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. arm64 is expected to implement shadow stack part of these arch agnostic `prctls` [6]
Changes since last time
Spec changes
Forward cfi spec has become much simpler. `lpad` instruction is pseudo for `auipc rd, <20bit_imm>`. `lpad` checks x7 against 20bit embedded in instr. Thus label width is 20bit.
Shadow stack management instructions are reduced to sspush - to push x1/x5 on shadow stack sspopchk - pops from shadow stack and comapres with x1/x5. ssamoswap - atomically swap value on shadow stack. rdssp - reads current shadow stack pointer
Shadow stack accesses on readonly memory always raise AMO/store page fault. `sspopchk` is load but if underlying page is readonly, it'll raise a store page fault. It simplifies hardware and kernel for COW handling for shadow stack pages.
riscv defines a new exception type `software check exception` and control flow violations raise software check exception.
enabling controls for shadow stack and landing are in xenvcfg CSR and controls lower privilege mode enabling. As an example senvcfg controls enabling for U and menvcfg controls enabling for S mode.
core mm shadow stack enabling
Shadow stack for x86 usermode are now in mainline and thus this patch series builds on top of that for arch-agnostic mm related changes. Big thanks and shout out to Rick Edgecombe for that.
selftests
Created some minimal selftests to test the patch series.
[1] - https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/ [2] - https://github.com/riscv/riscv-cfi [3] - https://lore.kernel.org/lkml/ZWHcBq0bJ+15eeKs@finisterre.sirena.org.uk/T/#mb... [4] - https://lore.kernel.org/all/20220130211838.8382-1-rick.p.edgecombe@intel.com... [5] - https://lore.kernel.org/lkml/CAHk-=wgP5mk3poVeejw16Asbid0ghDt4okHnWaWKLBkRhQ... [6] - https://lore.kernel.org/linux-mm/20231122-arm64-gcs-v7-2-201c483bd775@kernel... [7] - https://lore.kernel.org/lkml/20240125062739.1339782-1-debug@rivosinc.com/ [8] - https://docs.google.com/spreadsheets/d/1_cHGH4ctNVvFRiS7hW9dEGKtXLAJ3aX4Z_iT... [9] - https://lore.kernel.org/lkml/20240329044459.3990638-1-debug@rivosinc.com/
This is a note for people wanting to test this series.
1. Need a toolchain that has CFI support
$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc)
2. QEMU
$ git clone git@github.com:deepak0414/qemu.git -b zicfilp_zicfiss_mar24_spec_v8.1.1 $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc)
3. OpenSBI
$ git clone git@github.com:deepak0414/opensbi.git -b cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
4. Linux
Running defconfig is fine. CFI is enabled by default if the toolchain supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
5. Running
Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true
- Charlie
linux-kselftest-mirror@lists.linaro.org