When SME was initially merged we did not add support for TPIDR2_EL0 to
the ptrace interface, creating difficulties for debuggers in accessing
lazy save state for ZA. This series implements that support, extending
the existing NT_ARM_TLS regset to support the register when available,
and adds kselftest coverage for the existing and new NT_ARM_TLS
functionality.
Existing programs that query the size of the register set will be able
to observe the increased size of the register set. Programs that assume
the register set is single register will see no change. On systems that
do not support SME TPIDR2_EL0 will read as 0 and writes will be ignored,
support for SME should be queried via hwcaps as normal.
v4:
- Rebase onto v6.0-rc3.
v3:
- Fix copyright date on test program.
v2:
- Rebase onto v6.0-rc1.
Mark Brown (4):
kselftest/arm64: Add test coverage for NT_ARM_TLS
arm64/ptrace: Document extension of NT_ARM_TLS to cover TPIDR2_EL0
arm64/ptrace: Support access to TPIDR2_EL0
kselftest/arm64: Add coverage of TPIDR2_EL0 ptrace interface
Documentation/arm64/sme.rst | 3 +
arch/arm64/kernel/ptrace.c | 25 +-
tools/testing/selftests/arm64/abi/.gitignore | 1 +
tools/testing/selftests/arm64/abi/Makefile | 2 +-
tools/testing/selftests/arm64/abi/ptrace.c | 241 +++++++++++++++++++
5 files changed, 266 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/arm64/abi/ptrace.c
base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
--
2.30.2
From: Kyle Huey <me(a)kylehuey.com>
When management of the PKRU register was moved away from XSTATE, emulation
of PKRU's existence in XSTATE was added for reading PKRU through ptrace,
but not for writing PKRU through ptrace. This can be seen by running gdb
and executing `p $pkru`, `set $pkru = 42`, and `p $pkru`. On affected
kernels (5.14+) the write to the PKRU register (which gdb performs through
ptrace) is ignored.
There are three APIs that write PKRU: sigreturn, PTRACE_SETREGSET with
NT_X86_XSTATE, and KVM_SET_XSAVE. sigreturn still uses XRSTOR to write to
PKRU. KVM_SET_XSAVE has its own special handling to make PKRU writes take
effect (in fpu_copy_uabi_to_guest_fpstate). Push that down into
copy_uabi_to_xstate and have PTRACE_SETREGSET with NT_X86_XSTATE pass in
a pointer to the appropriate PKRU slot. copy_sigframe_from_user_to_xstate
depends on copy_uabi_to_xstate populating the PKRU field in the task's
XSTATE so that __fpu_restore_sig can do a XRSTOR from it, so continue doing
that.
This also adds code to initialize the PKRU value to the hardware init value
(namely 0) if the PKRU bit is not set in the XSTATE header provided to
ptrace, to match XRSTOR.
Fixes: e84ba47e313d ("x86/fpu: Hook up PKRU into ptrace()")
Signed-off-by: Kyle Huey <me(a)kylehuey.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: stable(a)vger.kernel.org # 5.14+
---
arch/x86/kernel/fpu/core.c | 20 +++++++++-----------
arch/x86/kernel/fpu/regset.c | 2 +-
arch/x86/kernel/fpu/signal.c | 2 +-
arch/x86/kernel/fpu/xstate.c | 25 ++++++++++++++++++++-----
arch/x86/kernel/fpu/xstate.h | 4 ++--
5 files changed, 33 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3b28c5b25e12..c273669e8a00 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -391,8 +391,6 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
{
struct fpstate *kstate = gfpu->fpstate;
const union fpregs_state *ustate = buf;
- struct pkru_state *xpkru;
- int ret;
if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) {
if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE)
@@ -406,16 +404,16 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;
- ret = copy_uabi_from_kernel_to_xstate(kstate, ustate);
- if (ret)
- return ret;
+ /*
+ * Nullify @vpkru to preserve its current value if PKRU's bit isn't set
+ * in the header. KVM's odd ABI is to leave PKRU untouched in this
+ * case (all other components are eventually re-initialized).
+ * (Not clear that this is actually necessary for compat).
+ */
+ if (!(ustate->xsave.header.xfeatures & XFEATURE_MASK_PKRU))
+ vpkru = NULL;
- /* Retrieve PKRU if not in init state */
- if (kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU) {
- xpkru = get_xsave_addr(&kstate->regs.xsave, XFEATURE_PKRU);
- *vpkru = xpkru->pkru;
- }
- return 0;
+ return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru);
}
EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate);
#endif /* CONFIG_KVM */
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 75ffaef8c299..6d056b68f4ed 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -167,7 +167,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
}
fpu_force_restore(fpu);
- ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf);
+ ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf, &target->thread.pkru);
out:
vfree(tmpbuf);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 91d4b6de58ab..558076dbde5b 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -396,7 +396,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
fpregs = &fpu->fpstate->regs;
if (use_xsave() && !fx_only) {
- if (copy_sigframe_from_user_to_xstate(fpu->fpstate, buf_fx))
+ if (copy_sigframe_from_user_to_xstate(tsk, buf_fx))
return false;
} else {
if (__copy_from_user(&fpregs->fxsave, buf_fx,
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8340156bfd2..8f14981a3936 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1197,7 +1197,7 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
- const void __user *ubuf)
+ const void __user *ubuf, u32 *pkru)
{
struct xregs_state *xsave = &fpstate->regs.xsave;
unsigned int offset, size;
@@ -1246,6 +1246,21 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
}
}
+ /*
+ * Update the user protection key storage. Allow KVM to
+ * pass in a NULL pkru pointer if the mask bit is unset
+ * for its legacy ABI behavior.
+ */
+ if (pkru)
+ *pkru = 0;
+
+ if (hdr.xfeatures & XFEATURE_MASK_PKRU) {
+ struct pkru_state *xpkru;
+
+ xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU);
+ *pkru = xpkru->pkru;
+ }
+
/*
* The state that came in from userspace was user-state only.
* Mask all the user states out of 'xfeatures':
@@ -1264,9 +1279,9 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
* Convert from a ptrace standard-format kernel buffer to kernel XSAVE[S]
* format and copy to the target thread. Used by ptrace and KVM.
*/
-int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf)
+int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru)
{
- return copy_uabi_to_xstate(fpstate, kbuf, NULL);
+ return copy_uabi_to_xstate(fpstate, kbuf, NULL, pkru);
}
/*
@@ -1274,10 +1289,10 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf)
* XSAVE[S] format and copy to the target thread. This is called from the
* sigreturn() and rt_sigreturn() system calls.
*/
-int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate,
+int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
const void __user *ubuf)
{
- return copy_uabi_to_xstate(fpstate, NULL, ubuf);
+ return copy_uabi_to_xstate(tsk->thread.fpu.fpstate, NULL, ubuf, &tsk->thread.pkru);
}
static bool validate_independent_components(u64 mask)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 5ad47031383b..a4ecb04d8d64 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -46,8 +46,8 @@ extern void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
u32 pkru_val, enum xstate_copy_mode copy_mode);
extern void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode mode);
-extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf);
-extern int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate, const void __user *ubuf);
+extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru);
+extern int copy_sigframe_from_user_to_xstate(struct task_struct *tsk, const void __user *ubuf);
extern void fpu__init_cpu_xstate(void);
--
2.37.2
Changelog since v5:
- Avoids a second copy from the uabi buffer as suggested.
- Preserves old KVM_SET_XSAVE behavior where leaving the PKRU bit in the
XSTATE header results in PKRU remaining unchanged instead of
reinitializing it.
- Fixed up patch metadata as requested.
Changelog since v4:
- Selftest additionally checks PKRU readbacks through ptrace.
- Selftest flips all PKRU bits (except the default key).
Changelog since v3:
- The v3 patch is now part 1 of 2.
- Adds a selftest in part 2 of 2.
Changelog since v2:
- Removed now unused variables in fpu_copy_uabi_to_guest_fpstate
Changelog since v1:
- Handles the error case of copy_to_buffer().
Currently our SVE syscall ABI documentation does not reflect the actual
implemented ABI, it says that register state not shared with FPSIMD
becomes undefined on syscall when in reality we always clear it. Since
changing this would cause a change in the observed kernel behaviour
there is a substantial desire to avoid taking advantage of the
documented ABI so instead let's document what we actually do so it's
clear that it is in reality an ABI.
There has been some pushback on tightening the documentation in the past
but it is hard to see who that helps, it makes the implementation
decisions less clear and makes it harder for people to discover and make
use of the actual ABI. The main practical concern is that qemu's user
mode does not currently flush the registers.
v3:
- Rebase onto v6.0-rc3.
v2:
- Rebase onto v6.0-rc1.
Mark Brown (3):
kselftest/arm64: Correct buffer allocation for SVE Z registers
arm64/sve: Document our actual ABI for clearing registers on syscall
kselftest/arm64: Enforce actual ABI for SVE syscalls
Documentation/arm64/sve.rst | 2 +-
.../testing/selftests/arm64/abi/syscall-abi.c | 61 ++++++++++++-------
2 files changed, 41 insertions(+), 22 deletions(-)
base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
--
2.30.2
This series has a few small enhancements for the existing standalone
floating point stress tests and then builds on those with a kselftest
integrated program which gives those a very quick spin from within
kselftest, as well as having an option to set a custom timeout to allow
for use with longer soak testing. This makes it much easier to get
thorough testing of the floating point state management logic, rather
than requiring custom setup for coverage of the various vector lengths
in the system as is needed at present.
It might be nice in future to extend this to attach to some or all of
the test programs with ptrace and read/write their registers as another
means of potentially triggering race conditions or corruption but that's
definitely another step.
v2:
- Rebase onto v6.0-rc3.
- Announce the results of enumeration before we start everything.
Mark Brown (4):
kselftest/arm64: Always encourage preemption for za-test
kselftest/arm64: Count SIGUSR2 deliveries in FP stress tests
kselftest/arm64: Install signal handlers before output in FP stress
tests
kselftest/arm64: kselftest harness for FP stress tests
tools/testing/selftests/arm64/fp/.gitignore | 1 +
tools/testing/selftests/arm64/fp/Makefile | 5 +-
.../testing/selftests/arm64/fp/asm-offsets.h | 1 +
tools/testing/selftests/arm64/fp/fp-stress.c | 535 ++++++++++++++++++
.../testing/selftests/arm64/fp/fpsimd-test.S | 51 +-
tools/testing/selftests/arm64/fp/sve-test.S | 51 +-
tools/testing/selftests/arm64/fp/za-test.S | 58 +-
7 files changed, 641 insertions(+), 61 deletions(-)
create mode 100644 tools/testing/selftests/arm64/fp/fp-stress.c
base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
--
2.30.2
These patches improve the coverage of ZA signal contexts a bit, adding
some validation that the actual data is correct and covering the case
where ZA is not enabled.
v2:
- Rebase onto v6.0-rc3.
Mark Brown (2):
kselftest/arm64: Tighten up validation of ZA signal context
kselftest/arm64: Add a test for signal frames with ZA disabled
.../arm64/signal/testcases/za_no_regs.c | 119 ++++++++++++++++++
.../arm64/signal/testcases/za_regs.c | 16 ++-
2 files changed, 134 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/signal/testcases/za_no_regs.c
base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
--
2.30.2
The arm64 architecture originally made the signal context a fixed size
structure containing a linked list of records with the various kinds of
register and other state which may be present. When SVE was implemented
it was realised that it supported implementations with more state than
could fit in that structure so a new record type EXTRA_CONTEXT was
introduced allowing the signal context to be extended beyond the
original size. Unfortunately the signal handling tests can not cope with
these EXTRA_CONTEXT records at all - some support was implemented but it
simply never worked.
v2:
- Rebase onto v6.0-rc3
Mark Brown (10):
kselftest/arm64: Enumerate SME rather than SVE vector lengths for
za_regs
kselftest/arm64: Validate signal ucontext in place
kselftest/arm64: Fix validatation termination record after
EXTRA_CONTEXT
kselftest/arm64: Fix validation of EXTRA_CONTEXT signal context
location
kselftest/arm64: Remove unneeded protype for validate_extra_context()
kselftest/arm64: Only validate each signal context once
kselftest/arm64: Validate contents of EXTRA_CONTEXT blocks
kselftest/arm64: Preserve any EXTRA_CONTEXT in handle_signal_copyctx()
kselftest/arm64: Allow larger buffers in get_signal_context()
kselftest/arm64: Include larger SVE and SME VLs in signal tests
.../arm64/signal/test_signals_utils.c | 59 +++++++++++++++++--
.../arm64/signal/test_signals_utils.h | 5 +-
.../testcases/fake_sigreturn_bad_magic.c | 2 +-
.../testcases/fake_sigreturn_bad_size.c | 2 +-
.../fake_sigreturn_bad_size_for_magic0.c | 2 +-
.../fake_sigreturn_duplicated_fpsimd.c | 2 +-
.../testcases/fake_sigreturn_misaligned_sp.c | 2 +-
.../testcases/fake_sigreturn_missing_fpsimd.c | 2 +-
.../testcases/fake_sigreturn_sme_change_vl.c | 2 +-
.../testcases/fake_sigreturn_sve_change_vl.c | 2 +-
.../selftests/arm64/signal/testcases/sme_vl.c | 2 +-
.../arm64/signal/testcases/ssve_regs.c | 25 +++-----
.../arm64/signal/testcases/sve_regs.c | 23 +++-----
.../selftests/arm64/signal/testcases/sve_vl.c | 2 +-
.../arm64/signal/testcases/testcases.c | 48 +++++++++++----
.../arm64/signal/testcases/testcases.h | 9 ++-
.../arm64/signal/testcases/za_regs.c | 28 ++++-----
17 files changed, 137 insertions(+), 80 deletions(-)
base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
--
2.30.2
While user namespaces do not make the kernel more vulnerable, they are however
used to initiate exploits. Some users do not want to block namespace creation
for the entirety of the system, which some distributions provide. Instead, we
needed a way to have some applications be blocked, and others allowed. This is
not possible with those tools. Managing hierarchies also did not fit our case
because we're determining which tasks are allowed based on their attributes.
While exploring a solution, we first leveraged the LSM cred_prepare hook
because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds()
security_prepare_creds()
call_int_hook(cred_prepare, ...
if (cred)
create_user_ns(cred)
We noticed that error codes were not propagated from this hook and
introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for
MAC policies, and instead the hook is meant for LSM authors to prepare
credentials for mutation. [2]
Additionally, cred_prepare hook is not without problems. Handling the clone3
case is a bit more tricky due to the user space pointer passed to it. This
makes checking the syscall subject to a possible TOCTTOU attack.
Ultimately, we concluded that a better course of action is to introduce
a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function
and userns_create LSM hook, then marks the hook as sleepable in BPF. The
following patches after include a BPF test and a patch for an SELinux
implementation.
We want to encourage use of user namespaces, and also cater the needs
of users/administrators to observe and/or control access. There is no
expectation of an impact on user space applications because access control
is opt-in, and users wishing to observe within a LSM context
Links:
1. https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/
2. https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/
3. https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare…
Past discussions:
V4: https://lore.kernel.org/all/20220801180146.1157914-1-fred@cloudflare.com/
V3: https://lore.kernel.org/all/20220721172808.585539-1-fred@cloudflare.com/
V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/
V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/
Changes since v4:
- Update commit description
- Update cover letter
Changes since v3:
- Explicitly set CAP_SYS_ADMIN to test namespace is created given
permission
- Simplify BPF test to use sleepable hook only
- Prefer unshare() over clone() for tests
Changes since v2:
- Rename create_user_ns hook to userns_create
- Use user_namespace as an object opposed to a generic namespace object
- s/domB_t/domA_t in commit message
Changes since v1:
- Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
- Add selinux: Implement create_user_ns hook patch
- Change function signature of security_create_user_ns() to only take
struct cred
- Move security_create_user_ns() call after id mapping check in
create_user_ns()
- Update documentation to reflect changes
Frederick Lawler (4):
security, lsm: Introduce security_create_user_ns()
bpf-lsm: Make bpf_lsm_userns_create() sleepable
selftests/bpf: Add tests verifying bpf lsm userns_create hook
selinux: Implement userns_create hook
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 6 ++
kernel/bpf/bpf_lsm.c | 1 +
kernel/user_namespace.c | 5 +
security/security.c | 5 +
security/selinux/hooks.c | 9 ++
security/selinux/include/classmap.h | 2 +
.../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++
.../selftests/bpf/progs/test_deny_namespace.c | 33 ++++++
10 files changed, 168 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c
create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
--
2.30.2