=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer") 2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers") 3. 276e9327 ("arm64: entry: improve data abort handling of tagged pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be passed to syscalls when they point to memory ranges obtained by anonymous mmap() or sbrk() (see the patchset [3] for more details).
For non-memory syscalls this is done by untaging user pointers when the kernel performs pointer checking to find out whether the pointer comes from userspace (most notably in access_ok). The untagging is done only when the pointer is being checked, the tag is preserved as the pointer makes its way through the kernel and stays tagged when the kernel dereferences the pointer when perfoming user memory accesses.
The mmap and mremap (only new_addr) syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma.
Other memory syscalls (mprotect, etc.) don't do user memory accesses but rather deal with memory ranges, and untagged pointers are better suited to describe memory ranges internally. Thus for memory syscalls we untag pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to completely strip the pointer tag as the pointer enters the kernel with some kind of a syscall wrapper, but that won't work with the countless number of different ioctl calls. With this approach we would need a custom wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues is to inspead allow tagged pointers to be passed to find_vma() (and other vma related functions) and untag them there. Unfortunately, a lot of find_vma() callers then compare or subtract the returned vma start and end fields against the pointer that was being searched. Thus this approach would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static analyzer based on Clang) to track casts of __user pointers to integer types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call find_vma() (and other similar functions) or directly compare against vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature support [4].
This patchset has been merged into the Pixel 2 & 3 kernel trees and is now being used to enable testing of Pixel phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e...
[3] https://lkml.org/lkml/2019/6/12/745
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture...
=== History
Changes in v19: - Rebased onto 7b5cf701 (5.3-rc1+).
Changes in v18: - Reverted the selftest back to not using the LD_PRELOAD approach. - Added prctl(PR_SET_TAGGED_ADDR_CTRL) call to the selftest. - Reworded the patch descriptions to make them less oriented on arm64 only. - Catalin's patch: "I added a Kconfig option and dropped the prctl args zero check. There is some minor clean-up as well".
Changes in v17: - The "uaccess: add noop untagged_addr definition" patch is dropped, as it was merged into upstream named as "uaccess: add noop untagged_addr definition". - Merged "mm, arm64: untag user pointers in do_pages_move" into "mm, arm64: untag user pointers passed to memory syscalls". - Added "arm64: Introduce prctl() options to control the tagged user addresses ABI" patch from Catalin. - Add tags_lib.so to tools/testing/selftests/arm64/.gitignore. - Added a comment clarifying untagged in mremap. - Moved untagging back into mlx4_get_umem_mr() for the IB patch.
Changes in v16: - Moved untagging for memory syscalls from arm64 wrappers back to generic code. - Dropped untagging for the following memory syscalls: brk, mmap, munmap; mremap (only dropped for new_address); mmap_pgoff (not used on arm64); remap_file_pages (deprecated); shmat, shmdt (work on shared memory). - Changed kselftest to LD_PRELOAD a shared library that overrides malloc to return tagged pointers. - Rebased onto 5.2-rc3.
Changes in v15: - Removed unnecessary untagging from radeon_ttm_tt_set_userptr(). - Removed unnecessary untagging from amdgpu_ttm_tt_set_userptr(). - Moved untagging to validate_range() in userfaultfd code. - Moved untagging to ib_uverbs_(re)reg_mr() from mlx4_get_umem_mr(). - Rebased onto 5.1.
Changes in v14: - Moved untagging for most memory syscalls to an arm64 specific implementation, instead of doing that in the common code. - Dropped "net, arm64: untag user pointers in tcp_zerocopy_receive", since the provided user pointers don't come from an anonymous map and thus are not covered by this ABI relaxation. - Dropped "kernel, arm64: untag user pointers in prctl_set_mm*". - Moved untagging from __check_mem_type() to tee_shm_register(). - Updated untagging for the amdgpu and radeon drivers to cover the MMU notifier, as suggested by Felix. - Since this ABI relaxation doesn't actually allow tagged instruction pointers, dropped the following patches: - Dropped "tracing, arm64: untag user pointers in seq_print_user_ip". - Dropped "uprobes, arm64: untag user pointers in find_active_uprobe". - Dropped "bpf, arm64: untag user pointers in stack_map_get_build_id_offset". - Rebased onto 5.1-rc7 (37624b58).
Changes in v13: - Simplified untagging in tcp_zerocopy_receive(). - Looked at find_vma() callers in drivers/, which allowed to identify a few other places where untagging is needed. - Added patch "mm, arm64: untag user pointers in get_vaddr_frames". - Added patch "drm/amdgpu, arm64: untag user pointers in amdgpu_ttm_tt_get_user_pages". - Added patch "drm/radeon, arm64: untag user pointers in radeon_ttm_tt_pin_userptr". - Added patch "IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr". - Added patch "media/v4l2-core, arm64: untag user pointers in videobuf_dma_contig_user_get". - Added patch "tee/optee, arm64: untag user pointers in check_mem_type". - Added patch "vfio/type1, arm64: untag user pointers".
Changes in v12: - Changed untagging in tcp_zerocopy_receive() to also untag zc->address. - Fixed untagging in prctl_set_mm* to only untag pointers for vma lookups and validity checks, but leave them as is for actual user space accesses. - Updated the link to the v2 of the "arm64 relaxed ABI" patchset [3]. - Dropped the documentation patch, as the "arm64 relaxed ABI" patchset [3] handles that.
Changes in v11: - Added "uprobes, arm64: untag user pointers in find_active_uprobe" patch. - Added "bpf, arm64: untag user pointers in stack_map_get_build_id_offset" patch. - Fixed "tracing, arm64: untag user pointers in seq_print_user_ip" to correctly perform subtration with a tagged addr. - Moved untagged_addr() from SYSCALL_DEFINE3(mprotect) and SYSCALL_DEFINE4(pkey_mprotect) to do_mprotect_pkey(). - Moved untagged_addr() definition for other arches from include/linux/memory.h to include/linux/mm.h. - Changed untagging in strn*_user() to perform userspace accesses through tagged pointers. - Updated the documentation to mention that passing tagged pointers to memory syscalls is allowed. - Updated the test to use malloc'ed memory instead of stack memory.
Changes in v10: - Added "mm, arm64: untag user pointers passed to memory syscalls" back. - New patch "fs, arm64: untag user pointers in fs/userfaultfd.c". - New patch "net, arm64: untag user pointers in tcp_zerocopy_receive". - New patch "kernel, arm64: untag user pointers in prctl_set_mm*". - New patch "tracing, arm64: untag user pointers in seq_print_user_ip".
Changes in v9: - Rebased onto 4.20-rc6. - Used u64 instead of __u64 in type casts in the untagged_addr macro for arm64. - Added braces around (addr) in the untagged_addr macro for other arches.
Changes in v8: - Rebased onto 65102238 (4.20-rc1). - Added a note to the cover letter on why syscall wrappers/shims that untag user pointers won't work. - Added a note to the cover letter that this patchset has been merged into the Pixel 2 kernel tree. - Documentation fixes, in particular added a list of syscalls that don't support tagged user pointers.
Changes in v7: - Rebased onto 17b57b18 (4.19-rc6). - Dropped the "arm64: untag user address in __do_user_fault" patch, since the existing patches already handle user faults properly. - Dropped the "usb, arm64: untag user addresses in devio" patch, since the passed pointer must come from a vma and therefore be untagged. - Dropped the "arm64: annotate user pointers casts detected by sparse" patch (see the discussion to the replies of the v6 of this patchset). - Added more context to the cover letter. - Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6: - Added annotations for user pointer casts found by sparse. - Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5: - Added 3 new patches that add untagging to places found with static analysis. - Rebased onto 44c929e1 (4.18-rc8).
Changes in v4: - Added a selftest for checking that passing tagged pointers to the kernel succeeds. - Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3: - Rebased onto e5c51f30 (4.17-rc6+). - Added linux-arch@ to the list of recipients.
Changes in v2: - Rebased onto 2d618bdf (4.17-rc3+). - Removed excessive untagging in gup.c. - Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1: - Rebased onto 4.17-rc1.
Changes in RFC v2: - Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of defining it for each arch individually. - Updated Documentation/arm64/tagged-pointers.txt. - Dropped "mm, arm64: untag user addresses in memory syscalls". - Rebased onto 3eb2ce82 (4.16-rc7).
Signed-off-by: Andrey Konovalov andreyknvl@google.com
Andrey Konovalov (14): arm64: untag user pointers in access_ok and __uaccess_mask_ptr lib: untag user pointers in strn*_user mm: untag user pointers passed to memory syscalls mm: untag user pointers in mm/gup.c mm: untag user pointers in get_vaddr_frames fs/namespace: untag user pointers in copy_mount_options userfaultfd: untag user pointers drm/amdgpu: untag user pointers drm/radeon: untag user pointers in radeon_gem_userptr_ioctl IB/mlx4: untag user pointers in mlx4_get_umem_mr media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get tee/shm: untag user pointers in tee_shm_register vfio/type1: untag user pointers in vaddr_get_pfn selftests, arm64: add a selftest for passing tagged pointers to kernel
Catalin Marinas (1): arm64: Introduce prctl() options to control the tagged user addresses ABI
arch/arm64/Kconfig | 9 +++ arch/arm64/include/asm/processor.h | 8 ++ arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/include/asm/uaccess.h | 12 ++- arch/arm64/kernel/process.c | 73 +++++++++++++++++++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 + drivers/gpu/drm/radeon/radeon_gem.c | 2 + drivers/infiniband/hw/mlx4/mr.c | 7 +- drivers/media/v4l2-core/videobuf-dma-contig.c | 9 ++- drivers/tee/tee_shm.c | 1 + drivers/vfio/vfio_iommu_type1.c | 2 + fs/namespace.c | 2 +- fs/userfaultfd.c | 22 +++--- include/uapi/linux/prctl.h | 5 ++ kernel/sys.c | 12 +++ lib/strncpy_from_user.c | 3 +- lib/strnlen_user.c | 3 +- mm/frame_vector.c | 2 + mm/gup.c | 4 + mm/madvise.c | 2 + mm/mempolicy.c | 3 + mm/migrate.c | 2 +- mm/mincore.c | 2 + mm/mlock.c | 4 + mm/mprotect.c | 2 + mm/mremap.c | 7 ++ mm/msync.c | 2 + tools/testing/selftests/arm64/.gitignore | 1 + tools/testing/selftests/arm64/Makefile | 11 +++ .../testing/selftests/arm64/run_tags_test.sh | 12 +++ tools/testing/selftests/arm64/tags_test.c | 29 ++++++++ 32 files changed, 233 insertions(+), 25 deletions(-) create mode 100644 tools/testing/selftests/arm64/.gitignore create mode 100644 tools/testing/selftests/arm64/Makefile create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh create mode 100644 tools/testing/selftests/arm64/tags_test.c
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
copy_from_user (and a few other similar functions) are used to copy data from user memory into the kernel memory or vice versa. Since a user can provided a tagged pointer to one of the syscalls that use copy_from_user, we need to correctly handle such pointers.
Do this by untagging user pointers in access_ok and in __uaccess_mask_ptr, before performing access validity checks.
Note, that this patch only temporarily untags the pointers to perform the checks, but then passes them as is into the kernel internals.
Reviewed-by: Vincenzo Frascino vincenzo.frascino@arm.com Reviewed-by: Kees Cook keescook@chromium.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- arch/arm64/include/asm/uaccess.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index 5a1c32260c1f..a138e3b4f717 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -62,6 +62,8 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si { unsigned long ret, limit = current_thread_info()->addr_limit;
+ addr = untagged_addr(addr); + __chk_user_ptr(addr); asm volatile( // A + B <= C + 1 for all A,B,C, in four easy steps: @@ -215,7 +217,8 @@ static inline void uaccess_enable_not_uao(void)
/* * Sanitise a uaccess pointer such that it becomes NULL if above the - * current addr_limit. + * current addr_limit. In case the pointer is tagged (has the top byte set), + * untag the pointer before checking. */ #define uaccess_mask_ptr(ptr) (__typeof__(ptr))__uaccess_mask_ptr(ptr) static inline void __user *__uaccess_mask_ptr(const void __user *ptr) @@ -223,10 +226,11 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr) void __user *safe_ptr;
asm volatile( - " bics xzr, %1, %2\n" + " bics xzr, %3, %2\n" " csel %0, %1, xzr, eq\n" : "=&r" (safe_ptr) - : "r" (ptr), "r" (current_thread_info()->addr_limit) + : "r" (ptr), "r" (current_thread_info()->addr_limit), + "r" (untagged_addr(ptr)) : "cc");
csdb();
From: Catalin Marinas catalin.marinas@arm.com
It is not desirable to relax the ABI to allow tagged user addresses into the kernel indiscriminately. This patch introduces a prctl() interface for enabling or disabling the tagged ABI with a global sysctl control for preventing applications from enabling the relaxed ABI (meant for testing user-space prctl() return error checking without reconfiguring the kernel). The ABI properties are inherited by threads of the same application and fork()'ed children but cleared on execve(). A Kconfig option allows the overall disabling of the relaxed ABI.
The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle MTE-specific settings like imprecise vs precise exceptions.
Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- arch/arm64/Kconfig | 9 ++++ arch/arm64/include/asm/processor.h | 8 +++ arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/include/asm/uaccess.h | 4 +- arch/arm64/kernel/process.c | 73 ++++++++++++++++++++++++++++ include/uapi/linux/prctl.h | 5 ++ kernel/sys.c | 12 +++++ 7 files changed, 111 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 3adcec05b1f6..5d254178b9ca 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1110,6 +1110,15 @@ config ARM64_SW_TTBR0_PAN zeroed area and reserved ASID. The user access routines restore the valid TTBR0_EL1 temporarily.
+config ARM64_TAGGED_ADDR_ABI + bool "Enable the tagged user addresses syscall ABI" + default y + help + When this option is enabled, user applications can opt in to a + relaxed ABI via prctl() allowing tagged addresses to be passed + to system calls as pointer arguments. For details, see + Documentation/arm64/tagged-address-abi.txt. + menuconfig COMPAT bool "Kernel support for 32-bit EL0" depends on ARM64_4K_PAGES || EXPERT diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index fd5b1a4efc70..ee86070a28d4 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -296,6 +296,14 @@ extern void __init minsigstksz_setup(void); /* PR_PAC_RESET_KEYS prctl */ #define PAC_RESET_KEYS(tsk, arg) ptrauth_prctl_reset_keys(tsk, arg)
+#ifdef CONFIG_ARM64_TAGGED_ADDR_ABI +/* PR_{SET,GET}_TAGGED_ADDR_CTRL prctl */ +long set_tagged_addr_ctrl(unsigned long arg); +long get_tagged_addr_ctrl(void); +#define SET_TAGGED_ADDR_CTRL(arg) set_tagged_addr_ctrl(arg) +#define GET_TAGGED_ADDR_CTRL() get_tagged_addr_ctrl() +#endif + /* * For CONFIG_GCC_PLUGIN_STACKLEAK * diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index 180b34ec5965..012238d8e58d 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -90,6 +90,7 @@ void arch_release_task_struct(struct task_struct *tsk); #define TIF_SVE 23 /* Scalable Vector Extension in use */ #define TIF_SVE_VL_INHERIT 24 /* Inherit sve_vl_onexec across exec */ #define TIF_SSBD 25 /* Wants SSB mitigation */ +#define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index a138e3b4f717..097d6bfac0b7 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -62,7 +62,9 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si { unsigned long ret, limit = current_thread_info()->addr_limit;
- addr = untagged_addr(addr); + if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) && + test_thread_flag(TIF_TAGGED_ADDR)) + addr = untagged_addr(addr);
__chk_user_ptr(addr); asm volatile( diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 6a869d9f304f..ef06a303bda0 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -19,6 +19,7 @@ #include <linux/kernel.h> #include <linux/mm.h> #include <linux/stddef.h> +#include <linux/sysctl.h> #include <linux/unistd.h> #include <linux/user.h> #include <linux/delay.h> @@ -38,6 +39,7 @@ #include <trace/events/power.h> #include <linux/percpu.h> #include <linux/thread_info.h> +#include <linux/prctl.h>
#include <asm/alternative.h> #include <asm/arch_gicv3.h> @@ -307,11 +309,18 @@ static void tls_thread_flush(void) } }
+static void flush_tagged_addr_state(void) +{ + if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI)) + clear_thread_flag(TIF_TAGGED_ADDR); +} + void flush_thread(void) { fpsimd_flush_thread(); tls_thread_flush(); flush_ptrace_hw_breakpoint(current); + flush_tagged_addr_state(); }
void release_thread(struct task_struct *dead_task) @@ -541,3 +550,67 @@ void arch_setup_new_exec(void)
ptrauth_thread_init_user(current); } + +#ifdef CONFIG_ARM64_TAGGED_ADDR_ABI +/* + * Control the relaxed ABI allowing tagged user addresses into the kernel. + */ +static unsigned int tagged_addr_prctl_allowed = 1; + +long set_tagged_addr_ctrl(unsigned long arg) +{ + if (!tagged_addr_prctl_allowed) + return -EINVAL; + if (is_compat_task()) + return -EINVAL; + if (arg & ~PR_TAGGED_ADDR_ENABLE) + return -EINVAL; + + update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE); + + return 0; +} + +long get_tagged_addr_ctrl(void) +{ + if (!tagged_addr_prctl_allowed) + return -EINVAL; + if (is_compat_task()) + return -EINVAL; + + if (test_thread_flag(TIF_TAGGED_ADDR)) + return PR_TAGGED_ADDR_ENABLE; + + return 0; +} + +/* + * Global sysctl to disable the tagged user addresses support. This control + * only prevents the tagged address ABI enabling via prctl() and does not + * disable it for tasks that already opted in to the relaxed ABI. + */ +static int zero; +static int one = 1; + +static struct ctl_table tagged_addr_sysctl_table[] = { + { + .procname = "tagged_addr", + .mode = 0644, + .data = &tagged_addr_prctl_allowed, + .maxlen = sizeof(int), + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one, + }, + { } +}; + +static int __init tagged_addr_init(void) +{ + if (!register_sysctl("abi", tagged_addr_sysctl_table)) + return -EINVAL; + return 0; +} + +core_initcall(tagged_addr_init); +#endif /* CONFIG_ARM64_TAGGED_ADDR_ABI */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 094bb03b9cc2..2e927b3e9d6c 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -229,4 +229,9 @@ struct prctl_mm_map { # define PR_PAC_APDBKEY (1UL << 3) # define PR_PAC_APGAKEY (1UL << 4)
+/* Tagged user address controls for arm64 */ +#define PR_SET_TAGGED_ADDR_CTRL 55 +#define PR_GET_TAGGED_ADDR_CTRL 56 +# define PR_TAGGED_ADDR_ENABLE (1UL << 0) + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 2969304c29fe..c6c4d5358bd3 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -124,6 +124,12 @@ #ifndef PAC_RESET_KEYS # define PAC_RESET_KEYS(a, b) (-EINVAL) #endif +#ifndef SET_TAGGED_ADDR_CTRL +# define SET_TAGGED_ADDR_CTRL(a) (-EINVAL) +#endif +#ifndef GET_TAGGED_ADDR_CTRL +# define GET_TAGGED_ADDR_CTRL() (-EINVAL) +#endif
/* * this is where the system-wide overflow UID and GID are defined, for @@ -2492,6 +2498,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, return -EINVAL; error = PAC_RESET_KEYS(me, arg2); break; + case PR_SET_TAGGED_ADDR_CTRL: + error = SET_TAGGED_ADDR_CTRL(arg2); + break; + case PR_GET_TAGGED_ADDR_CTRL: + error = GET_TAGGED_ADDR_CTRL(); + break; default: error = -EINVAL; break;
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
strncpy_from_user and strnlen_user accept user addresses as arguments, and do not go through the same path as copy_from_user and others, so here we need to handle the case of tagged user addresses separately.
Untag user pointers passed to these functions.
Note, that this patch only temporarily untags the pointers to perform validity checks, but then uses them as is to perform user memory accesses.
Reviewed-by: Vincenzo Frascino vincenzo.frascino@arm.com Reviewed-by: Khalid Aziz khalid.aziz@oracle.com Acked-by: Kees Cook keescook@chromium.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- lib/strncpy_from_user.c | 3 ++- lib/strnlen_user.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c index 023ba9f3b99f..dccb95af6003 100644 --- a/lib/strncpy_from_user.c +++ b/lib/strncpy_from_user.c @@ -6,6 +6,7 @@ #include <linux/uaccess.h> #include <linux/kernel.h> #include <linux/errno.h> +#include <linux/mm.h>
#include <asm/byteorder.h> #include <asm/word-at-a-time.h> @@ -108,7 +109,7 @@ long strncpy_from_user(char *dst, const char __user *src, long count) return 0;
max_addr = user_addr_max(); - src_addr = (unsigned long)src; + src_addr = (unsigned long)untagged_addr(src); if (likely(src_addr < max_addr)) { unsigned long max = max_addr - src_addr; long retval; diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c index 7f2db3fe311f..28ff554a1be8 100644 --- a/lib/strnlen_user.c +++ b/lib/strnlen_user.c @@ -2,6 +2,7 @@ #include <linux/kernel.h> #include <linux/export.h> #include <linux/uaccess.h> +#include <linux/mm.h>
#include <asm/word-at-a-time.h>
@@ -109,7 +110,7 @@ long strnlen_user(const char __user *str, long count) return 0;
max_addr = user_addr_max(); - src_addr = (unsigned long)str; + src_addr = (unsigned long)untagged_addr(str); if (likely(src_addr < max_addr)) { unsigned long max = max_addr - src_addr; long retval;
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
This patch allows tagged pointers to be passed to the following memory syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect, mremap, msync, munlock, move_pages.
The mmap and mremap syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma.
Reviewed-by: Khalid Aziz khalid.aziz@oracle.com Reviewed-by: Vincenzo Frascino vincenzo.frascino@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Andrey Konovalov andreyknvl@google.com --- mm/madvise.c | 2 ++ mm/mempolicy.c | 3 +++ mm/migrate.c | 2 +- mm/mincore.c | 2 ++ mm/mlock.c | 4 ++++ mm/mprotect.c | 2 ++ mm/mremap.c | 7 +++++++ mm/msync.c | 2 ++ 8 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/mm/madvise.c b/mm/madvise.c index 968df3aa069f..4b102a61bfbf 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -810,6 +810,8 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) size_t len; struct blk_plug plug;
+ start = untagged_addr(start); + if (!madvise_behavior_valid(behavior)) return error;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c index f48693f75b37..74c134d0c84e 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1360,6 +1360,7 @@ static long kernel_mbind(unsigned long start, unsigned long len, int err; unsigned short mode_flags;
+ start = untagged_addr(start); mode_flags = mode & MPOL_MODE_FLAGS; mode &= ~MPOL_MODE_FLAGS; if (mode >= MPOL_MAX) @@ -1517,6 +1518,8 @@ static int kernel_get_mempolicy(int __user *policy, int uninitialized_var(pval); nodemask_t nodes;
+ addr = untagged_addr(addr); + if (nmask != NULL && maxnode < nr_node_ids) return -EINVAL;
diff --git a/mm/migrate.c b/mm/migrate.c index 8992741f10aa..8f59c24cb141 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1609,7 +1609,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, goto out_flush; if (get_user(node, nodes + i)) goto out_flush; - addr = (unsigned long)p; + addr = (unsigned long)untagged_addr(p);
err = -ENODEV; if (node < 0 || node >= MAX_NUMNODES) diff --git a/mm/mincore.c b/mm/mincore.c index 4fe91d497436..3cfa120ba0eb 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -257,6 +257,8 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, unsigned long pages; unsigned char *tmp;
+ start = untagged_addr(start); + /* Check the start address: needs to be page-aligned.. */ if (start & ~PAGE_MASK) return -EINVAL; diff --git a/mm/mlock.c b/mm/mlock.c index a90099da4fb4..a72c1eeded77 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -674,6 +674,8 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla unsigned long lock_limit; int error = -ENOMEM;
+ start = untagged_addr(start); + if (!can_do_mlock()) return -EPERM;
@@ -735,6 +737,8 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) { int ret;
+ start = untagged_addr(start); + len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK;
diff --git a/mm/mprotect.c b/mm/mprotect.c index bf38dfbbb4b4..19f981b733bc 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -465,6 +465,8 @@ static int do_mprotect_pkey(unsigned long start, size_t len, const bool rier = (current->personality & READ_IMPLIES_EXEC) && (prot & PROT_READ);
+ start = untagged_addr(start); + prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP); if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */ return -EINVAL; diff --git a/mm/mremap.c b/mm/mremap.c index fc241d23cd97..64c9a3b8be0a 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -606,6 +606,13 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap);
+ /* + * Architectures may interpret the tag passed to mmap as a background + * colour for the corresponding vma. For mremap we don't allow tagged + * new_addr to preserve similar behaviour to mmap. + */ + addr = untagged_addr(addr); + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) return ret;
diff --git a/mm/msync.c b/mm/msync.c index ef30a429623a..c3bd3e75f687 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -37,6 +37,8 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) int unmapped_error = 0; int error = -EINVAL;
+ start = untagged_addr(start); + if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) goto out; if (offset_in_page(start))
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
mm/gup.c provides a kernel interface that accepts user addresses and manipulates user pages directly (for example get_user_pages, that is used by the futex syscall). Since a user can provided tagged addresses, we need to handle this case.
Add untagging to gup.c functions that use user addresses for vma lookups.
Reviewed-by: Khalid Aziz khalid.aziz@oracle.com Reviewed-by: Vincenzo Frascino vincenzo.frascino@arm.com Reviewed-by: Kees Cook keescook@chromium.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- mm/gup.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/mm/gup.c b/mm/gup.c index 98f13ab37bac..1c1c97ec63df 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -799,6 +799,8 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (!nr_pages) return 0;
+ start = untagged_addr(start); + VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET));
/* @@ -961,6 +963,8 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, struct vm_area_struct *vma; vm_fault_t ret, major = 0;
+ address = untagged_addr(address); + if (unlocked) fault_flags |= FAULT_FLAG_ALLOW_RETRY;
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
get_vaddr_frames uses provided user pointers for vma lookups, which can only by done with untagged pointers. Instead of locating and changing all callers of this function, perform untagging in it.
Reviewed-by: Khalid Aziz khalid.aziz@oracle.com Reviewed-by: Vincenzo Frascino vincenzo.frascino@arm.com Acked-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Andrey Konovalov andreyknvl@google.com --- mm/frame_vector.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/frame_vector.c b/mm/frame_vector.c index c64dca6e27c2..c431ca81dad5 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -46,6 +46,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, if (WARN_ON_ONCE(nr_frames > vec->nr_allocated)) nr_frames = vec->nr_allocated;
+ start = untagged_addr(start); + down_read(&mm->mmap_sem); locked = 1; vma = find_vma_intersection(mm, start, start + 1);
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
In copy_mount_options a user address is being subtracted from TASK_SIZE. If the address is lower than TASK_SIZE, the size is calculated to not allow the exact_copy_from_user() call to cross TASK_SIZE boundary. However if the address is tagged, then the size will be calculated incorrectly.
Untag the address before subtracting.
Reviewed-by: Khalid Aziz khalid.aziz@oracle.com Reviewed-by: Vincenzo Frascino vincenzo.frascino@arm.com Reviewed-by: Kees Cook keescook@chromium.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- fs/namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/namespace.c b/fs/namespace.c index 6464ea4acba9..b32eb26af8bf 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2994,7 +2994,7 @@ void *copy_mount_options(const void __user * data) * the remainder of the page. */ /* copy_from_user cannot cross TASK_SIZE ! */ - size = TASK_SIZE - (unsigned long)data; + size = TASK_SIZE - (unsigned long)untagged_addr(data); if (size > PAGE_SIZE) size = PAGE_SIZE;
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
userfaultfd code use provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in validate_range().
Reviewed-by: Mike Rapoport rppt@linux.ibm.com Reviewed-by: Vincenzo Frascino vincenzo.frascino@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Andrey Konovalov andreyknvl@google.com --- fs/userfaultfd.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index ccbdbd62f0d8..6284a4e719cb 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1271,21 +1271,23 @@ static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx, }
static __always_inline int validate_range(struct mm_struct *mm, - __u64 start, __u64 len) + __u64 *start, __u64 len) { __u64 task_size = mm->task_size;
- if (start & ~PAGE_MASK) + *start = untagged_addr(*start); + + if (*start & ~PAGE_MASK) return -EINVAL; if (len & ~PAGE_MASK) return -EINVAL; if (!len) return -EINVAL; - if (start < mmap_min_addr) + if (*start < mmap_min_addr) return -EINVAL; - if (start >= task_size) + if (*start >= task_size) return -EINVAL; - if (len > task_size - start) + if (len > task_size - *start) return -EINVAL; return 0; } @@ -1335,7 +1337,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, goto out; }
- ret = validate_range(mm, uffdio_register.range.start, + ret = validate_range(mm, &uffdio_register.range.start, uffdio_register.range.len); if (ret) goto out; @@ -1524,7 +1526,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister))) goto out;
- ret = validate_range(mm, uffdio_unregister.start, + ret = validate_range(mm, &uffdio_unregister.start, uffdio_unregister.len); if (ret) goto out; @@ -1675,7 +1677,7 @@ static int userfaultfd_wake(struct userfaultfd_ctx *ctx, if (copy_from_user(&uffdio_wake, buf, sizeof(uffdio_wake))) goto out;
- ret = validate_range(ctx->mm, uffdio_wake.start, uffdio_wake.len); + ret = validate_range(ctx->mm, &uffdio_wake.start, uffdio_wake.len); if (ret) goto out;
@@ -1715,7 +1717,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, sizeof(uffdio_copy)-sizeof(__s64))) goto out;
- ret = validate_range(ctx->mm, uffdio_copy.dst, uffdio_copy.len); + ret = validate_range(ctx->mm, &uffdio_copy.dst, uffdio_copy.len); if (ret) goto out; /* @@ -1771,7 +1773,7 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, sizeof(uffdio_zeropage)-sizeof(__s64))) goto out;
- ret = validate_range(ctx->mm, uffdio_zeropage.range.start, + ret = validate_range(ctx->mm, &uffdio_zeropage.range.start, uffdio_zeropage.range.len); if (ret) goto out;
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
In amdgpu_gem_userptr_ioctl() and amdgpu_amdkfd_gpuvm.c/init_user_pages() an MMU notifier is set up with a (tagged) userspace pointer. The untagged address should be used so that MMU notifiers for the untagged address get correctly matched up with the right BO. This patch untag user pointers in amdgpu_gem_userptr_ioctl() for the GEM case and in amdgpu_amdkfd_gpuvm_ alloc_memory_of_gpu() for the KFD case. This also makes sure that an untagged pointer is passed to amdgpu_ttm_tt_get_user_pages(), which uses it for vma lookups.
Reviewed-by: Kees Cook keescook@chromium.org Suggested-by: Felix Kuehling Felix.Kuehling@amd.com Acked-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 1d3ee9c42f7e..00468ebf8b76 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1103,7 +1103,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( alloc_flags = 0; if (!offset || !*offset) return -EINVAL; - user_addr = *offset; + user_addr = untagged_addr(*offset); } else if (flags & (ALLOC_MEM_FLAGS_DOORBELL | ALLOC_MEM_FLAGS_MMIO_REMAP)) { domain = AMDGPU_GEM_DOMAIN_GTT; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 939f8305511b..d7855842fd51 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -291,6 +291,8 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data, uint32_t handle; int r;
+ args->addr = untagged_addr(args->addr); + if (offset_in_page(args->addr | args->size)) return -EINVAL;
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
In radeon_gem_userptr_ioctl() an MMU notifier is set up with a (tagged) userspace pointer. The untagged address should be used so that MMU notifiers for the untagged address get correctly matched up with the right BO. This funcation also calls radeon_ttm_tt_pin_userptr(), which uses provided user pointers for vma lookups, which can only by done with untagged pointers.
This patch untags user pointers in radeon_gem_userptr_ioctl().
Reviewed-by: Khalid Aziz khalid.aziz@oracle.com Reviewed-by: Kees Cook keescook@chromium.org Suggested-by: Felix Kuehling Felix.Kuehling@amd.com Acked-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- drivers/gpu/drm/radeon/radeon_gem.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index d8bc5d2dfd61..89353098b627 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -296,6 +296,8 @@ int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data, uint32_t handle; int r;
+ args->addr = untagged_addr(args->addr); + if (offset_in_page(args->addr | args->size)) return -EINVAL;
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Reviewed-by: Jason Gunthorpe jgg@mellanox.com Acked-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Andrey Konovalov andreyknvl@google.com --- drivers/infiniband/hw/mlx4/mr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 753479285ce9..6ae503cfc526 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -377,6 +377,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, * again */ if (!ib_access_writable(access_flags)) { + unsigned long untagged_start = untagged_addr(start); struct vm_area_struct *vma;
down_read(¤t->mm->mmap_sem); @@ -385,9 +386,9 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, * cover the memory, but for now it requires a single vma to * entirely cover the MR to support RO mappings. */ - vma = find_vma(current->mm, start); - if (vma && vma->vm_end >= start + length && - vma->vm_start <= start) { + vma = find_vma(current->mm, untagged_start); + if (vma && vma->vm_end >= untagged_start + length && + vma->vm_start <= untagged_start) { if (vma->vm_flags & VM_WRITE) access_flags |= IB_ACCESS_LOCAL_WRITE; } else {
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
videobuf_dma_contig_user_get() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag the pointers in this function.
Reviewed-by: Khalid Aziz khalid.aziz@oracle.com Reviewed-by: Kees Cook keescook@chromium.org Acked-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Andrey Konovalov andreyknvl@google.com --- drivers/media/v4l2-core/videobuf-dma-contig.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c index 76b4ac7b1678..aeb2f497c683 100644 --- a/drivers/media/v4l2-core/videobuf-dma-contig.c +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c @@ -157,6 +157,7 @@ static void videobuf_dma_contig_user_put(struct videobuf_dma_contig_memory *mem) static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem, struct videobuf_buffer *vb) { + unsigned long untagged_baddr = untagged_addr(vb->baddr); struct mm_struct *mm = current->mm; struct vm_area_struct *vma; unsigned long prev_pfn, this_pfn; @@ -164,22 +165,22 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem, unsigned int offset; int ret;
- offset = vb->baddr & ~PAGE_MASK; + offset = untagged_baddr & ~PAGE_MASK; mem->size = PAGE_ALIGN(vb->size + offset); ret = -EINVAL;
down_read(&mm->mmap_sem);
- vma = find_vma(mm, vb->baddr); + vma = find_vma(mm, untagged_baddr); if (!vma) goto out_up;
- if ((vb->baddr + mem->size) > vma->vm_end) + if ((untagged_baddr + mem->size) > vma->vm_end) goto out_up;
pages_done = 0; prev_pfn = 0; /* kill warning */ - user_address = vb->baddr; + user_address = untagged_baddr;
while (pages_done < (mem->size >> PAGE_SHIFT)) { ret = follow_pfn(vma, user_address, &this_pfn);
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
tee_shm_register()->optee_shm_unregister()->check_mem_type() uses provided user pointers for vma lookups (via __check_mem_type()), which can only by done with untagged pointers.
Untag user pointers in this function.
Reviewed-by: Kees Cook keescook@chromium.org Acked-by: Jens Wiklander jens.wiklander@linaro.org Signed-off-by: Andrey Konovalov andreyknvl@google.com --- drivers/tee/tee_shm.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c index 2da026fd12c9..09ddcd06c715 100644 --- a/drivers/tee/tee_shm.c +++ b/drivers/tee/tee_shm.c @@ -254,6 +254,7 @@ struct tee_shm *tee_shm_register(struct tee_context *ctx, unsigned long addr, shm->teedev = teedev; shm->ctx = ctx; shm->id = -1; + addr = untagged_addr(addr); start = rounddown(addr, PAGE_SIZE); shm->offset = addr - start; shm->size = length;
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
vaddr_get_pfn() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Reviewed-by: Eric Auger eric.auger@redhat.com Reviewed-by: Vincenzo Frascino vincenzo.frascino@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Andrey Konovalov andreyknvl@google.com --- drivers/vfio/vfio_iommu_type1.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 054391f30fa8..67a24b4d0fa4 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -368,6 +368,8 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
down_read(&mm->mmap_sem);
+ vaddr = untagged_addr(vaddr); + vma = find_vma_intersection(mm, vaddr, vaddr + 1);
if (vma && vma->vm_flags & VM_PFNMAP) {
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
This patch adds a simple test, that calls the uname syscall with a tagged user pointer as an argument. Without the kernel accepting tagged user pointers the test fails with EFAULT.
Reviewed-by: Catalin Marinas catalin.marinas@arm.com Acked-by: Kees Cook keescook@chromium.org Signed-off-by: Andrey Konovalov andreyknvl@google.com --- tools/testing/selftests/arm64/.gitignore | 1 + tools/testing/selftests/arm64/Makefile | 11 +++++++ .../testing/selftests/arm64/run_tags_test.sh | 12 ++++++++ tools/testing/selftests/arm64/tags_test.c | 29 +++++++++++++++++++ 4 files changed, 53 insertions(+) create mode 100644 tools/testing/selftests/arm64/.gitignore create mode 100644 tools/testing/selftests/arm64/Makefile create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh create mode 100644 tools/testing/selftests/arm64/tags_test.c
diff --git a/tools/testing/selftests/arm64/.gitignore b/tools/testing/selftests/arm64/.gitignore new file mode 100644 index 000000000000..e8fae8d61ed6 --- /dev/null +++ b/tools/testing/selftests/arm64/.gitignore @@ -0,0 +1 @@ +tags_test diff --git a/tools/testing/selftests/arm64/Makefile b/tools/testing/selftests/arm64/Makefile new file mode 100644 index 000000000000..a61b2e743e99 --- /dev/null +++ b/tools/testing/selftests/arm64/Makefile @@ -0,0 +1,11 @@ +# SPDX-License-Identifier: GPL-2.0 + +# ARCH can be overridden by the user for cross compiling +ARCH ?= $(shell uname -m 2>/dev/null || echo not) + +ifneq (,$(filter $(ARCH),aarch64 arm64)) +TEST_GEN_PROGS := tags_test +TEST_PROGS := run_tags_test.sh +endif + +include ../lib.mk diff --git a/tools/testing/selftests/arm64/run_tags_test.sh b/tools/testing/selftests/arm64/run_tags_test.sh new file mode 100755 index 000000000000..745f11379930 --- /dev/null +++ b/tools/testing/selftests/arm64/run_tags_test.sh @@ -0,0 +1,12 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +echo "--------------------" +echo "running tags test" +echo "--------------------" +./tags_test +if [ $? -ne 0 ]; then + echo "[FAIL]" +else + echo "[PASS]" +fi diff --git a/tools/testing/selftests/arm64/tags_test.c b/tools/testing/selftests/arm64/tags_test.c new file mode 100644 index 000000000000..22a1b266e373 --- /dev/null +++ b/tools/testing/selftests/arm64/tags_test.c @@ -0,0 +1,29 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <stdint.h> +#include <sys/prctl.h> +#include <sys/utsname.h> + +#define SHIFT_TAG(tag) ((uint64_t)(tag) << 56) +#define SET_TAG(ptr, tag) (((uint64_t)(ptr) & ~SHIFT_TAG(0xff)) | \ + SHIFT_TAG(tag)) + +int main(void) +{ + static int tbi_enabled = 0; + struct utsname *ptr, *tagged_ptr; + int err; + + if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0) == 0) + tbi_enabled = 1; + ptr = (struct utsname *)malloc(sizeof(*ptr)); + if (tbi_enabled) + tagged_ptr = (struct utsname *)SET_TAG(ptr, 0x42); + err = uname(tagged_ptr); + free(ptr); + + return err; +}
On Tue, Jul 23, 2019 at 7:59 PM Andrey Konovalov andreyknvl@google.com wrote:
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged pointers, due to these patches:
- 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer")
- 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers")
- 276e9327 ("arm64: entry: improve data abort handling of tagged pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be passed to syscalls when they point to memory ranges obtained by anonymous mmap() or sbrk() (see the patchset [3] for more details).
For non-memory syscalls this is done by untaging user pointers when the kernel performs pointer checking to find out whether the pointer comes from userspace (most notably in access_ok). The untagging is done only when the pointer is being checked, the tag is preserved as the pointer makes its way through the kernel and stays tagged when the kernel dereferences the pointer when perfoming user memory accesses.
The mmap and mremap (only new_addr) syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma.
Other memory syscalls (mprotect, etc.) don't do user memory accesses but rather deal with memory ranges, and untagged pointers are better suited to describe memory ranges internally. Thus for memory syscalls we untag pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to completely strip the pointer tag as the pointer enters the kernel with some kind of a syscall wrapper, but that won't work with the countless number of different ioctl calls. With this approach we would need a custom wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues is to inspead allow tagged pointers to be passed to find_vma() (and other vma related functions) and untag them there. Unfortunately, a lot of find_vma() callers then compare or subtract the returned vma start and end fields against the pointer that was being searched. Thus this approach would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues with user pointer untagging:
Static testing (with sparse [2] and separately with a custom static analyzer based on Clang) to track casts of __user pointers to integer types to find places where untagging needs to be done.
Static testing with grep to find parts of the kernel that call find_vma() (and other similar functions) or directly compare against vm_start/vm_end fields of vma.
Static testing with grep to find parts of the kernel that compare user pointers with TASK_SIZE or other similar consts and macros.
Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature support [4].
This patchset has been merged into the Pixel 2 & 3 kernel trees and is now being used to enable testing of Pixel phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e...
[3] https://lkml.org/lkml/2019/6/12/745
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture...
Hi Andrew and Catalin,
Do you think this is ready to be merged?
Should this go through the mm or the arm tree?
Thanks!
=== History
Changes in v19:
- Rebased onto 7b5cf701 (5.3-rc1+).
Changes in v18:
- Reverted the selftest back to not using the LD_PRELOAD approach.
- Added prctl(PR_SET_TAGGED_ADDR_CTRL) call to the selftest.
- Reworded the patch descriptions to make them less oriented on arm64 only.
- Catalin's patch: "I added a Kconfig option and dropped the prctl args zero check. There is some minor clean-up as well".
Changes in v17:
- The "uaccess: add noop untagged_addr definition" patch is dropped, as it was merged into upstream named as "uaccess: add noop untagged_addr definition".
- Merged "mm, arm64: untag user pointers in do_pages_move" into "mm, arm64: untag user pointers passed to memory syscalls".
- Added "arm64: Introduce prctl() options to control the tagged user addresses ABI" patch from Catalin.
- Add tags_lib.so to tools/testing/selftests/arm64/.gitignore.
- Added a comment clarifying untagged in mremap.
- Moved untagging back into mlx4_get_umem_mr() for the IB patch.
Changes in v16:
- Moved untagging for memory syscalls from arm64 wrappers back to generic code.
- Dropped untagging for the following memory syscalls: brk, mmap, munmap; mremap (only dropped for new_address); mmap_pgoff (not used on arm64); remap_file_pages (deprecated); shmat, shmdt (work on shared memory).
- Changed kselftest to LD_PRELOAD a shared library that overrides malloc to return tagged pointers.
- Rebased onto 5.2-rc3.
Changes in v15:
- Removed unnecessary untagging from radeon_ttm_tt_set_userptr().
- Removed unnecessary untagging from amdgpu_ttm_tt_set_userptr().
- Moved untagging to validate_range() in userfaultfd code.
- Moved untagging to ib_uverbs_(re)reg_mr() from mlx4_get_umem_mr().
- Rebased onto 5.1.
Changes in v14:
- Moved untagging for most memory syscalls to an arm64 specific implementation, instead of doing that in the common code.
- Dropped "net, arm64: untag user pointers in tcp_zerocopy_receive", since the provided user pointers don't come from an anonymous map and thus are not covered by this ABI relaxation.
- Dropped "kernel, arm64: untag user pointers in prctl_set_mm*".
- Moved untagging from __check_mem_type() to tee_shm_register().
- Updated untagging for the amdgpu and radeon drivers to cover the MMU notifier, as suggested by Felix.
- Since this ABI relaxation doesn't actually allow tagged instruction pointers, dropped the following patches:
- Dropped "tracing, arm64: untag user pointers in seq_print_user_ip".
- Dropped "uprobes, arm64: untag user pointers in find_active_uprobe".
- Dropped "bpf, arm64: untag user pointers in stack_map_get_build_id_offset".
- Rebased onto 5.1-rc7 (37624b58).
Changes in v13:
- Simplified untagging in tcp_zerocopy_receive().
- Looked at find_vma() callers in drivers/, which allowed to identify a few other places where untagging is needed.
- Added patch "mm, arm64: untag user pointers in get_vaddr_frames".
- Added patch "drm/amdgpu, arm64: untag user pointers in amdgpu_ttm_tt_get_user_pages".
- Added patch "drm/radeon, arm64: untag user pointers in radeon_ttm_tt_pin_userptr".
- Added patch "IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr".
- Added patch "media/v4l2-core, arm64: untag user pointers in videobuf_dma_contig_user_get".
- Added patch "tee/optee, arm64: untag user pointers in check_mem_type".
- Added patch "vfio/type1, arm64: untag user pointers".
Changes in v12:
- Changed untagging in tcp_zerocopy_receive() to also untag zc->address.
- Fixed untagging in prctl_set_mm* to only untag pointers for vma lookups and validity checks, but leave them as is for actual user space accesses.
- Updated the link to the v2 of the "arm64 relaxed ABI" patchset [3].
- Dropped the documentation patch, as the "arm64 relaxed ABI" patchset [3] handles that.
Changes in v11:
- Added "uprobes, arm64: untag user pointers in find_active_uprobe" patch.
- Added "bpf, arm64: untag user pointers in stack_map_get_build_id_offset" patch.
- Fixed "tracing, arm64: untag user pointers in seq_print_user_ip" to correctly perform subtration with a tagged addr.
- Moved untagged_addr() from SYSCALL_DEFINE3(mprotect) and SYSCALL_DEFINE4(pkey_mprotect) to do_mprotect_pkey().
- Moved untagged_addr() definition for other arches from include/linux/memory.h to include/linux/mm.h.
- Changed untagging in strn*_user() to perform userspace accesses through tagged pointers.
- Updated the documentation to mention that passing tagged pointers to memory syscalls is allowed.
- Updated the test to use malloc'ed memory instead of stack memory.
Changes in v10:
- Added "mm, arm64: untag user pointers passed to memory syscalls" back.
- New patch "fs, arm64: untag user pointers in fs/userfaultfd.c".
- New patch "net, arm64: untag user pointers in tcp_zerocopy_receive".
- New patch "kernel, arm64: untag user pointers in prctl_set_mm*".
- New patch "tracing, arm64: untag user pointers in seq_print_user_ip".
Changes in v9:
- Rebased onto 4.20-rc6.
- Used u64 instead of __u64 in type casts in the untagged_addr macro for arm64.
- Added braces around (addr) in the untagged_addr macro for other arches.
Changes in v8:
- Rebased onto 65102238 (4.20-rc1).
- Added a note to the cover letter on why syscall wrappers/shims that untag user pointers won't work.
- Added a note to the cover letter that this patchset has been merged into the Pixel 2 kernel tree.
- Documentation fixes, in particular added a list of syscalls that don't support tagged user pointers.
Changes in v7:
- Rebased onto 17b57b18 (4.19-rc6).
- Dropped the "arm64: untag user address in __do_user_fault" patch, since the existing patches already handle user faults properly.
- Dropped the "usb, arm64: untag user addresses in devio" patch, since the passed pointer must come from a vma and therefore be untagged.
- Dropped the "arm64: annotate user pointers casts detected by sparse" patch (see the discussion to the replies of the v6 of this patchset).
- Added more context to the cover letter.
- Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6:
- Added annotations for user pointer casts found by sparse.
- Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5:
- Added 3 new patches that add untagging to places found with static analysis.
- Rebased onto 44c929e1 (4.18-rc8).
Changes in v4:
- Added a selftest for checking that passing tagged pointers to the kernel succeeds.
- Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.
Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1:
- Rebased onto 4.17-rc1.
Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped "mm, arm64: untag user addresses in memory syscalls".
- Rebased onto 3eb2ce82 (4.16-rc7).
Signed-off-by: Andrey Konovalov andreyknvl@google.com
Andrey Konovalov (14): arm64: untag user pointers in access_ok and __uaccess_mask_ptr lib: untag user pointers in strn*_user mm: untag user pointers passed to memory syscalls mm: untag user pointers in mm/gup.c mm: untag user pointers in get_vaddr_frames fs/namespace: untag user pointers in copy_mount_options userfaultfd: untag user pointers drm/amdgpu: untag user pointers drm/radeon: untag user pointers in radeon_gem_userptr_ioctl IB/mlx4: untag user pointers in mlx4_get_umem_mr media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get tee/shm: untag user pointers in tee_shm_register vfio/type1: untag user pointers in vaddr_get_pfn selftests, arm64: add a selftest for passing tagged pointers to kernel
Catalin Marinas (1): arm64: Introduce prctl() options to control the tagged user addresses ABI
arch/arm64/Kconfig | 9 +++ arch/arm64/include/asm/processor.h | 8 ++ arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/include/asm/uaccess.h | 12 ++- arch/arm64/kernel/process.c | 73 +++++++++++++++++++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 + drivers/gpu/drm/radeon/radeon_gem.c | 2 + drivers/infiniband/hw/mlx4/mr.c | 7 +- drivers/media/v4l2-core/videobuf-dma-contig.c | 9 ++- drivers/tee/tee_shm.c | 1 + drivers/vfio/vfio_iommu_type1.c | 2 + fs/namespace.c | 2 +- fs/userfaultfd.c | 22 +++--- include/uapi/linux/prctl.h | 5 ++ kernel/sys.c | 12 +++ lib/strncpy_from_user.c | 3 +- lib/strnlen_user.c | 3 +- mm/frame_vector.c | 2 + mm/gup.c | 4 + mm/madvise.c | 2 + mm/mempolicy.c | 3 + mm/migrate.c | 2 +- mm/mincore.c | 2 + mm/mlock.c | 4 + mm/mprotect.c | 2 + mm/mremap.c | 7 ++ mm/msync.c | 2 + tools/testing/selftests/arm64/.gitignore | 1 + tools/testing/selftests/arm64/Makefile | 11 +++ .../testing/selftests/arm64/run_tags_test.sh | 12 +++ tools/testing/selftests/arm64/tags_test.c | 29 ++++++++ 32 files changed, 233 insertions(+), 25 deletions(-) create mode 100644 tools/testing/selftests/arm64/.gitignore create mode 100644 tools/testing/selftests/arm64/Makefile create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh create mode 100644 tools/testing/selftests/arm64/tags_test.c
-- 2.22.0.709.g102302147b-goog
Hi Andrey,
On Tue, Jul 23, 2019 at 08:03:29PM +0200, Andrey Konovalov wrote:
On Tue, Jul 23, 2019 at 7:59 PM Andrey Konovalov andreyknvl@google.com wrote:
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged pointers, due to these patches:
- 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer")
- 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers")
- 276e9327 ("arm64: entry: improve data abort handling of tagged pointers")
This patchset extends tagged pointer support to syscall arguments.
[...]
Do you think this is ready to be merged?
Should this go through the mm or the arm tree?
I would certainly prefer to take at least the arm64 bits via the arm64 tree (i.e. patches 1, 2 and 15). We also need a Documentation patch describing the new ABI.
Will
On Wed, Jul 24, 2019 at 4:02 PM Will Deacon will@kernel.org wrote:
Hi Andrey,
On Tue, Jul 23, 2019 at 08:03:29PM +0200, Andrey Konovalov wrote:
On Tue, Jul 23, 2019 at 7:59 PM Andrey Konovalov andreyknvl@google.com wrote:
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged pointers, due to these patches:
- 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer")
- 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers")
- 276e9327 ("arm64: entry: improve data abort handling of tagged pointers")
This patchset extends tagged pointer support to syscall arguments.
[...]
Do you think this is ready to be merged?
Should this go through the mm or the arm tree?
I would certainly prefer to take at least the arm64 bits via the arm64 tree (i.e. patches 1, 2 and 15). We also need a Documentation patch describing the new ABI.
Sounds good! Should I post those patches together with the Documentation patches from Vincenzo as a separate patchset?
Vincenzo, could you share the last version of the Documentation patches?
Thanks!
Will
On Wed, Jul 24, 2019 at 04:16:49PM +0200, Andrey Konovalov wrote:
On Wed, Jul 24, 2019 at 4:02 PM Will Deacon will@kernel.org wrote:
On Tue, Jul 23, 2019 at 08:03:29PM +0200, Andrey Konovalov wrote:
On Tue, Jul 23, 2019 at 7:59 PM Andrey Konovalov andreyknvl@google.com wrote:
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged pointers, due to these patches:
- 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer")
- 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers")
- 276e9327 ("arm64: entry: improve data abort handling of tagged pointers")
This patchset extends tagged pointer support to syscall arguments.
[...]
Do you think this is ready to be merged?
Should this go through the mm or the arm tree?
I would certainly prefer to take at least the arm64 bits via the arm64 tree (i.e. patches 1, 2 and 15). We also need a Documentation patch describing the new ABI.
Sounds good! Should I post those patches together with the Documentation patches from Vincenzo as a separate patchset?
Yes, please (although as you say below, we need a new version of those patches from Vincenzo to address the feedback on v5). The other thing I should say is that I'd be happy to queue the other patches in the series too, but some of them are missing acks from the relevant maintainers (e.g. the mm/ and fs/ changes).
Will
Hi Will and Andrey,
On 24/07/2019 15:20, Will Deacon wrote:
On Wed, Jul 24, 2019 at 04:16:49PM +0200, Andrey Konovalov wrote:
On Wed, Jul 24, 2019 at 4:02 PM Will Deacon will@kernel.org wrote:
On Tue, Jul 23, 2019 at 08:03:29PM +0200, Andrey Konovalov wrote:
On Tue, Jul 23, 2019 at 7:59 PM Andrey Konovalov andreyknvl@google.com wrote:
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged pointers, due to these patches:
- 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer")
- 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers")
- 276e9327 ("arm64: entry: improve data abort handling of tagged pointers")
This patchset extends tagged pointer support to syscall arguments.
[...]
Do you think this is ready to be merged?
Should this go through the mm or the arm tree?
I would certainly prefer to take at least the arm64 bits via the arm64 tree (i.e. patches 1, 2 and 15). We also need a Documentation patch describing the new ABI.
Sounds good! Should I post those patches together with the Documentation patches from Vincenzo as a separate patchset?
Yes, please (although as you say below, we need a new version of those patches from Vincenzo to address the feedback on v5). The other thing I should say is that I'd be happy to queue the other patches in the series too, but some of them are missing acks from the relevant maintainers (e.g. the mm/ and fs/ changes).
I am actively working on the document and will share v6 with the requested changes in the next few days.
Will
On Tue, Jul 23, 2019 at 07:58:48PM +0200, Andrey Konovalov wrote:
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Reviewed-by: Jason Gunthorpe jgg@mellanox.com Acked-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Andrey Konovalov andreyknvl@google.com
drivers/infiniband/hw/mlx4/mr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
Applied to rdma-for next, please don't sent it via other trees :)
Thanks, Jason
On Wed, Jul 24, 2019 at 9:25 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 23, 2019 at 07:58:48PM +0200, Andrey Konovalov wrote:
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Reviewed-by: Jason Gunthorpe jgg@mellanox.com Acked-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Andrey Konovalov andreyknvl@google.com
drivers/infiniband/hw/mlx4/mr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
Applied to rdma-for next, please don't sent it via other trees :)
Sure, thanks!
Thanks, Jason
On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64 kernel, hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
This patchset proposes a relaxation of the ABI with which it is possible to pass tagged tagged pointers to the syscalls, when these pointers are in memory ranges obtained as described in tagged-address-abi.txt contained in this patch series.
Since it is not desirable to relax the ABI to allow tagged user addresses into the kernel indiscriminately, this patchset documents a new sysctl interface (/proc/sys/abi/tagged_addr) that is used to prevent the applications from enabling the relaxed ABI and a new prctl() interface that can be used to enable or disable the relaxed ABI.
This patchset should be merged together with [1].
[1] https://patchwork.kernel.org/cover/10674351/
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Cc: Szabolcs Nagy szabolcs.nagy@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com
Vincenzo Frascino (2): arm64: Define Documentation/arm64/tagged-address-abi.rst arm64: Relax Documentation/arm64/tagged-pointers.rst
Documentation/arm64/tagged-address-abi.rst | 148 +++++++++++++++++++++ Documentation/arm64/tagged-pointers.rst | 23 +++- 2 files changed, 164 insertions(+), 7 deletions(-) create mode 100644 Documentation/arm64/tagged-address-abi.rst
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap().
This change in the ABI requires a mechanism to requires the userspace to opt-in to such an option.
Specify and document the way in which sysctl and prctl() can be used in combination to allow the userspace to opt-in this feature.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com Acked-by: Szabolcs Nagy szabolcs.nagy@arm.com --- Documentation/arm64/tagged-address-abi.rst | 148 +++++++++++++++++++++ 1 file changed, 148 insertions(+) create mode 100644 Documentation/arm64/tagged-address-abi.rst
diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst new file mode 100644 index 000000000000..a8ecb991de82 --- /dev/null +++ b/Documentation/arm64/tagged-address-abi.rst @@ -0,0 +1,148 @@ +======================== +ARM64 TAGGED ADDRESS ABI +======================== + +Author: Vincenzo Frascino vincenzo.frascino@arm.com + +Date: 25 July 2019 + +This document describes the usage and semantics of the Tagged Address +ABI on arm64. + +1. Introduction +--------------- + +On arm64 the TCR_EL1.TBI0 bit has always been enabled on the kernel, hence +the userspace (EL0) is entitled to perform a user memory access through a +64-bit pointer with a non-zero top byte but the resulting pointers are not +allowed at the user-kernel syscall ABI boundary. + +This document describes a relaxation of the ABI that makes it possible to +to pass tagged pointers to the syscalls, when these pointers are in memory +ranges obtained as described in section 2. + +Since it is not desirable to relax the ABI to allow tagged user addresses +into the kernel indiscriminately, arm64 provides a new sysctl interface +(/proc/sys/abi/tagged_addr) that is used to prevent the applications from +enabling the relaxed ABI and a new prctl() interface that can be used to +enable or disable the relaxed ABI. +A detailed description of the newly introduced mechanisms will be provided +in section 2. + +2. ARM64 Tagged Address ABI +--------------------------- + +From the kernel syscall interface perspective, we define, for the purposes +of this document, a "valid tagged pointer" as a pointer that either has a +zero value set in the top byte or has a non-zero value, is in memory ranges +privately owned by a userspace process and is obtained in one of the +following ways: +- mmap() done by the process itself, where either: + + - flags have **MAP_PRIVATE** and **MAP_ANONYMOUS** + - flags have **MAP_PRIVATE** and the file descriptor refers to a regular + file or **/dev/zero** + +- brk() system call done by the process itself (i.e. the heap area between + the initial location of the program break at process creation and its + current location). +- any memory mapped by the kernel in the process's address space during + creation and with the same restrictions as for mmap() (e.g. data, bss, + stack). + +The ARM64 Tagged Address ABI is an opt-in feature, and an application can +control it using the following: + +- **/proc/sys/abi/tagged_addr**: a new sysctl interface that can be used to + prevent the applications from enabling the access to the relaxed ABI. + The sysctl supports the following configuration options: + + - **0**: Disable the access to the ARM64 Tagged Address ABI for all + the applications. + - **1** (Default): Enable the access to the ARM64 Tagged Address ABI for + all the applications. + + If the access to the ARM64 Tagged Address ABI is disabled at a certain + point in time, all the applications that were using tagging before this + event occurs, will continue to use tagging. +- **prctl()s**: + + - **PR_SET_TAGGED_ADDR_CTRL**: Invoked by a process, can be used to enable or + disable its access to the ARM64 Tagged Address ABI. + + The (unsigned int) arg2 argument is a bit mask describing the control mode + used: + + - **PR_TAGGED_ADDR_ENABLE**: Enable ARM64 Tagged Address ABI. + + The prctl(PR_SET_TAGGED_ADDR_CTRL, ...) will return -EINVAL if the ARM64 + Tagged Address ABI is not available. + + The arguments arg3, arg4, and arg5 are ignored. + - **PR_GET_TAGGED_ADDR_CTRL**: can be used to check the status of the Tagged + Address ABI. + + The arguments arg2, arg3, arg4, and arg5 are ignored. + +The ABI properties set by the mechanisms described above are inherited by threads +of the same application and fork()'ed children but cleared by execve(). + +When a process has successfully opted into the new ABI by invoking +PR_SET_TAGGED_ADDR_CTRL prctl(), this guarantees the following behaviours: + + - Every currently available syscall, except the cases mentioned in section 3, can + accept any valid tagged pointer. The same rule is applicable to any syscall + introduced in the future. + - If a non valid tagged pointer is passed to a syscall then the behaviour + is undefined. + - Every valid tagged pointer is expected to work as an untagged one. + - The kernel preserves any valid tagged pointer and returns it to the + userspace unchanged (i.e. on syscall return) in all the cases except the + ones documented in the "Preserving tags" section of tagged-pointers.txt. + +A definition of the meaning of tagged pointers on arm64 can be found in: +Documentation/arm64/tagged-pointers.txt. + +3. ARM64 Tagged Address ABI Exceptions +-------------------------------------- + +The behaviours described in section 2, with particular reference to the +acceptance by the syscalls of any valid tagged pointer are not applicable +to the following cases: + + - mmap() addr parameter. + - mremap() new_address parameter. + - prctl(PR_SET_MM, PR_SET_MM_MAP, ...) struct prctl_mm_map fields. + - prctl(PR_SET_MM, PR_SET_MM_MAP_SIZE, ...) struct prctl_mm_map fields. + +Any attempt to use non-zero tagged pointers will lead to undefined behaviour. + +4. Example of correct usage +--------------------------- +.. code-block:: c + + void main(void) + { + static int tbi_enabled = 0; + unsigned long tag = 0; + + char *ptr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS, -1, 0); + + if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, + 0, 0, 0) == 0) + tbi_enabled = 1; + + if (ptr == (void *)-1) /* MAP_FAILED */ + return -1; + + if (tbi_enabled) + tag = rand() & 0xff; + + ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT)); + + *ptr = 'a'; + + ... + } +
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
With the relaxed ABI proposed in this set, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap().
Relax the requirements described in tagged-pointers.rst to be compliant with the behaviours guaranteed by the ARM64 Tagged Address ABI.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com Acked-by: Szabolcs Nagy szabolcs.nagy@arm.com --- Documentation/arm64/tagged-pointers.rst | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst index 2acdec3ebbeb..933aaef8d52f 100644 --- a/Documentation/arm64/tagged-pointers.rst +++ b/Documentation/arm64/tagged-pointers.rst @@ -20,7 +20,8 @@ Passing tagged addresses to the kernel --------------------------------------
All interpretation of userspace memory addresses by the kernel assumes -an address tag of 0x00. +an address tag of 0x00, unless the userspace opts-in the ARM64 Tagged +Address ABI via the PR_SET_TAGGED_ADDR_CTRL prctl().
This includes, but is not limited to, addresses found in:
@@ -33,18 +34,23 @@ This includes, but is not limited to, addresses found in: - the frame pointer (x29) and frame records, e.g. when interpreting them to generate a backtrace or call graph.
-Using non-zero address tags in any of these locations may result in an -error code being returned, a (fatal) signal being raised, or other modes -of failure. +Using non-zero address tags in any of these locations when the +userspace application did not opt-in to the ARM64 Tagged Address ABI +may result in an error code being returned, a (fatal) signal being raised, +or other modes of failure.
-For these reasons, passing non-zero address tags to the kernel via -system calls is forbidden, and using a non-zero address tag for sp is -strongly discouraged. +For these reasons, when the userspace application did not opt-in, passing +non-zero address tags to the kernel via system calls is forbidden, and using +a non-zero address tag for sp is strongly discouraged.
Programs maintaining a frame pointer and frame records that use non-zero address tags may suffer impaired or inaccurate debug and profiling visibility.
+A definition of the meaning of ARM64 Tagged Address ABI and of the +guarantees that the ABI provides when the userspace opts-in via prctl() +can be found in: Documentation/arm64/tagged-address-abi.rst. +
Preserving tags --------------- @@ -59,6 +65,9 @@ be preserved. The architecture prevents the use of a tagged PC, so the upper byte will be set to a sign-extension of bit 55 on exception return.
+These behaviours are preserved even when the userspace opts-in to the ARM64 +Tagged Address ABI via the PR_SET_TAGGED_ADDR_CTRL prctl(). +
Other considerations --------------------
Some more comments. Mostly minor wording issues, except the prctl() exclusion at the end.
On 25/07/2019 14:50, Vincenzo Frascino wrote:
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap().
This change in the ABI requires a mechanism to requires the userspace to opt-in to such an option.
Specify and document the way in which sysctl and prctl() can be used in combination to allow the userspace to opt-in this feature.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com Acked-by: Szabolcs Nagy szabolcs.nagy@arm.com
Documentation/arm64/tagged-address-abi.rst | 148 +++++++++++++++++++++ 1 file changed, 148 insertions(+) create mode 100644 Documentation/arm64/tagged-address-abi.rst
diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst new file mode 100644 index 000000000000..a8ecb991de82 --- /dev/null +++ b/Documentation/arm64/tagged-address-abi.rst @@ -0,0 +1,148 @@ +======================== +ARM64 TAGGED ADDRESS ABI +========================
+Author: Vincenzo Frascino vincenzo.frascino@arm.com
+Date: 25 July 2019
+This document describes the usage and semantics of the Tagged Address +ABI on arm64.
+1. Introduction +---------------
+On arm64 the TCR_EL1.TBI0 bit has always been enabled on the kernel, hence +the userspace (EL0) is entitled to perform a user memory access through a +64-bit pointer with a non-zero top byte but the resulting pointers are not +allowed at the user-kernel syscall ABI boundary.
+This document describes a relaxation of the ABI that makes it possible to +to pass tagged pointers to the syscalls, when these pointers are in memory
One too many "to" (at the end the previous line).
+ranges obtained as described in section 2.
+Since it is not desirable to relax the ABI to allow tagged user addresses +into the kernel indiscriminately, arm64 provides a new sysctl interface +(/proc/sys/abi/tagged_addr) that is used to prevent the applications from +enabling the relaxed ABI and a new prctl() interface that can be used to +enable or disable the relaxed ABI. +A detailed description of the newly introduced mechanisms will be provided +in section 2.
+2. ARM64 Tagged Address ABI +---------------------------
+From the kernel syscall interface perspective, we define, for the purposes +of this document, a "valid tagged pointer" as a pointer that either has a +zero value set in the top byte or has a non-zero value, is in memory ranges +privately owned by a userspace process and is obtained in one of the +following ways: +- mmap() done by the process itself, where either:
- flags have **MAP_PRIVATE** and **MAP_ANONYMOUS**
- flags have **MAP_PRIVATE** and the file descriptor refers to a regular
- file or **/dev/zero**
+- brk() system call done by the process itself (i.e. the heap area between
- the initial location of the program break at process creation and its
- current location).
+- any memory mapped by the kernel in the process's address space during
- creation and with the same restrictions as for mmap() (e.g. data, bss,
- stack).
+The ARM64 Tagged Address ABI is an opt-in feature, and an application can +control it using the following:
+- **/proc/sys/abi/tagged_addr**: a new sysctl interface that can be used to
- prevent the applications from enabling the access to the relaxed ABI.
- The sysctl supports the following configuration options:
- **0**: Disable the access to the ARM64 Tagged Address ABI for all
- the applications.
- **1** (Default): Enable the access to the ARM64 Tagged Address ABI for
- all the applications.
- If the access to the ARM64 Tagged Address ABI is disabled at a certain
- point in time, all the applications that were using tagging before this
- event occurs, will continue to use tagging.
"tagging" may be misinterpreted here. I would be more explicit by saying that the tagged address ABI remains enabled in processes that opted in before the access got disabled.
+- **prctl()s**:
- **PR_SET_TAGGED_ADDR_CTRL**: Invoked by a process, can be used to enable or
- disable its access to the ARM64 Tagged Address ABI.
I still find the wording confusing, because "access to the ABI" is not used consistently. The "tagged_addr" sysctl enables *access to the ABI*, that's fine. However, PR_SET_TAGGED_ADDR_CTRL enables *the ABI itself* (which is only possible if access to the ABI is enabled).
- The (unsigned int) arg2 argument is a bit mask describing the control mode
- used:
- **PR_TAGGED_ADDR_ENABLE**: Enable ARM64 Tagged Address ABI.
- The prctl(PR_SET_TAGGED_ADDR_CTRL, ...) will return -EINVAL if the ARM64
- Tagged Address ABI is not available.
For clarity, it would be good to mention that one possible reason for the ABI not to be available is tagged_addr == 0.
- The arguments arg3, arg4, and arg5 are ignored.
- **PR_GET_TAGGED_ADDR_CTRL**: can be used to check the status of the Tagged
- Address ABI.
- The arguments arg2, arg3, arg4, and arg5 are ignored.
+The ABI properties set by the mechanisms described above are inherited by threads +of the same application and fork()'ed children but cleared by execve().
+When a process has successfully opted into the new ABI by invoking +PR_SET_TAGGED_ADDR_CTRL prctl(), this guarantees the following behaviours:
- Every currently available syscall, except the cases mentioned in section 3, can
- accept any valid tagged pointer. The same rule is applicable to any syscall
- introduced in the future.
I thought Catalin wanted to drop this guarantee?
- If a non valid tagged pointer is passed to a syscall then the behaviour
- is undefined.
- Every valid tagged pointer is expected to work as an untagged one.
- The kernel preserves any valid tagged pointer and returns it to the
- userspace unchanged (i.e. on syscall return) in all the cases except the
- ones documented in the "Preserving tags" section of tagged-pointers.txt.
+A definition of the meaning of tagged pointers on arm64 can be found in: +Documentation/arm64/tagged-pointers.txt.
+3. ARM64 Tagged Address ABI Exceptions +--------------------------------------
+The behaviours described in section 2, with particular reference to the +acceptance by the syscalls of any valid tagged pointer are not applicable +to the following cases:
- mmap() addr parameter.
- mremap() new_address parameter.
- prctl(PR_SET_MM, PR_SET_MM_MAP, ...) struct prctl_mm_map fields.
- prctl(PR_SET_MM, PR_SET_MM_MAP_SIZE, ...) struct prctl_mm_map fields.
All the PR_SET_MM options that specify pointers (PR_SET_MM_START_CODE, PR_SET_MM_END_CODE, ...) should be excluded as well. AFAICT (but don't take my word for it), that's all of them except PR_SET_MM_EXE_FILE. Conversely, PR_SET_MM_MAP_SIZE should not be excluded (it does not pass a prctl_mm_map struct, and the pointer to unsigned int can be tagged).
Kevin
+Any attempt to use non-zero tagged pointers will lead to undefined behaviour.
+4. Example of correct usage +--------------------------- +.. code-block:: c
- void main(void)
- {
static int tbi_enabled = 0;
unsigned long tag = 0;
char *ptr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS, -1, 0);
if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE,
0, 0, 0) == 0)
tbi_enabled = 1;
if (ptr == (void *)-1) /* MAP_FAILED */
return -1;
if (tbi_enabled)
tag = rand() & 0xff;
ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
*ptr = 'a';
...
- }
Hi Kevin,
On 7/30/19 11:32 AM, Kevin Brodsky wrote:
Some more comments. Mostly minor wording issues, except the prctl() exclusion at the end.
On 25/07/2019 14:50, Vincenzo Frascino wrote:
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap().
This change in the ABI requires a mechanism to requires the userspace to opt-in to such an option.
Specify and document the way in which sysctl and prctl() can be used in combination to allow the userspace to opt-in this feature.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com Acked-by: Szabolcs Nagy szabolcs.nagy@arm.com
Documentation/arm64/tagged-address-abi.rst | 148 +++++++++++++++++++++ Â 1 file changed, 148 insertions(+) Â create mode 100644 Documentation/arm64/tagged-address-abi.rst
diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst new file mode 100644 index 000000000000..a8ecb991de82 --- /dev/null +++ b/Documentation/arm64/tagged-address-abi.rst @@ -0,0 +1,148 @@ +======================== +ARM64 TAGGED ADDRESS ABI +========================
+Author: Vincenzo Frascino vincenzo.frascino@arm.com
+Date: 25 July 2019
+This document describes the usage and semantics of the Tagged Address +ABI on arm64.
+1. Introduction +---------------
+On arm64 the TCR_EL1.TBI0 bit has always been enabled on the kernel, hence +the userspace (EL0) is entitled to perform a user memory access through a +64-bit pointer with a non-zero top byte but the resulting pointers are not +allowed at the user-kernel syscall ABI boundary.
+This document describes a relaxation of the ABI that makes it possible to +to pass tagged pointers to the syscalls, when these pointers are in memory
One too many "to" (at the end the previous line).
Yep will fix in v7.
+ranges obtained as described in section 2.
+Since it is not desirable to relax the ABI to allow tagged user addresses +into the kernel indiscriminately, arm64 provides a new sysctl interface +(/proc/sys/abi/tagged_addr) that is used to prevent the applications from +enabling the relaxed ABI and a new prctl() interface that can be used to +enable or disable the relaxed ABI. +A detailed description of the newly introduced mechanisms will be provided +in section 2.
+2. ARM64 Tagged Address ABI +---------------------------
+From the kernel syscall interface perspective, we define, for the purposes +of this document, a "valid tagged pointer" as a pointer that either has a +zero value set in the top byte or has a non-zero value, is in memory ranges +privately owned by a userspace process and is obtained in one of the +following ways: +- mmap() done by the process itself, where either:
+Â - flags have **MAP_PRIVATE** and **MAP_ANONYMOUS** +Â - flags have **MAP_PRIVATE** and the file descriptor refers to a regular +Â Â Â file or **/dev/zero**
+- brk() system call done by the process itself (i.e. the heap area between +Â the initial location of the program break at process creation and its +Â current location). +- any memory mapped by the kernel in the process's address space during +Â creation and with the same restrictions as for mmap() (e.g. data, bss, +Â stack).
+The ARM64 Tagged Address ABI is an opt-in feature, and an application can +control it using the following:
+- **/proc/sys/abi/tagged_addr**: a new sysctl interface that can be used to +Â prevent the applications from enabling the access to the relaxed ABI. +Â The sysctl supports the following configuration options:
+Â - **0**: Disable the access to the ARM64 Tagged Address ABI for all +Â Â Â the applications. +Â - **1** (Default): Enable the access to the ARM64 Tagged Address ABI for +Â Â Â all the applications.
+Â Â If the access to the ARM64 Tagged Address ABI is disabled at a certain +Â Â point in time, all the applications that were using tagging before this +Â Â event occurs, will continue to use tagging.
"tagging" may be misinterpreted here. I would be more explicit by saying that the tagged address ABI remains enabled in processes that opted in before the access got disabled.
Assuming that ARM64 Tagged Address ABI gives access to "tagging" and since it is what this document is talking about, I do not see how it can be misinterpreted ;)
+- **prctl()s**:
+Â - **PR_SET_TAGGED_ADDR_CTRL**: Invoked by a process, can be used to enable or +Â Â Â disable its access to the ARM64 Tagged Address ABI.
I still find the wording confusing, because "access to the ABI" is not used consistently. The "tagged_addr" sysctl enables *access to the ABI*, that's fine. However, PR_SET_TAGGED_ADDR_CTRL enables *the ABI itself* (which is only possible if access to the ABI is enabled).
As it stands, it enables or disables the ABI itself when used with PR_TAGGED_ADDR_ENABLE, or can enable other things in future. IMHO the only thing that these features have in common is the access to the ABI which is granted by this prctl().
+Â Â Â The (unsigned int) arg2 argument is a bit mask describing the control mode +Â Â Â used:
+Â Â Â - **PR_TAGGED_ADDR_ENABLE**: Enable ARM64 Tagged Address ABI.
+Â Â Â The prctl(PR_SET_TAGGED_ADDR_CTRL, ...) will return -EINVAL if the ARM64 +Â Â Â Tagged Address ABI is not available.
For clarity, it would be good to mention that one possible reason for the ABI not to be available is tagged_addr == 0.
The logical implication is already quite clear tagged_addr == 0 (Disabled) => Tagged Address ABI not available => return -EINVAL. I do not see the need to repeat the concept twice.
+Â Â Â The arguments arg3, arg4, and arg5 are ignored. +Â - **PR_GET_TAGGED_ADDR_CTRL**: can be used to check the status of the Tagged +Â Â Â Address ABI.
+Â Â Â The arguments arg2, arg3, arg4, and arg5 are ignored.
+The ABI properties set by the mechanisms described above are inherited by threads +of the same application and fork()'ed children but cleared by execve().
+When a process has successfully opted into the new ABI by invoking +PR_SET_TAGGED_ADDR_CTRL prctl(), this guarantees the following behaviours:
- Every currently available syscall, except the cases mentioned in section
3, can +Â Â accept any valid tagged pointer. The same rule is applicable to any syscall +Â Â introduced in the future.
I thought Catalin wanted to drop this guarantee?
The guarantee is changed and explicitly includes the syscalls that can be added in the future. IMHO since we are defining an ABI, we cannot leave that topic in an uncharted territory, we need to address it.
- If a non valid tagged pointer is passed to a syscall then the behaviour
+Â Â is undefined.
- Every valid tagged pointer is expected to work as an untagged one.
- The kernel preserves any valid tagged pointer and returns it to the
+Â Â userspace unchanged (i.e. on syscall return) in all the cases except the +Â Â ones documented in the "Preserving tags" section of tagged-pointers.txt.
+A definition of the meaning of tagged pointers on arm64 can be found in: +Documentation/arm64/tagged-pointers.txt.
+3. ARM64 Tagged Address ABI Exceptions +--------------------------------------
+The behaviours described in section 2, with particular reference to the +acceptance by the syscalls of any valid tagged pointer are not applicable +to the following cases:
- mmap() addr parameter.
- mremap() new_address parameter.
- prctl(PR_SET_MM, PR_SET_MM_MAP, ...) struct prctl_mm_map fields.
- prctl(PR_SET_MM, PR_SET_MM_MAP_SIZE, ...) struct prctl_mm_map fields.
All the PR_SET_MM options that specify pointers (PR_SET_MM_START_CODE, PR_SET_MM_END_CODE, ...) should be excluded as well. AFAICT (but don't take my word for it), that's all of them except PR_SET_MM_EXE_FILE. Conversely, PR_SET_MM_MAP_SIZE should not be excluded (it does not pass a prctl_mm_map struct, and the pointer to unsigned int can be tagged).
Agreed, I clearly misread the prctl() man page here. Fill fix in v7. PR_SET_MM_MAP_SIZE _returns_ struct prctl_mm_map, does not take it as a parameter.
Vincenzo
Kevin
+Any attempt to use non-zero tagged pointers will lead to undefined behaviour.
+4. Example of correct usage +--------------------------- +.. code-block:: c
+Â Â void main(void) +Â Â { +Â Â Â Â Â Â Â Â Â Â static int tbi_enabled = 0; +Â Â Â Â Â Â Â Â Â Â unsigned long tag = 0;
+Â Â Â Â Â Â Â Â Â Â char *ptr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â MAP_ANONYMOUS, -1, 0);
+Â Â Â Â Â Â Â Â Â Â if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0, 0, 0) == 0) +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â tbi_enabled = 1;
+Â Â Â Â Â Â Â Â Â Â if (ptr == (void *)-1) /* MAP_FAILED */ +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return -1;
+Â Â Â Â Â Â Â Â Â Â if (tbi_enabled) +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â tag = rand() & 0xff;
+Â Â Â Â Â Â Â Â Â Â ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
+Â Â Â Â Â Â Â Â Â Â *ptr = 'a';
+Â Â Â Â Â Â Â Â Â Â ... +Â Â }
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 30/07/2019 14:25, Vincenzo Frascino wrote:
Hi Kevin,
On 7/30/19 11:32 AM, Kevin Brodsky wrote:
Some more comments. Mostly minor wording issues, except the prctl() exclusion at the end.
On 25/07/2019 14:50, Vincenzo Frascino wrote:
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap().
This change in the ABI requires a mechanism to requires the userspace to opt-in to such an option.
Specify and document the way in which sysctl and prctl() can be used in combination to allow the userspace to opt-in this feature.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com Acked-by: Szabolcs Nagy szabolcs.nagy@arm.com
Documentation/arm64/tagged-address-abi.rst | 148 +++++++++++++++++++++ Â 1 file changed, 148 insertions(+) Â create mode 100644 Documentation/arm64/tagged-address-abi.rst
diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst new file mode 100644 index 000000000000..a8ecb991de82 --- /dev/null +++ b/Documentation/arm64/tagged-address-abi.rst @@ -0,0 +1,148 @@ +======================== +ARM64 TAGGED ADDRESS ABI +========================
+Author: Vincenzo Frascino vincenzo.frascino@arm.com
+Date: 25 July 2019
+This document describes the usage and semantics of the Tagged Address +ABI on arm64.
+1. Introduction +---------------
+On arm64 the TCR_EL1.TBI0 bit has always been enabled on the kernel, hence +the userspace (EL0) is entitled to perform a user memory access through a +64-bit pointer with a non-zero top byte but the resulting pointers are not +allowed at the user-kernel syscall ABI boundary.
+This document describes a relaxation of the ABI that makes it possible to +to pass tagged pointers to the syscalls, when these pointers are in memory
One too many "to" (at the end the previous line).
Yep will fix in v7.
+ranges obtained as described in section 2.
+Since it is not desirable to relax the ABI to allow tagged user addresses +into the kernel indiscriminately, arm64 provides a new sysctl interface +(/proc/sys/abi/tagged_addr) that is used to prevent the applications from +enabling the relaxed ABI and a new prctl() interface that can be used to +enable or disable the relaxed ABI. +A detailed description of the newly introduced mechanisms will be provided +in section 2.
+2. ARM64 Tagged Address ABI +---------------------------
+From the kernel syscall interface perspective, we define, for the purposes +of this document, a "valid tagged pointer" as a pointer that either has a +zero value set in the top byte or has a non-zero value, is in memory ranges +privately owned by a userspace process and is obtained in one of the +following ways: +- mmap() done by the process itself, where either:
+Â - flags have **MAP_PRIVATE** and **MAP_ANONYMOUS** +Â - flags have **MAP_PRIVATE** and the file descriptor refers to a regular +Â Â Â file or **/dev/zero**
+- brk() system call done by the process itself (i.e. the heap area between +Â the initial location of the program break at process creation and its +Â current location). +- any memory mapped by the kernel in the process's address space during +Â creation and with the same restrictions as for mmap() (e.g. data, bss, +Â stack).
+The ARM64 Tagged Address ABI is an opt-in feature, and an application can +control it using the following:
+- **/proc/sys/abi/tagged_addr**: a new sysctl interface that can be used to +Â prevent the applications from enabling the access to the relaxed ABI. +Â The sysctl supports the following configuration options:
+Â - **0**: Disable the access to the ARM64 Tagged Address ABI for all +Â Â Â the applications. +Â - **1** (Default): Enable the access to the ARM64 Tagged Address ABI for +Â Â Â all the applications.
+Â Â If the access to the ARM64 Tagged Address ABI is disabled at a certain +Â Â point in time, all the applications that were using tagging before this +Â Â event occurs, will continue to use tagging.
"tagging" may be misinterpreted here. I would be more explicit by saying that the tagged address ABI remains enabled in processes that opted in before the access got disabled.
Assuming that ARM64 Tagged Address ABI gives access to "tagging" and since it is what this document is talking about, I do not see how it can be misinterpreted ;)
"tagging" is a confusing term ("using tagging" even more so), it could be interpreted as memory tagging (especially in the presence of MTE). This document does not use "tagging" anywhere else, which is good. Let's stick to the same name for the ABI throughout the document, repetition is less problematic than vague wording.
+- **prctl()s**:
+Â - **PR_SET_TAGGED_ADDR_CTRL**: Invoked by a process, can be used to enable or +Â Â Â disable its access to the ARM64 Tagged Address ABI.
I still find the wording confusing, because "access to the ABI" is not used consistently. The "tagged_addr" sysctl enables *access to the ABI*, that's fine. However, PR_SET_TAGGED_ADDR_CTRL enables *the ABI itself* (which is only possible if access to the ABI is enabled).
As it stands, it enables or disables the ABI itself when used with PR_TAGGED_ADDR_ENABLE, or can enable other things in future. IMHO the only thing that these features have in common is the access to the ABI which is granted by this prctl().
I see your point, you could have other bits controlling other aspects. However, I would really avoid saying that this prctl is used to enable or disable access to the new ABI, because it isn't (either you have access to the new ABI and this prctl can be used, or you don't and this prctl will fail).
+Â Â Â The (unsigned int) arg2 argument is a bit mask describing the control mode +Â Â Â used:
+Â Â Â - **PR_TAGGED_ADDR_ENABLE**: Enable ARM64 Tagged Address ABI.
+Â Â Â The prctl(PR_SET_TAGGED_ADDR_CTRL, ...) will return -EINVAL if the ARM64 +Â Â Â Tagged Address ABI is not available.
For clarity, it would be good to mention that one possible reason for the ABI not to be available is tagged_addr == 0.
The logical implication is already quite clear tagged_addr == 0 (Disabled) => Tagged Address ABI not available => return -EINVAL. I do not see the need to repeat the concept twice.
+Â Â Â The arguments arg3, arg4, and arg5 are ignored. +Â - **PR_GET_TAGGED_ADDR_CTRL**: can be used to check the status of the Tagged +Â Â Â Address ABI.
+Â Â Â The arguments arg2, arg3, arg4, and arg5 are ignored.
+The ABI properties set by the mechanisms described above are inherited by threads +of the same application and fork()'ed children but cleared by execve().
+When a process has successfully opted into the new ABI by invoking +PR_SET_TAGGED_ADDR_CTRL prctl(), this guarantees the following behaviours:
- Every currently available syscall, except the cases mentioned in section
3, can +Â Â accept any valid tagged pointer. The same rule is applicable to any syscall +Â Â introduced in the future.
I thought Catalin wanted to drop this guarantee?
The guarantee is changed and explicitly includes the syscalls that can be added in the future. IMHO since we are defining an ABI, we cannot leave that topic in an uncharted territory, we need to address it.
It makes sense to me, just wanted to be sure that Catalin is on the same page.
- If a non valid tagged pointer is passed to a syscall then the behaviour
+Â Â is undefined.
- Every valid tagged pointer is expected to work as an untagged one.
- The kernel preserves any valid tagged pointer and returns it to the
+Â Â userspace unchanged (i.e. on syscall return) in all the cases except the +Â Â ones documented in the "Preserving tags" section of tagged-pointers.txt.
+A definition of the meaning of tagged pointers on arm64 can be found in: +Documentation/arm64/tagged-pointers.txt.
+3. ARM64 Tagged Address ABI Exceptions +--------------------------------------
+The behaviours described in section 2, with particular reference to the +acceptance by the syscalls of any valid tagged pointer are not applicable +to the following cases:
- mmap() addr parameter.
- mremap() new_address parameter.
- prctl(PR_SET_MM, PR_SET_MM_MAP, ...) struct prctl_mm_map fields.
- prctl(PR_SET_MM, PR_SET_MM_MAP_SIZE, ...) struct prctl_mm_map fields.
All the PR_SET_MM options that specify pointers (PR_SET_MM_START_CODE, PR_SET_MM_END_CODE, ...) should be excluded as well. AFAICT (but don't take my word for it), that's all of them except PR_SET_MM_EXE_FILE. Conversely, PR_SET_MM_MAP_SIZE should not be excluded (it does not pass a prctl_mm_map struct, and the pointer to unsigned int can be tagged).
Agreed, I clearly misread the prctl() man page here. Fill fix in v7. PR_SET_MM_MAP_SIZE _returns_ struct prctl_mm_map, does not take it as a parameter.
OK. About PR_SET_MM_MAP_SIZE, it neither takes nor returns struct prctl_mm_map. It writes the size of prctl_map to the int pointed to by arg3, and does nothing else. Therefore, there's no need to exclude it.
BTW I've just realised that the man page is wrong about PR_SET_MM_MAP_SIZE, the pointer to int is passed in arg3, not arg4. Anyone knows where to report that?
Thanks, Kevin
Vincenzo
Kevin
+Any attempt to use non-zero tagged pointers will lead to undefined behaviour.
+4. Example of correct usage +--------------------------- +.. code-block:: c
+Â Â void main(void) +Â Â { +Â Â Â Â Â Â Â Â Â Â static int tbi_enabled = 0; +Â Â Â Â Â Â Â Â Â Â unsigned long tag = 0;
+Â Â Â Â Â Â Â Â Â Â char *ptr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â MAP_ANONYMOUS, -1, 0);
+Â Â Â Â Â Â Â Â Â Â if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0, 0, 0) == 0) +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â tbi_enabled = 1;
+Â Â Â Â Â Â Â Â Â Â if (ptr == (void *)-1) /* MAP_FAILED */ +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return -1;
+Â Â Â Â Â Â Â Â Â Â if (tbi_enabled) +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â tag = rand() & 0xff;
+Â Â Â Â Â Â Â Â Â Â ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
+Â Â Â Â Â Â Â Â Â Â *ptr = 'a';
+Â Â Â Â Â Â Â Â Â Â ... +Â Â }
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Kevin,
On 7/30/19 2:57 PM, Kevin Brodsky wrote:
On 30/07/2019 14:25, Vincenzo Frascino wrote:
Hi Kevin,
On 7/30/19 11:32 AM, Kevin Brodsky wrote:
Some more comments. Mostly minor wording issues, except the prctl() exclusion at the end.
On 25/07/2019 14:50, Vincenzo Frascino wrote:
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap().
This change in the ABI requires a mechanism to requires the userspace to opt-in to such an option.
Specify and document the way in which sysctl and prctl() can be used in combination to allow the userspace to opt-in this feature.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com Acked-by: Szabolcs Nagy szabolcs.nagy@arm.com
Documentation/arm64/tagged-address-abi.rst | 148 +++++++++++++++++++++ Â Â 1 file changed, 148 insertions(+) Â Â create mode 100644 Documentation/arm64/tagged-address-abi.rst
diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst new file mode 100644 index 000000000000..a8ecb991de82 --- /dev/null +++ b/Documentation/arm64/tagged-address-abi.rst @@ -0,0 +1,148 @@ +======================== +ARM64 TAGGED ADDRESS ABI +========================
+Author: Vincenzo Frascino vincenzo.frascino@arm.com
+Date: 25 July 2019
+This document describes the usage and semantics of the Tagged Address +ABI on arm64.
+1. Introduction +---------------
+On arm64 the TCR_EL1.TBI0 bit has always been enabled on the kernel, hence +the userspace (EL0) is entitled to perform a user memory access through a +64-bit pointer with a non-zero top byte but the resulting pointers are not +allowed at the user-kernel syscall ABI boundary.
+This document describes a relaxation of the ABI that makes it possible to +to pass tagged pointers to the syscalls, when these pointers are in memory
One too many "to" (at the end the previous line).
Yep will fix in v7.
+ranges obtained as described in section 2.
+Since it is not desirable to relax the ABI to allow tagged user addresses +into the kernel indiscriminately, arm64 provides a new sysctl interface +(/proc/sys/abi/tagged_addr) that is used to prevent the applications from +enabling the relaxed ABI and a new prctl() interface that can be used to +enable or disable the relaxed ABI. +A detailed description of the newly introduced mechanisms will be provided +in section 2.
+2. ARM64 Tagged Address ABI +---------------------------
+From the kernel syscall interface perspective, we define, for the purposes +of this document, a "valid tagged pointer" as a pointer that either has a +zero value set in the top byte or has a non-zero value, is in memory ranges +privately owned by a userspace process and is obtained in one of the +following ways: +- mmap() done by the process itself, where either:
+Â - flags have **MAP_PRIVATE** and **MAP_ANONYMOUS** +Â - flags have **MAP_PRIVATE** and the file descriptor refers to a regular +Â Â Â file or **/dev/zero**
+- brk() system call done by the process itself (i.e. the heap area between +Â the initial location of the program break at process creation and its +Â current location). +- any memory mapped by the kernel in the process's address space during +Â creation and with the same restrictions as for mmap() (e.g. data, bss, +Â stack).
+The ARM64 Tagged Address ABI is an opt-in feature, and an application can +control it using the following:
+- **/proc/sys/abi/tagged_addr**: a new sysctl interface that can be used to +Â prevent the applications from enabling the access to the relaxed ABI. +Â The sysctl supports the following configuration options:
+Â - **0**: Disable the access to the ARM64 Tagged Address ABI for all +Â Â Â the applications. +Â - **1** (Default): Enable the access to the ARM64 Tagged Address ABI for +Â Â Â all the applications.
+Â Â If the access to the ARM64 Tagged Address ABI is disabled at a certain +Â Â point in time, all the applications that were using tagging before this +Â Â event occurs, will continue to use tagging.
"tagging" may be misinterpreted here. I would be more explicit by saying that the tagged address ABI remains enabled in processes that opted in before the access got disabled.
Assuming that ARM64 Tagged Address ABI gives access to "tagging" and since it is what this document is talking about, I do not see how it can be misinterpreted ;)
"tagging" is a confusing term ("using tagging" even more so), it could be interpreted as memory tagging (especially in the presence of MTE). This document does not use "tagging" anywhere else, which is good. Let's stick to the same name for the ABI throughout the document, repetition is less problematic than vague wording.
This document does not cover MTE, it covers the "ARM64 Tagged Address ABI" hence "tagging" has a precise semantical meaning in this context. Still I do not see how it can be confused.
+- **prctl()s**:
+Â - **PR_SET_TAGGED_ADDR_CTRL**: Invoked by a process, can be used to enable or +Â Â Â disable its access to the ARM64 Tagged Address ABI.
I still find the wording confusing, because "access to the ABI" is not used consistently. The "tagged_addr" sysctl enables *access to the ABI*, that's fine. However, PR_SET_TAGGED_ADDR_CTRL enables *the ABI itself* (which is only possible if access to the ABI is enabled).
As it stands, it enables or disables the ABI itself when used with PR_TAGGED_ADDR_ENABLE, or can enable other things in future. IMHO the only thing that these features have in common is the access to the ABI which is granted by this prctl().
I see your point, you could have other bits controlling other aspects. However, I would really avoid saying that this prctl is used to enable or disable access to the new ABI, because it isn't (either you have access to the new ABI and this prctl can be used, or you don't and this prctl will fail).
What is the system wide evidence that the access to the ABI is denied? Or what is the system wide evidence that it is granted?
In other words, is it enough for a process to have the sysctl set (system wide) to know that the the ABI is enabled and have granted access to it? or does it need to do something else?
+Â Â Â The (unsigned int) arg2 argument is a bit mask describing the control mode +Â Â Â used:
+Â Â Â - **PR_TAGGED_ADDR_ENABLE**: Enable ARM64 Tagged Address ABI.
+Â Â Â The prctl(PR_SET_TAGGED_ADDR_CTRL, ...) will return -EINVAL if the ARM64 +Â Â Â Tagged Address ABI is not available.
For clarity, it would be good to mention that one possible reason for the ABI not to be available is tagged_addr == 0.
The logical implication is already quite clear tagged_addr == 0 (Disabled) => Tagged Address ABI not available => return -EINVAL. I do not see the need to repeat the concept twice.
+Â Â Â The arguments arg3, arg4, and arg5 are ignored. +Â - **PR_GET_TAGGED_ADDR_CTRL**: can be used to check the status of the Tagged +Â Â Â Address ABI.
+Â Â Â The arguments arg2, arg3, arg4, and arg5 are ignored.
+The ABI properties set by the mechanisms described above are inherited by threads +of the same application and fork()'ed children but cleared by execve().
+When a process has successfully opted into the new ABI by invoking +PR_SET_TAGGED_ADDR_CTRL prctl(), this guarantees the following behaviours:
- Every currently available syscall, except the cases mentioned in section
3, can +Â Â accept any valid tagged pointer. The same rule is applicable to any syscall +Â Â introduced in the future.
I thought Catalin wanted to drop this guarantee?
The guarantee is changed and explicitly includes the syscalls that can be added in the future. IMHO since we are defining an ABI, we cannot leave that topic in an uncharted territory, we need to address it.
It makes sense to me, just wanted to be sure that Catalin is on the same page.
- If a non valid tagged pointer is passed to a syscall then the behaviour
+Â Â is undefined.
- Every valid tagged pointer is expected to work as an untagged one.
- The kernel preserves any valid tagged pointer and returns it to the
+Â Â userspace unchanged (i.e. on syscall return) in all the cases except the +Â Â ones documented in the "Preserving tags" section of tagged-pointers.txt.
+A definition of the meaning of tagged pointers on arm64 can be found in: +Documentation/arm64/tagged-pointers.txt.
+3. ARM64 Tagged Address ABI Exceptions +--------------------------------------
+The behaviours described in section 2, with particular reference to the +acceptance by the syscalls of any valid tagged pointer are not applicable +to the following cases:
- mmap() addr parameter.
- mremap() new_address parameter.
- prctl(PR_SET_MM, PR_SET_MM_MAP, ...) struct prctl_mm_map fields.
- prctl(PR_SET_MM, PR_SET_MM_MAP_SIZE, ...) struct prctl_mm_map fields.
All the PR_SET_MM options that specify pointers (PR_SET_MM_START_CODE, PR_SET_MM_END_CODE, ...) should be excluded as well. AFAICT (but don't take my word for it), that's all of them except PR_SET_MM_EXE_FILE. Conversely, PR_SET_MM_MAP_SIZE should not be excluded (it does not pass a prctl_mm_map struct, and the pointer to unsigned int can be tagged).
Agreed, I clearly misread the prctl() man page here. Fill fix in v7. PR_SET_MM_MAP_SIZE _returns_Â struct prctl_mm_map, does not take it as a parameter.
OK. About PR_SET_MM_MAP_SIZE, it neither takes nor returns struct prctl_mm_map. It writes the size of prctl_map to the int pointed to by arg3, and does nothing else. Therefore, there's no need to exclude it.
Agreed, I missed the word size in my reply: s/_returns_ struct prctl_mm_map/_returns_ the size of struct prctl_mm_map/
BTW I've just realised that the man page is wrong about PR_SET_MM_MAP_SIZE, the pointer to int is passed in arg3, not arg4. Anyone knows where to report that?
Thanks, Kevin
Vincenzo
Kevin
+Any attempt to use non-zero tagged pointers will lead to undefined behaviour.
+4. Example of correct usage +--------------------------- +.. code-block:: c
+Â Â void main(void) +Â Â { +Â Â Â Â Â Â Â Â Â Â static int tbi_enabled = 0; +Â Â Â Â Â Â Â Â Â Â unsigned long tag = 0;
+Â Â Â Â Â Â Â Â Â Â char *ptr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â MAP_ANONYMOUS, -1, 0);
+Â Â Â Â Â Â Â Â Â Â if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0, 0, 0) == 0) +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â tbi_enabled = 1;
+Â Â Â Â Â Â Â Â Â Â if (ptr == (void *)-1) /* MAP_FAILED */ +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return -1;
+Â Â Â Â Â Â Â Â Â Â if (tbi_enabled) +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â tag = rand() & 0xff;
+Â Â Â Â Â Â Â Â Â Â ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
+Â Â Â Â Â Â Â Â Â Â *ptr = 'a';
+Â Â Â Â Â Â Â Â Â Â ... +Â Â }
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 30/07/2019 15:24, Vincenzo Frascino wrote:
Hi Kevin,
On 7/30/19 2:57 PM, Kevin Brodsky wrote:
On 30/07/2019 14:25, Vincenzo Frascino wrote:
Hi Kevin,
On 7/30/19 11:32 AM, Kevin Brodsky wrote:
Some more comments. Mostly minor wording issues, except the prctl() exclusion at the end.
On 25/07/2019 14:50, Vincenzo Frascino wrote:
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap().
This change in the ABI requires a mechanism to requires the userspace to opt-in to such an option.
Specify and document the way in which sysctl and prctl() can be used in combination to allow the userspace to opt-in this feature.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com Acked-by: Szabolcs Nagy szabolcs.nagy@arm.com
Documentation/arm64/tagged-address-abi.rst | 148 +++++++++++++++++++++ Â Â 1 file changed, 148 insertions(+) Â Â create mode 100644 Documentation/arm64/tagged-address-abi.rst
diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst new file mode 100644 index 000000000000..a8ecb991de82 --- /dev/null +++ b/Documentation/arm64/tagged-address-abi.rst @@ -0,0 +1,148 @@ +======================== +ARM64 TAGGED ADDRESS ABI +========================
+Author: Vincenzo Frascino vincenzo.frascino@arm.com
+Date: 25 July 2019
+This document describes the usage and semantics of the Tagged Address +ABI on arm64.
+1. Introduction +---------------
+On arm64 the TCR_EL1.TBI0 bit has always been enabled on the kernel, hence +the userspace (EL0) is entitled to perform a user memory access through a +64-bit pointer with a non-zero top byte but the resulting pointers are not +allowed at the user-kernel syscall ABI boundary.
+This document describes a relaxation of the ABI that makes it possible to +to pass tagged pointers to the syscalls, when these pointers are in memory
One too many "to" (at the end the previous line).
Yep will fix in v7.
+ranges obtained as described in section 2.
+Since it is not desirable to relax the ABI to allow tagged user addresses +into the kernel indiscriminately, arm64 provides a new sysctl interface +(/proc/sys/abi/tagged_addr) that is used to prevent the applications from +enabling the relaxed ABI and a new prctl() interface that can be used to +enable or disable the relaxed ABI. +A detailed description of the newly introduced mechanisms will be provided +in section 2.
+2. ARM64 Tagged Address ABI +---------------------------
+From the kernel syscall interface perspective, we define, for the purposes +of this document, a "valid tagged pointer" as a pointer that either has a +zero value set in the top byte or has a non-zero value, is in memory ranges +privately owned by a userspace process and is obtained in one of the +following ways: +- mmap() done by the process itself, where either:
+Â - flags have **MAP_PRIVATE** and **MAP_ANONYMOUS** +Â - flags have **MAP_PRIVATE** and the file descriptor refers to a regular +Â Â Â file or **/dev/zero**
+- brk() system call done by the process itself (i.e. the heap area between +Â the initial location of the program break at process creation and its +Â current location). +- any memory mapped by the kernel in the process's address space during +Â creation and with the same restrictions as for mmap() (e.g. data, bss, +Â stack).
+The ARM64 Tagged Address ABI is an opt-in feature, and an application can +control it using the following:
+- **/proc/sys/abi/tagged_addr**: a new sysctl interface that can be used to +Â prevent the applications from enabling the access to the relaxed ABI. +Â The sysctl supports the following configuration options:
+Â - **0**: Disable the access to the ARM64 Tagged Address ABI for all +Â Â Â the applications. +Â - **1** (Default): Enable the access to the ARM64 Tagged Address ABI for +Â Â Â all the applications.
+Â Â If the access to the ARM64 Tagged Address ABI is disabled at a certain +Â Â point in time, all the applications that were using tagging before this +Â Â event occurs, will continue to use tagging.
"tagging" may be misinterpreted here. I would be more explicit by saying that the tagged address ABI remains enabled in processes that opted in before the access got disabled.
Assuming that ARM64 Tagged Address ABI gives access to "tagging" and since it is what this document is talking about, I do not see how it can be misinterpreted ;)
"tagging" is a confusing term ("using tagging" even more so), it could be interpreted as memory tagging (especially in the presence of MTE). This document does not use "tagging" anywhere else, which is good. Let's stick to the same name for the ABI throughout the document, repetition is less problematic than vague wording.
This document does not cover MTE, it covers the "ARM64 Tagged Address ABI" hence "tagging" has a precise semantical meaning in this context. Still I do not see how it can be confused.
+- **prctl()s**:
+Â - **PR_SET_TAGGED_ADDR_CTRL**: Invoked by a process, can be used to enable or +Â Â Â disable its access to the ARM64 Tagged Address ABI.
I still find the wording confusing, because "access to the ABI" is not used consistently. The "tagged_addr" sysctl enables *access to the ABI*, that's fine. However, PR_SET_TAGGED_ADDR_CTRL enables *the ABI itself* (which is only possible if access to the ABI is enabled).
As it stands, it enables or disables the ABI itself when used with PR_TAGGED_ADDR_ENABLE, or can enable other things in future. IMHO the only thing that these features have in common is the access to the ABI which is granted by this prctl().
I see your point, you could have other bits controlling other aspects. However, I would really avoid saying that this prctl is used to enable or disable access to the new ABI, because it isn't (either you have access to the new ABI and this prctl can be used, or you don't and this prctl will fail).
What is the system wide evidence that the access to the ABI is denied? Or what is the system wide evidence that it is granted?
In other words, is it enough for a process to have the sysctl set (system wide) to know that the the ABI is enabled and have granted access to it? or does it need to do something else?
I think we really have a wording problem here, which is why this part of the document and this discussion is confusing.
tagged_addr=1 (system-wide) allows processes to enable the tagged address ABI by calling prctl(PR_SET_TAGGED_ADDR_CTRL). It does not alter the state of any running process, and does not enable the ABI by default for new processes either. Conversely, when tagged_addr=0, that prctl() is always denied.
The current description of the sysctl and prctl does not make that clear. I think that it would be much more obvious by reorganising that section as such: - prctl() first, the current wording is fine. - sysctl() second, described *only* in terms of the prctl() (denying PR_SET_TAGGED_ADDR_CTRL or not), and nothing else, to avoid wording issues.
It's certainly not the only way to do it, but that would be much clearer to me :)
Kevin
+Â Â Â The (unsigned int) arg2 argument is a bit mask describing the control mode +Â Â Â used:
+Â Â Â - **PR_TAGGED_ADDR_ENABLE**: Enable ARM64 Tagged Address ABI.
+Â Â Â The prctl(PR_SET_TAGGED_ADDR_CTRL, ...) will return -EINVAL if the ARM64 +Â Â Â Tagged Address ABI is not available.
For clarity, it would be good to mention that one possible reason for the ABI not to be available is tagged_addr == 0.
The logical implication is already quite clear tagged_addr == 0 (Disabled) => Tagged Address ABI not available => return -EINVAL. I do not see the need to repeat the concept twice.
+Â Â Â The arguments arg3, arg4, and arg5 are ignored. +Â - **PR_GET_TAGGED_ADDR_CTRL**: can be used to check the status of the Tagged +Â Â Â Address ABI.
+Â Â Â The arguments arg2, arg3, arg4, and arg5 are ignored.
+The ABI properties set by the mechanisms described above are inherited by threads +of the same application and fork()'ed children but cleared by execve().
+When a process has successfully opted into the new ABI by invoking +PR_SET_TAGGED_ADDR_CTRL prctl(), this guarantees the following behaviours:
- Every currently available syscall, except the cases mentioned in section
3, can +Â Â accept any valid tagged pointer. The same rule is applicable to any syscall +Â Â introduced in the future.
I thought Catalin wanted to drop this guarantee?
The guarantee is changed and explicitly includes the syscalls that can be added in the future. IMHO since we are defining an ABI, we cannot leave that topic in an uncharted territory, we need to address it.
It makes sense to me, just wanted to be sure that Catalin is on the same page.
- If a non valid tagged pointer is passed to a syscall then the behaviour
+Â Â is undefined.
- Every valid tagged pointer is expected to work as an untagged one.
- The kernel preserves any valid tagged pointer and returns it to the
+Â Â userspace unchanged (i.e. on syscall return) in all the cases except the +Â Â ones documented in the "Preserving tags" section of tagged-pointers.txt.
+A definition of the meaning of tagged pointers on arm64 can be found in: +Documentation/arm64/tagged-pointers.txt.
+3. ARM64 Tagged Address ABI Exceptions +--------------------------------------
+The behaviours described in section 2, with particular reference to the +acceptance by the syscalls of any valid tagged pointer are not applicable +to the following cases:
- mmap() addr parameter.
- mremap() new_address parameter.
- prctl(PR_SET_MM, PR_SET_MM_MAP, ...) struct prctl_mm_map fields.
- prctl(PR_SET_MM, PR_SET_MM_MAP_SIZE, ...) struct prctl_mm_map fields.
All the PR_SET_MM options that specify pointers (PR_SET_MM_START_CODE, PR_SET_MM_END_CODE, ...) should be excluded as well. AFAICT (but don't take my word for it), that's all of them except PR_SET_MM_EXE_FILE. Conversely, PR_SET_MM_MAP_SIZE should not be excluded (it does not pass a prctl_mm_map struct, and the pointer to unsigned int can be tagged).
Agreed, I clearly misread the prctl() man page here. Fill fix in v7. PR_SET_MM_MAP_SIZE _returns_Â struct prctl_mm_map, does not take it as a parameter.
OK. About PR_SET_MM_MAP_SIZE, it neither takes nor returns struct prctl_mm_map. It writes the size of prctl_map to the int pointed to by arg3, and does nothing else. Therefore, there's no need to exclude it.
Agreed, I missed the word size in my reply: s/_returns_ struct prctl_mm_map/_returns_ the size of struct prctl_mm_map/
BTW I've just realised that the man page is wrong about PR_SET_MM_MAP_SIZE, the pointer to int is passed in arg3, not arg4. Anyone knows where to report that?
Thanks, Kevin
Vincenzo
Kevin
+Any attempt to use non-zero tagged pointers will lead to undefined behaviour.
+4. Example of correct usage +--------------------------- +.. code-block:: c
+Â Â void main(void) +Â Â { +Â Â Â Â Â Â Â Â Â Â static int tbi_enabled = 0; +Â Â Â Â Â Â Â Â Â Â unsigned long tag = 0;
+Â Â Â Â Â Â Â Â Â Â char *ptr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â MAP_ANONYMOUS, -1, 0);
+Â Â Â Â Â Â Â Â Â Â if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0, 0, 0) == 0) +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â tbi_enabled = 1;
+Â Â Â Â Â Â Â Â Â Â if (ptr == (void *)-1) /* MAP_FAILED */ +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return -1;
+Â Â Â Â Â Â Â Â Â Â if (tbi_enabled) +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â tag = rand() & 0xff;
+Â Â Â Â Â Â Â Â Â Â ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
+Â Â Â Â Â Â Â Â Â Â *ptr = 'a';
+Â Â Â Â Â Â Â Â Â Â ... +Â Â }
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 7/25/19 6:50 AM, Vincenzo Frascino wrote:
With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap().
I don't see a lot of description of why this restriction is necessary. What's the problem with supporting MAP_SHARED?
On 7/23/19 10:58 AM, Andrey Konovalov wrote:
+long set_tagged_addr_ctrl(unsigned long arg) +{
- if (!tagged_addr_prctl_allowed)
return -EINVAL;
- if (is_compat_task())
return -EINVAL;
- if (arg & ~PR_TAGGED_ADDR_ENABLE)
return -EINVAL;
- update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE);
- return 0;
+}
Instead of a plain enable/disable, a more flexible ABI would be to have the tag mask be passed in. That way, an implementation that has a flexible tag size can select it. It also ensures that userspace actually knows what the tag size is and isn't surprised if a hardware implementation changes the tag size or position.
Also, this whole set deals with tagging/untagging, but there's an effective loss of address space when you do this. Is that dealt with anywhere? How do we ensure that allocations don't get placed at a tagged address before this gets turned on? Where's that checking?
On 31/07/2019 17:50, Dave Hansen wrote:
On 7/23/19 10:58 AM, Andrey Konovalov wrote:
The mmap and mremap (only new_addr) syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma.
What the heck is a "background colour"? :)
Good point, this is some jargon that we started using for MTE, the idea being that the kernel could set a tag value (specified during mmap()) as "background colour" for anonymous pages allocated in that range.
Anyway, this patch series is not about MTE. Andrey, for v20 (if any), I think it's best to drop this last sentence to avoid any confusion.
Kevin
On 31/07/2019 18:05, Dave Hansen wrote:
On 7/23/19 10:58 AM, Andrey Konovalov wrote:
+long set_tagged_addr_ctrl(unsigned long arg) +{
- if (!tagged_addr_prctl_allowed)
return -EINVAL;
- if (is_compat_task())
return -EINVAL;
- if (arg & ~PR_TAGGED_ADDR_ENABLE)
return -EINVAL;
- update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE);
- return 0;
+}
Instead of a plain enable/disable, a more flexible ABI would be to have the tag mask be passed in. That way, an implementation that has a flexible tag size can select it. It also ensures that userspace actually knows what the tag size is and isn't surprised if a hardware implementation changes the tag size or position.
Also, this whole set deals with tagging/untagging, but there's an effective loss of address space when you do this. Is that dealt with anywhere? How do we ensure that allocations don't get placed at a tagged address before this gets turned on? Where's that checking?
This patch series only changes what is allowed or not at the syscall interface. It does not change the address space size. On arm64, TBI (Top Byte Ignore) has always been enabled for userspace, so it has never been possible to use the upper 8 bits of user pointers for addressing.
If other architectures were to support a similar functionality, then I agree that a common and more generic interface (if needed) would be helpful, but as it stands this is an arm64-specific prctl, and on arm64 the address tag is defined by the architecture as bits [63:56].
Kevin
On Thu, Aug 1, 2019 at 2:11 PM Kevin Brodsky kevin.brodsky@arm.com wrote:
On 31/07/2019 17:50, Dave Hansen wrote:
On 7/23/19 10:58 AM, Andrey Konovalov wrote:
The mmap and mremap (only new_addr) syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma.
What the heck is a "background colour"? :)
Good point, this is some jargon that we started using for MTE, the idea being that the kernel could set a tag value (specified during mmap()) as "background colour" for anonymous pages allocated in that range.
Anyway, this patch series is not about MTE. Andrey, for v20 (if any), I think it's best to drop this last sentence to avoid any confusion.
Sure, thanks!
Kevin
On 8/1/19 5:48 AM, Andrey Konovalov wrote:
On Thu, Aug 1, 2019 at 2:11 PM Kevin Brodsky kevin.brodsky@arm.com wrote:
On 31/07/2019 17:50, Dave Hansen wrote:
On 7/23/19 10:58 AM, Andrey Konovalov wrote:
The mmap and mremap (only new_addr) syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma.
What the heck is a "background colour"? :)
Good point, this is some jargon that we started using for MTE, the idea being that the kernel could set a tag value (specified during mmap()) as "background colour" for anonymous pages allocated in that range.
Anyway, this patch series is not about MTE. Andrey, for v20 (if any), I think it's best to drop this last sentence to avoid any confusion.
Sure, thanks!
OK, but what does that mean for tagged addresses getting passed to mmap/mremap? That sentence read to me like "architectures might allow tags for ...something...". So do we accept tagged addresses into those syscalls?
On 8/1/19 5:38 AM, Kevin Brodsky wrote:
This patch series only changes what is allowed or not at the syscall interface. It does not change the address space size. On arm64, TBI (Top Byte Ignore) has always been enabled for userspace, so it has never been possible to use the upper 8 bits of user pointers for addressing.
Oh, so does the address space that's available already chop that out?
If other architectures were to support a similar functionality, then I agree that a common and more generic interface (if needed) would be helpful, but as it stands this is an arm64-specific prctl, and on arm64 the address tag is defined by the architecture as bits [63:56].
It should then be an arch_prctl(), no?
Hi Dave,
On Wed, Jul 31, 2019 at 09:43:46AM -0700, Dave Hansen wrote:
On 7/25/19 6:50 AM, Vincenzo Frascino wrote:
With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap().
I don't see a lot of description of why this restriction is necessary. What's the problem with supporting MAP_SHARED?
We could support MAP_SHARED | MAP_ANONYMOUS (and based on some internal discussions, this would be fine with the hardware memory tagging as well). What we don't want in the ABI is to support file mmap() for top-byte-ignore (or MTE). If you see a use-case, please let us know.
On Thu, Aug 01, 2019 at 08:36:47AM -0700, Dave Hansen wrote:
On 8/1/19 5:48 AM, Andrey Konovalov wrote:
On Thu, Aug 1, 2019 at 2:11 PM Kevin Brodsky kevin.brodsky@arm.com wrote:
On 31/07/2019 17:50, Dave Hansen wrote:
On 7/23/19 10:58 AM, Andrey Konovalov wrote:
The mmap and mremap (only new_addr) syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma.
What the heck is a "background colour"? :)
Good point, this is some jargon that we started using for MTE, the idea being that the kernel could set a tag value (specified during mmap()) as "background colour" for anonymous pages allocated in that range.
Anyway, this patch series is not about MTE. Andrey, for v20 (if any), I think it's best to drop this last sentence to avoid any confusion.
Indeed, the part with the "background colour" and even the "currently" adverb should be dropped.
Also, if we merge the patches via different trees anyway, I don't think there is a need for Andrey to integrate them with his series. We can pick them up directly in the arm64 tree (once the review finished).
OK, but what does that mean for tagged addresses getting passed to mmap/mremap? That sentence read to me like "architectures might allow tags for ...something...". So do we accept tagged addresses into those syscalls?
If mmap() does not return a tagged address, the reasoning is that it should not accept one as an address hint (with or without MAP_FIXED). Note that these docs should only describe the top-byte-ignore ABI while leaving the memory tagging for a future patchset.
In that future patchset, we may want to update the mmap() ABI to allow, only in conjunction with PROT_MTE, a tagged pointer as an address argument. In such case mmap() will return a tagged address and the pages pre-coloured (on fault) with the tag requested by the user. As I said, that's to be discussed later in the year.
On Thu, Aug 01, 2019 at 09:45:05AM -0700, Dave Hansen wrote:
On 8/1/19 5:38 AM, Kevin Brodsky wrote:
This patch series only changes what is allowed or not at the syscall interface. It does not change the address space size. On arm64, TBI (Top Byte Ignore) has always been enabled for userspace, so it has never been possible to use the upper 8 bits of user pointers for addressing.
Oh, so does the address space that's available already chop that out?
Yes. Currently the hardware only supports 52-bit virtual addresses. It could be expanded (though it needs a 5th page table level) to 56-bit VA but it's not currently on our (hardware) plans. Beyond 56-bit, it cannot be done without breaking the software expectations (and hopefully I'll retire before we need this ;)).
If other architectures were to support a similar functionality, then I agree that a common and more generic interface (if needed) would be helpful, but as it stands this is an arm64-specific prctl, and on arm64 the address tag is defined by the architecture as bits [63:56].
It should then be an arch_prctl(), no?
I guess you just want renaming SET_TAGGED_ADDR_CTRL() to arch_prctl_tagged_addr_ctrl_set()? (similarly for 'get')
On Wed, Jul 24, 2019 at 03:20:59PM +0100, Will Deacon wrote:
On Wed, Jul 24, 2019 at 04:16:49PM +0200, Andrey Konovalov wrote:
On Wed, Jul 24, 2019 at 4:02 PM Will Deacon will@kernel.org wrote:
On Tue, Jul 23, 2019 at 08:03:29PM +0200, Andrey Konovalov wrote:
Should this go through the mm or the arm tree?
I would certainly prefer to take at least the arm64 bits via the arm64 tree (i.e. patches 1, 2 and 15). We also need a Documentation patch describing the new ABI.
Sounds good! Should I post those patches together with the Documentation patches from Vincenzo as a separate patchset?
Yes, please (although as you say below, we need a new version of those patches from Vincenzo to address the feedback on v5). The other thing I should say is that I'd be happy to queue the other patches in the series too, but some of them are missing acks from the relevant maintainers (e.g. the mm/ and fs/ changes).
Ok, I've queued patches 1, 2, and 15 on a stable branch here:
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-n...
which should find its way into -next shortly via our for-next/core branch. If you want to make changes, please send additional patches on top.
This is targetting 5.4, but I will drop it before the merge window if we don't have both of the following in place:
* Updated ABI documentation with Acks from Catalin and Kevin * The other patches in the series either Acked (so I can pick them up) or queued via some other tree(s) for 5.4.
Make sense?
Cheers,
Will
On Tue, Aug 6, 2019 at 7:13 PM Will Deacon will@kernel.org wrote:
On Wed, Jul 24, 2019 at 03:20:59PM +0100, Will Deacon wrote:
On Wed, Jul 24, 2019 at 04:16:49PM +0200, Andrey Konovalov wrote:
On Wed, Jul 24, 2019 at 4:02 PM Will Deacon will@kernel.org wrote:
On Tue, Jul 23, 2019 at 08:03:29PM +0200, Andrey Konovalov wrote:
Should this go through the mm or the arm tree?
I would certainly prefer to take at least the arm64 bits via the arm64 tree (i.e. patches 1, 2 and 15). We also need a Documentation patch describing the new ABI.
Sounds good! Should I post those patches together with the Documentation patches from Vincenzo as a separate patchset?
Yes, please (although as you say below, we need a new version of those patches from Vincenzo to address the feedback on v5). The other thing I should say is that I'd be happy to queue the other patches in the series too, but some of them are missing acks from the relevant maintainers (e.g. the mm/ and fs/ changes).
Ok, I've queued patches 1, 2, and 15 on a stable branch here:
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-n...
which should find its way into -next shortly via our for-next/core branch. If you want to make changes, please send additional patches on top.
This is targetting 5.4, but I will drop it before the merge window if we don't have both of the following in place:
- Updated ABI documentation with Acks from Catalin and Kevin
Catalin has posted a new version today.
- The other patches in the series either Acked (so I can pick them up) or queued via some other tree(s) for 5.4.
So we have the following patches in this series:
1. arm64: untag user pointers in access_ok and __uaccess_mask_ptr 2. arm64: Introduce prctl() options to control the tagged user addresses ABI 3. lib: untag user pointers in strn*_user 4. mm: untag user pointers passed to memory syscalls 5. mm: untag user pointers in mm/gup.c 6. mm: untag user pointers in get_vaddr_frames 7. fs/namespace: untag user pointers in copy_mount_options 8. userfaultfd: untag user pointers 9. drm/amdgpu: untag user pointers 10. drm/radeon: untag user pointers in radeon_gem_userptr_ioctl 11. IB/mlx4: untag user pointers in mlx4_get_umem_mr 12. media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get 13. tee/shm: untag user pointers in tee_shm_register 14. vfio/type1: untag user pointers in vaddr_get_pfn 15. selftests, arm64: add a selftest for passing tagged pointers to kernel
1, 2 and 15 have been picked by Will.
11 has been picked up by Jason.
9, 10, 12, 13 and 14 have acks from their subsystem maintainers.
3 touches generic lib code, I'm not sure if there's a dedicated maintainer for that.
The ones that are left are the mm ones: 4, 5, 6, 7 and 8.
Andrew, could you take a look and give your Acked-by or pick them up directly?
Make sense?
Cheers,
Will
Thanks!
On Wed, Aug 07, 2019 at 07:17:35PM +0200, Andrey Konovalov wrote:
On Tue, Aug 6, 2019 at 7:13 PM Will Deacon will@kernel.org wrote:
On Wed, Jul 24, 2019 at 03:20:59PM +0100, Will Deacon wrote:
On Wed, Jul 24, 2019 at 04:16:49PM +0200, Andrey Konovalov wrote:
On Wed, Jul 24, 2019 at 4:02 PM Will Deacon will@kernel.org wrote:
On Tue, Jul 23, 2019 at 08:03:29PM +0200, Andrey Konovalov wrote:
Should this go through the mm or the arm tree?
I would certainly prefer to take at least the arm64 bits via the arm64 tree (i.e. patches 1, 2 and 15). We also need a Documentation patch describing the new ABI.
Sounds good! Should I post those patches together with the Documentation patches from Vincenzo as a separate patchset?
Yes, please (although as you say below, we need a new version of those patches from Vincenzo to address the feedback on v5). The other thing I should say is that I'd be happy to queue the other patches in the series too, but some of them are missing acks from the relevant maintainers (e.g. the mm/ and fs/ changes).
Ok, I've queued patches 1, 2, and 15 on a stable branch here:
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-n...
which should find its way into -next shortly via our for-next/core branch. If you want to make changes, please send additional patches on top.
This is targetting 5.4, but I will drop it before the merge window if we don't have both of the following in place:
- Updated ABI documentation with Acks from Catalin and Kevin
Catalin has posted a new version today.
- The other patches in the series either Acked (so I can pick them up) or queued via some other tree(s) for 5.4.
So we have the following patches in this series:
- arm64: untag user pointers in access_ok and __uaccess_mask_ptr
- arm64: Introduce prctl() options to control the tagged user addresses ABI
- lib: untag user pointers in strn*_user
- mm: untag user pointers passed to memory syscalls
- mm: untag user pointers in mm/gup.c
- mm: untag user pointers in get_vaddr_frames
- fs/namespace: untag user pointers in copy_mount_options
- userfaultfd: untag user pointers
- drm/amdgpu: untag user pointers
- drm/radeon: untag user pointers in radeon_gem_userptr_ioctl
- IB/mlx4: untag user pointers in mlx4_get_umem_mr
- media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get
- tee/shm: untag user pointers in tee_shm_register
- vfio/type1: untag user pointers in vaddr_get_pfn
- selftests, arm64: add a selftest for passing tagged pointers to kernel
1, 2 and 15 have been picked by Will.
11 has been picked up by Jason.
9, 10, 12, 13 and 14 have acks from their subsystem maintainers.
3 touches generic lib code, I'm not sure if there's a dedicated maintainer for that.
Andrew tends to pick up lib/ patches.
The ones that are left are the mm ones: 4, 5, 6, 7 and 8.
Andrew, could you take a look and give your Acked-by or pick them up directly?
Given the subsystem Acks, it seems like 3-10 and 12 could all just go via Andrew? I hope he agrees. :)
On Thu, 8 Aug 2019 14:12:19 -0700 Kees Cook keescook@chromium.org wrote:
The ones that are left are the mm ones: 4, 5, 6, 7 and 8.
Andrew, could you take a look and give your Acked-by or pick them up directly?
Given the subsystem Acks, it seems like 3-10 and 12 could all just go via Andrew? I hope he agrees. :)
I'll grab everything that has not yet appeared in linux-next. If more of these patches appear in linux-next I'll drop those as well.
The review discussion against " [PATCH v19 02/15] arm64: Introduce prctl() options to control the tagged user addresses ABI" has petered out inconclusively. prctl() vs arch_prctl().
On Thu, Aug 08, 2019 at 03:33:00PM -0700, Andrew Morton wrote:
On Thu, 8 Aug 2019 14:12:19 -0700 Kees Cook keescook@chromium.org wrote:
The ones that are left are the mm ones: 4, 5, 6, 7 and 8.
Andrew, could you take a look and give your Acked-by or pick them up directly?
Given the subsystem Acks, it seems like 3-10 and 12 could all just go via Andrew? I hope he agrees. :)
I'll grab everything that has not yet appeared in linux-next. If more of these patches appear in linux-next I'll drop those as well.
The review discussion against " [PATCH v19 02/15] arm64: Introduce prctl() options to control the tagged user addresses ABI" has petered out inconclusively. prctl() vs arch_prctl().
I've always disliked arch_prctl() existing at all. Given that tagging is likely to be a multi-architectural feature, it seems like the controls should live in prctl() to me.
On Thu, Aug 08, 2019 at 04:09:04PM -0700, Kees Cook wrote:
On Thu, Aug 08, 2019 at 03:33:00PM -0700, Andrew Morton wrote:
On Thu, 8 Aug 2019 14:12:19 -0700 Kees Cook keescook@chromium.org wrote:
The ones that are left are the mm ones: 4, 5, 6, 7 and 8.
Andrew, could you take a look and give your Acked-by or pick them up directly?
Given the subsystem Acks, it seems like 3-10 and 12 could all just go via Andrew? I hope he agrees. :)
I'll grab everything that has not yet appeared in linux-next. If more of these patches appear in linux-next I'll drop those as well.
The review discussion against " [PATCH v19 02/15] arm64: Introduce prctl() options to control the tagged user addresses ABI" has petered out inconclusively. prctl() vs arch_prctl().
I've always disliked arch_prctl() existing at all. Given that tagging is likely to be a multi-architectural feature, it seems like the controls should live in prctl() to me.
It took a bit of grep'ing to figure out what Dave H meant by arch_prctl(). It's an x86-specific syscall which we do not have on arm64 (and possibly any other architecture). Actually, we don't have any arm64 specific syscalls, only the generic unistd.h, hence the confusion. For other arm64-specific prctls like SVE we used the generic sys_prctl() and I can see x86 not being consistent either (PR_MPX_ENABLE_MANAGEMENT).
In general I disagree with adding any arm64-specific syscalls but in this instance it can't even be justified. I'd rather see some clean-up similar to arch_ptrace/ptrace_request than introducing new syscall numbers (but as I suggested in my reply to Dave, that's for another patch series).
On Fri, Aug 09, 2019 at 10:00:17AM +0100, Catalin Marinas wrote:
On Thu, Aug 08, 2019 at 04:09:04PM -0700, Kees Cook wrote:
On Thu, Aug 08, 2019 at 03:33:00PM -0700, Andrew Morton wrote:
On Thu, 8 Aug 2019 14:12:19 -0700 Kees Cook keescook@chromium.org wrote:
The ones that are left are the mm ones: 4, 5, 6, 7 and 8.
Andrew, could you take a look and give your Acked-by or pick them up directly?
Given the subsystem Acks, it seems like 3-10 and 12 could all just go via Andrew? I hope he agrees. :)
I'll grab everything that has not yet appeared in linux-next. If more of these patches appear in linux-next I'll drop those as well.
The review discussion against " [PATCH v19 02/15] arm64: Introduce prctl() options to control the tagged user addresses ABI" has petered out inconclusively. prctl() vs arch_prctl().
I've always disliked arch_prctl() existing at all. Given that tagging is likely to be a multi-architectural feature, it seems like the controls should live in prctl() to me.
It took a bit of grep'ing to figure out what Dave H meant by arch_prctl(). It's an x86-specific syscall which we do not have on arm64 (and possibly any other architecture). Actually, we don't have any arm64 specific syscalls, only the generic unistd.h, hence the confusion. For other arm64-specific prctls like SVE we used the generic sys_prctl() and I can see x86 not being consistent either (PR_MPX_ENABLE_MANAGEMENT).
In general I disagree with adding any arm64-specific syscalls but in this instance it can't even be justified. I'd rather see some clean-up similar to arch_ptrace/ptrace_request than introducing new syscall numbers (but as I suggested in my reply to Dave, that's for another patch series).
I had a go at refactoring this a while ago, but it fell by the wayside.
I can try to resurrect it if it's still considered worthwhile.
Cheers ---Dave
On Tue, Jul 23, 2019 at 07:58:41PM +0200, Andrey Konovalov wrote:
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
This patch allows tagged pointers to be passed to the following memory syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect, mremap, msync, munlock, move_pages.
The mmap and mremap syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma.
Reviewed-by: Khalid Aziz khalid.aziz@oracle.com Reviewed-by: Vincenzo Frascino vincenzo.frascino@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Andrey Konovalov andreyknvl@google.com
mm/madvise.c | 2 ++ mm/mempolicy.c | 3 +++ mm/migrate.c | 2 +- mm/mincore.c | 2 ++ mm/mlock.c | 4 ++++ mm/mprotect.c | 2 ++ mm/mremap.c | 7 +++++++ mm/msync.c | 2 ++ 8 files changed, 23 insertions(+), 1 deletion(-)
More back and forth discussions on how to specify the exceptions here. I'm proposing just dropping the exceptions and folding in the diff below.
Andrew, if you prefer a standalone patch instead, please let me know:
------------------8<----------------------------
From 9a5286acaa638c6a917d96986bf28dad35e24a0c Mon Sep 17 00:00:00 2001
From: Catalin Marinas catalin.marinas@arm.com Date: Fri, 9 Aug 2019 14:21:33 +0100 Subject: [PATCH] fixup! mm: untag user pointers passed to memory syscalls
mmap, mremap, munmap, brk added to the list of syscalls that accept tagged pointers.
Signed-off-by: Catalin Marinas catalin.marinas@arm.com --- mm/mmap.c | 5 +++++ mm/mremap.c | 6 +----- 2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 7e8c3e8ae75f..b766b633b7ae 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -201,6 +201,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) bool downgraded = false; LIST_HEAD(uf);
+ brk = untagged_addr(brk); + if (down_write_killable(&mm->mmap_sem)) return -EINTR;
@@ -1573,6 +1575,8 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, struct file *file = NULL; unsigned long retval;
+ addr = untagged_addr(addr); + if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); file = fget(fd); @@ -2874,6 +2878,7 @@ EXPORT_SYMBOL(vm_munmap);
SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len) { + addr = untagged_addr(addr); profile_munmap(addr); return __vm_munmap(addr, len, true); } diff --git a/mm/mremap.c b/mm/mremap.c index 64c9a3b8be0a..1fc8a29fbe3f 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -606,12 +606,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap);
- /* - * Architectures may interpret the tag passed to mmap as a background - * colour for the corresponding vma. For mremap we don't allow tagged - * new_addr to preserve similar behaviour to mmap. - */ addr = untagged_addr(addr); + new_addr = untagged_addr(new_addr);
if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) return ret;
On Tue, Jul 23, 2019 at 07:58:39PM +0200, Andrey Konovalov wrote:
From: Catalin Marinas catalin.marinas@arm.com
It is not desirable to relax the ABI to allow tagged user addresses into the kernel indiscriminately. This patch introduces a prctl() interface for enabling or disabling the tagged ABI with a global sysctl control for preventing applications from enabling the relaxed ABI (meant for testing user-space prctl() return error checking without reconfiguring the kernel). The ABI properties are inherited by threads of the same application and fork()'ed children but cleared on execve(). A Kconfig option allows the overall disabling of the relaxed ABI.
The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle MTE-specific settings like imprecise vs precise exceptions.
Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Andrey Konovalov andreyknvl@google.com
Following several discussions on the list and in private, I'm proposing the update below. I can send it as a patch on top of the current series since Will has already queued this.
---------------8<-------------------------------------
From 1b3f57ab0c2c51f8b31c19fb34d270e1f3ee57fe Mon Sep 17 00:00:00 2001
From: Catalin Marinas catalin.marinas@arm.com Date: Fri, 9 Aug 2019 15:09:15 +0100 Subject: [PATCH] fixup! arm64: Introduce prctl() options to control the tagged user addresses ABI
Rename abi.tagged_addr sysctl control to abi.tagged_addr_disabled, defaulting to 0. Only prevent prctl(PR_TAGGED_ADDR_ENABLE)from being called when abi.tagged_addr_disabled==1.
Force unused arg* of the new prctl() to 0.
Signed-off-by: Catalin Marinas catalin.marinas@arm.com --- arch/arm64/kernel/process.c | 17 ++++++++++------- kernel/sys.c | 4 ++++ 2 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 76b7c55026aa..03689c0beb34 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -579,17 +579,22 @@ void arch_setup_new_exec(void) /* * Control the relaxed ABI allowing tagged user addresses into the kernel. */ -static unsigned int tagged_addr_prctl_allowed = 1; +static unsigned int tagged_addr_disabled;
long set_tagged_addr_ctrl(unsigned long arg) { - if (!tagged_addr_prctl_allowed) - return -EINVAL; if (is_compat_task()) return -EINVAL; if (arg & ~PR_TAGGED_ADDR_ENABLE) return -EINVAL;
+ /* + * Do not allow the enabling of the tagged address ABI if globally + * disabled via sysctl abi.tagged_addr_disabled. + */ + if (arg & PR_TAGGED_ADDR_ENABLE && tagged_addr_disabled) + return -EINVAL; + update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE);
return 0; @@ -597,8 +602,6 @@ long set_tagged_addr_ctrl(unsigned long arg)
long get_tagged_addr_ctrl(void) { - if (!tagged_addr_prctl_allowed) - return -EINVAL; if (is_compat_task()) return -EINVAL;
@@ -618,9 +621,9 @@ static int one = 1;
static struct ctl_table tagged_addr_sysctl_table[] = { { - .procname = "tagged_addr", + .procname = "tagged_addr_disabled", .mode = 0644, - .data = &tagged_addr_prctl_allowed, + .data = &tagged_addr_disabled, .maxlen = sizeof(int), .proc_handler = proc_dointvec_minmax, .extra1 = &zero, diff --git a/kernel/sys.c b/kernel/sys.c index c6c4d5358bd3..ec48396b4943 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2499,9 +2499,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, error = PAC_RESET_KEYS(me, arg2); break; case PR_SET_TAGGED_ADDR_CTRL: + if (arg3 || arg4 || arg5) + return -EINVAL; error = SET_TAGGED_ADDR_CTRL(arg2); break; case PR_GET_TAGGED_ADDR_CTRL: + if (arg2 || arg3 || arg4 || arg5) + return -EINVAL; error = GET_TAGGED_ADDR_CTRL(); break; default:
linux-kselftest-mirror@lists.linaro.org