=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer") 2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers") 3. 276e9327 ("arm64: entry: improve data abort handling of tagged pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be passed to syscalls when they point to memory ranges obtained by anonymous mmap() or brk().
For non-memory syscalls this is done by untaging user pointers when the kernel performs pointer checking to find out whether the pointer comes from userspace (most notably in access_ok). The untagging is done only when the pointer is being checked, the tag is preserved as the pointer makes its way through the kernel and stays tagged when the kernel dereferences the pointer when perfoming user memory accesses.
Memory syscalls (mmap, mprotect, etc.) don't do user memory accesses but rather deal with memory ranges, and untagged pointers are better suited to describe memory ranges internally. Thus for memory syscalls we untag pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to completely strip the pointer tag as the pointer enters the kernel with some kind of a syscall wrapper, but that won't work with the countless number of different ioctl calls. With this approach we would need a custom wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues is to inspead allow tagged pointers to be passed to find_vma() (and other vma related functions) and untag them there. Unfortunately, a lot of find_vma() callers then compare or subtract the returned vma start and end fields against the pointer that was being searched. Thus this approach would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static analyzer based on Clang) to track casts of __user pointers to integer types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call find_vma() (and other similar functions) or directly compare against vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature support [4].
This patchset has been merged into the Pixel 2 kernel tree and is now being used to enable testing of Pixel 2 phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e...
[3] https://lkml.org/lkml/2018/12/10/402
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture...
Changes in v11: - Added "uprobes, arm64: untag user pointers in find_active_uprobe" patch. - Added "bpf, arm64: untag user pointers in stack_map_get_build_id_offset" patch. - Fixed "tracing, arm64: untag user pointers in seq_print_user_ip" to correctly perform subtration with a tagged addr. - Moved untagged_addr() from SYSCALL_DEFINE3(mprotect) and SYSCALL_DEFINE4(pkey_mprotect) to do_mprotect_pkey(). - Moved untagged_addr() definition for other arches from include/linux/memory.h to include/linux/mm.h. - Changed untagging in strn*_user() to perform userspace accesses through tagged pointers. - Updated the documentation to mention that passing tagged pointers to memory syscalls is allowed. - Updated the test to use malloc'ed memory instead of stack memory.
Changes in v10: - Added "mm, arm64: untag user pointers passed to memory syscalls" back. - New patch "fs, arm64: untag user pointers in fs/userfaultfd.c". - New patch "net, arm64: untag user pointers in tcp_zerocopy_receive". - New patch "kernel, arm64: untag user pointers in prctl_set_mm*". - New patch "tracing, arm64: untag user pointers in seq_print_user_ip".
Changes in v9: - Rebased onto 4.20-rc6. - Used u64 instead of __u64 in type casts in the untagged_addr macro for arm64. - Added braces around (addr) in the untagged_addr macro for other arches.
Changes in v8: - Rebased onto 65102238 (4.20-rc1). - Added a note to the cover letter on why syscall wrappers/shims that untag user pointers won't work. - Added a note to the cover letter that this patchset has been merged into the Pixel 2 kernel tree. - Documentation fixes, in particular added a list of syscalls that don't support tagged user pointers.
Changes in v7: - Rebased onto 17b57b18 (4.19-rc6). - Dropped the "arm64: untag user address in __do_user_fault" patch, since the existing patches already handle user faults properly. - Dropped the "usb, arm64: untag user addresses in devio" patch, since the passed pointer must come from a vma and therefore be untagged. - Dropped the "arm64: annotate user pointers casts detected by sparse" patch (see the discussion to the replies of the v6 of this patchset). - Added more context to the cover letter. - Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6: - Added annotations for user pointer casts found by sparse. - Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5: - Added 3 new patches that add untagging to places found with static analysis. - Rebased onto 44c929e1 (4.18-rc8).
Changes in v4: - Added a selftest for checking that passing tagged pointers to the kernel succeeds. - Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3: - Rebased onto e5c51f30 (4.17-rc6+). - Added linux-arch@ to the list of recipients.
Changes in v2: - Rebased onto 2d618bdf (4.17-rc3+). - Removed excessive untagging in gup.c. - Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1: - Rebased onto 4.17-rc1.
Changes in RFC v2: - Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of defining it for each arch individually. - Updated Documentation/arm64/tagged-pointers.txt. - Dropped "mm, arm64: untag user addresses in memory syscalls". - Rebased onto 3eb2ce82 (4.16-rc7).
Andrey Konovalov (14): uaccess: add untagged_addr definition for other arches arm64: untag user pointers in access_ok and __uaccess_mask_ptr lib, arm64: untag user pointers in strn*_user mm, arm64: untag user pointers passed to memory syscalls mm, arm64: untag user pointers in mm/gup.c fs, arm64: untag user pointers in copy_mount_options fs, arm64: untag user pointers in fs/userfaultfd.c net, arm64: untag user pointers in tcp_zerocopy_receive kernel, arm64: untag user pointers in prctl_set_mm* tracing, arm64: untag user pointers in seq_print_user_ip uprobes, arm64: untag user pointers in find_active_uprobe bpf, arm64: untag user pointers in stack_map_get_build_id_offset arm64: update Documentation/arm64/tagged-pointers.txt selftests, arm64: add a selftest for passing tagged pointers to kernel
Documentation/arm64/tagged-pointers.txt | 18 +++++++--------- arch/arm64/include/asm/uaccess.h | 10 +++++---- fs/namespace.c | 2 +- fs/userfaultfd.c | 5 +++++ include/linux/mm.h | 4 ++++ ipc/shm.c | 2 ++ kernel/bpf/stackmap.c | 6 ++++-- kernel/events/uprobes.c | 2 ++ kernel/sys.c | 14 +++++++++++++ kernel/trace/trace_output.c | 5 +++-- lib/strncpy_from_user.c | 3 ++- lib/strnlen_user.c | 3 ++- mm/gup.c | 4 ++++ mm/madvise.c | 2 ++ mm/mempolicy.c | 5 +++++ mm/migrate.c | 1 + mm/mincore.c | 2 ++ mm/mlock.c | 5 +++++ mm/mmap.c | 7 +++++++ mm/mprotect.c | 1 + mm/mremap.c | 2 ++ mm/msync.c | 2 ++ net/ipv4/tcp.c | 2 ++ tools/testing/selftests/arm64/.gitignore | 1 + tools/testing/selftests/arm64/Makefile | 11 ++++++++++ .../testing/selftests/arm64/run_tags_test.sh | 12 +++++++++++ tools/testing/selftests/arm64/tags_test.c | 21 +++++++++++++++++++ 27 files changed, 131 insertions(+), 21 deletions(-) create mode 100644 tools/testing/selftests/arm64/.gitignore create mode 100644 tools/testing/selftests/arm64/Makefile create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh create mode 100644 tools/testing/selftests/arm64/tags_test.c
To allow arm64 syscalls to accept tagged pointers from userspace, we must untag them when they are passed to the kernel. Since untagging is done in generic parts of the kernel, the untagged_addr macro needs to be defined for all architectures.
Define it as a noop for architectures other than arm64.
Acked-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- include/linux/mm.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 76769749b5a5..4d674518d392 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -99,6 +99,10 @@ extern int mmap_rnd_compat_bits __read_mostly; #include <asm/pgtable.h> #include <asm/processor.h>
+#ifndef untagged_addr +#define untagged_addr(addr) (addr) +#endif + #ifndef __pa_symbol #define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)(x), 0)) #endif
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
copy_from_user (and a few other similar functions) are used to copy data from user memory into the kernel memory or vice versa. Since a user can provided a tagged pointer to one of the syscalls that use copy_from_user, we need to correctly handle such pointers.
Do this by untagging user pointers in access_ok and in __uaccess_mask_ptr, before performing access validity checks.
Note, that this patch only temporarily untags the pointers to perform the checks, but then passes them as is into the kernel internals.
Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- arch/arm64/include/asm/uaccess.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index e5d5f31c6d36..9164ecb5feca 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -94,7 +94,7 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si return ret; }
-#define access_ok(addr, size) __range_ok(addr, size) +#define access_ok(addr, size) __range_ok(untagged_addr(addr), size) #define user_addr_max get_fs
#define _ASM_EXTABLE(from, to) \ @@ -226,7 +226,8 @@ static inline void uaccess_enable_not_uao(void)
/* * Sanitise a uaccess pointer such that it becomes NULL if above the - * current addr_limit. + * current addr_limit. In case the pointer is tagged (has the top byte set), + * untag the pointer before checking. */ #define uaccess_mask_ptr(ptr) (__typeof__(ptr))__uaccess_mask_ptr(ptr) static inline void __user *__uaccess_mask_ptr(const void __user *ptr) @@ -234,10 +235,11 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr) void __user *safe_ptr;
asm volatile( - " bics xzr, %1, %2\n" + " bics xzr, %3, %2\n" " csel %0, %1, xzr, eq\n" : "=&r" (safe_ptr) - : "r" (ptr), "r" (current_thread_info()->addr_limit) + : "r" (ptr), "r" (current_thread_info()->addr_limit), + "r" (untagged_addr(ptr)) : "cc");
csdb();
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
strncpy_from_user and strnlen_user accept user addresses as arguments, and do not go through the same path as copy_from_user and others, so here we need to handle the case of tagged user addresses separately.
Untag user pointers passed to these functions.
Note, that this patch only temporarily untags the pointers to perform validity checks, but then uses them as is to perform user memory accesses.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- lib/strncpy_from_user.c | 3 ++- lib/strnlen_user.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c index 58eacd41526c..6209bb9507c7 100644 --- a/lib/strncpy_from_user.c +++ b/lib/strncpy_from_user.c @@ -6,6 +6,7 @@ #include <linux/uaccess.h> #include <linux/kernel.h> #include <linux/errno.h> +#include <linux/mm.h>
#include <asm/byteorder.h> #include <asm/word-at-a-time.h> @@ -107,7 +108,7 @@ long strncpy_from_user(char *dst, const char __user *src, long count) return 0;
max_addr = user_addr_max(); - src_addr = (unsigned long)src; + src_addr = (unsigned long)untagged_addr(src); if (likely(src_addr < max_addr)) { unsigned long max = max_addr - src_addr; long retval; diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c index 1c1a1b0e38a5..8ca3d2ac32ec 100644 --- a/lib/strnlen_user.c +++ b/lib/strnlen_user.c @@ -2,6 +2,7 @@ #include <linux/kernel.h> #include <linux/export.h> #include <linux/uaccess.h> +#include <linux/mm.h>
#include <asm/word-at-a-time.h>
@@ -109,7 +110,7 @@ long strnlen_user(const char __user *str, long count) return 0;
max_addr = user_addr_max(); - src_addr = (unsigned long)str; + src_addr = (unsigned long)untagged_addr(str); if (likely(src_addr < max_addr)) { unsigned long max = max_addr - src_addr; long retval;
On 15/03/2019 19:51, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
strncpy_from_user and strnlen_user accept user addresses as arguments, and do not go through the same path as copy_from_user and others, so here we need to handle the case of tagged user addresses separately.
Untag user pointers passed to these functions.
Note, that this patch only temporarily untags the pointers to perform validity checks, but then uses them as is to perform user memory accesses.
Thank you for this new version, looks good to me.
To give a bit of context to the readers, I asked Andrey to make this change, because it makes a difference with hardware memory tagging. Indeed, in that situation, it is always preferable to access the memory using the user-provided tag, so that tag checking can take place; if there is a mismatch, a tag fault will occur (which is handled in a way similar to a page fault). It is also preferable not to assume that an untagged user pointer (tag 0x0) bypasses tag checks.
Kevin
Signed-off-by: Andrey Konovalov andreyknvl@google.com
lib/strncpy_from_user.c | 3 ++- lib/strnlen_user.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c index 58eacd41526c..6209bb9507c7 100644 --- a/lib/strncpy_from_user.c +++ b/lib/strncpy_from_user.c @@ -6,6 +6,7 @@ #include <linux/uaccess.h> #include <linux/kernel.h> #include <linux/errno.h> +#include <linux/mm.h> #include <asm/byteorder.h> #include <asm/word-at-a-time.h> @@ -107,7 +108,7 @@ long strncpy_from_user(char *dst, const char __user *src, long count) return 0; max_addr = user_addr_max();
- src_addr = (unsigned long)src;
- src_addr = (unsigned long)untagged_addr(src); if (likely(src_addr < max_addr)) { unsigned long max = max_addr - src_addr; long retval;
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c index 1c1a1b0e38a5..8ca3d2ac32ec 100644 --- a/lib/strnlen_user.c +++ b/lib/strnlen_user.c @@ -2,6 +2,7 @@ #include <linux/kernel.h> #include <linux/export.h> #include <linux/uaccess.h> +#include <linux/mm.h> #include <asm/word-at-a-time.h> @@ -109,7 +110,7 @@ long strnlen_user(const char __user *str, long count) return 0; max_addr = user_addr_max();
- src_addr = (unsigned long)str;
- src_addr = (unsigned long)untagged_addr(str); if (likely(src_addr < max_addr)) { unsigned long max = max_addr - src_addr; long retval;
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
This patch allows tagged pointers to be passed to the following memory syscalls: madvise, mbind, get_mempolicy, mincore, mlock, mlock2, brk, mmap_pgoff, old_mmap, munmap, remap_file_pages, mprotect, pkey_mprotect, mremap, msync and shmdt.
This is done by untagging pointers passed to these syscalls in the prologues of their handlers.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- ipc/shm.c | 2 ++ mm/madvise.c | 2 ++ mm/mempolicy.c | 5 +++++ mm/migrate.c | 1 + mm/mincore.c | 2 ++ mm/mlock.c | 5 +++++ mm/mmap.c | 7 +++++++ mm/mprotect.c | 1 + mm/mremap.c | 2 ++ mm/msync.c | 2 ++ 10 files changed, 29 insertions(+)
diff --git a/ipc/shm.c b/ipc/shm.c index ce1ca9f7c6e9..7af8951e6c41 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1593,6 +1593,7 @@ SYSCALL_DEFINE3(shmat, int, shmid, char __user *, shmaddr, int, shmflg) unsigned long ret; long err;
+ shmaddr = untagged_addr(shmaddr); err = do_shmat(shmid, shmaddr, shmflg, &ret, SHMLBA); if (err) return err; @@ -1732,6 +1733,7 @@ long ksys_shmdt(char __user *shmaddr)
SYSCALL_DEFINE1(shmdt, char __user *, shmaddr) { + shmaddr = untagged_addr(shmaddr); return ksys_shmdt(shmaddr); }
diff --git a/mm/madvise.c b/mm/madvise.c index 21a7881a2db4..64e6d34a7f9b 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -809,6 +809,8 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) size_t len; struct blk_plug plug;
+ start = untagged_addr(start); + if (!madvise_behavior_valid(behavior)) return error;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c index af171ccb56a2..31691737c59c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1334,6 +1334,7 @@ static long kernel_mbind(unsigned long start, unsigned long len, int err; unsigned short mode_flags;
+ start = untagged_addr(start); mode_flags = mode & MPOL_MODE_FLAGS; mode &= ~MPOL_MODE_FLAGS; if (mode >= MPOL_MAX) @@ -1491,6 +1492,8 @@ static int kernel_get_mempolicy(int __user *policy, int uninitialized_var(pval); nodemask_t nodes;
+ addr = untagged_addr(addr); + if (nmask != NULL && maxnode < nr_node_ids) return -EINVAL;
@@ -1576,6 +1579,8 @@ COMPAT_SYSCALL_DEFINE6(mbind, compat_ulong_t, start, compat_ulong_t, len, unsigned long nr_bits, alloc_size; nodemask_t bm;
+ start = untagged_addr(start); + nr_bits = min_t(unsigned long, maxnode-1, MAX_NUMNODES); alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
diff --git a/mm/migrate.c b/mm/migrate.c index ac6f4939bb59..ecc6dcdefb1f 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1612,6 +1612,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, if (get_user(node, nodes + i)) goto out_flush; addr = (unsigned long)p; + addr = untagged_addr(addr);
err = -ENODEV; if (node < 0 || node >= MAX_NUMNODES) diff --git a/mm/mincore.c b/mm/mincore.c index 218099b5ed31..c4a3f4484b6b 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -228,6 +228,8 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, unsigned long pages; unsigned char *tmp;
+ start = untagged_addr(start); + /* Check the start address: needs to be page-aligned.. */ if (start & ~PAGE_MASK) return -EINVAL; diff --git a/mm/mlock.c b/mm/mlock.c index 080f3b36415b..6934ec92bf39 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -715,6 +715,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len) { + start = untagged_addr(start); return do_mlock(start, len, VM_LOCKED); }
@@ -722,6 +723,8 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) { vm_flags_t vm_flags = VM_LOCKED;
+ start = untagged_addr(start); + if (flags & ~MLOCK_ONFAULT) return -EINVAL;
@@ -735,6 +738,8 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) { int ret;
+ start = untagged_addr(start); + len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK;
diff --git a/mm/mmap.c b/mm/mmap.c index 41eb48d9b527..512c679c7f33 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -199,6 +199,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) bool downgraded = false; LIST_HEAD(uf);
+ brk = untagged_addr(brk); + if (down_write_killable(&mm->mmap_sem)) return -EINTR;
@@ -1571,6 +1573,8 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, struct file *file = NULL; unsigned long retval;
+ addr = untagged_addr(addr); + if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); file = fget(fd); @@ -2867,6 +2871,7 @@ EXPORT_SYMBOL(vm_munmap);
SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len) { + addr = untagged_addr(addr); profile_munmap(addr); return __vm_munmap(addr, len, true); } @@ -2885,6 +2890,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, unsigned long ret = -EINVAL; struct file *file;
+ start = untagged_addr(start); + pr_warn_once("%s (%d) uses deprecated remap_file_pages() syscall. See Documentation/vm/remap_file_pages.rst.\n", current->comm, current->pid);
diff --git a/mm/mprotect.c b/mm/mprotect.c index 028c724dcb1a..3c2b11629f89 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -468,6 +468,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len, if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */ return -EINVAL;
+ start = untagged_addr(start); if (start & ~PAGE_MASK) return -EINVAL; if (!len) diff --git a/mm/mremap.c b/mm/mremap.c index e3edef6b7a12..6422aeee65bb 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -605,6 +605,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap);
+ addr = untagged_addr(addr); + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) return ret;
diff --git a/mm/msync.c b/mm/msync.c index ef30a429623a..c3bd3e75f687 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -37,6 +37,8 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) int unmapped_error = 0; int error = -EINVAL;
+ start = untagged_addr(start); + if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) goto out; if (offset_in_page(start))
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
mm/gup.c provides a kernel interface that accepts user addresses and manipulates user pages directly (for example get_user_pages, that is used by the futex syscall). Since a user can provided tagged addresses, we need to handle this case.
Add untagging to gup.c functions that use user addresses for vma lookups.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- mm/gup.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/mm/gup.c b/mm/gup.c index f84e22685aaa..3192741e0b3a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -686,6 +686,8 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (!nr_pages) return 0;
+ start = untagged_addr(start); + VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET));
/* @@ -848,6 +850,8 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, struct vm_area_struct *vma; vm_fault_t ret, major = 0;
+ address = untagged_addr(address); + if (unlocked) fault_flags |= FAULT_FLAG_ALLOW_RETRY;
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
In copy_mount_options a user address is being subtracted from TASK_SIZE. If the address is lower than TASK_SIZE, the size is calculated to not allow the exact_copy_from_user() call to cross TASK_SIZE boundary. However if the address is tagged, then the size will be calculated incorrectly.
Untag the address before subtracting.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- fs/namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/namespace.c b/fs/namespace.c index c9cab307fa77..c27e5713bf04 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2825,7 +2825,7 @@ void *copy_mount_options(const void __user * data) * the remainder of the page. */ /* copy_from_user cannot cross TASK_SIZE ! */ - size = TASK_SIZE - (unsigned long)data; + size = TASK_SIZE - (unsigned long)untagged_addr(data); if (size > PAGE_SIZE) size = PAGE_SIZE;
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
userfaultfd_register() and userfaultfd_unregister() use provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in these functions.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- fs/userfaultfd.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 89800fc7dc9d..a3b70e0d9756 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1320,6 +1320,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, goto out; }
+ uffdio_register.range.start = + untagged_addr(uffdio_register.range.start); + ret = validate_range(mm, uffdio_register.range.start, uffdio_register.range.len); if (ret) @@ -1507,6 +1510,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister))) goto out;
+ uffdio_unregister.start = untagged_addr(uffdio_unregister.start); + ret = validate_range(mm, uffdio_unregister.start, uffdio_unregister.len); if (ret)
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
tcp_zerocopy_receive() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- net/ipv4/tcp.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6baa6dc1b13b..89db3b4fc753 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1758,6 +1758,8 @@ static int tcp_zerocopy_receive(struct sock *sk, int inq; int ret;
+ address = untagged_addr(address); + if (address & (PAGE_SIZE - 1) || address != zc->address) return -EINVAL;
On 03/15/2019 12:51 PM, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
tcp_zerocopy_receive() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
net/ipv4/tcp.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6baa6dc1b13b..89db3b4fc753 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1758,6 +1758,8 @@ static int tcp_zerocopy_receive(struct sock *sk, int inq; int ret;
- address = untagged_addr(address);
- if (address & (PAGE_SIZE - 1) || address != zc->address)
The second test will fail, if the top bits are changed in address but not in zc->address
return -EINVAL;
On Fri, Mar 15, 2019 at 9:03 PM Eric Dumazet eric.dumazet@gmail.com wrote:
On 03/15/2019 12:51 PM, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
tcp_zerocopy_receive() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
net/ipv4/tcp.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6baa6dc1b13b..89db3b4fc753 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1758,6 +1758,8 @@ static int tcp_zerocopy_receive(struct sock *sk, int inq; int ret;
address = untagged_addr(address);
if (address & (PAGE_SIZE - 1) || address != zc->address)
The second test will fail, if the top bits are changed in address but not in zc->address
Will fix in v12, thanks Eric!
return -EINVAL;
On Mon, Mar 18, 2019 at 2:14 PM Andrey Konovalov andreyknvl@google.com wrote:
On Fri, Mar 15, 2019 at 9:03 PM Eric Dumazet eric.dumazet@gmail.com wrote:
On 03/15/2019 12:51 PM, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
tcp_zerocopy_receive() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
net/ipv4/tcp.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6baa6dc1b13b..89db3b4fc753 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1758,6 +1758,8 @@ static int tcp_zerocopy_receive(struct sock *sk, int inq; int ret;
address = untagged_addr(address);
if (address & (PAGE_SIZE - 1) || address != zc->address)
The second test will fail, if the top bits are changed in address but not in zc->address
Will fix in v12, thanks Eric!
Looking at the code, what's the point of this address != zc->address check? Should I just remove it?
return -EINVAL;
On Mon, Mar 18, 2019 at 6:17 AM Andrey Konovalov andreyknvl@google.com wrote:
Looking at the code, what's the point of this address != zc->address check? Should I just remove it?
No you must not remove it.
The test detects if a u64 ->unsigned long conversion might have truncated bits.
Quite surprisingly some people still use 32bit kernels.
The ABI is 64bit only, because we did not want to have yet another compat layer.
struct tcp_zerocopy_receive { __u64 address; /* in: address of mapping */ __u32 length; /* in/out: number of bytes to map/mapped */ __u32 recv_skip_hint; /* out: amount of bytes to skip */ };
On Mon, Mar 18, 2019 at 3:45 PM Eric Dumazet edumazet@google.com wrote:
On Mon, Mar 18, 2019 at 6:17 AM Andrey Konovalov andreyknvl@google.com wrote:
Looking at the code, what's the point of this address != zc->address check? Should I just remove it?
No you must not remove it.
The test detects if a u64 ->unsigned long conversion might have truncated bits.
Quite surprisingly some people still use 32bit kernels.
The ABI is 64bit only, because we did not want to have yet another compat layer.
struct tcp_zerocopy_receive { __u64 address; /* in: address of mapping */ __u32 length; /* in/out: number of bytes to map/mapped */ __u32 recv_skip_hint; /* out: amount of bytes to skip */ };
Ah, got it, thanks! I'll add a comment here then, otherwise this looks confusing.
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
prctl_set_mm() and prctl_set_mm_map() use provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in these functions.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- kernel/sys.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/kernel/sys.c b/kernel/sys.c index 12df0e5434b8..8e56d87cc6db 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1993,6 +1993,18 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data if (copy_from_user(&prctl_map, addr, sizeof(prctl_map))) return -EFAULT;
+ prctl_map->start_code = untagged_addr(prctl_map.start_code); + prctl_map->end_code = untagged_addr(prctl_map.end_code); + prctl_map->start_data = untagged_addr(prctl_map.start_data); + prctl_map->end_data = untagged_addr(prctl_map.end_data); + prctl_map->start_brk = untagged_addr(prctl_map.start_brk); + prctl_map->brk = untagged_addr(prctl_map.brk); + prctl_map->start_stack = untagged_addr(prctl_map.start_stack); + prctl_map->arg_start = untagged_addr(prctl_map.arg_start); + prctl_map->arg_end = untagged_addr(prctl_map.arg_end); + prctl_map->env_start = untagged_addr(prctl_map.env_start); + prctl_map->env_end = untagged_addr(prctl_map.env_end); + error = validate_prctl_map(&prctl_map); if (error) return error; @@ -2106,6 +2118,8 @@ static int prctl_set_mm(int opt, unsigned long addr, opt != PR_SET_MM_MAP_SIZE))) return -EINVAL;
+ addr = untagged_addr(addr); + #ifdef CONFIG_CHECKPOINT_RESTORE if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE) return prctl_set_mm_map(opt, (const void __user *)addr, arg4);
On 15/03/2019 19:51, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
prctl_set_mm() and prctl_set_mm_map() use provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in these functions.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
kernel/sys.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/kernel/sys.c b/kernel/sys.c index 12df0e5434b8..8e56d87cc6db 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1993,6 +1993,18 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data if (copy_from_user(&prctl_map, addr, sizeof(prctl_map))) return -EFAULT;
- prctl_map->start_code = untagged_addr(prctl_map.start_code);
- prctl_map->end_code = untagged_addr(prctl_map.end_code);
- prctl_map->start_data = untagged_addr(prctl_map.start_data);
- prctl_map->end_data = untagged_addr(prctl_map.end_data);
- prctl_map->start_brk = untagged_addr(prctl_map.start_brk);
- prctl_map->brk = untagged_addr(prctl_map.brk);
- prctl_map->start_stack = untagged_addr(prctl_map.start_stack);
- prctl_map->arg_start = untagged_addr(prctl_map.arg_start);
- prctl_map->arg_end = untagged_addr(prctl_map.arg_end);
- prctl_map->env_start = untagged_addr(prctl_map.env_start);
- prctl_map->env_end = untagged_addr(prctl_map.env_end);
As the buildbot suggests, those -> should be . instead :) You might want to check your local build with CONFIG_CHECKPOINT_RESTORE=y.
- error = validate_prctl_map(&prctl_map); if (error) return error;
@@ -2106,6 +2118,8 @@ static int prctl_set_mm(int opt, unsigned long addr, opt != PR_SET_MM_MAP_SIZE))) return -EINVAL;
- addr = untagged_addr(addr);
This is a bit too coarse, addr is indeed used for find_vma() later on, but it is also used to access memory, by prctl_set_mm_mmap() and prctl_set_auxv().
Kevin
- #ifdef CONFIG_CHECKPOINT_RESTORE if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE) return prctl_set_mm_map(opt, (const void __user *)addr, arg4);
On Mon, Mar 18, 2019 at 12:47 PM Kevin Brodsky kevin.brodsky@arm.com wrote:
On 15/03/2019 19:51, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
prctl_set_mm() and prctl_set_mm_map() use provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in these functions.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
kernel/sys.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/kernel/sys.c b/kernel/sys.c index 12df0e5434b8..8e56d87cc6db 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1993,6 +1993,18 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data if (copy_from_user(&prctl_map, addr, sizeof(prctl_map))) return -EFAULT;
prctl_map->start_code = untagged_addr(prctl_map.start_code);
prctl_map->end_code = untagged_addr(prctl_map.end_code);
prctl_map->start_data = untagged_addr(prctl_map.start_data);
prctl_map->end_data = untagged_addr(prctl_map.end_data);
prctl_map->start_brk = untagged_addr(prctl_map.start_brk);
prctl_map->brk = untagged_addr(prctl_map.brk);
prctl_map->start_stack = untagged_addr(prctl_map.start_stack);
prctl_map->arg_start = untagged_addr(prctl_map.arg_start);
prctl_map->arg_end = untagged_addr(prctl_map.arg_end);
prctl_map->env_start = untagged_addr(prctl_map.env_start);
prctl_map->env_end = untagged_addr(prctl_map.env_end);
As the buildbot suggests, those -> should be . instead :) You might want to check your local build with CONFIG_CHECKPOINT_RESTORE=y.
Oops :)
error = validate_prctl_map(&prctl_map); if (error) return error;
@@ -2106,6 +2118,8 @@ static int prctl_set_mm(int opt, unsigned long addr, opt != PR_SET_MM_MAP_SIZE))) return -EINVAL;
addr = untagged_addr(addr);
This is a bit too coarse, addr is indeed used for find_vma() later on, but it is also used to access memory, by prctl_set_mm_mmap() and prctl_set_auxv().
Yes, I wrote this patch before our Friday discussion and forgot about it. I'll fix it in v12, thanks!
Kevin
- #ifdef CONFIG_CHECKPOINT_RESTORE if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE) return prctl_set_mm_map(opt, (const void __user *)addr, arg4);
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
seq_print_user_ip() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- kernel/trace/trace_output.c | 5 +++-- p | 45 +++++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 2 deletions(-) create mode 100644 p
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 54373d93e251..6376bee93c84 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -370,6 +370,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, { struct file *file = NULL; unsigned long vmstart = 0; + unsigned long untagged_ip = untagged_addr(ip); int ret = 1;
if (s->full) @@ -379,7 +380,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, const struct vm_area_struct *vma;
down_read(&mm->mmap_sem); - vma = find_vma(mm, ip); + vma = find_vma(mm, untagged_ip); if (vma) { file = vma->vm_file; vmstart = vma->vm_start; @@ -388,7 +389,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, ret = trace_seq_path(s, &file->f_path); if (ret) trace_seq_printf(s, "[+0x%lx]", - ip - vmstart); + untagged_ip - vmstart); } up_read(&mm->mmap_sem); } diff --git a/p b/p new file mode 100644 index 000000000000..9d6fa5386e55 --- /dev/null +++ b/p @@ -0,0 +1,45 @@ +commit 1fa6fadf644859e8a6a8ecce258444b49be8c7ee +Author: Andrey Konovalov andreyknvl@google.com +Date: Mon Mar 4 17:20:32 2019 +0100 + + kasan: fix coccinelle warnings in kasan_p*_table + + kasan_p4d_table, kasan_pmd_table and kasan_pud_table are declared as + returning bool, but return 0 instead of false, which produces a coccinelle + warning. Fix it. + + Fixes: 0207df4fa1a8 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN") + Reported-by: kbuild test robot lkp@intel.com + Signed-off-by: Andrey Konovalov andreyknvl@google.com + +diff --git a/mm/kasan/init.c b/mm/kasan/init.c +index 45a1b5e38e1e..fcaa1ca03175 100644 +--- a/mm/kasan/init.c ++++ b/mm/kasan/init.c +@@ -42,7 +42,7 @@ static inline bool kasan_p4d_table(pgd_t pgd) + #else + static inline bool kasan_p4d_table(pgd_t pgd) + { +- return 0; ++ return false; + } + #endif + #if CONFIG_PGTABLE_LEVELS > 3 +@@ -54,7 +54,7 @@ static inline bool kasan_pud_table(p4d_t p4d) + #else + static inline bool kasan_pud_table(p4d_t p4d) + { +- return 0; ++ return false; + } + #endif + #if CONFIG_PGTABLE_LEVELS > 2 +@@ -66,7 +66,7 @@ static inline bool kasan_pmd_table(pud_t pud) + #else + static inline bool kasan_pmd_table(pud_t pud) + { +- return 0; ++ return false; + } + #endif + pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
On Fri, 15 Mar 2019 20:51:34 +0100 Andrey Konovalov andreyknvl@google.com wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
seq_print_user_ip() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
kernel/trace/trace_output.c | 5 +++-- p | 45 +++++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 2 deletions(-) create mode 100644 p
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 54373d93e251..6376bee93c84 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -370,6 +370,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, { struct file *file = NULL; unsigned long vmstart = 0;
- unsigned long untagged_ip = untagged_addr(ip); int ret = 1;
if (s->full) @@ -379,7 +380,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, const struct vm_area_struct *vma; down_read(&mm->mmap_sem);
vma = find_vma(mm, ip);
if (vma) { file = vma->vm_file; vmstart = vma->vm_start;vma = find_vma(mm, untagged_ip);
@@ -388,7 +389,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, ret = trace_seq_path(s, &file->f_path); if (ret) trace_seq_printf(s, "[+0x%lx]",
ip - vmstart);
} up_read(&mm->mmap_sem); }untagged_ip - vmstart);
diff --git a/p b/p new file mode 100644 index 000000000000..9d6fa5386e55 --- /dev/null +++ b/p @@ -0,0 +1,45 @@ +commit 1fa6fadf644859e8a6a8ecce258444b49be8c7ee +Author: Andrey Konovalov andreyknvl@google.com +Date: Mon Mar 4 17:20:32 2019 +0100
- kasan: fix coccinelle warnings in kasan_p*_table
- kasan_p4d_table, kasan_pmd_table and kasan_pud_table are declared as
- returning bool, but return 0 instead of false, which produces a coccinelle
- warning. Fix it.
- Fixes: 0207df4fa1a8 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
- Reported-by: kbuild test robot lkp@intel.com
- Signed-off-by: Andrey Konovalov andreyknvl@google.com
Did you mean to append this commit to this patch?
-- Steve
+diff --git a/mm/kasan/init.c b/mm/kasan/init.c +index 45a1b5e38e1e..fcaa1ca03175 100644 +--- a/mm/kasan/init.c ++++ b/mm/kasan/init.c +@@ -42,7 +42,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
- #else
- static inline bool kasan_p4d_table(pgd_t pgd)
- {
+- return 0; ++ return false;
- }
- #endif
- #if CONFIG_PGTABLE_LEVELS > 3
+@@ -54,7 +54,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
- #else
- static inline bool kasan_pud_table(p4d_t p4d)
- {
+- return 0; ++ return false;
- }
- #endif
- #if CONFIG_PGTABLE_LEVELS > 2
+@@ -66,7 +66,7 @@ static inline bool kasan_pmd_table(pud_t pud)
- #else
- static inline bool kasan_pmd_table(pud_t pud)
- {
+- return 0; ++ return false;
- }
- #endif
- pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
On Fri, Mar 15, 2019 at 9:14 PM Steven Rostedt rostedt@goodmis.org wrote:
On Fri, 15 Mar 2019 20:51:34 +0100 Andrey Konovalov andreyknvl@google.com wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
seq_print_user_ip() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
kernel/trace/trace_output.c | 5 +++-- p | 45 +++++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 2 deletions(-) create mode 100644 p
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 54373d93e251..6376bee93c84 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -370,6 +370,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, { struct file *file = NULL; unsigned long vmstart = 0;
unsigned long untagged_ip = untagged_addr(ip); int ret = 1; if (s->full)
@@ -379,7 +380,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, const struct vm_area_struct *vma;
down_read(&mm->mmap_sem);
vma = find_vma(mm, ip);
vma = find_vma(mm, untagged_ip); if (vma) { file = vma->vm_file; vmstart = vma->vm_start;
@@ -388,7 +389,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, ret = trace_seq_path(s, &file->f_path); if (ret) trace_seq_printf(s, "[+0x%lx]",
ip - vmstart);
untagged_ip - vmstart); } up_read(&mm->mmap_sem); }
diff --git a/p b/p new file mode 100644 index 000000000000..9d6fa5386e55 --- /dev/null +++ b/p @@ -0,0 +1,45 @@ +commit 1fa6fadf644859e8a6a8ecce258444b49be8c7ee +Author: Andrey Konovalov andreyknvl@google.com +Date: Mon Mar 4 17:20:32 2019 +0100
- kasan: fix coccinelle warnings in kasan_p*_table
- kasan_p4d_table, kasan_pmd_table and kasan_pud_table are declared as
- returning bool, but return 0 instead of false, which produces a coccinelle
- warning. Fix it.
- Fixes: 0207df4fa1a8 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
- Reported-by: kbuild test robot lkp@intel.com
- Signed-off-by: Andrey Konovalov andreyknvl@google.com
Did you mean to append this commit to this patch?
No, did it by mistake. Will remove in v12, thanks for noticing!
-- Steve
+diff --git a/mm/kasan/init.c b/mm/kasan/init.c +index 45a1b5e38e1e..fcaa1ca03175 100644 +--- a/mm/kasan/init.c ++++ b/mm/kasan/init.c +@@ -42,7 +42,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
- #else
- static inline bool kasan_p4d_table(pgd_t pgd)
- {
+- return 0; ++ return false;
- }
- #endif
- #if CONFIG_PGTABLE_LEVELS > 3
+@@ -54,7 +54,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
- #else
- static inline bool kasan_pud_table(p4d_t p4d)
- {
+- return 0; ++ return false;
- }
- #endif
- #if CONFIG_PGTABLE_LEVELS > 2
+@@ -66,7 +66,7 @@ static inline bool kasan_pmd_table(pud_t pud)
- #else
- static inline bool kasan_pmd_table(pud_t pud)
- {
+- return 0; ++ return false;
- }
- #endif
- pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
find_active_uprobe() uses provided user pointer (obtained via instruction_pointer(regs)) for vma lookups, which can only by done with untagged pointers.
Untag the user pointer in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- kernel/events/uprobes.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index c5cde87329c7..d3a2716a813a 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1992,6 +1992,8 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp) struct uprobe *uprobe = NULL; struct vm_area_struct *vma;
+ bp_vaddr = untagged_addr(bp_vaddr); + down_read(&mm->mmap_sem); vma = find_vma(mm, bp_vaddr); if (vma && vma->vm_start <= bp_vaddr) {
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
stack_map_get_build_id_offset() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag the user pointer in this function for doing the lookup and calculating the offset, but save as is into the bpf_stack_build_id struct.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- kernel/bpf/stackmap.c | 6 ++++-- p | 45 ------------------------------------------- 2 files changed, 4 insertions(+), 47 deletions(-) delete mode 100644 p
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 950ab2f28922..bb89341d3faf 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -320,7 +320,9 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, }
for (i = 0; i < trace_nr; i++) { - vma = find_vma(current->mm, ips[i]); + u64 untagged_ip = untagged_addr(ips[i]); + + vma = find_vma(current->mm, untagged_ip); if (!vma || stack_map_get_build_id(vma, id_offs[i].build_id)) { /* per entry fall back to ips */ id_offs[i].status = BPF_STACK_BUILD_ID_IP; @@ -328,7 +330,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, memset(id_offs[i].build_id, 0, BPF_BUILD_ID_SIZE); continue; } - id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ips[i] + id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + untagged_ip - vma->vm_start; id_offs[i].status = BPF_STACK_BUILD_ID_VALID; } diff --git a/p b/p deleted file mode 100644 index 9d6fa5386e55..000000000000 --- a/p +++ /dev/null @@ -1,45 +0,0 @@ -commit 1fa6fadf644859e8a6a8ecce258444b49be8c7ee -Author: Andrey Konovalov andreyknvl@google.com -Date: Mon Mar 4 17:20:32 2019 +0100 - - kasan: fix coccinelle warnings in kasan_p*_table - - kasan_p4d_table, kasan_pmd_table and kasan_pud_table are declared as - returning bool, but return 0 instead of false, which produces a coccinelle - warning. Fix it. - - Fixes: 0207df4fa1a8 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN") - Reported-by: kbuild test robot lkp@intel.com - Signed-off-by: Andrey Konovalov andreyknvl@google.com - -diff --git a/mm/kasan/init.c b/mm/kasan/init.c -index 45a1b5e38e1e..fcaa1ca03175 100644 ---- a/mm/kasan/init.c -+++ b/mm/kasan/init.c -@@ -42,7 +42,7 @@ static inline bool kasan_p4d_table(pgd_t pgd) - #else - static inline bool kasan_p4d_table(pgd_t pgd) - { -- return 0; -+ return false; - } - #endif - #if CONFIG_PGTABLE_LEVELS > 3 -@@ -54,7 +54,7 @@ static inline bool kasan_pud_table(p4d_t p4d) - #else - static inline bool kasan_pud_table(p4d_t p4d) - { -- return 0; -+ return false; - } - #endif - #if CONFIG_PGTABLE_LEVELS > 2 -@@ -66,7 +66,7 @@ static inline bool kasan_pmd_table(pud_t pud) - #else - static inline bool kasan_pmd_table(pud_t pud) - { -- return 0; -+ return false; - } - #endif - pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
Document the ABI changes in Documentation/arm64/tagged-pointers.txt.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- Documentation/arm64/tagged-pointers.txt | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/Documentation/arm64/tagged-pointers.txt b/Documentation/arm64/tagged-pointers.txt index a25a99e82bb1..07fdddeacad0 100644 --- a/Documentation/arm64/tagged-pointers.txt +++ b/Documentation/arm64/tagged-pointers.txt @@ -17,13 +17,15 @@ this byte for application use. Passing tagged addresses to the kernel --------------------------------------
-All interpretation of userspace memory addresses by the kernel assumes -an address tag of 0x00. +The kernel supports tags in pointer arguments (including pointers in +structures) of syscalls, however such pointers must point to memory ranges +obtained by anonymous mmap() or brk().
-This includes, but is not limited to, addresses found in: +The kernel supports tags in user fault addresses. However the fault_address +field in the sigcontext struct will contain an untagged address.
- - pointer arguments to system calls, including pointers in structures - passed to system calls, +All other interpretations of userspace memory addresses by the kernel +assume an address tag of 0x00, in particular:
- the stack pointer (sp), e.g. when interpreting it to deliver a signal, @@ -33,11 +35,7 @@ This includes, but is not limited to, addresses found in:
Using non-zero address tags in any of these locations may result in an error code being returned, a (fatal) signal being raised, or other modes -of failure. - -For these reasons, passing non-zero address tags to the kernel via -system calls is forbidden, and using a non-zero address tag for sp is -strongly discouraged. +of failure. Using a non-zero address tag for sp is strongly discouraged.
Programs maintaining a frame pointer and frame records that use non-zero address tags may suffer impaired or inaccurate debug and profiling
On 15/03/2019 19:51, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
Document the ABI changes in Documentation/arm64/tagged-pointers.txt.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
Documentation/arm64/tagged-pointers.txt | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/Documentation/arm64/tagged-pointers.txt b/Documentation/arm64/tagged-pointers.txt index a25a99e82bb1..07fdddeacad0 100644 --- a/Documentation/arm64/tagged-pointers.txt +++ b/Documentation/arm64/tagged-pointers.txt @@ -17,13 +17,15 @@ this byte for application use. Passing tagged addresses to the kernel
-All interpretation of userspace memory addresses by the kernel assumes -an address tag of 0x00. +The kernel supports tags in pointer arguments (including pointers in +structures) of syscalls, however such pointers must point to memory ranges +obtained by anonymous mmap() or brk(). -This includes, but is not limited to, addresses found in: +The kernel supports tags in user fault addresses. However the fault_address +field in the sigcontext struct will contain an untagged address.
- pointer arguments to system calls, including pointers in structures
- passed to system calls,
+All other interpretations of userspace memory addresses by the kernel +assume an address tag of 0x00, in particular:
- the stack pointer (sp), e.g. when interpreting it to deliver a signal,
@@ -33,11 +35,7 @@ This includes, but is not limited to, addresses found in: Using non-zero address tags in any of these locations may result in an error code being returned, a (fatal) signal being raised, or other modes -of failure.
-For these reasons, passing non-zero address tags to the kernel via -system calls is forbidden, and using a non-zero address tag for sp is -strongly discouraged. +of failure. Using a non-zero address tag for sp is strongly discouraged.
I don't understand why we should keep such a limitation. For MTE, tagging SP is something we are definitely considering. This does bother userspace software in some rare cases, but I'm not sure in what way it bothers the kernel.
Kevin
Programs maintaining a frame pointer and frame records that use non-zero address tags may suffer impaired or inaccurate debug and profiling
On Mon, Mar 18, 2019 at 2:26 PM Kevin Brodsky kevin.brodsky@arm.com wrote:
On 15/03/2019 19:51, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
Document the ABI changes in Documentation/arm64/tagged-pointers.txt.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
Documentation/arm64/tagged-pointers.txt | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/Documentation/arm64/tagged-pointers.txt b/Documentation/arm64/tagged-pointers.txt index a25a99e82bb1..07fdddeacad0 100644 --- a/Documentation/arm64/tagged-pointers.txt +++ b/Documentation/arm64/tagged-pointers.txt @@ -17,13 +17,15 @@ this byte for application use. Passing tagged addresses to the kernel
-All interpretation of userspace memory addresses by the kernel assumes -an address tag of 0x00. +The kernel supports tags in pointer arguments (including pointers in +structures) of syscalls, however such pointers must point to memory ranges +obtained by anonymous mmap() or brk().
-This includes, but is not limited to, addresses found in: +The kernel supports tags in user fault addresses. However the fault_address +field in the sigcontext struct will contain an untagged address.
- pointer arguments to system calls, including pointers in structures
- passed to system calls,
+All other interpretations of userspace memory addresses by the kernel +assume an address tag of 0x00, in particular:
- the stack pointer (sp), e.g. when interpreting it to deliver a signal,
@@ -33,11 +35,7 @@ This includes, but is not limited to, addresses found in:
Using non-zero address tags in any of these locations may result in an error code being returned, a (fatal) signal being raised, or other modes -of failure.
-For these reasons, passing non-zero address tags to the kernel via -system calls is forbidden, and using a non-zero address tag for sp is -strongly discouraged. +of failure. Using a non-zero address tag for sp is strongly discouraged.
I don't understand why we should keep such a limitation. For MTE, tagging SP is something we are definitely considering. This does bother userspace software in some rare cases, but I'm not sure in what way it bothers the kernel.
I don't mind allowing tagged sp as well, but it seems that it's another ABI relaxation that needs to be handled separately. I'm not sure if we want to include that into this patchset, which is supposed to allow tagged pointers to be passed to syscalls.
Kevin
Programs maintaining a frame pointer and frame records that use non-zero address tags may suffer impaired or inaccurate debug and profiling
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
This patch adds a simple test, that calls the uname syscall with a tagged user pointer as an argument. Without the kernel accepting tagged user pointers the test fails with EFAULT.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- tools/testing/selftests/arm64/.gitignore | 1 + tools/testing/selftests/arm64/Makefile | 11 ++++++++++ .../testing/selftests/arm64/run_tags_test.sh | 12 +++++++++++ tools/testing/selftests/arm64/tags_test.c | 21 +++++++++++++++++++ 4 files changed, 45 insertions(+) create mode 100644 tools/testing/selftests/arm64/.gitignore create mode 100644 tools/testing/selftests/arm64/Makefile create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh create mode 100644 tools/testing/selftests/arm64/tags_test.c
diff --git a/tools/testing/selftests/arm64/.gitignore b/tools/testing/selftests/arm64/.gitignore new file mode 100644 index 000000000000..e8fae8d61ed6 --- /dev/null +++ b/tools/testing/selftests/arm64/.gitignore @@ -0,0 +1 @@ +tags_test diff --git a/tools/testing/selftests/arm64/Makefile b/tools/testing/selftests/arm64/Makefile new file mode 100644 index 000000000000..a61b2e743e99 --- /dev/null +++ b/tools/testing/selftests/arm64/Makefile @@ -0,0 +1,11 @@ +# SPDX-License-Identifier: GPL-2.0 + +# ARCH can be overridden by the user for cross compiling +ARCH ?= $(shell uname -m 2>/dev/null || echo not) + +ifneq (,$(filter $(ARCH),aarch64 arm64)) +TEST_GEN_PROGS := tags_test +TEST_PROGS := run_tags_test.sh +endif + +include ../lib.mk diff --git a/tools/testing/selftests/arm64/run_tags_test.sh b/tools/testing/selftests/arm64/run_tags_test.sh new file mode 100755 index 000000000000..745f11379930 --- /dev/null +++ b/tools/testing/selftests/arm64/run_tags_test.sh @@ -0,0 +1,12 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +echo "--------------------" +echo "running tags test" +echo "--------------------" +./tags_test +if [ $? -ne 0 ]; then + echo "[FAIL]" +else + echo "[PASS]" +fi diff --git a/tools/testing/selftests/arm64/tags_test.c b/tools/testing/selftests/arm64/tags_test.c new file mode 100644 index 000000000000..2bd1830a7ebe --- /dev/null +++ b/tools/testing/selftests/arm64/tags_test.c @@ -0,0 +1,21 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <stdint.h> +#include <sys/utsname.h> + +#define SHIFT_TAG(tag) ((uint64_t)(tag) << 56) +#define SET_TAG(ptr, tag) (((uint64_t)(ptr) & ~SHIFT_TAG(0xff)) | \ + SHIFT_TAG(tag)) + +int main(void) +{ + struct utsname *ptr = (struct utsname *)malloc(sizeof(*ptr)); + void *tagged_ptr = (void *)SET_TAG(ptr, 0x42); + int err = uname(tagged_ptr); + + free(ptr); + return err; +}
On arm64 the TCR_EL1.TBI0 bit has been always enabled in the Linux kernel hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
This patchset proposes a relaxation of the ABI and a mechanism to advertise it to the userspace via an AT_FLAGS.
The rationale behind the choice of AT_FLAGS is that the Unix System V ABI defines AT_FLAGS as "flags", leaving some degree of freedom in interpretation. There are two previous attempts of using AT_FLAGS in the Linux Kernel for different reasons: the first was more generic and was used to expose the support for the GNU STACK NX feature [1] and the second was done for the MIPS architecture and was used to expose the support of "MIPS ABI Extension for IEEE Std 754 Non-Compliant Interlinking" [2]. Both the changes are currently _not_ merged in mainline. The only architecture that reserves some of the bits in AT_FLAGS is currently MIPS, which introduced the concept of platform specific ABI (psABI) reserving the top-byte [3].
When ARM64_AT_FLAGS_SYSCALL_TBI is set the kernel is advertising to the userspace that a relaxed ABI is supported hence this type of pointers are now allowed to be passed to the syscalls when they are in memory ranges obtained by anonymous mmap() or brk().
The userspace _must_ verify that the flag is set before passing tagged pointers to the syscalls allowed by this relaxation.
More in general, exposing the ARM64_AT_FLAGS_SYSCALL_TBI flag and mandating to the software to check that the feature is present, before using the associated functionality, it provides a degree of control on the decision of disabling such a feature in future without consequently breaking the userspace.
The change required a modification of the elf common code, because in Linux the AT_FLAGS are currently set to zero by default by the kernel.
The newly added flag has been verified on arm64 using the code below. #include <stdio.h> #include <stdbool.h> #include <sys/auxv.h>
#define ARM64_AT_FLAGS_SYSCALL_TBI (1 << 0)
bool arm64_syscall_tbi_is_present(void) { unsigned long at_flags = getauxval(AT_FLAGS); if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI) return true;
return false; }
void main() { if (arm64_syscall_tbi_is_present()) printf("ARM64_AT_FLAGS_SYSCALL_TBI is present\n"); }
This patchset should be merged together with [4].
[1] https://patchwork.ozlabs.org/patch/579578/ [2] https://lore.kernel.org/patchwork/cover/618280/ [3] ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/psABI_mips3.0.pdf [4] https://patchwork.kernel.org/cover/10674351/
ABI References: --------------- Sco SysV ABI: http://www.sco.com/developers/gabi/2003-12-17/contents.html PowerPC AUXV: http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/db... AMD64 ABI: https://www.cs.tufts.edu/comp/40-2012f/readings/amd64-abi.pdf x86 ABI: https://www.uclibc.org/docs/psABI-i386.pdf MIPS ABI: ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/psABI_mips3.0.pdf ARM ABI: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044f/IHI0044F_aaelf.pdf SPARC ABI: http://math-atlas.sourceforge.net/devel/assembly/abi_sysV_sparc.pdf
CC: Alexander Viro viro@zeniv.linux.org.uk Cc: Alexei Starovoitov ast@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Andrey Konovalov andreyknvl@google.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Branislav Rankov Branislav.Rankov@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chintan Pandya cpandya@codeaurora.org Cc: Daniel Borkmann daniel@iogearbox.net Cc: Dave Martin Dave.Martin@arm.com Cc: "David S. Miller" davem@davemloft.net Cc: Dmitry Vyukov dvyukov@google.com Cc: Eric Dumazet edumazet@google.com Cc: Evgeniy Stepanov eugenis@google.com Cc: Graeme Barnes Graeme.Barnes@arm.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Ingo Molnar mingo@kernel.org Cc: Jacob Bramley Jacob.Bramley@arm.com Cc: Kate Stewart kstewart@linuxfoundation.org Cc: Kees Cook keescook@chromium.org Cc: Kevin Brodsky kevin.brodsky@arm.com Cc: "Kirill A . Shutemov" kirill.shutemov@linux.intel.com Cc: Kostya Serebryany kcc@google.com Cc: Lee Smith Lee.Smith@arm.com Cc: Luc Van Oostenryck luc.vanoostenryck@gmail.com Cc: Mark Rutland mark.rutland@arm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ramana Radhakrishnan Ramana.Radhakrishnan@arm.com Cc: Robin Murphy robin.murphy@arm.com Cc: Ruben Ayrapetyan Ruben.Ayrapetyan@arm.com Cc: Shuah Khan shuah@kernel.org Cc: Steven Rostedt rostedt@goodmis.org Cc: Szabolcs Nagy Szabolcs.Nagy@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com
Changes: -------- v2: - Rebased on 5.1-rc1 - Addressed review comments - Modified tagged-pointers.txt to be compliant with the new ABI relaxation
Vincenzo Frascino (4): elf: Make AT_FLAGS arch configurable arm64: Define Documentation/arm64/elf_at_flags.txt arm64: Relax Documentation/arm64/tagged-pointers.txt arm64: elf: Advertise relaxed ABI
Documentation/arm64/elf_at_flags.txt | 133 ++++++++++++++++++++++++ Documentation/arm64/tagged-pointers.txt | 23 ++-- arch/arm64/include/asm/atflags.h | 7 ++ arch/arm64/include/asm/elf.h | 5 + arch/arm64/include/uapi/asm/atflags.h | 8 ++ fs/binfmt_elf.c | 6 +- fs/binfmt_elf_fdpic.c | 6 +- fs/compat_binfmt_elf.c | 5 + 8 files changed, 184 insertions(+), 9 deletions(-) create mode 100644 Documentation/arm64/elf_at_flags.txt create mode 100644 arch/arm64/include/asm/atflags.h create mode 100644 arch/arm64/include/uapi/asm/atflags.h
Currently, the AT_FLAGS in the elf auxiliary vector are set to 0 by default by the kernel. Some architectures might need to expose to the userspace a non-zero value to advertise some platform specific ABI functionalities.
Make AT_FLAGS configurable by the architectures that require it.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com CC: Alexander Viro viro@zeniv.linux.org.uk Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- fs/binfmt_elf.c | 6 +++++- fs/binfmt_elf_fdpic.c | 6 +++++- fs/compat_binfmt_elf.c | 5 +++++ 3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 7d09d125f148..f699a9ef5112 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -84,6 +84,10 @@ static int elf_core_dump(struct coredump_params *cprm); #define ELF_CORE_EFLAGS 0 #endif
+#ifndef ELF_AT_FLAGS +#define ELF_AT_FLAGS 0 +#endif + #define ELF_PAGESTART(_v) ((_v) & ~(unsigned long)(ELF_MIN_ALIGN-1)) #define ELF_PAGEOFFSET(_v) ((_v) & (ELF_MIN_ALIGN-1)) #define ELF_PAGEALIGN(_v) (((_v) + ELF_MIN_ALIGN - 1) & ~(ELF_MIN_ALIGN - 1)) @@ -249,7 +253,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec, NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr)); NEW_AUX_ENT(AT_PHNUM, exec->e_phnum); NEW_AUX_ENT(AT_BASE, interp_load_addr); - NEW_AUX_ENT(AT_FLAGS, 0); + NEW_AUX_ENT(AT_FLAGS, ELF_AT_FLAGS); NEW_AUX_ENT(AT_ENTRY, exec->e_entry); NEW_AUX_ENT(AT_UID, from_kuid_munged(cred->user_ns, cred->uid)); NEW_AUX_ENT(AT_EUID, from_kuid_munged(cred->user_ns, cred->euid)); diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index b53bb3729ac1..cf1e680a6b88 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -82,6 +82,10 @@ static int elf_fdpic_map_file_by_direct_mmap(struct elf_fdpic_params *, static int elf_fdpic_core_dump(struct coredump_params *cprm); #endif
+#ifndef ELF_AT_FLAGS +#define ELF_AT_FLAGS 0 +#endif + static struct linux_binfmt elf_fdpic_format = { .module = THIS_MODULE, .load_binary = load_elf_fdpic_binary, @@ -651,7 +655,7 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr)); NEW_AUX_ENT(AT_PHNUM, exec_params->hdr.e_phnum); NEW_AUX_ENT(AT_BASE, interp_params->elfhdr_addr); - NEW_AUX_ENT(AT_FLAGS, 0); + NEW_AUX_ENT(AT_FLAGS, ELF_AT_FLAGS); NEW_AUX_ENT(AT_ENTRY, exec_params->entry_addr); NEW_AUX_ENT(AT_UID, (elf_addr_t) from_kuid_munged(cred->user_ns, cred->uid)); NEW_AUX_ENT(AT_EUID, (elf_addr_t) from_kuid_munged(cred->user_ns, cred->euid)); diff --git a/fs/compat_binfmt_elf.c b/fs/compat_binfmt_elf.c index 15f6e96b3bd9..a21cf99701ae 100644 --- a/fs/compat_binfmt_elf.c +++ b/fs/compat_binfmt_elf.c @@ -79,6 +79,11 @@ #define ELF_HWCAP2 COMPAT_ELF_HWCAP2 #endif
+#ifdef COMPAT_ELF_AT_FLAGS +#undef ELF_AT_FLAGS +#define ELF_AT_FLAGS COMPAT_ELF_AT_FLAGS +#endif + #ifdef COMPAT_ARCH_DLINFO #undef ARCH_DLINFO #define ARCH_DLINFO COMPAT_ARCH_DLINFO
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap() or brk().
This change in the ABI requires a mechanism to inform the userspace that such an option is available.
Specify and document the way in which AT_FLAGS can be used to advertise this feature to the userspace.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com
Squash with "arm64: Define Documentation/arm64/elf_at_flags.txt" --- Documentation/arm64/elf_at_flags.txt | 133 +++++++++++++++++++++++++++ 1 file changed, 133 insertions(+) create mode 100644 Documentation/arm64/elf_at_flags.txt
diff --git a/Documentation/arm64/elf_at_flags.txt b/Documentation/arm64/elf_at_flags.txt new file mode 100644 index 000000000000..9b3494207c14 --- /dev/null +++ b/Documentation/arm64/elf_at_flags.txt @@ -0,0 +1,133 @@ +ARM64 ELF AT_FLAGS +================== + +This document describes the usage and semantics of AT_FLAGS on arm64. + +1. Introduction +--------------- + +AT_FLAGS is part of the Auxiliary Vector, contains the flags and it +is set to zero by the kernel on arm64 unless one or more of the +features detailed in paragraph 2 are present. + +The auxiliary vector can be accessed by the userspace using the +getauxval() API provided by the C library. +getauxval() returns an unsigned long and when a flag is present in +the AT_FLAGS, the corresponding bit in the returned value is set to 1. + +The AT_FLAGS with a "defined semantics" on arm64 are exposed to the +userspace via user API (uapi/asm/atflags.h). +The AT_FLAGS bits with "undefined semantics" are set to zero by default. +This means that the AT_FLAGS bits to which this document does not assign +an explicit meaning are to be intended reserved for future use. +The kernel will populate all such bits with zero until meanings are +assigned to them. If and when meanings are assigned, it is guaranteed +that they will not impact the functional operation of existing userspace +software. Userspace software should ignore any AT_FLAGS bit whose meaning +is not defined when the software is written. + +The userspace software can test for features by acquiring the AT_FLAGS +entry of the auxiliary vector, and testing whether a relevant flag +is set. + +Example of a userspace test function: + +bool feature_x_is_present(void) +{ + unsigned long at_flags = getauxval(AT_FLAGS); + if (at_flags & FEATURE_X) + return true; + + return false; +} + +Where the software relies on a feature advertised by AT_FLAGS, it +must check that the feature is present before attempting to +use it. + +2. Features exposed via AT_FLAGS +-------------------------------- + +bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI + + On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64 + kernel, hence the userspace (EL0) is allowed to set a non-zero value + in the top byte but the resulting pointers are not allowed at the + user-kernel syscall ABI boundary. + When bit[0] is set to 1 the kernel is advertising to the userspace + that a relaxed ABI is supported hence this type of pointers are now + allowed to be passed to the syscalls, when these pointers are in + memory ranges privately owned by a process and obtained by the + process in accordance with the definition of "valid tagged pointer" + in paragraph 3. + In these cases the tag is preserved as the pointer goes through the + kernel. Only when the kernel needs to check if a pointer is coming + from userspace an untag operation is required. + +3. ARM64_AT_FLAGS_SYSCALL_TBI +----------------------------- + +From the kernel syscall interface prospective, we define, for the purposes +of this document, a "valid tagged pointer" as a pointer that either it has +a zero value set in the top byte or it has a non-zero value, it is in memory +ranges privately owned by a userspace process and it is obtained in one of +the following ways: + - mmap() done by the process itself, where either: + * flags = MAP_PRIVATE | MAP_ANONYMOUS + * flags = MAP_PRIVATE and the file descriptor refers to a regular + file or "/dev/zero" + - a mapping below sbrk(0) done by the process itself + - any memory mapped by the kernel in the process's address space during + creation and following the restrictions presented above (i.e. data, bss, + stack). + +When the ARM64_AT_FLAGS_SYSCALL_TBI flag is set by the kernel, the following +behaviours are guaranteed by the ABI: + + - Every current or newly introduced syscall can accept any valid tagged + pointers. + + - If a non valid tagged pointer is passed to a syscall then the behaviour + is undefined. + + - Every valid tagged pointer is expected to work as an untagged one. + + - The kernel preserves any valid tagged pointers and returns them to the + userspace unchanged in all the cases except the ones documented in the + "Preserving tags" paragraph of tagged-pointers.txt. + +A definition of the meaning of tagged pointers on arm64 can be found in: +Documentation/arm64/tagged-pointers.txt. + +Example of correct usage (pseudo-code) for a userspace application: + +bool arm64_syscall_tbi_is_present(void) +{ + unsigned long at_flags = getauxval(AT_FLAGS); + if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI) + return true; + + return false; +} + +void main(void) +{ + char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS, -1, 0); + + int fd = open("test.txt", O_WRONLY); + + /* Check if the relaxed ABI is supported */ + if (arm64_syscall_tbi_is_present()) { + /* Add a tag to the pointer */ + addr = tag_pointer(addr); + } + + strcpy("Hello World\n", addr); + + /* Write to a file */ + write(fd, addr, sizeof(addr)); + + close(fd); +} +
Hi Vincenzo,
On Mon, Mar 18, 2019 at 10:06 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap() or brk().
This change in the ABI requires a mechanism to inform the userspace that such an option is available.
Specify and document the way in which AT_FLAGS can be used to advertise this feature to the userspace.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com
Squash with "arm64: Define Documentation/arm64/elf_at_flags.txt"
Documentation/arm64/elf_at_flags.txt | 133 +++++++++++++++++++++++++++ 1 file changed, 133 insertions(+) create mode 100644 Documentation/arm64/elf_at_flags.txt
diff --git a/Documentation/arm64/elf_at_flags.txt b/Documentation/arm64/elf_at_flags.txt new file mode 100644 index 000000000000..9b3494207c14 --- /dev/null +++ b/Documentation/arm64/elf_at_flags.txt @@ -0,0 +1,133 @@ +ARM64 ELF AT_FLAGS +==================
+This document describes the usage and semantics of AT_FLAGS on arm64.
+1. Introduction +---------------
+AT_FLAGS is part of the Auxiliary Vector, contains the flags and it +is set to zero by the kernel on arm64 unless one or more of the +features detailed in paragraph 2 are present.
+The auxiliary vector can be accessed by the userspace using the +getauxval() API provided by the C library. +getauxval() returns an unsigned long and when a flag is present in +the AT_FLAGS, the corresponding bit in the returned value is set to 1.
+The AT_FLAGS with a "defined semantics" on arm64 are exposed to the +userspace via user API (uapi/asm/atflags.h). +The AT_FLAGS bits with "undefined semantics" are set to zero by default. +This means that the AT_FLAGS bits to which this document does not assign +an explicit meaning are to be intended reserved for future use. +The kernel will populate all such bits with zero until meanings are +assigned to them. If and when meanings are assigned, it is guaranteed +that they will not impact the functional operation of existing userspace +software. Userspace software should ignore any AT_FLAGS bit whose meaning +is not defined when the software is written.
+The userspace software can test for features by acquiring the AT_FLAGS +entry of the auxiliary vector, and testing whether a relevant flag +is set.
+Example of a userspace test function:
+bool feature_x_is_present(void) +{
unsigned long at_flags = getauxval(AT_FLAGS);
if (at_flags & FEATURE_X)
return true;
return false;
+}
+Where the software relies on a feature advertised by AT_FLAGS, it +must check that the feature is present before attempting to +use it.
+2. Features exposed via AT_FLAGS +--------------------------------
+bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
- On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
- kernel, hence the userspace (EL0) is allowed to set a non-zero value
- in the top byte but the resulting pointers are not allowed at the
- user-kernel syscall ABI boundary.
- When bit[0] is set to 1 the kernel is advertising to the userspace
- that a relaxed ABI is supported hence this type of pointers are now
- allowed to be passed to the syscalls, when these pointers are in
- memory ranges privately owned by a process and obtained by the
- process in accordance with the definition of "valid tagged pointer"
- in paragraph 3.
- In these cases the tag is preserved as the pointer goes through the
- kernel. Only when the kernel needs to check if a pointer is coming
- from userspace an untag operation is required.
+3. ARM64_AT_FLAGS_SYSCALL_TBI +-----------------------------
+From the kernel syscall interface prospective, we define, for the purposes +of this document, a "valid tagged pointer" as a pointer that either it has +a zero value set in the top byte or it has a non-zero value, it is in memory +ranges privately owned by a userspace process and it is obtained in one of +the following ways:
- mmap() done by the process itself, where either:
- flags = MAP_PRIVATE | MAP_ANONYMOUS
- flags = MAP_PRIVATE and the file descriptor refers to a regular
file or "/dev/zero"
- a mapping below sbrk(0) done by the process itself
- any memory mapped by the kernel in the process's address space during
- creation and following the restrictions presented above (i.e. data, bss,
- stack).
+When the ARM64_AT_FLAGS_SYSCALL_TBI flag is set by the kernel, the following +behaviours are guaranteed by the ABI:
- Every current or newly introduced syscall can accept any valid tagged
- pointers.
- If a non valid tagged pointer is passed to a syscall then the behaviour
- is undefined.
- Every valid tagged pointer is expected to work as an untagged one.
- The kernel preserves any valid tagged pointers and returns them to the
- userspace unchanged in all the cases except the ones documented in the
- "Preserving tags" paragraph of tagged-pointers.txt.
+A definition of the meaning of tagged pointers on arm64 can be found in: +Documentation/arm64/tagged-pointers.txt.
+Example of correct usage (pseudo-code) for a userspace application:
+bool arm64_syscall_tbi_is_present(void) +{
unsigned long at_flags = getauxval(AT_FLAGS);
if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
return true;
return false;
+}
+void main(void) +{
char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS, -1, 0);
int fd = open("test.txt", O_WRONLY);
/* Check if the relaxed ABI is supported */
if (arm64_syscall_tbi_is_present()) {
/* Add a tag to the pointer */
addr = tag_pointer(addr);
}
strcpy("Hello World\n", addr);
Nit: s/strcpy("Hello World\n", addr)/strcpy(addr, "Hello World\n")
Thanks, Amit D
/* Write to a file */
write(fd, addr, sizeof(addr));
close(fd);
+}
-- 2.21.0
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Fri, Mar 22, 2019 at 11:52:37AM +0530, Amit Daniel Kachhap wrote:
On Mon, Mar 18, 2019 at 10:06 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
+Example of correct usage (pseudo-code) for a userspace application:
+bool arm64_syscall_tbi_is_present(void) +{
unsigned long at_flags = getauxval(AT_FLAGS);
if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
return true;
return false;
+}
+void main(void) +{
char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS, -1, 0);
int fd = open("test.txt", O_WRONLY);
/* Check if the relaxed ABI is supported */
if (arm64_syscall_tbi_is_present()) {
/* Add a tag to the pointer */
addr = tag_pointer(addr);
}
strcpy("Hello World\n", addr);
Nit: s/strcpy("Hello World\n", addr)/strcpy(addr, "Hello World\n")
Not exactly a nit ;).
/* Write to a file */
write(fd, addr, sizeof(addr));
I presume this was supposed to write "Hello World\n" to a file but sizeof(addr) is 1.
Since we already support tagged pointers in user space (as long as they are not passed into the kernel), the above example could tag the pointer unconditionally and only clear it before write() if !arm64_syscall_tbi_is_present().
On 18/03/2019 16:35, Vincenzo Frascino wrote:
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap() or brk().
This change in the ABI requires a mechanism to inform the userspace that such an option is available.
Specify and document the way in which AT_FLAGS can be used to advertise this feature to the userspace.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com
Squash with "arm64: Define Documentation/arm64/elf_at_flags.txt"
Documentation/arm64/elf_at_flags.txt | 133 +++++++++++++++++++++++++++ 1 file changed, 133 insertions(+) create mode 100644 Documentation/arm64/elf_at_flags.txt
diff --git a/Documentation/arm64/elf_at_flags.txt b/Documentation/arm64/elf_at_flags.txt new file mode 100644 index 000000000000..9b3494207c14 --- /dev/null +++ b/Documentation/arm64/elf_at_flags.txt @@ -0,0 +1,133 @@ +ARM64 ELF AT_FLAGS +==================
+This document describes the usage and semantics of AT_FLAGS on arm64.
+1. Introduction +---------------
+AT_FLAGS is part of the Auxiliary Vector, contains the flags and it +is set to zero by the kernel on arm64 unless one or more of the +features detailed in paragraph 2 are present.
+The auxiliary vector can be accessed by the userspace using the +getauxval() API provided by the C library. +getauxval() returns an unsigned long and when a flag is present in +the AT_FLAGS, the corresponding bit in the returned value is set to 1.
+The AT_FLAGS with a "defined semantics" on arm64 are exposed to the +userspace via user API (uapi/asm/atflags.h). +The AT_FLAGS bits with "undefined semantics" are set to zero by default. +This means that the AT_FLAGS bits to which this document does not assign +an explicit meaning are to be intended reserved for future use. +The kernel will populate all such bits with zero until meanings are +assigned to them. If and when meanings are assigned, it is guaranteed +that they will not impact the functional operation of existing userspace +software. Userspace software should ignore any AT_FLAGS bit whose meaning +is not defined when the software is written.
+The userspace software can test for features by acquiring the AT_FLAGS +entry of the auxiliary vector, and testing whether a relevant flag +is set.
+Example of a userspace test function:
+bool feature_x_is_present(void) +{
- unsigned long at_flags = getauxval(AT_FLAGS);
- if (at_flags & FEATURE_X)
return true;
- return false;
+}
+Where the software relies on a feature advertised by AT_FLAGS, it +must check that the feature is present before attempting to +use it.
+2. Features exposed via AT_FLAGS +--------------------------------
+bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
- On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
- kernel, hence the userspace (EL0) is allowed to set a non-zero value
- in the top byte but the resulting pointers are not allowed at the
- user-kernel syscall ABI boundary.
- When bit[0] is set to 1 the kernel is advertising to the userspace
- that a relaxed ABI is supported hence this type of pointers are now
- allowed to be passed to the syscalls, when these pointers are in
- memory ranges privately owned by a process and obtained by the
- process in accordance with the definition of "valid tagged pointer"
- in paragraph 3.
- In these cases the tag is preserved as the pointer goes through the
- kernel. Only when the kernel needs to check if a pointer is coming
- from userspace an untag operation is required.
I would leave this last sentence out, because: 1. It is an implementation detail that doesn't impact this user ABI. 2. It is not entirely accurate: untagging the pointer may be needed for various kinds of address lookup (like finding the corresponding VMA), at which point the kernel usually already knows it is a userspace pointer.
+3. ARM64_AT_FLAGS_SYSCALL_TBI +-----------------------------
+From the kernel syscall interface prospective, we define, for the purposes +of this document, a "valid tagged pointer" as a pointer that either it has +a zero value set in the top byte or it has a non-zero value, it is in memory +ranges privately owned by a userspace process and it is obtained in one of +the following ways:
- mmap() done by the process itself, where either:
- flags = MAP_PRIVATE | MAP_ANONYMOUS
- flags = MAP_PRIVATE and the file descriptor refers to a regular
file or "/dev/zero"
- a mapping below sbrk(0) done by the process itself
I don't think that's very clear, this doesn't say how the mapping is obtained. Maybe "a mapping obtained by the process using brk() or sbrk()"?
- any memory mapped by the kernel in the process's address space during
- creation and following the restrictions presented above (i.e. data, bss,
- stack).
With the rules above, the code section is included as well. Replacing "i.e." with "e.g." would avoid having to list every single section (which is probably not a good idea anyway).
Kevin
+When the ARM64_AT_FLAGS_SYSCALL_TBI flag is set by the kernel, the following +behaviours are guaranteed by the ABI:
- Every current or newly introduced syscall can accept any valid tagged
- pointers.
- If a non valid tagged pointer is passed to a syscall then the behaviour
- is undefined.
- Every valid tagged pointer is expected to work as an untagged one.
- The kernel preserves any valid tagged pointers and returns them to the
- userspace unchanged in all the cases except the ones documented in the
- "Preserving tags" paragraph of tagged-pointers.txt.
+A definition of the meaning of tagged pointers on arm64 can be found in: +Documentation/arm64/tagged-pointers.txt.
+Example of correct usage (pseudo-code) for a userspace application:
+bool arm64_syscall_tbi_is_present(void) +{
- unsigned long at_flags = getauxval(AT_FLAGS);
- if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
return true;
- return false;
+}
+void main(void) +{
- char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS, -1, 0);
- int fd = open("test.txt", O_WRONLY);
- /* Check if the relaxed ABI is supported */
- if (arm64_syscall_tbi_is_present()) {
/* Add a tag to the pointer */
addr = tag_pointer(addr);
- }
- strcpy("Hello World\n", addr);
- /* Write to a file */
- write(fd, addr, sizeof(addr));
- close(fd);
+}
On Fri, Mar 22, 2019 at 03:52:49PM +0000, Kevin Brodsky wrote:
On 18/03/2019 16:35, Vincenzo Frascino wrote:
+2. Features exposed via AT_FLAGS +--------------------------------
+bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
- On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
- kernel, hence the userspace (EL0) is allowed to set a non-zero value
- in the top byte but the resulting pointers are not allowed at the
- user-kernel syscall ABI boundary.
- When bit[0] is set to 1 the kernel is advertising to the userspace
- that a relaxed ABI is supported hence this type of pointers are now
- allowed to be passed to the syscalls, when these pointers are in
- memory ranges privately owned by a process and obtained by the
- process in accordance with the definition of "valid tagged pointer"
- in paragraph 3.
- In these cases the tag is preserved as the pointer goes through the
- kernel. Only when the kernel needs to check if a pointer is coming
- from userspace an untag operation is required.
I would leave this last sentence out, because:
- It is an implementation detail that doesn't impact this user ABI.
- It is not entirely accurate: untagging the pointer may be needed for
various kinds of address lookup (like finding the corresponding VMA), at which point the kernel usually already knows it is a userspace pointer.
I fully agree, the above paragraph should not be part of the user ABI document.
+3. ARM64_AT_FLAGS_SYSCALL_TBI +-----------------------------
+From the kernel syscall interface prospective, we define, for the purposes +of this document, a "valid tagged pointer" as a pointer that either it has +a zero value set in the top byte or it has a non-zero value, it is in memory +ranges privately owned by a userspace process and it is obtained in one of +the following ways:
- mmap() done by the process itself, where either:
- flags = MAP_PRIVATE | MAP_ANONYMOUS
- flags = MAP_PRIVATE and the file descriptor refers to a regular
file or "/dev/zero"
- a mapping below sbrk(0) done by the process itself
I don't think that's very clear, this doesn't say how the mapping is obtained. Maybe "a mapping obtained by the process using brk() or sbrk()"?
I think what we mean here is anything in the "[heap]" section as per /proc/*/maps (in the kernel this would be start_brk to brk).
- any memory mapped by the kernel in the process's address space during
- creation and following the restrictions presented above (i.e. data, bss,
- stack).
With the rules above, the code section is included as well. Replacing "i.e." with "e.g." would avoid having to list every single section (which is probably not a good idea anyway).
We could mention [stack] explicitly as that's documented in the Documentation/filesystems/proc.txt and it's likely considered ABI already.
The code section is MAP_PRIVATE, and can be done by the dynamic loader (user process), so it falls under the mmap() rules listed above. I guess we could simply drop "done by the process itself" here and allow MAP_PRIVATE|MAP_ANONYMOUS or MAP_PRIVATE of regular file. This would cover the [heap] and [stack] and we won't have to debate the brk() case at all.
We probably mention somewhere (or we should in the tagged pointers doc) that we don't support tagged PC.
On 03/04/2019 17:50, Catalin Marinas wrote:
On Fri, Mar 22, 2019 at 03:52:49PM +0000, Kevin Brodsky wrote:
On 18/03/2019 16:35, Vincenzo Frascino wrote:
+2. Features exposed via AT_FLAGS +--------------------------------
+bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
- On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
- kernel, hence the userspace (EL0) is allowed to set a non-zero value
- in the top byte but the resulting pointers are not allowed at the
- user-kernel syscall ABI boundary.
- When bit[0] is set to 1 the kernel is advertising to the userspace
- that a relaxed ABI is supported hence this type of pointers are now
- allowed to be passed to the syscalls, when these pointers are in
- memory ranges privately owned by a process and obtained by the
- process in accordance with the definition of "valid tagged pointer"
- in paragraph 3.
- In these cases the tag is preserved as the pointer goes through the
- kernel. Only when the kernel needs to check if a pointer is coming
- from userspace an untag operation is required.
I would leave this last sentence out, because:
- It is an implementation detail that doesn't impact this user ABI.
- It is not entirely accurate: untagging the pointer may be needed for
various kinds of address lookup (like finding the corresponding VMA), at which point the kernel usually already knows it is a userspace pointer.
I fully agree, the above paragraph should not be part of the user ABI document.
+3. ARM64_AT_FLAGS_SYSCALL_TBI +-----------------------------
+From the kernel syscall interface prospective, we define, for the purposes +of this document, a "valid tagged pointer" as a pointer that either it has +a zero value set in the top byte or it has a non-zero value, it is in memory +ranges privately owned by a userspace process and it is obtained in one of +the following ways:
- mmap() done by the process itself, where either:
- flags = MAP_PRIVATE | MAP_ANONYMOUS
- flags = MAP_PRIVATE and the file descriptor refers to a regular
file or "/dev/zero"
- a mapping below sbrk(0) done by the process itself
I don't think that's very clear, this doesn't say how the mapping is obtained. Maybe "a mapping obtained by the process using brk() or sbrk()"?
I think what we mean here is anything in the "[heap]" section as per /proc/*/maps (in the kernel this would be start_brk to brk).
- any memory mapped by the kernel in the process's address space during
- creation and following the restrictions presented above (i.e. data, bss,
- stack).
With the rules above, the code section is included as well. Replacing "i.e." with "e.g." would avoid having to list every single section (which is probably not a good idea anyway).
We could mention [stack] explicitly as that's documented in the Documentation/filesystems/proc.txt and it's likely considered ABI already.
The code section is MAP_PRIVATE, and can be done by the dynamic loader (user process), so it falls under the mmap() rules listed above. I guess we could simply drop "done by the process itself" here and allow MAP_PRIVATE|MAP_ANONYMOUS or MAP_PRIVATE of regular file. This would cover the [heap] and [stack] and we won't have to debate the brk() case at all.
That's probably the best option. I initially used this wording because I was worried that there could be cases where the kernel allocates "magic" memory for userspace that is MAP_PRIVATE|MAP_ANONYMOUS, but in fact it's probably not the case (presumably such mapping should always be done via install_special_mapping(), which is definitely not MAP_PRIVATE).
We probably mention somewhere (or we should in the tagged pointers doc) that we don't support tagged PC.
I think that Documentation/arm64/tagged-pointers.txt already makes it reasonably clear (anyway, with the architecture not supporting it, you can't expect much from the kernel).
Kevin
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
With the relaxed ABI proposed in this set, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap() or sbrk().
Relax the requirements described in tagged-pointers.txt to be compliant with the behaviours guaranteed by the ABI deriving from the introduction of the ARM64_AT_FLAGS_SYSCALL_TBI flag.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- Documentation/arm64/tagged-pointers.txt | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/Documentation/arm64/tagged-pointers.txt b/Documentation/arm64/tagged-pointers.txt index a25a99e82bb1..df27188b9433 100644 --- a/Documentation/arm64/tagged-pointers.txt +++ b/Documentation/arm64/tagged-pointers.txt @@ -18,7 +18,8 @@ Passing tagged addresses to the kernel --------------------------------------
All interpretation of userspace memory addresses by the kernel assumes -an address tag of 0x00. +an address tag of 0x00, unless the ARM64_AT_FLAGS_SYSCALL_TBI flag is +set by the kernel.
This includes, but is not limited to, addresses found in:
@@ -31,18 +32,23 @@ This includes, but is not limited to, addresses found in: - the frame pointer (x29) and frame records, e.g. when interpreting them to generate a backtrace or call graph.
-Using non-zero address tags in any of these locations may result in an -error code being returned, a (fatal) signal being raised, or other modes -of failure. +Using non-zero address tags in any of these locations when the +ARM64_AT_FLAGS_SYSCALL_TBI flag is not set by the kernel, may result in +an error code being returned, a (fatal) signal being raised, or other +modes of failure.
-For these reasons, passing non-zero address tags to the kernel via -system calls is forbidden, and using a non-zero address tag for sp is -strongly discouraged. +For these reasons, when the flag is not set, passing non-zero address +tags to the kernel via system calls is forbidden, and using a non-zero +address tag for sp is strongly discouraged.
Programs maintaining a frame pointer and frame records that use non-zero address tags may suffer impaired or inaccurate debug and profiling visibility.
+A definition of the meaning of ARM64_AT_FLAGS_SYSCALL_TBI and of the +guarantees that the ABI provides when the flag is set by the kernel can +be found in: Documentation/arm64/elf_at_flags.txt. +
Preserving tags --------------- @@ -57,6 +63,9 @@ be preserved. The architecture prevents the use of a tagged PC, so the upper byte will be set to a sign-extension of bit 55 on exception return.
+This behaviours are preserved even when the ARM64_AT_FLAGS_SYSCALL_TBI flag +is set by the kernel. +
Other considerations --------------------
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
Set ARM64_AT_FLAGS_SYSCALL_TBI (bit[0]) in the AT_FLAGS to advertise the relaxation of the ABI to the userspace.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com CC: Andrey Konovalov andreyknvl@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm64/include/asm/atflags.h | 7 +++++++ arch/arm64/include/asm/elf.h | 5 +++++ arch/arm64/include/uapi/asm/atflags.h | 8 ++++++++ 3 files changed, 20 insertions(+) create mode 100644 arch/arm64/include/asm/atflags.h create mode 100644 arch/arm64/include/uapi/asm/atflags.h
diff --git a/arch/arm64/include/asm/atflags.h b/arch/arm64/include/asm/atflags.h new file mode 100644 index 000000000000..b20093d61bf2 --- /dev/null +++ b/arch/arm64/include/asm/atflags.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_ATFLAGS_H +#define __ASM_ATFLAGS_H + +#include <uapi/asm/atflags.h> + +#endif diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 6adc1a90e7e6..73d5184a4dd9 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -16,6 +16,7 @@ #ifndef __ASM_ELF_H #define __ASM_ELF_H
+#include <asm/atflags.h> #include <asm/hwcap.h>
/* @@ -167,6 +168,10 @@ do { \ NEW_AUX_ENT(AT_IGNORE, 0); \ } while (0)
+/* Platform specific AT_FLAGS */ +#define ELF_AT_FLAGS ARM64_AT_FLAGS_SYSCALL_TBI +#define COMPAT_ELF_AT_FLAGS 0 + #define ARCH_HAS_SETUP_ADDITIONAL_PAGES struct linux_binprm; extern int arch_setup_additional_pages(struct linux_binprm *bprm, diff --git a/arch/arm64/include/uapi/asm/atflags.h b/arch/arm64/include/uapi/asm/atflags.h new file mode 100644 index 000000000000..1cf25692ffd6 --- /dev/null +++ b/arch/arm64/include/uapi/asm/atflags.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __UAPI_ASM_ATFLAGS_H +#define __UAPI_ASM_ATFLAGS_H + +/* Platform specific AT_FLAGS */ +#define ARM64_AT_FLAGS_SYSCALL_TBI (1 << 0) + +#endif
linux-kselftest-mirror@lists.linaro.org