=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer") 2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers") 3. 276e9327 ("arm64: entry: improve data abort handling of tagged pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be passed to syscalls when they point to memory ranges obtained by anonymous mmap() or sbrk() (see the patchset [3] for more details).
For non-memory syscalls this is done by untaging user pointers when the kernel performs pointer checking to find out whether the pointer comes from userspace (most notably in access_ok). The untagging is done only when the pointer is being checked, the tag is preserved as the pointer makes its way through the kernel and stays tagged when the kernel dereferences the pointer when perfoming user memory accesses.
Memory syscalls (mmap, mprotect, etc.) don't do user memory accesses but rather deal with memory ranges, and untagged pointers are better suited to describe memory ranges internally. Thus for memory syscalls we untag pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to completely strip the pointer tag as the pointer enters the kernel with some kind of a syscall wrapper, but that won't work with the countless number of different ioctl calls. With this approach we would need a custom wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues is to inspead allow tagged pointers to be passed to find_vma() (and other vma related functions) and untag them there. Unfortunately, a lot of find_vma() callers then compare or subtract the returned vma start and end fields against the pointer that was being searched. Thus this approach would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static analyzer based on Clang) to track casts of __user pointers to integer types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call find_vma() (and other similar functions) or directly compare against vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature support [4].
This patchset has been merged into the Pixel 2 & 3 kernel trees and is now being used to enable testing of Pixel phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e...
[3] https://lkml.org/lkml/2019/3/18/819
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture...
Changes in v14: - Moved untagging for most memory syscalls to an arm64 specific implementation, instead of doing that in the common code. - Dropped "net, arm64: untag user pointers in tcp_zerocopy_receive", since the provided user pointers don't come from an anonymous map and thus are not covered by this ABI relaxation. - Dropped "kernel, arm64: untag user pointers in prctl_set_mm*". - Moved untagging from __check_mem_type() to tee_shm_register(). - Updated untagging for the amdgpu and radeon drivers to cover the MMU notifier, as suggested by Felix. - Since this ABI relaxation doesn't actually allow tagged instruction pointers, dropped the following patches: - Dropped "tracing, arm64: untag user pointers in seq_print_user_ip". - Dropped "uprobes, arm64: untag user pointers in find_active_uprobe". - Dropped "bpf, arm64: untag user pointers in stack_map_get_build_id_offset". - Rebased onto 5.1-rc7 (37624b58).
Changes in v13: - Simplified untagging in tcp_zerocopy_receive(). - Looked at find_vma() callers in drivers/, which allowed to identify a few other places where untagging is needed. - Added patch "mm, arm64: untag user pointers in get_vaddr_frames". - Added patch "drm/amdgpu, arm64: untag user pointers in amdgpu_ttm_tt_get_user_pages". - Added patch "drm/radeon, arm64: untag user pointers in radeon_ttm_tt_pin_userptr". - Added patch "IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr". - Added patch "media/v4l2-core, arm64: untag user pointers in videobuf_dma_contig_user_get". - Added patch "tee/optee, arm64: untag user pointers in check_mem_type". - Added patch "vfio/type1, arm64: untag user pointers".
Changes in v12: - Changed untagging in tcp_zerocopy_receive() to also untag zc->address. - Fixed untagging in prctl_set_mm* to only untag pointers for vma lookups and validity checks, but leave them as is for actual user space accesses. - Updated the link to the v2 of the "arm64 relaxed ABI" patchset [3]. - Dropped the documentation patch, as the "arm64 relaxed ABI" patchset [3] handles that.
Changes in v11: - Added "uprobes, arm64: untag user pointers in find_active_uprobe" patch. - Added "bpf, arm64: untag user pointers in stack_map_get_build_id_offset" patch. - Fixed "tracing, arm64: untag user pointers in seq_print_user_ip" to correctly perform subtration with a tagged addr. - Moved untagged_addr() from SYSCALL_DEFINE3(mprotect) and SYSCALL_DEFINE4(pkey_mprotect) to do_mprotect_pkey(). - Moved untagged_addr() definition for other arches from include/linux/memory.h to include/linux/mm.h. - Changed untagging in strn*_user() to perform userspace accesses through tagged pointers. - Updated the documentation to mention that passing tagged pointers to memory syscalls is allowed. - Updated the test to use malloc'ed memory instead of stack memory.
Changes in v10: - Added "mm, arm64: untag user pointers passed to memory syscalls" back. - New patch "fs, arm64: untag user pointers in fs/userfaultfd.c". - New patch "net, arm64: untag user pointers in tcp_zerocopy_receive". - New patch "kernel, arm64: untag user pointers in prctl_set_mm*". - New patch "tracing, arm64: untag user pointers in seq_print_user_ip".
Changes in v9: - Rebased onto 4.20-rc6. - Used u64 instead of __u64 in type casts in the untagged_addr macro for arm64. - Added braces around (addr) in the untagged_addr macro for other arches.
Changes in v8: - Rebased onto 65102238 (4.20-rc1). - Added a note to the cover letter on why syscall wrappers/shims that untag user pointers won't work. - Added a note to the cover letter that this patchset has been merged into the Pixel 2 kernel tree. - Documentation fixes, in particular added a list of syscalls that don't support tagged user pointers.
Changes in v7: - Rebased onto 17b57b18 (4.19-rc6). - Dropped the "arm64: untag user address in __do_user_fault" patch, since the existing patches already handle user faults properly. - Dropped the "usb, arm64: untag user addresses in devio" patch, since the passed pointer must come from a vma and therefore be untagged. - Dropped the "arm64: annotate user pointers casts detected by sparse" patch (see the discussion to the replies of the v6 of this patchset). - Added more context to the cover letter. - Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6: - Added annotations for user pointer casts found by sparse. - Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5: - Added 3 new patches that add untagging to places found with static analysis. - Rebased onto 44c929e1 (4.18-rc8).
Changes in v4: - Added a selftest for checking that passing tagged pointers to the kernel succeeds. - Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3: - Rebased onto e5c51f30 (4.17-rc6+). - Added linux-arch@ to the list of recipients.
Changes in v2: - Rebased onto 2d618bdf (4.17-rc3+). - Removed excessive untagging in gup.c. - Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1: - Rebased onto 4.17-rc1.
Changes in RFC v2: - Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of defining it for each arch individually. - Updated Documentation/arm64/tagged-pointers.txt. - Dropped "mm, arm64: untag user addresses in memory syscalls". - Rebased onto 3eb2ce82 (4.16-rc7).
Signed-off-by: Andrey Konovalov andreyknvl@google.com
Andrey Konovalov (17): uaccess: add untagged_addr definition for other arches arm64: untag user pointers in access_ok and __uaccess_mask_ptr lib, arm64: untag user pointers in strn*_user mm: add ksys_ wrappers to memory syscalls arms64: untag user pointers passed to memory syscalls mm: untag user pointers in do_pages_move mm, arm64: untag user pointers in mm/gup.c mm, arm64: untag user pointers in get_vaddr_frames fs, arm64: untag user pointers in copy_mount_options fs, arm64: untag user pointers in fs/userfaultfd.c drm/amdgpu, arm64: untag user pointers drm/radeon, arm64: untag user pointers IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr media/v4l2-core, arm64: untag user pointers in videobuf_dma_contig_user_get tee, arm64: untag user pointers in tee_shm_register vfio/type1, arm64: untag user pointers in vaddr_get_pfn selftests, arm64: add a selftest for passing tagged pointers to kernel
arch/arm64/include/asm/uaccess.h | 10 +- arch/arm64/kernel/sys.c | 128 ++++++++++++++++- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/radeon/radeon_gem.c | 2 + drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/infiniband/hw/mlx4/mr.c | 7 +- drivers/media/v4l2-core/videobuf-dma-contig.c | 9 +- drivers/tee/tee_shm.c | 1 + drivers/vfio/vfio_iommu_type1.c | 2 + fs/namespace.c | 2 +- fs/userfaultfd.c | 5 + include/linux/mm.h | 4 + include/linux/syscalls.h | 22 +++ ipc/shm.c | 7 +- lib/strncpy_from_user.c | 3 +- lib/strnlen_user.c | 3 +- mm/frame_vector.c | 2 + mm/gup.c | 4 + mm/madvise.c | 129 +++++++++--------- mm/mempolicy.c | 21 ++- mm/migrate.c | 1 + mm/mincore.c | 57 ++++---- mm/mlock.c | 20 ++- mm/mmap.c | 30 +++- mm/mprotect.c | 6 +- mm/mremap.c | 27 ++-- mm/msync.c | 35 +++-- tools/testing/selftests/arm64/.gitignore | 1 + tools/testing/selftests/arm64/Makefile | 11 ++ .../testing/selftests/arm64/run_tags_test.sh | 12 ++ tools/testing/selftests/arm64/tags_test.c | 21 +++ 33 files changed, 431 insertions(+), 159 deletions(-) create mode 100644 tools/testing/selftests/arm64/.gitignore create mode 100644 tools/testing/selftests/arm64/Makefile create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh create mode 100644 tools/testing/selftests/arm64/tags_test.c
To allow arm64 syscalls to accept tagged pointers from userspace, we must untag them when they are passed to the kernel. Since untagging is done in generic parts of the kernel, the untagged_addr macro needs to be defined for all architectures.
Define it as a noop for architectures other than arm64.
Acked-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- include/linux/mm.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 6b10c21630f5..44041df804a6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -99,6 +99,10 @@ extern int mmap_rnd_compat_bits __read_mostly; #include <asm/pgtable.h> #include <asm/processor.h>
+#ifndef untagged_addr +#define untagged_addr(addr) (addr) +#endif + #ifndef __pa_symbol #define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)(x), 0)) #endif
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
copy_from_user (and a few other similar functions) are used to copy data from user memory into the kernel memory or vice versa. Since a user can provided a tagged pointer to one of the syscalls that use copy_from_user, we need to correctly handle such pointers.
Do this by untagging user pointers in access_ok and in __uaccess_mask_ptr, before performing access validity checks.
Note, that this patch only temporarily untags the pointers to perform the checks, but then passes them as is into the kernel internals.
Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- arch/arm64/include/asm/uaccess.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index e5d5f31c6d36..9164ecb5feca 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -94,7 +94,7 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si return ret; }
-#define access_ok(addr, size) __range_ok(addr, size) +#define access_ok(addr, size) __range_ok(untagged_addr(addr), size) #define user_addr_max get_fs
#define _ASM_EXTABLE(from, to) \ @@ -226,7 +226,8 @@ static inline void uaccess_enable_not_uao(void)
/* * Sanitise a uaccess pointer such that it becomes NULL if above the - * current addr_limit. + * current addr_limit. In case the pointer is tagged (has the top byte set), + * untag the pointer before checking. */ #define uaccess_mask_ptr(ptr) (__typeof__(ptr))__uaccess_mask_ptr(ptr) static inline void __user *__uaccess_mask_ptr(const void __user *ptr) @@ -234,10 +235,11 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr) void __user *safe_ptr;
asm volatile( - " bics xzr, %1, %2\n" + " bics xzr, %3, %2\n" " csel %0, %1, xzr, eq\n" : "=&r" (safe_ptr) - : "r" (ptr), "r" (current_thread_info()->addr_limit) + : "r" (ptr), "r" (current_thread_info()->addr_limit), + "r" (untagged_addr(ptr)) : "cc");
csdb();
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
strncpy_from_user and strnlen_user accept user addresses as arguments, and do not go through the same path as copy_from_user and others, so here we need to handle the case of tagged user addresses separately.
Untag user pointers passed to these functions.
Note, that this patch only temporarily untags the pointers to perform validity checks, but then uses them as is to perform user memory accesses.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- lib/strncpy_from_user.c | 3 ++- lib/strnlen_user.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c index 58eacd41526c..6209bb9507c7 100644 --- a/lib/strncpy_from_user.c +++ b/lib/strncpy_from_user.c @@ -6,6 +6,7 @@ #include <linux/uaccess.h> #include <linux/kernel.h> #include <linux/errno.h> +#include <linux/mm.h>
#include <asm/byteorder.h> #include <asm/word-at-a-time.h> @@ -107,7 +108,7 @@ long strncpy_from_user(char *dst, const char __user *src, long count) return 0;
max_addr = user_addr_max(); - src_addr = (unsigned long)src; + src_addr = (unsigned long)untagged_addr(src); if (likely(src_addr < max_addr)) { unsigned long max = max_addr - src_addr; long retval; diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c index 1c1a1b0e38a5..8ca3d2ac32ec 100644 --- a/lib/strnlen_user.c +++ b/lib/strnlen_user.c @@ -2,6 +2,7 @@ #include <linux/kernel.h> #include <linux/export.h> #include <linux/uaccess.h> +#include <linux/mm.h>
#include <asm/word-at-a-time.h>
@@ -109,7 +110,7 @@ long strnlen_user(const char __user *str, long count) return 0;
max_addr = user_addr_max(); - src_addr = (unsigned long)str; + src_addr = (unsigned long)untagged_addr(str); if (likely(src_addr < max_addr)) { unsigned long max = max_addr - src_addr; long retval;
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
This patch adds ksys_ wrappers to the following memory syscalls:
brk, get_mempolicy (renamed kernel_get_mempolicy -> ksys_get_mempolicy), madvise, mbind (renamed kernel_mbind -> ksys_mbind), mincore, mlock (renamed do_mlock -> ksys_mlock), mlock2, mmap_pgoff, mprotect (renamed do_mprotect_pkey -> ksys_mprotect_pkey), mremap, msync, munlock, munmap, remap_file_pages, shmat, shmdt.
The next patch in this series will add a custom implementation for these syscalls that makes them accept tagged pointers on arm64.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- include/linux/syscalls.h | 22 +++++++ ipc/shm.c | 7 ++- mm/madvise.c | 129 ++++++++++++++++++++------------------- mm/mempolicy.c | 21 +++---- mm/mincore.c | 57 +++++++++-------- mm/mlock.c | 20 ++++-- mm/mmap.c | 30 ++++++--- mm/mprotect.c | 6 +- mm/mremap.c | 27 +++++--- mm/msync.c | 35 ++++++----- 10 files changed, 213 insertions(+), 141 deletions(-)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index e446806a561f..70008f5ed84f 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1260,6 +1260,28 @@ int ksys_ipc(unsigned int call, int first, unsigned long second, unsigned long third, void __user * ptr, long fifth); int compat_ksys_ipc(u32 call, int first, int second, u32 third, u32 ptr, u32 fifth); +unsigned long ksys_mremap(unsigned long addr, unsigned long old_len, + unsigned long new_len, unsigned long flags, + unsigned long new_addr); +int ksys_munmap(unsigned long addr, size_t len); +unsigned long ksys_brk(unsigned long brk); +int ksys_get_mempolicy(int __user *policy, unsigned long __user *nmask, + unsigned long maxnode, unsigned long addr, unsigned long flags); +int ksys_madvise(unsigned long start, size_t len_in, int behavior); +long ksys_mbind(unsigned long start, unsigned long len, + unsigned long mode, const unsigned long __user *nmask, + unsigned long maxnode, unsigned int flags); +__must_check int ksys_mlock(unsigned long start, size_t len, vm_flags_t flags); +__must_check int ksys_mlock2(unsigned long start, size_t len, vm_flags_t flags); +int ksys_munlock(unsigned long start, size_t len); +int ksys_mprotect_pkey(unsigned long start, size_t len, + unsigned long prot, int pkey); +int ksys_msync(unsigned long start, size_t len, int flags); +long ksys_mincore(unsigned long start, size_t len, unsigned char __user *vec); +unsigned long ksys_remap_file_pages(unsigned long start, unsigned long size, + unsigned long prot, unsigned long pgoff, unsigned long flags); +long ksys_shmat(int shmid, char __user *shmaddr, int shmflg); +long ksys_shmdt(char __user *shmaddr);
/* * The following kernel syscall equivalents are just wrappers to fs-internal diff --git a/ipc/shm.c b/ipc/shm.c index ce1ca9f7c6e9..557b43968c0e 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1588,7 +1588,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, return err; }
-SYSCALL_DEFINE3(shmat, int, shmid, char __user *, shmaddr, int, shmflg) +long ksys_shmat(int shmid, char __user *shmaddr, int shmflg) { unsigned long ret; long err; @@ -1600,6 +1600,11 @@ SYSCALL_DEFINE3(shmat, int, shmid, char __user *, shmaddr, int, shmflg) return (long)ret; }
+SYSCALL_DEFINE3(shmat, int, shmid, char __user *, shmaddr, int, shmflg) +{ + return ksys_shmat(shmid, shmaddr, shmflg); +} + #ifdef CONFIG_COMPAT
#ifndef COMPAT_SHMLBA diff --git a/mm/madvise.c b/mm/madvise.c index 21a7881a2db4..c27f5f14e2ee 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -738,68 +738,7 @@ madvise_behavior_valid(int behavior) } }
-/* - * The madvise(2) system call. - * - * Applications can use madvise() to advise the kernel how it should - * handle paging I/O in this VM area. The idea is to help the kernel - * use appropriate read-ahead and caching techniques. The information - * provided is advisory only, and can be safely disregarded by the - * kernel without affecting the correct operation of the application. - * - * behavior values: - * MADV_NORMAL - the default behavior is to read clusters. This - * results in some read-ahead and read-behind. - * MADV_RANDOM - the system should read the minimum amount of data - * on any access, since it is unlikely that the appli- - * cation will need more than what it asks for. - * MADV_SEQUENTIAL - pages in the given range will probably be accessed - * once, so they can be aggressively read ahead, and - * can be freed soon after they are accessed. - * MADV_WILLNEED - the application is notifying the system to read - * some pages ahead. - * MADV_DONTNEED - the application is finished with the given range, - * so the kernel can free resources associated with it. - * MADV_FREE - the application marks pages in the given range as lazy free, - * where actual purges are postponed until memory pressure happens. - * MADV_REMOVE - the application wants to free up the given range of - * pages and associated backing store. - * MADV_DONTFORK - omit this area from child's address space when forking: - * typically, to avoid COWing pages pinned by get_user_pages(). - * MADV_DOFORK - cancel MADV_DONTFORK: no longer omit this area when forking. - * MADV_WIPEONFORK - present the child process with zero-filled memory in this - * range after a fork. - * MADV_KEEPONFORK - undo the effect of MADV_WIPEONFORK - * MADV_HWPOISON - trigger memory error handler as if the given memory range - * were corrupted by unrecoverable hardware memory failure. - * MADV_SOFT_OFFLINE - try to soft-offline the given range of memory. - * MADV_MERGEABLE - the application recommends that KSM try to merge pages in - * this area with pages of identical content from other such areas. - * MADV_UNMERGEABLE- cancel MADV_MERGEABLE: no longer merge pages with others. - * MADV_HUGEPAGE - the application wants to back the given range by transparent - * huge pages in the future. Existing pages might be coalesced and - * new pages might be allocated as THP. - * MADV_NOHUGEPAGE - mark the given range as not worth being backed by - * transparent huge pages so the existing pages will not be - * coalesced into THP and new pages will not be allocated as THP. - * MADV_DONTDUMP - the application wants to prevent pages in the given range - * from being included in its core dump. - * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. - * - * return values: - * zero - success - * -EINVAL - start + len < 0, start is not page-aligned, - * "behavior" is not a valid value, or application - * is attempting to release locked or shared pages, - * or the specified address range includes file, Huge TLB, - * MAP_SHARED or VMPFNMAP range. - * -ENOMEM - addresses in the specified range are not currently - * mapped, or are outside the AS of the process. - * -EIO - an I/O error occurred while paging in data. - * -EBADF - map exists, but area maps something that isn't a file. - * -EAGAIN - a kernel resource was temporarily unavailable. - */ -SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) +int ksys_madvise(unsigned long start, size_t len_in, int behavior) { unsigned long end, tmp; struct vm_area_struct *vma, *prev; @@ -894,3 +833,69 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
return error; } + +/* + * The madvise(2) system call. + * + * Applications can use madvise() to advise the kernel how it should + * handle paging I/O in this VM area. The idea is to help the kernel + * use appropriate read-ahead and caching techniques. The information + * provided is advisory only, and can be safely disregarded by the + * kernel without affecting the correct operation of the application. + * + * behavior values: + * MADV_NORMAL - the default behavior is to read clusters. This + * results in some read-ahead and read-behind. + * MADV_RANDOM - the system should read the minimum amount of data + * on any access, since it is unlikely that the appli- + * cation will need more than what it asks for. + * MADV_SEQUENTIAL - pages in the given range will probably be accessed + * once, so they can be aggressively read ahead, and + * can be freed soon after they are accessed. + * MADV_WILLNEED - the application is notifying the system to read + * some pages ahead. + * MADV_DONTNEED - the application is finished with the given range, + * so the kernel can free resources associated with it. + * MADV_FREE - the application marks pages in the given range as lazy free, + * where actual purges are postponed until memory pressure happens. + * MADV_REMOVE - the application wants to free up the given range of + * pages and associated backing store. + * MADV_DONTFORK - omit this area from child's address space when forking: + * typically, to avoid COWing pages pinned by get_user_pages(). + * MADV_DOFORK - cancel MADV_DONTFORK: no longer omit this area when forking. + * MADV_WIPEONFORK - present the child process with zero-filled memory in this + * range after a fork. + * MADV_KEEPONFORK - undo the effect of MADV_WIPEONFORK + * MADV_HWPOISON - trigger memory error handler as if the given memory range + * were corrupted by unrecoverable hardware memory failure. + * MADV_SOFT_OFFLINE - try to soft-offline the given range of memory. + * MADV_MERGEABLE - the application recommends that KSM try to merge pages in + * this area with pages of identical content from other such areas. + * MADV_UNMERGEABLE- cancel MADV_MERGEABLE: no longer merge pages with others. + * MADV_HUGEPAGE - the application wants to back the given range by transparent + * huge pages in the future. Existing pages might be coalesced and + * new pages might be allocated as THP. + * MADV_NOHUGEPAGE - mark the given range as not worth being backed by + * transparent huge pages so the existing pages will not be + * coalesced into THP and new pages will not be allocated as THP. + * MADV_DONTDUMP - the application wants to prevent pages in the given range + * from being included in its core dump. + * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. + * + * return values: + * zero - success + * -EINVAL - start + len < 0, start is not page-aligned, + * "behavior" is not a valid value, or application + * is attempting to release locked or shared pages, + * or the specified address range includes file, Huge TLB, + * MAP_SHARED or VMPFNMAP range. + * -ENOMEM - addresses in the specified range are not currently + * mapped, or are outside the AS of the process. + * -EIO - an I/O error occurred while paging in data. + * -EBADF - map exists, but area maps something that isn't a file. + * -EAGAIN - a kernel resource was temporarily unavailable. + */ +SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) +{ + return ksys_madvise(start, len_in, behavior); +} diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 2219e747df49..c2f82a045ceb 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1352,9 +1352,9 @@ static int copy_nodes_to_user(unsigned long __user *mask, unsigned long maxnode, return copy_to_user(mask, nodes_addr(*nodes), copy) ? -EFAULT : 0; }
-static long kernel_mbind(unsigned long start, unsigned long len, - unsigned long mode, const unsigned long __user *nmask, - unsigned long maxnode, unsigned int flags) +long ksys_mbind(unsigned long start, unsigned long len, + unsigned long mode, const unsigned long __user *nmask, + unsigned long maxnode, unsigned int flags) { nodemask_t nodes; int err; @@ -1377,7 +1377,7 @@ SYSCALL_DEFINE6(mbind, unsigned long, start, unsigned long, len, unsigned long, mode, const unsigned long __user *, nmask, unsigned long, maxnode, unsigned int, flags) { - return kernel_mbind(start, len, mode, nmask, maxnode, flags); + return ksys_mbind(start, len, mode, nmask, maxnode, flags); }
/* Set the process memory policy */ @@ -1507,11 +1507,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
/* Retrieve NUMA policy */ -static int kernel_get_mempolicy(int __user *policy, - unsigned long __user *nmask, - unsigned long maxnode, - unsigned long addr, - unsigned long flags) +int ksys_get_mempolicy(int __user *policy, unsigned long __user *nmask, + unsigned long maxnode, unsigned long addr, unsigned long flags) { int err; int uninitialized_var(pval); @@ -1538,7 +1535,7 @@ SYSCALL_DEFINE5(get_mempolicy, int __user *, policy, unsigned long __user *, nmask, unsigned long, maxnode, unsigned long, addr, unsigned long, flags) { - return kernel_get_mempolicy(policy, nmask, maxnode, addr, flags); + return ksys_get_mempolicy(policy, nmask, maxnode, addr, flags); }
#ifdef CONFIG_COMPAT @@ -1559,7 +1556,7 @@ COMPAT_SYSCALL_DEFINE5(get_mempolicy, int __user *, policy, if (nmask) nm = compat_alloc_user_space(alloc_size);
- err = kernel_get_mempolicy(policy, nm, nr_bits+1, addr, flags); + err = ksys_get_mempolicy(policy, nm, nr_bits+1, addr, flags);
if (!err && nmask) { unsigned long copy_size; @@ -1613,7 +1610,7 @@ COMPAT_SYSCALL_DEFINE6(mbind, compat_ulong_t, start, compat_ulong_t, len, return -EFAULT; }
- return kernel_mbind(start, len, mode, nm, nr_bits+1, flags); + return ksys_mbind(start, len, mode, nm, nr_bits+1, flags); }
COMPAT_SYSCALL_DEFINE4(migrate_pages, compat_pid_t, pid, diff --git a/mm/mincore.c b/mm/mincore.c index 218099b5ed31..a609bd8128da 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -197,32 +197,7 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v return (end - addr) >> PAGE_SHIFT; }
-/* - * The mincore(2) system call. - * - * mincore() returns the memory residency status of the pages in the - * current process's address space specified by [addr, addr + len). - * The status is returned in a vector of bytes. The least significant - * bit of each byte is 1 if the referenced page is in memory, otherwise - * it is zero. - * - * Because the status of a page can change after mincore() checks it - * but before it returns to the application, the returned vector may - * contain stale information. Only locked pages are guaranteed to - * remain in memory. - * - * return values: - * zero - success - * -EFAULT - vec points to an illegal address - * -EINVAL - addr is not a multiple of PAGE_SIZE - * -ENOMEM - Addresses in the range [addr, addr + len] are - * invalid for the address space of this process, or - * specify one or more pages which are not currently - * mapped - * -EAGAIN - A kernel resource was temporarily unavailable. - */ -SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, - unsigned char __user *, vec) +long ksys_mincore(unsigned long start, size_t len, unsigned char __user *vec) { long retval; unsigned long pages; @@ -271,3 +246,33 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, free_page((unsigned long) tmp); return retval; } + +/* + * The mincore(2) system call. + * + * mincore() returns the memory residency status of the pages in the + * current process's address space specified by [addr, addr + len). + * The status is returned in a vector of bytes. The least significant + * bit of each byte is 1 if the referenced page is in memory, otherwise + * it is zero. + * + * Because the status of a page can change after mincore() checks it + * but before it returns to the application, the returned vector may + * contain stale information. Only locked pages are guaranteed to + * remain in memory. + * + * return values: + * zero - success + * -EFAULT - vec points to an illegal address + * -EINVAL - addr is not a multiple of PAGE_SIZE + * -ENOMEM - Addresses in the range [addr, addr + len] are + * invalid for the address space of this process, or + * specify one or more pages which are not currently + * mapped + * -EAGAIN - A kernel resource was temporarily unavailable. + */ +SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, + unsigned char __user *, vec) +{ + return ksys_mincore(start, len, vec); +} diff --git a/mm/mlock.c b/mm/mlock.c index 080f3b36415b..09e449447539 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -668,7 +668,7 @@ static int count_mm_mlocked_page_nr(struct mm_struct *mm, return count >> PAGE_SHIFT; }
-static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags) +__must_check int ksys_mlock(unsigned long start, size_t len, vm_flags_t flags) { unsigned long locked; unsigned long lock_limit; @@ -715,10 +715,10 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len) { - return do_mlock(start, len, VM_LOCKED); + return ksys_mlock(start, len, VM_LOCKED); }
-SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) +__must_check int ksys_mlock2(unsigned long start, size_t len, vm_flags_t flags) { vm_flags_t vm_flags = VM_LOCKED;
@@ -728,10 +728,15 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) if (flags & MLOCK_ONFAULT) vm_flags |= VM_LOCKONFAULT;
- return do_mlock(start, len, vm_flags); + return ksys_mlock(start, len, vm_flags); }
-SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) +SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) +{ + return ksys_mlock2(start, len, flags); +} + +int ksys_munlock(unsigned long start, size_t len) { int ret;
@@ -746,6 +751,11 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) return ret; }
+SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) +{ + return ksys_munlock(start, len); +} + /* * Take the MCL_* flags passed into mlockall (or 0 if called from munlockall) * and translate into the appropriate modifications to mm->def_flags and/or the diff --git a/mm/mmap.c b/mm/mmap.c index bd7b9f293b39..09bfaf36b961 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -189,7 +189,8 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags, struct list_head *uf); -SYSCALL_DEFINE1(brk, unsigned long, brk) + +unsigned long ksys_brk(unsigned long brk) { unsigned long retval; unsigned long newbrk, oldbrk, origbrk; @@ -288,6 +289,11 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) return retval; }
+SYSCALL_DEFINE1(brk, unsigned long, brk) +{ + return ksys_brk(brk); +} + static long vma_compute_subtree_gap(struct vm_area_struct *vma) { unsigned long max, prev_end, subtree_gap; @@ -2870,18 +2876,19 @@ int vm_munmap(unsigned long start, size_t len) } EXPORT_SYMBOL(vm_munmap);
-SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len) +int ksys_munmap(unsigned long addr, size_t len) { profile_munmap(addr); return __vm_munmap(addr, len, true); }
+SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len) +{ + return ksys_munmap(addr, len); +}
-/* - * Emulation of deprecated remap_file_pages() syscall. - */ -SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, - unsigned long, prot, unsigned long, pgoff, unsigned long, flags) +unsigned long ksys_remap_file_pages(unsigned long start, unsigned long size, + unsigned long prot, unsigned long pgoff, unsigned long flags) {
struct mm_struct *mm = current->mm; @@ -2976,6 +2983,15 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, return ret; }
+/* + * Emulation of deprecated remap_file_pages() syscall. + */ +SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, + unsigned long, prot, unsigned long, pgoff, unsigned long, flags) +{ + return ksys_remap_file_pages(start, size, prot, pgoff, flags); +} + /* * this is really a simplified "do_mmap". it only handles * anonymous maps. eventually we may be able to do some diff --git a/mm/mprotect.c b/mm/mprotect.c index 028c724dcb1a..07344bdd7a04 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -454,7 +454,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, /* * pkey==-1 when doing a legacy mprotect() */ -static int do_mprotect_pkey(unsigned long start, size_t len, +int ksys_mprotect_pkey(unsigned long start, size_t len, unsigned long prot, int pkey) { unsigned long nstart, end, tmp, reqprot; @@ -578,7 +578,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len, SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, unsigned long, prot) { - return do_mprotect_pkey(start, len, prot, -1); + return ksys_mprotect_pkey(start, len, prot, -1); }
#ifdef CONFIG_ARCH_HAS_PKEYS @@ -586,7 +586,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len, unsigned long, prot, int, pkey) { - return do_mprotect_pkey(start, len, prot, pkey); + return ksys_mprotect_pkey(start, len, prot, pkey); }
SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val) diff --git a/mm/mremap.c b/mm/mremap.c index e3edef6b7a12..fec1f9911388 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -584,16 +584,9 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) return 1; }
-/* - * Expand (or shrink) an existing mapping, potentially moving it at the - * same time (controlled by the MREMAP_MAYMOVE flag and available VM space) - * - * MREMAP_FIXED option added 5-Dec-1999 by Benjamin LaHaise - * This option implies MREMAP_MAYMOVE. - */ -SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, - unsigned long, new_len, unsigned long, flags, - unsigned long, new_addr) +unsigned long ksys_mremap(unsigned long addr, unsigned long old_len, + unsigned long new_len, unsigned long flags, + unsigned long new_addr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; @@ -726,3 +719,17 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, userfaultfd_unmap_complete(mm, &uf_unmap); return ret; } + +/* + * Expand (or shrink) an existing mapping, potentially moving it at the + * same time (controlled by the MREMAP_MAYMOVE flag and available VM space) + * + * MREMAP_FIXED option added 5-Dec-1999 by Benjamin LaHaise + * This option implies MREMAP_MAYMOVE. + */ +SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, + unsigned long, new_len, unsigned long, flags, + unsigned long, new_addr) +{ + return ksys_mremap(addr, old_len, new_len, flags, new_addr); +} diff --git a/mm/msync.c b/mm/msync.c index ef30a429623a..b5a013549626 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -15,21 +15,7 @@ #include <linux/syscalls.h> #include <linux/sched.h>
-/* - * MS_SYNC syncs the entire file - including mappings. - * - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). - * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). - * Now it doesn't do anything, since dirty pages are properly tracked. - * - * The application may now run fsync() to - * write out the dirty pages and wait on the writeout and check the result. - * Or the application may run fadvise(FADV_DONTNEED) against the fd to start - * async writeout immediately. - * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to - * applications. - */ -SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) +int ksys_msync(unsigned long start, size_t len, int flags) { unsigned long end; struct mm_struct *mm = current->mm; @@ -106,3 +92,22 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) out: return error ? : unmapped_error; } + +/* + * MS_SYNC syncs the entire file - including mappings. + * + * MS_ASYNC does not start I/O (it used to, up to 2.5.67). + * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). + * Now it doesn't do anything, since dirty pages are properly tracked. + * + * The application may now run fsync() to + * write out the dirty pages and wait on the writeout and check the result. + * Or the application may run fadvise(FADV_DONTNEED) against the fd to start + * async writeout immediately. + * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to + * applications. + */ +SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) +{ + return ksys_msync(start, len, flags); +}
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
This patch allows tagged pointers to be passed to the following memory syscalls: brk, get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mmap, mmap_pgoff, mprotect, mremap, msync, munlock, munmap, remap_file_pages, shmat and shmdt.
This is done by untagging pointers passed to these syscalls in the prologues of their handlers.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- arch/arm64/kernel/sys.c | 128 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 127 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c index b44065fb1616..933bb9f3d6ec 100644 --- a/arch/arm64/kernel/sys.c +++ b/arch/arm64/kernel/sys.c @@ -35,10 +35,33 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len, { if (offset_in_page(off) != 0) return -EINVAL; - + addr = untagged_addr(addr); return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT); }
+SYSCALL_DEFINE6(arm64_mmap_pgoff, unsigned long, addr, unsigned long, len, + unsigned long, prot, unsigned long, flags, + unsigned long, fd, unsigned long, pgoff) +{ + addr = untagged_addr(addr); + return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff); +} + +SYSCALL_DEFINE5(arm64_mremap, unsigned long, addr, unsigned long, old_len, + unsigned long, new_len, unsigned long, flags, + unsigned long, new_addr) +{ + addr = untagged_addr(addr); + new_addr = untagged_addr(new_addr); + return ksys_mremap(addr, old_len, new_len, flags, new_addr); +} + +SYSCALL_DEFINE2(arm64_munmap, unsigned long, addr, size_t, len) +{ + addr = untagged_addr(addr); + return ksys_munmap(addr, len); +} + SYSCALL_DEFINE1(arm64_personality, unsigned int, personality) { if (personality(personality) == PER_LINUX32 && @@ -47,10 +70,113 @@ SYSCALL_DEFINE1(arm64_personality, unsigned int, personality) return ksys_personality(personality); }
+SYSCALL_DEFINE1(arm64_brk, unsigned long, brk) +{ + brk = untagged_addr(brk); + return ksys_brk(brk); +} + +SYSCALL_DEFINE5(arm64_get_mempolicy, int __user *, policy, + unsigned long __user *, nmask, unsigned long, maxnode, + unsigned long, addr, unsigned long, flags) +{ + addr = untagged_addr(addr); + return ksys_get_mempolicy(policy, nmask, maxnode, addr, flags); +} + +SYSCALL_DEFINE3(arm64_madvise, unsigned long, start, + size_t, len_in, int, behavior) +{ + start = untagged_addr(start); + return ksys_madvise(start, len_in, behavior); +} + +SYSCALL_DEFINE6(arm64_mbind, unsigned long, start, unsigned long, len, + unsigned long, mode, const unsigned long __user *, nmask, + unsigned long, maxnode, unsigned int, flags) +{ + start = untagged_addr(start); + return ksys_mbind(start, len, mode, nmask, maxnode, flags); +} + +SYSCALL_DEFINE2(arm64_mlock, unsigned long, start, size_t, len) +{ + start = untagged_addr(start); + return ksys_mlock(start, len, VM_LOCKED); +} + +SYSCALL_DEFINE2(arm64_mlock2, unsigned long, start, size_t, len) +{ + start = untagged_addr(start); + return ksys_mlock(start, len, VM_LOCKED); +} + +SYSCALL_DEFINE2(arm64_munlock, unsigned long, start, size_t, len) +{ + start = untagged_addr(start); + return ksys_munlock(start, len); +} + +SYSCALL_DEFINE3(arm64_mprotect, unsigned long, start, size_t, len, + unsigned long, prot) +{ + start = untagged_addr(start); + return ksys_mprotect_pkey(start, len, prot, -1); +} + +SYSCALL_DEFINE3(arm64_msync, unsigned long, start, size_t, len, int, flags) +{ + start = untagged_addr(start); + return ksys_msync(start, len, flags); +} + +SYSCALL_DEFINE3(arm64_mincore, unsigned long, start, size_t, len, + unsigned char __user *, vec) +{ + start = untagged_addr(start); + return ksys_mincore(start, len, vec); +} + +SYSCALL_DEFINE5(arm64_remap_file_pages, unsigned long, start, + unsigned long, size, unsigned long, prot, + unsigned long, pgoff, unsigned long, flags) +{ + start = untagged_addr(start); + return ksys_remap_file_pages(start, size, prot, pgoff, flags); +} + +SYSCALL_DEFINE3(arm64_shmat, int, shmid, char __user *, shmaddr, int, shmflg) +{ + shmaddr = untagged_addr(shmaddr); + return ksys_shmat(shmid, shmaddr, shmflg); +} + +SYSCALL_DEFINE1(arm64_shmdt, char __user *, shmaddr) +{ + shmaddr = untagged_addr(shmaddr); + return ksys_shmdt(shmaddr); +} + /* * Wrappers to pass the pt_regs argument. */ #define sys_personality sys_arm64_personality +#define sys_mmap_pgoff sys_arm64_mmap_pgoff +#define sys_mremap sys_arm64_mremap +#define sys_munmap sys_arm64_munmap +#define sys_brk sys_arm64_brk +#define sys_get_mempolicy sys_arm64_get_mempolicy +#define sys_madvise sys_arm64_madvise +#define sys_mbind sys_arm64_mbind +#define sys_mlock sys_arm64_mlock +#define sys_mlock2 sys_arm64_mlock2 +#define sys_munlock sys_arm64_munlock +#define sys_mprotect sys_arm64_mprotect +#define sys_msync sys_arm64_msync +#define sys_mincore sys_arm64_mincore +#define sys_remap_file_pages sys_arm64_remap_file_pages +#define sys_shmat sys_arm64_shmat +#define sys_shmdt sys_arm64_shmdt
asmlinkage long sys_ni_syscall(const struct pt_regs *); #define __arm64_sys_ni_syscall sys_ni_syscall
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
do_pages_move() is used in the implementation of the move_pages syscall.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- mm/migrate.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/mm/migrate.c b/mm/migrate.c index 663a5449367a..c014a07135f0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1617,6 +1617,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, if (get_user(node, nodes + i)) goto out_flush; addr = (unsigned long)p; + addr = untagged_addr(addr);
err = -ENODEV; if (node < 0 || node >= MAX_NUMNODES)
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
mm/gup.c provides a kernel interface that accepts user addresses and manipulates user pages directly (for example get_user_pages, that is used by the futex syscall). Since a user can provided tagged addresses, we need to handle this case.
Add untagging to gup.c functions that use user addresses for vma lookups.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- mm/gup.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/mm/gup.c b/mm/gup.c index 91819b8ad9cc..2f477a0a7180 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -696,6 +696,8 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (!nr_pages) return 0;
+ start = untagged_addr(start); + VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET));
/* @@ -858,6 +860,8 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, struct vm_area_struct *vma; vm_fault_t ret, major = 0;
+ address = untagged_addr(address); + if (unlocked) fault_flags |= FAULT_FLAG_ALLOW_RETRY;
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
get_vaddr_frames uses provided user pointers for vma lookups, which can only by done with untagged pointers. Instead of locating and changing all callers of this function, perform untagging in it.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- mm/frame_vector.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/frame_vector.c b/mm/frame_vector.c index c64dca6e27c2..c431ca81dad5 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -46,6 +46,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, if (WARN_ON_ONCE(nr_frames > vec->nr_allocated)) nr_frames = vec->nr_allocated;
+ start = untagged_addr(start); + down_read(&mm->mmap_sem); locked = 1; vma = find_vma_intersection(mm, start, start + 1);
On Tue, Apr 30, 2019 at 03:25:04PM +0200, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
get_vaddr_frames uses provided user pointers for vma lookups, which can only by done with untagged pointers. Instead of locating and changing all callers of this function, perform untagging in it.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
mm/frame_vector.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/frame_vector.c b/mm/frame_vector.c index c64dca6e27c2..c431ca81dad5 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -46,6 +46,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, if (WARN_ON_ONCE(nr_frames > vec->nr_allocated)) nr_frames = vec->nr_allocated;
- start = untagged_addr(start);
- down_read(&mm->mmap_sem); locked = 1; vma = find_vma_intersection(mm, start, start + 1);
Is this some buffer that the user may have malloc'ed? I got lost when trying to track down the provenience of this buffer.
On Fri, May 3, 2019 at 6:51 PM Catalin Marinas catalin.marinas@arm.com wrote:
On Tue, Apr 30, 2019 at 03:25:04PM +0200, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
get_vaddr_frames uses provided user pointers for vma lookups, which can only by done with untagged pointers. Instead of locating and changing all callers of this function, perform untagging in it.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
mm/frame_vector.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/frame_vector.c b/mm/frame_vector.c index c64dca6e27c2..c431ca81dad5 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -46,6 +46,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, if (WARN_ON_ONCE(nr_frames > vec->nr_allocated)) nr_frames = vec->nr_allocated;
start = untagged_addr(start);
down_read(&mm->mmap_sem); locked = 1; vma = find_vma_intersection(mm, start, start + 1);
Is this some buffer that the user may have malloc'ed? I got lost when trying to track down the provenience of this buffer.
The caller that I found when I was looking at this:
drivers/gpu/drm/exynos/exynos_drm_g2d.c:482 exynos_g2d_set_cmdlist_ioctl()->g2d_map_cmdlist_gem()->g2d_userptr_get_dma_addr()->get_vaddr_frames()
-- Catalin
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
In copy_mount_options a user address is being subtracted from TASK_SIZE. If the address is lower than TASK_SIZE, the size is calculated to not allow the exact_copy_from_user() call to cross TASK_SIZE boundary. However if the address is tagged, then the size will be calculated incorrectly.
Untag the address before subtracting.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- fs/namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/namespace.c b/fs/namespace.c index c9cab307fa77..c27e5713bf04 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2825,7 +2825,7 @@ void *copy_mount_options(const void __user * data) * the remainder of the page. */ /* copy_from_user cannot cross TASK_SIZE ! */ - size = TASK_SIZE - (unsigned long)data; + size = TASK_SIZE - (unsigned long)untagged_addr(data); if (size > PAGE_SIZE) size = PAGE_SIZE;
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
userfaultfd_register() and userfaultfd_unregister() use provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in these functions.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- fs/userfaultfd.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index f5de1e726356..fdee0db0e847 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1325,6 +1325,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, goto out; }
+ uffdio_register.range.start = + untagged_addr(uffdio_register.range.start); + ret = validate_range(mm, uffdio_register.range.start, uffdio_register.range.len); if (ret) @@ -1514,6 +1517,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister))) goto out;
+ uffdio_unregister.start = untagged_addr(uffdio_unregister.start); + ret = validate_range(mm, uffdio_unregister.start, uffdio_unregister.len); if (ret)
On Tue, Apr 30, 2019 at 03:25:06PM +0200, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
userfaultfd_register() and userfaultfd_unregister() use provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in these functions.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
fs/userfaultfd.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index f5de1e726356..fdee0db0e847 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1325,6 +1325,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, goto out; }
- uffdio_register.range.start =
untagged_addr(uffdio_register.range.start);
- ret = validate_range(mm, uffdio_register.range.start, uffdio_register.range.len); if (ret)
@@ -1514,6 +1517,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister))) goto out;
- uffdio_unregister.start = untagged_addr(uffdio_unregister.start);
- ret = validate_range(mm, uffdio_unregister.start, uffdio_unregister.len); if (ret)
Wouldn't it be easier to do this in validate_range()? There are a few more calls in this file, though I didn't check whether a tagged address would cause issues.
On Fri, May 3, 2019 at 6:56 PM Catalin Marinas catalin.marinas@arm.com wrote:
On Tue, Apr 30, 2019 at 03:25:06PM +0200, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
userfaultfd_register() and userfaultfd_unregister() use provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in these functions.
Signed-off-by: Andrey Konovalov andreyknvl@google.com
fs/userfaultfd.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index f5de1e726356..fdee0db0e847 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1325,6 +1325,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, goto out; }
uffdio_register.range.start =
untagged_addr(uffdio_register.range.start);
ret = validate_range(mm, uffdio_register.range.start, uffdio_register.range.len); if (ret)
@@ -1514,6 +1517,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister))) goto out;
uffdio_unregister.start = untagged_addr(uffdio_unregister.start);
ret = validate_range(mm, uffdio_unregister.start, uffdio_unregister.len); if (ret)
Wouldn't it be easier to do this in validate_range()? There are a few more calls in this file, though I didn't check whether a tagged address would cause issues.
Yes, I think it makes more sense, will do in v15, thanks!
-- Catalin
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
amdgpu_ttm_tt_get_user_pages() uses provided user pointers for vma lookups, which can only by done with untagged pointers. This patch untag user pointers when they are being set in amdgpu_ttm_tt_set_userptr().
In amdgpu_gem_userptr_ioctl() and amdgpu_amdkfd_gpuvm.c/init_user_pages() an MMU notifier is set up with a (tagged) userspace pointer. The untagged address should be used so that MMU notifiers for the untagged address get correctly matched up with the right BO. This patch untag user pointers in amdgpu_gem_userptr_ioctl() for the GEM case and in amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu() for the KFD case.
Suggested-by: Kuehling, Felix Felix.Kuehling@amd.com Signed-off-by: Andrey Konovalov andreyknvl@google.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 1921dec3df7a..20cac44ed449 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1121,7 +1121,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( alloc_flags = 0; if (!offset || !*offset) return -EINVAL; - user_addr = *offset; + user_addr = untagged_addr(*offset); } else if (flags & ALLOC_MEM_FLAGS_DOORBELL) { domain = AMDGPU_GEM_DOMAIN_GTT; alloc_domain = AMDGPU_GEM_DOMAIN_CPU; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d21dd2f369da..985cb82b2aa6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -286,6 +286,8 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data, uint32_t handle; int r;
+ args->addr = untagged_addr(args->addr); + if (offset_in_page(args->addr | args->size)) return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 73e71e61dc99..1d30e97ac2c4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1248,7 +1248,7 @@ int amdgpu_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t addr, if (gtt == NULL) return -EINVAL;
- gtt->userptr = addr; + gtt->userptr = untagged_addr(addr); gtt->userflags = flags;
if (gtt->usertask)
On 2019-04-30 9:25 a.m., Andrey Konovalov wrote:
[CAUTION: External Email]
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
amdgpu_ttm_tt_get_user_pages() uses provided user pointers for vma lookups, which can only by done with untagged pointers. This patch untag user pointers when they are being set in amdgpu_ttm_tt_set_userptr().
In amdgpu_gem_userptr_ioctl() and amdgpu_amdkfd_gpuvm.c/init_user_pages() an MMU notifier is set up with a (tagged) userspace pointer. The untagged address should be used so that MMU notifiers for the untagged address get correctly matched up with the right BO. This patch untag user pointers in amdgpu_gem_userptr_ioctl() for the GEM case and in amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu() for the KFD case.
Suggested-by: Kuehling, Felix Felix.Kuehling@amd.com Signed-off-by: Andrey Konovalov andreyknvl@google.com
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 1921dec3df7a..20cac44ed449 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1121,7 +1121,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( alloc_flags = 0; if (!offset || !*offset) return -EINVAL;
user_addr = *offset;
user_addr = untagged_addr(*offset); } else if (flags & ALLOC_MEM_FLAGS_DOORBELL) { domain = AMDGPU_GEM_DOMAIN_GTT; alloc_domain = AMDGPU_GEM_DOMAIN_CPU;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d21dd2f369da..985cb82b2aa6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -286,6 +286,8 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data, uint32_t handle; int r;
args->addr = untagged_addr(args->addr);
if (offset_in_page(args->addr | args->size)) return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 73e71e61dc99..1d30e97ac2c4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1248,7 +1248,7 @@ int amdgpu_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t addr, if (gtt == NULL) return -EINVAL;
gtt->userptr = addr;
gtt->userptr = untagged_addr(addr);
Doing this here seems unnecessary. You already untagged the address in both callers of this function. Untagging in the two callers ensures that the userptr and MMU notifier are in sync, using the same untagged address. Doing it again here is redundant.
Regards, Felix
gtt->userflags = flags; if (gtt->usertask)
-- 2.21.0.593.g511ec345e18-goog
On Tue, Apr 30, 2019 at 8:03 PM Kuehling, Felix Felix.Kuehling@amd.com wrote:
On 2019-04-30 9:25 a.m., Andrey Konovalov wrote:
[CAUTION: External Email]
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
amdgpu_ttm_tt_get_user_pages() uses provided user pointers for vma lookups, which can only by done with untagged pointers. This patch untag user pointers when they are being set in amdgpu_ttm_tt_set_userptr().
In amdgpu_gem_userptr_ioctl() and amdgpu_amdkfd_gpuvm.c/init_user_pages() an MMU notifier is set up with a (tagged) userspace pointer. The untagged address should be used so that MMU notifiers for the untagged address get correctly matched up with the right BO. This patch untag user pointers in amdgpu_gem_userptr_ioctl() for the GEM case and in amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu() for the KFD case.
Suggested-by: Kuehling, Felix Felix.Kuehling@amd.com Signed-off-by: Andrey Konovalov andreyknvl@google.com
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 1921dec3df7a..20cac44ed449 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1121,7 +1121,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( alloc_flags = 0; if (!offset || !*offset) return -EINVAL;
user_addr = *offset;
user_addr = untagged_addr(*offset); } else if (flags & ALLOC_MEM_FLAGS_DOORBELL) { domain = AMDGPU_GEM_DOMAIN_GTT; alloc_domain = AMDGPU_GEM_DOMAIN_CPU;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d21dd2f369da..985cb82b2aa6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -286,6 +286,8 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data, uint32_t handle; int r;
args->addr = untagged_addr(args->addr);
if (offset_in_page(args->addr | args->size)) return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 73e71e61dc99..1d30e97ac2c4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1248,7 +1248,7 @@ int amdgpu_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t addr, if (gtt == NULL) return -EINVAL;
gtt->userptr = addr;
gtt->userptr = untagged_addr(addr);
Doing this here seems unnecessary. You already untagged the address in both callers of this function. Untagging in the two callers ensures that the userptr and MMU notifier are in sync, using the same untagged address. Doing it again here is redundant.
Will fix in v15, thanks!
Regards, Felix
gtt->userflags = flags; if (gtt->usertask)
-- 2.21.0.593.g511ec345e18-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
radeon_ttm_tt_pin_userptr() uses provided user pointers for vma lookups, which can only by done with untagged pointers. This patch untags user pointers when they are being set in radeon_ttm_tt_pin_userptr().
In amdgpu_gem_userptr_ioctl() an MMU notifier is set up with a (tagged) userspace pointer. The untagged address should be used so that MMU notifiers for the untagged address get correctly matched up with the right BO. This patch untags user pointers in radeon_gem_userptr_ioctl().
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- drivers/gpu/drm/radeon/radeon_gem.c | 2 ++ drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index 44617dec8183..90eb78fb5eb2 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -291,6 +291,8 @@ int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data, uint32_t handle; int r;
+ args->addr = untagged_addr(args->addr); + if (offset_in_page(args->addr | args->size)) return -EINVAL;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 9920a6fc11bf..dce722c494c1 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -742,7 +742,7 @@ int radeon_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t addr, if (gtt == NULL) return -EINVAL;
- gtt->userptr = addr; + gtt->userptr = untagged_addr(addr); gtt->usermm = current->mm; gtt->userflags = flags; return 0;
On 2019-04-30 9:25 a.m., Andrey Konovalov wrote:
[CAUTION: External Email]
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
radeon_ttm_tt_pin_userptr() uses provided user pointers for vma lookups, which can only by done with untagged pointers. This patch untags user pointers when they are being set in radeon_ttm_tt_pin_userptr().
In amdgpu_gem_userptr_ioctl() an MMU notifier is set up with a (tagged) userspace pointer. The untagged address should be used so that MMU notifiers for the untagged address get correctly matched up with the right BO. This patch untags user pointers in radeon_gem_userptr_ioctl().
Signed-off-by: Andrey Konovalov andreyknvl@google.com
drivers/gpu/drm/radeon/radeon_gem.c | 2 ++ drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index 44617dec8183..90eb78fb5eb2 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -291,6 +291,8 @@ int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data, uint32_t handle; int r;
args->addr = untagged_addr(args->addr);
if (offset_in_page(args->addr | args->size)) return -EINVAL;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 9920a6fc11bf..dce722c494c1 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -742,7 +742,7 @@ int radeon_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t addr, if (gtt == NULL) return -EINVAL;
gtt->userptr = addr;
gtt->userptr = untagged_addr(addr);
Doing this here seems unnecessary, because you already untagged the address in the only caller of this function in radeon_gem_userptr_ioctl. The change there will affect both the userptr and MMU notifier setup and makes sure that both are in sync, using the same untagged address.
Regards, Felix
gtt->usermm = current->mm; gtt->userflags = flags; return 0;
-- 2.21.0.593.g511ec345e18-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com Reviewed-by: Leon Romanovsky leonro@mellanox.com --- drivers/infiniband/hw/mlx4/mr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 395379a480cb..9a35ed2c6a6f 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -378,6 +378,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, * again */ if (!ib_access_writable(access_flags)) { + unsigned long untagged_start = untagged_addr(start); struct vm_area_struct *vma;
down_read(¤t->mm->mmap_sem); @@ -386,9 +387,9 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, * cover the memory, but for now it requires a single vma to * entirely cover the MR to support RO mappings. */ - vma = find_vma(current->mm, start); - if (vma && vma->vm_end >= start + length && - vma->vm_start <= start) { + vma = find_vma(current->mm, untagged_start); + if (vma && vma->vm_end >= untagged_start + length && + vma->vm_start <= untagged_start) { if (vma->vm_flags & VM_WRITE) access_flags |= IB_ACCESS_LOCAL_WRITE; } else {
On Tue, Apr 30, 2019 at 03:25:09PM +0200, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com Reviewed-by: Leon Romanovsky leonro@mellanox.com
drivers/infiniband/hw/mlx4/mr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 395379a480cb..9a35ed2c6a6f 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -378,6 +378,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, * again */ if (!ib_access_writable(access_flags)) {
struct vm_area_struct *vma;unsigned long untagged_start = untagged_addr(start);
down_read(¤t->mm->mmap_sem); @@ -386,9 +387,9 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, * cover the memory, but for now it requires a single vma to * entirely cover the MR to support RO mappings. */
vma = find_vma(current->mm, start);
if (vma && vma->vm_end >= start + length &&
vma->vm_start <= start) {
vma = find_vma(current->mm, untagged_start);
if (vma && vma->vm_end >= untagged_start + length &&
} else {vma->vm_start <= untagged_start) { if (vma->vm_flags & VM_WRITE) access_flags |= IB_ACCESS_LOCAL_WRITE;
Discussion ongoing on the previous version of the patch but I'm more inclined to do this in ib_uverbs_(re)reg_mr() on cmd.start.
On Fri, May 3, 2019 at 7:03 PM Catalin Marinas catalin.marinas@arm.com wrote:
On Tue, Apr 30, 2019 at 03:25:09PM +0200, Andrey Konovalov wrote:
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com Reviewed-by: Leon Romanovsky leonro@mellanox.com
drivers/infiniband/hw/mlx4/mr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 395379a480cb..9a35ed2c6a6f 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -378,6 +378,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, * again */ if (!ib_access_writable(access_flags)) {
unsigned long untagged_start = untagged_addr(start); struct vm_area_struct *vma; down_read(¤t->mm->mmap_sem);
@@ -386,9 +387,9 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, * cover the memory, but for now it requires a single vma to * entirely cover the MR to support RO mappings. */
vma = find_vma(current->mm, start);
if (vma && vma->vm_end >= start + length &&
vma->vm_start <= start) {
vma = find_vma(current->mm, untagged_start);
if (vma && vma->vm_end >= untagged_start + length &&
vma->vm_start <= untagged_start) { if (vma->vm_flags & VM_WRITE) access_flags |= IB_ACCESS_LOCAL_WRITE; } else {
Discussion ongoing on the previous version of the patch but I'm more inclined to do this in ib_uverbs_(re)reg_mr() on cmd.start.
OK, I want to publish v15 sooner to fix the issue with emails addresses, so I'll implement this approach there for now.
-- Catalin
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
videobuf_dma_contig_user_get() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag the pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- drivers/media/v4l2-core/videobuf-dma-contig.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c index e1bf50df4c70..8a1ddd146b17 100644 --- a/drivers/media/v4l2-core/videobuf-dma-contig.c +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c @@ -160,6 +160,7 @@ static void videobuf_dma_contig_user_put(struct videobuf_dma_contig_memory *mem) static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem, struct videobuf_buffer *vb) { + unsigned long untagged_baddr = untagged_addr(vb->baddr); struct mm_struct *mm = current->mm; struct vm_area_struct *vma; unsigned long prev_pfn, this_pfn; @@ -167,22 +168,22 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem, unsigned int offset; int ret;
- offset = vb->baddr & ~PAGE_MASK; + offset = untagged_baddr & ~PAGE_MASK; mem->size = PAGE_ALIGN(vb->size + offset); ret = -EINVAL;
down_read(&mm->mmap_sem);
- vma = find_vma(mm, vb->baddr); + vma = find_vma(mm, untagged_baddr); if (!vma) goto out_up;
- if ((vb->baddr + mem->size) > vma->vm_end) + if ((untagged_baddr + mem->size) > vma->vm_end) goto out_up;
pages_done = 0; prev_pfn = 0; /* kill warning */ - user_address = vb->baddr; + user_address = untagged_baddr;
while (pages_done < (mem->size >> PAGE_SHIFT)) { ret = follow_pfn(vma, user_address, &this_pfn);
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
tee_shm_register()->optee_shm_unregister()->check_mem_type() uses provided user pointers for vma lookups (via __check_mem_type()), which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- drivers/tee/tee_shm.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c index 0b9ab1d0dd45..8e7b52ab6c63 100644 --- a/drivers/tee/tee_shm.c +++ b/drivers/tee/tee_shm.c @@ -263,6 +263,7 @@ struct tee_shm *tee_shm_register(struct tee_context *ctx, unsigned long addr, shm->teedev = teedev; shm->ctx = ctx; shm->id = -1; + addr = untagged_addr(addr); start = rounddown(addr, PAGE_SIZE); shm->offset = addr - start; shm->size = length;
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
vaddr_get_pfn() uses provided user pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- drivers/vfio/vfio_iommu_type1.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index d0f731c9920a..5daa966d799e 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -382,6 +382,8 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
down_read(&mm->mmap_sem);
+ vaddr = untagged_addr(vaddr); + vma = find_vma_intersection(mm, vaddr, vaddr + 1);
if (vma && vma->vm_flags & VM_PFNMAP) {
This patch is a part of a series that extends arm64 kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments.
This patch adds a simple test, that calls the uname syscall with a tagged user pointer as an argument. Without the kernel accepting tagged user pointers the test fails with EFAULT.
Signed-off-by: Andrey Konovalov andreyknvl@google.com --- tools/testing/selftests/arm64/.gitignore | 1 + tools/testing/selftests/arm64/Makefile | 11 ++++++++++ .../testing/selftests/arm64/run_tags_test.sh | 12 +++++++++++ tools/testing/selftests/arm64/tags_test.c | 21 +++++++++++++++++++ 4 files changed, 45 insertions(+) create mode 100644 tools/testing/selftests/arm64/.gitignore create mode 100644 tools/testing/selftests/arm64/Makefile create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh create mode 100644 tools/testing/selftests/arm64/tags_test.c
diff --git a/tools/testing/selftests/arm64/.gitignore b/tools/testing/selftests/arm64/.gitignore new file mode 100644 index 000000000000..e8fae8d61ed6 --- /dev/null +++ b/tools/testing/selftests/arm64/.gitignore @@ -0,0 +1 @@ +tags_test diff --git a/tools/testing/selftests/arm64/Makefile b/tools/testing/selftests/arm64/Makefile new file mode 100644 index 000000000000..a61b2e743e99 --- /dev/null +++ b/tools/testing/selftests/arm64/Makefile @@ -0,0 +1,11 @@ +# SPDX-License-Identifier: GPL-2.0 + +# ARCH can be overridden by the user for cross compiling +ARCH ?= $(shell uname -m 2>/dev/null || echo not) + +ifneq (,$(filter $(ARCH),aarch64 arm64)) +TEST_GEN_PROGS := tags_test +TEST_PROGS := run_tags_test.sh +endif + +include ../lib.mk diff --git a/tools/testing/selftests/arm64/run_tags_test.sh b/tools/testing/selftests/arm64/run_tags_test.sh new file mode 100755 index 000000000000..745f11379930 --- /dev/null +++ b/tools/testing/selftests/arm64/run_tags_test.sh @@ -0,0 +1,12 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +echo "--------------------" +echo "running tags test" +echo "--------------------" +./tags_test +if [ $? -ne 0 ]; then + echo "[FAIL]" +else + echo "[PASS]" +fi diff --git a/tools/testing/selftests/arm64/tags_test.c b/tools/testing/selftests/arm64/tags_test.c new file mode 100644 index 000000000000..2bd1830a7ebe --- /dev/null +++ b/tools/testing/selftests/arm64/tags_test.c @@ -0,0 +1,21 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <stdint.h> +#include <sys/utsname.h> + +#define SHIFT_TAG(tag) ((uint64_t)(tag) << 56) +#define SET_TAG(ptr, tag) (((uint64_t)(ptr) & ~SHIFT_TAG(0xff)) | \ + SHIFT_TAG(tag)) + +int main(void) +{ + struct utsname *ptr = (struct utsname *)malloc(sizeof(*ptr)); + void *tagged_ptr = (void *)SET_TAG(ptr, 0x42); + int err = uname(tagged_ptr); + + free(ptr); + return err; +}
linux-kselftest-mirror@lists.linaro.org