PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.
There are two reasons for a special syscall-related ptrace request.
Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls. Some examples include:
* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
In short, if a 64-bit task performs a syscall through int 0x80, its tracer
has no reliable means to find out that the syscall was, in fact,
a compat syscall, and misidentifies it.
* Syscall-enter-stop and syscall-exit-stop look the same for the tracer.
Common practice is to keep track of the sequence of ptrace-stops in order
not to mix the two syscall-stops up. But it is not as simple as it looks;
for example, strace had a (just recently fixed) long-standing bug where
attaching strace to a tracee that is performing the execve system call
led to the tracer identifying the following syscall-exit-stop as
syscall-enter-stop, which messed up all the state tracking.
* Since the introduction of commit 84d77d3f06e7e8dea057d10e8ec77ad71f721be3
("ptrace: Don't allow accessing an undumpable mm"), both PTRACE_PEEKDATA
and process_vm_readv become unavailable when the process dumpable flag
is cleared. On such architectures as ia64 this results in all syscall
arguments being unavailable for the tracer.
Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee. For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.
PTRACE_GET_SYSCALL_INFO returns the following structure:
struct ptrace_syscall_info {
__u8 op; /* PTRACE_SYSCALL_INFO_* */
__u32 arch __attribute__((__aligned__(sizeof(__u32))));
__u64 instruction_pointer;
__u64 stack_pointer;
union {
struct {
__u64 nr;
__u64 args[6];
} entry;
struct {
__s64 rval;
__u8 is_error;
} exit;
struct {
__u64 nr;
__u64 args[6];
__u32 ret_data;
} seccomp;
};
};
The structure was chosen according to [2], except for the following
changes:
* seccomp substructure was added as a superset of entry substructure;
* the type of nr field was changed from int to __u64 because syscall
numbers are, as a practical matter, 64 bits;
* stack_pointer field was added along with instruction_pointer field
since it is readily available and can save the tracer from extra
PTRACE_GETREGS/PTRACE_GETREGSET calls;
* arch is always initialized to aid with tracing system calls
* such as execve();
* instruction_pointer and stack_pointer are always initialized
so they could be easily obtained for non-syscall stops;
* a boolean is_error field was added along with rval field, this way
the tracer can more reliably distinguish a return value
from an error value.
strace has been ported to PTRACE_GET_SYSCALL_INFO.
Starting with release 4.26, strace uses PTRACE_GET_SYSCALL_INFO API
as the preferred mechanism of obtaining syscall information.
[1] https://lore.kernel.org/lkml/CA+55aFzcSVmdDj9Lh_gdbz1OzHyEm6ZrGPBDAJnywm2LF…
[2] https://lore.kernel.org/lkml/CAObL_7GM0n80N7J_DFw_eQyfLyzq+sf4y2AvsCCV88Tb3…
---
Notes:
v8:
* Moved syscall_get_arch() specific patches to a separate patchset
which is now merged into audit/next tree.
* Rebased to linux-next.
* Moved ptrace_get_syscall_info code under #ifdef CONFIG_HAVE_ARCH_TRACEHOOK,
narrowing down the set of architectures supported by this implementation
back to those 19 that enable CONFIG_HAVE_ARCH_TRACEHOOK because
I failed to get all syscall_get_*(), instruction_pointer(),
and user_stack_pointer() functions implemented on some niche
architectures. This leaves the following architectures out:
alpha, h8300, m68k, microblaze, and unicore32.
v7:
* Rebased to v5.0-rc1.
* 5 arch-specific preparatory patches out of 25 have been merged
into v5.0-rc1 via arch trees.
v6:
* Add syscall_get_arguments and syscall_set_arguments wrappers
to asm-generic/syscall.h, requested by Geert.
* Change PTRACE_GET_SYSCALL_INFO return code: do not take trailing paddings
into account, use the end of the last field of the structure being written.
* Change struct ptrace_syscall_info:
* remove .frame_pointer field, is is not needed and not portable;
* make .arch field explicitly aligned, remove no longer needed
padding before .arch field;
* remove trailing pads, they are no longer needed.
v5:
* Merge separate series and patches into the single series.
* Change PTRACE_EVENTMSG_SYSCALL_{ENTRY,EXIT} values as requested by Oleg.
* Change struct ptrace_syscall_info: generalize instruction_pointer,
stack_pointer, and frame_pointer fields by moving them from
ptrace_syscall_info.{entry,seccomp} substructures to ptrace_syscall_info
and initializing them for all stops.
* Add PTRACE_SYSCALL_INFO_NONE, set it when not in a syscall stop,
so e.g. "strace -i" could use PTRACE_SYSCALL_INFO_SECCOMP to obtain
instruction_pointer when the tracee is in a signal stop.
* Patch all remaining architectures to provide all necessary
syscall_get_* functions.
* Make available for all architectures: do not conditionalize on
CONFIG_HAVE_ARCH_TRACEHOOK since all syscall_get_* functions
are implemented on all architectures.
* Add a test for PTRACE_GET_SYSCALL_INFO to selftests/ptrace.
v4:
* Do not introduce task_struct.ptrace_event,
use child->last_siginfo->si_code instead.
* Implement PTRACE_SYSCALL_INFO_SECCOMP and ptrace_syscall_info.seccomp
support along with PTRACE_SYSCALL_INFO_{ENTRY,EXIT} and
ptrace_syscall_info.{entry,exit}.
v3:
* Change struct ptrace_syscall_info.
* Support PTRACE_EVENT_SECCOMP by adding ptrace_event to task_struct.
* Add proper defines for ptrace_syscall_info.op values.
* Rename PT_SYSCALL_IS_ENTERING and PT_SYSCALL_IS_EXITING to
PTRACE_EVENTMSG_SYSCALL_ENTRY and PTRACE_EVENTMSG_SYSCALL_EXIT
* and move them to uapi.
v2:
* Do not use task->ptrace.
* Replace entry_info.is_compat with entry_info.arch, use syscall_get_arch().
* Use addr argument of sys_ptrace to get expected size of the struct;
return full size of the struct.
Dmitry V. Levin (6):
nds32: fix asm/syscall.h
hexagon: define syscall_get_error() and syscall_get_return_value()
mips: define syscall_get_error()
parisc: define syscall_get_error()
powerpc: define syscall_get_error()
selftests/ptrace: add a test case for PTRACE_GET_SYSCALL_INFO
Elvira Khabirova (1):
ptrace: add PTRACE_GET_SYSCALL_INFO request
arch/hexagon/include/asm/syscall.h | 14 +
arch/mips/include/asm/syscall.h | 6 +
arch/nds32/include/asm/syscall.h | 29 +-
arch/parisc/include/asm/syscall.h | 7 +
arch/powerpc/include/asm/syscall.h | 10 +
include/linux/tracehook.h | 9 +-
include/uapi/linux/ptrace.h | 35 +++
kernel/ptrace.c | 103 ++++++-
tools/testing/selftests/ptrace/.gitignore | 1 +
tools/testing/selftests/ptrace/Makefile | 2 +-
.../selftests/ptrace/get_syscall_info.c | 271 ++++++++++++++++++
11 files changed, 471 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/get_syscall_info.c
--
ldv
Not all compilers have __builtin_bswap16() and __builtin_bswap32(),
thus not all compilers are able to compile the following code:
(__builtin_constant_p(x) ? \
___constant_swab16(x) : __builtin_bswap16(x))
That's the reason why bpf_ntohl() doesn't work on GCC < 4.8, for
instance:
error: implicit declaration of function '__builtin_bswap16'
We can use __builtin_bswap16() only if compiler has this built-in,
that is, only if __HAVE_BUILTIN_BSWAP16__ is defined. Standard UAPI
__swab16()/__swab32() take care of that, and, additionally, handle
__builtin_constant_p() cases as well:
#ifdef __HAVE_BUILTIN_BSWAP16__
#define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
#else
#define __swab16(x) \
(__builtin_constant_p((__u16)(x)) ? \
___constant_swab16(x) : \
__fswab16(x))
#endif
So we can tweak selftests/bpf/bpf_endian.h and use UAPI
__swab16()/__swab32().
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
---
v2: fixed build error, reshuffled patches (Stanislav Fomichev)
tools/testing/selftests/bpf/bpf_endian.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/bpf_endian.h b/tools/testing/selftests/bpf/bpf_endian.h
index b25595ea4a78..1ed268b2002b 100644
--- a/tools/testing/selftests/bpf/bpf_endian.h
+++ b/tools/testing/selftests/bpf/bpf_endian.h
@@ -20,12 +20,12 @@
* use different targets.
*/
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
-# define __bpf_ntohs(x) __builtin_bswap16(x)
-# define __bpf_htons(x) __builtin_bswap16(x)
+# define __bpf_ntohs(x) __swab16(x)
+# define __bpf_htons(x) __swab16(x)
# define __bpf_constant_ntohs(x) ___constant_swab16(x)
# define __bpf_constant_htons(x) ___constant_swab16(x)
-# define __bpf_ntohl(x) __builtin_bswap32(x)
-# define __bpf_htonl(x) __builtin_bswap32(x)
+# define __bpf_ntohl(x) __swab32(x)
+# define __bpf_htonl(x) __swab32(x)
# define __bpf_constant_ntohl(x) ___constant_swab32(x)
# define __bpf_constant_htonl(x) ___constant_swab32(x)
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
--
2.21.0
After some experiences I found that urandom_read does not need to be
linked statically. When the 'read' syscall call is moved to separate
non-inlined function then bpf_get_stackid() is able to find
the executable in stack trace and extract its build_id from it.
Signed-off-by: Ivan Vecera <ivecera(a)redhat.com>
---
tools/testing/selftests/bpf/Makefile | 2 +-
tools/testing/selftests/bpf/urandom_read.c | 15 +++++++++++----
2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 2aed37ea61a4..c33900a8fec0 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -69,7 +69,7 @@ TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read
all: $(TEST_CUSTOM_PROGS)
$(OUTPUT)/urandom_read: $(OUTPUT)/%: %.c
- $(CC) -o $@ -static $< -Wl,--build-id
+ $(CC) -o $@ $< -Wl,--build-id
BPFOBJ := $(OUTPUT)/libbpf.a
diff --git a/tools/testing/selftests/bpf/urandom_read.c b/tools/testing/selftests/bpf/urandom_read.c
index 9de8b7cb4e6d..db781052758d 100644
--- a/tools/testing/selftests/bpf/urandom_read.c
+++ b/tools/testing/selftests/bpf/urandom_read.c
@@ -7,11 +7,19 @@
#define BUF_SIZE 256
+static __attribute__((noinline))
+void urandom_read(int fd, int count)
+{
+ char buf[BUF_SIZE];
+ int i;
+
+ for (i = 0; i < count; ++i)
+ read(fd, buf, BUF_SIZE);
+}
+
int main(int argc, char *argv[])
{
int fd = open("/dev/urandom", O_RDONLY);
- int i;
- char buf[BUF_SIZE];
int count = 4;
if (fd < 0)
@@ -20,8 +28,7 @@ int main(int argc, char *argv[])
if (argc == 2)
count = atoi(argv[1]);
- for (i = 0; i < count; ++i)
- read(fd, buf, BUF_SIZE);
+ urandom_read(fd, count);
close(fd);
return 0;
--
2.19.2
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be
passed to syscalls when they point to memory ranges obtained by anonymous
mmap() or sbrk() (see the patchset [3] for more details).
For non-memory syscalls this is done by untaging user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok). The untagging is done only
when the pointer is being checked, the tag is preserved as the pointer
makes its way through the kernel and stays tagged when the kernel
dereferences the pointer when perfoming user memory accesses.
Memory syscalls (mmap, mprotect, etc.) don't do user memory accesses but
rather deal with memory ranges, and untagged pointers are better suited to
describe memory ranges internally. Thus for memory syscalls we untag
pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to
completely strip the pointer tag as the pointer enters the kernel with
some kind of a syscall wrapper, but that won't work with the countless
number of different ioctl calls. With this approach we would need a custom
wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues
is to inspead allow tagged pointers to be passed to find_vma() (and other
vma related functions) and untag them there. Unfortunately, a lot of
find_vma() callers then compare or subtract the returned vma start and end
fields against the pointer that was being searched. Thus this approach
would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues
with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call
find_vma() (and other similar functions) or directly compare against
vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare
user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added
to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature
support [4].
This patchset has been merged into the Pixel 2 kernel tree and is now
being used to enable testing of Pixel 2 phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060…
[3] https://lkml.org/lkml/2019/3/18/819
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architectur…
Changes in v13:
- Simplified untagging in tcp_zerocopy_receive().
- Looked at find_vma() callers in drivers/, which allowed to identify a
few other places where untagging is needed.
- Added patch "mm, arm64: untag user pointers in get_vaddr_frames".
- Added patch "drm/amdgpu, arm64: untag user pointers in
amdgpu_ttm_tt_get_user_pages".
- Added patch "drm/radeon, arm64: untag user pointers in
radeon_ttm_tt_pin_userptr".
- Added patch "IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr".
- Added patch "media/v4l2-core, arm64: untag user pointers in
videobuf_dma_contig_user_get".
- Added patch "tee/optee, arm64: untag user pointers in check_mem_type".
- Added patch "vfio/type1, arm64: untag user pointers".
Changes in v12:
- Changed untagging in tcp_zerocopy_receive() to also untag zc->address.
- Fixed untagging in prctl_set_mm* to only untag pointers for vma lookups
and validity checks, but leave them as is for actual user space accesses.
- Updated the link to the v2 of the "arm64 relaxed ABI" patchset [3].
- Dropped the documentation patch, as the "arm64 relaxed ABI" patchset [3]
handles that.
Changes in v11:
- Added "uprobes, arm64: untag user pointers in find_active_uprobe" patch.
- Added "bpf, arm64: untag user pointers in stack_map_get_build_id_offset"
patch.
- Fixed "tracing, arm64: untag user pointers in seq_print_user_ip" to
correctly perform subtration with a tagged addr.
- Moved untagged_addr() from SYSCALL_DEFINE3(mprotect) and
SYSCALL_DEFINE4(pkey_mprotect) to do_mprotect_pkey().
- Moved untagged_addr() definition for other arches from
include/linux/memory.h to include/linux/mm.h.
- Changed untagging in strn*_user() to perform userspace accesses through
tagged pointers.
- Updated the documentation to mention that passing tagged pointers to
memory syscalls is allowed.
- Updated the test to use malloc'ed memory instead of stack memory.
Changes in v10:
- Added "mm, arm64: untag user pointers passed to memory syscalls" back.
- New patch "fs, arm64: untag user pointers in fs/userfaultfd.c".
- New patch "net, arm64: untag user pointers in tcp_zerocopy_receive".
- New patch "kernel, arm64: untag user pointers in prctl_set_mm*".
- New patch "tracing, arm64: untag user pointers in seq_print_user_ip".
Changes in v9:
- Rebased onto 4.20-rc6.
- Used u64 instead of __u64 in type casts in the untagged_addr macro for
arm64.
- Added braces around (addr) in the untagged_addr macro for other arches.
Changes in v8:
- Rebased onto 65102238 (4.20-rc1).
- Added a note to the cover letter on why syscall wrappers/shims that untag
user pointers won't work.
- Added a note to the cover letter that this patchset has been merged into
the Pixel 2 kernel tree.
- Documentation fixes, in particular added a list of syscalls that don't
support tagged user pointers.
Changes in v7:
- Rebased onto 17b57b18 (4.19-rc6).
- Dropped the "arm64: untag user address in __do_user_fault" patch, since
the existing patches already handle user faults properly.
- Dropped the "usb, arm64: untag user addresses in devio" patch, since the
passed pointer must come from a vma and therefore be untagged.
- Dropped the "arm64: annotate user pointers casts detected by sparse"
patch (see the discussion to the replies of the v6 of this patchset).
- Added more context to the cover letter.
- Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6:
- Added annotations for user pointer casts found by sparse.
- Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5:
- Added 3 new patches that add untagging to places found with static
analysis.
- Rebased onto 44c929e1 (4.18-rc8).
Changes in v4:
- Added a selftest for checking that passing tagged pointers to the
kernel succeeds.
- Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.
Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1:
- Rebased onto 4.17-rc1.
Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped "mm, arm64: untag user addresses in memory syscalls".
- Rebased onto 3eb2ce82 (4.16-rc7).
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Andrey Konovalov (20):
uaccess: add untagged_addr definition for other arches
arm64: untag user pointers in access_ok and __uaccess_mask_ptr
lib, arm64: untag user pointers in strn*_user
mm, arm64: untag user pointers passed to memory syscalls
mm, arm64: untag user pointers in mm/gup.c
mm, arm64: untag user pointers in get_vaddr_frames
fs, arm64: untag user pointers in copy_mount_options
fs, arm64: untag user pointers in fs/userfaultfd.c
net, arm64: untag user pointers in tcp_zerocopy_receive
kernel, arm64: untag user pointers in prctl_set_mm*
tracing, arm64: untag user pointers in seq_print_user_ip
uprobes, arm64: untag user pointers in find_active_uprobe
bpf, arm64: untag user pointers in stack_map_get_build_id_offset
drm/amdgpu, arm64: untag user pointers in amdgpu_ttm_tt_get_user_pages
drm/radeon, arm64: untag user pointers in radeon_ttm_tt_pin_userptr
IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr
media/v4l2-core, arm64: untag user pointers in
videobuf_dma_contig_user_get
tee/optee, arm64: untag user pointers in check_mem_type
vfio/type1, arm64: untag user pointers in vaddr_get_pfn
selftests, arm64: add a selftest for passing tagged pointers to kernel
arch/arm64/include/asm/uaccess.h | 10 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 ++-
drivers/gpu/drm/radeon/radeon_ttm.c | 5 ++-
drivers/infiniband/hw/mlx4/mr.c | 7 +--
drivers/media/v4l2-core/videobuf-dma-contig.c | 9 ++--
drivers/tee/optee/call.c | 1 +
drivers/vfio/vfio_iommu_type1.c | 2 +
fs/namespace.c | 2 +-
fs/userfaultfd.c | 5 +++
include/linux/mm.h | 4 ++
ipc/shm.c | 2 +
kernel/bpf/stackmap.c | 6 ++-
kernel/events/uprobes.c | 2 +
kernel/sys.c | 44 +++++++++++++------
kernel/trace/trace_output.c | 5 ++-
lib/strncpy_from_user.c | 3 +-
lib/strnlen_user.c | 3 +-
mm/frame_vector.c | 2 +
mm/gup.c | 4 ++
mm/madvise.c | 2 +
mm/mempolicy.c | 5 +++
mm/migrate.c | 1 +
mm/mincore.c | 2 +
mm/mlock.c | 5 +++
mm/mmap.c | 7 +++
mm/mprotect.c | 1 +
mm/mremap.c | 2 +
mm/msync.c | 2 +
net/ipv4/tcp.c | 2 +
tools/testing/selftests/arm64/.gitignore | 1 +
tools/testing/selftests/arm64/Makefile | 11 +++++
.../testing/selftests/arm64/run_tags_test.sh | 12 +++++
tools/testing/selftests/arm64/tags_test.c | 21 +++++++++
33 files changed, 159 insertions(+), 36 deletions(-)
create mode 100644 tools/testing/selftests/arm64/.gitignore
create mode 100644 tools/testing/selftests/arm64/Makefile
create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh
create mode 100644 tools/testing/selftests/arm64/tags_test.c
--
2.21.0.225.g810b269d1ac-goog
Hi Mimi,
Thank you for help about the pointer about IMA testing.
Probably I should cc list as well since we are talking about the patch
itself. For the ima test itself I could still ask for help in a private
email thread.
On 03/18/19 at 02:09pm, Mimi Zohar wrote:
> On Mon, 2019-03-18 at 22:06 +0800, Dave Young wrote:
> > Hi Mimi,
> >
> > On 03/14/19 at 02:41pm, Mimi Zohar wrote:
> > > The kernel may be configured or an IMA policy specified on the boot
> > > command line requiring the kexec kernel image signature to be verified.
> > > At runtime a custom IMA policy may be loaded, replacing the policy
> > > specified on the boot command line. In addition, the arch specific
> > > policy rules are dynamically defined based on the secure boot mode that
> > > may require the kernel image signature to be verified.
> > >
> > > The kernel image may have a PE signature, an IMA signature, or both. In
> > > addition, there are two kexec syscalls - kexec_load and kexec_file_load
> > > - but only the kexec_file_load syscall can verify signatures.
> > >
> > > These kexec selftests verify that only properly signed kernel images are
> > > loaded as required, based on the kernel config, the secure boot mode,
> > > and the IMA runtime policy.
> > >
> > > Loading a kernel image or kernel module requires root privileges. To
> > > run just the KEXEC selftests: sudo make TARGETS=kexec kselftest
> > >
> > > Changelog v4:
> > > - Moved the kexec tests to selftests/kexec, as requested by Dave Young.
> > > - Removed the kernel module selftest from this patch set.
> > > - Rewritten cover letter, removing reference to kernel modules.
> > >
> > > Changelog v3:
> > > - Updated tests based on Petr's review, including the defining a common
> > > test to check for root privileges.
> > > - Modified config, removing the CONFIG_KEXEC_VERIFY_SIG requirement.
> > > - Updated the SPDX license to GPL-2.0 based on Shuah's review.
> > > - Updated the secureboot mode test to check the SetupMode as well, based
> > > on David Young's review.
> > >
> > >
> > I was trying to review the patches although I'm slow due to something
> > else.
> >
> > But I still did not setup a IMA testable system, need check your old
> > email about how to setup it.
>
> (The ima-evm-utils package contains a README with directions.)
>
> >
> > A quick testing gives me below results
> >
> > /* test #1, my default kconfig
> > # NO CONFIG_INTEGRITY compiled in
> > */
> >
> > make[1]: Nothing to be done for 'all'.
> > make[1]: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > make[1]: Entering directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > TAP version 13
> > selftests: kexec: test_kexec_load.sh
> > ========================================
> > selftests: kexec: test_kexec_load.sh: Warning: file
> > test_kexec_load.sh is not executable, correct this.
> > not ok 1..1 selftests: kexec: test_kexec_load.sh [FAIL]
>
> That's really weird. Both before and after applying these patches
> test_kexec_load.sh is executable (stable linux-5.0.y). Could
> something else be preventing it from executing?
>
> > selftests: kexec: test_kexec_file_load.sh
> > ========================================
> > [INFO] kexec_file_load is enabled
> > [INFO] secure boot mode not enabled
> > [INFO] kexec kernel image PE signed
> > [INFO] kexec kernel image not IMA signed
> > kexec_file_load succeeded (possibly missing IMA sig) [FAIL]
> > not ok 1..2 selftests: kexec: test_kexec_file_load.sh [FAIL]
> > make[1]: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > make: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests'
>
> This message is because neither CONFIG_KEXEC_BZIMAGE_VERIFY_SIG or an
> IMA signature is required. It couldn't read the IMA runtime policy
> rules to determine if an IMA signature is required. So, it's trying
> to provide a hint as to what happened.
>
> I'll update the test to see if CONFIG_IMA_APPRAISE is enabled, before
> emitting this message.
>
> >
> > /* test #2, enabled IMA kconfigs, simply test without other ima
> > setup eg. use a policy etc. need to follow up some guide to test the
> > ima functionality (TODO..)
> > */
> >
> >
> > [root@dhcp-128-65 linux-x86]# make -C tools/testing/selftests TARGETS=kexec run_tests
> > make: Entering directory '/home/dyoung/git/github/linux/tools/testing/selftests'
> > make[1]: Entering directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > make[1]: Nothing to be done for 'all'.
> > make[1]: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > make[1]: Entering directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > TAP version 13
> > selftests: kexec: test_kexec_load.sh
> > ========================================
> > selftests: kexec: test_kexec_load.sh: Warning: file test_kexec_load.sh is not executable, correct this.
> > not ok 1..1 selftests: kexec: test_kexec_load.sh [FAIL]
> > selftests: kexec: test_kexec_file_load.sh
> > ========================================
> > [INFO] kexec_file_load is enabled
> > [INFO] reading IMA policy permitted
> > [INFO] secure boot mode not enabled
> > No signature verification required
> > not ok 1..2 selftests: kexec: test_kexec_file_load.sh [SKIP]
> > make[1]: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > make: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests'
>
> The purpose of these tests was to coordinate kernel image signature
> verification.
>
> If you require a PE signature, load an IMA policy requiring an IMA
> signature, or even enable CONFIG_IMA_ARCH_POLICY, the test would
> require some form of signature verification.
Did a test with a embedded ima key in kernel, with secure boot disabled,
but with Secure Boot enabled, but failed to sign the kernel with both
pesign and evmctl, will continue to see how to work on it and ask in
private email if needed :)
About the patch itself, as we talked in another email, I would expect it
can work with other test cases eg. without IMA/secure boot. But if that
is not easy, maybe you can change the test script filename to something
like: test_kexec_load_sigcheck.sh and test_kexec_file_load_sigcheck.sh
then we can add other non-sigcheck related cases to other test scripts
later. But ideally if we can handle them in current files it would be
better.
Another issue I noticed is even if boot with ima_appraise=off, kexec
load still checking the conditions. Will see if I'm having something
wrong in test steps.
Thanks
Dave
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be
passed to syscalls when they point to memory ranges obtained by anonymous
mmap() or sbrk() (see the patchset [3] for more details).
For non-memory syscalls this is done by untaging user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok). The untagging is done only
when the pointer is being checked, the tag is preserved as the pointer
makes its way through the kernel and stays tagged when the kernel
dereferences the pointer when perfoming user memory accesses.
Memory syscalls (mmap, mprotect, etc.) don't do user memory accesses but
rather deal with memory ranges, and untagged pointers are better suited to
describe memory ranges internally. Thus for memory syscalls we untag
pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to
completely strip the pointer tag as the pointer enters the kernel with
some kind of a syscall wrapper, but that won't work with the countless
number of different ioctl calls. With this approach we would need a custom
wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues
is to inspead allow tagged pointers to be passed to find_vma() (and other
vma related functions) and untag them there. Unfortunately, a lot of
find_vma() callers then compare or subtract the returned vma start and end
fields against the pointer that was being searched. Thus this approach
would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues
with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call
find_vma() (and other similar functions) or directly compare against
vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare
user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added
to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature
support [4].
This patchset has been merged into the Pixel 2 kernel tree and is now
being used to enable testing of Pixel 2 phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060…
[3] https://lkml.org/lkml/2019/3/18/819
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architectur…
Changes in v12:
- Changed untagging in tcp_zerocopy_receive() to also untag zc->address.
- Fixed untagging in prctl_set_mm* to only untag pointers for vma lookups
and validity checks, but leave them as is for actual user space accesses.
- Updated the link to the v2 of the "arm64 relaxed ABI" patchset [3].
- Dropped the documentation patch, as the "arm64 relaxed ABI" patchset [3]
handles that.
Changes in v11:
- Added "uprobes, arm64: untag user pointers in find_active_uprobe" patch.
- Added "bpf, arm64: untag user pointers in stack_map_get_build_id_offset"
patch.
- Fixed "tracing, arm64: untag user pointers in seq_print_user_ip" to
correctly perform subtration with a tagged addr.
- Moved untagged_addr() from SYSCALL_DEFINE3(mprotect) and
SYSCALL_DEFINE4(pkey_mprotect) to do_mprotect_pkey().
- Moved untagged_addr() definition for other arches from
include/linux/memory.h to include/linux/mm.h.
- Changed untagging in strn*_user() to perform userspace accesses through
tagged pointers.
- Updated the documentation to mention that passing tagged pointers to
memory syscalls is allowed.
- Updated the test to use malloc'ed memory instead of stack memory.
Changes in v10:
- Added "mm, arm64: untag user pointers passed to memory syscalls" back.
- New patch "fs, arm64: untag user pointers in fs/userfaultfd.c".
- New patch "net, arm64: untag user pointers in tcp_zerocopy_receive".
- New patch "kernel, arm64: untag user pointers in prctl_set_mm*".
- New patch "tracing, arm64: untag user pointers in seq_print_user_ip".
Changes in v9:
- Rebased onto 4.20-rc6.
- Used u64 instead of __u64 in type casts in the untagged_addr macro for
arm64.
- Added braces around (addr) in the untagged_addr macro for other arches.
Changes in v8:
- Rebased onto 65102238 (4.20-rc1).
- Added a note to the cover letter on why syscall wrappers/shims that untag
user pointers won't work.
- Added a note to the cover letter that this patchset has been merged into
the Pixel 2 kernel tree.
- Documentation fixes, in particular added a list of syscalls that don't
support tagged user pointers.
Changes in v7:
- Rebased onto 17b57b18 (4.19-rc6).
- Dropped the "arm64: untag user address in __do_user_fault" patch, since
the existing patches already handle user faults properly.
- Dropped the "usb, arm64: untag user addresses in devio" patch, since the
passed pointer must come from a vma and therefore be untagged.
- Dropped the "arm64: annotate user pointers casts detected by sparse"
patch (see the discussion to the replies of the v6 of this patchset).
- Added more context to the cover letter.
- Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6:
- Added annotations for user pointer casts found by sparse.
- Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5:
- Added 3 new patches that add untagging to places found with static
analysis.
- Rebased onto 44c929e1 (4.18-rc8).
Changes in v4:
- Added a selftest for checking that passing tagged pointers to the
kernel succeeds.
- Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.
Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1:
- Rebased onto 4.17-rc1.
Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped "mm, arm64: untag user addresses in memory syscalls".
- Rebased onto 3eb2ce82 (4.16-rc7).
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Andrey Konovalov (13):
uaccess: add untagged_addr definition for other arches
arm64: untag user pointers in access_ok and __uaccess_mask_ptr
lib, arm64: untag user pointers in strn*_user
mm, arm64: untag user pointers passed to memory syscalls
mm, arm64: untag user pointers in mm/gup.c
fs, arm64: untag user pointers in copy_mount_options
fs, arm64: untag user pointers in fs/userfaultfd.c
net, arm64: untag user pointers in tcp_zerocopy_receive
kernel, arm64: untag user pointers in prctl_set_mm*
tracing, arm64: untag user pointers in seq_print_user_ip
uprobes, arm64: untag user pointers in find_active_uprobe
bpf, arm64: untag user pointers in stack_map_get_build_id_offset
selftests, arm64: add a selftest for passing tagged pointers to kernel
arch/arm64/include/asm/uaccess.h | 10 +++--
fs/namespace.c | 2 +-
fs/userfaultfd.c | 5 +++
include/linux/mm.h | 4 ++
ipc/shm.c | 2 +
kernel/bpf/stackmap.c | 6 ++-
kernel/events/uprobes.c | 2 +
kernel/sys.c | 44 +++++++++++++------
kernel/trace/trace_output.c | 5 ++-
lib/strncpy_from_user.c | 3 +-
lib/strnlen_user.c | 3 +-
mm/gup.c | 4 ++
mm/madvise.c | 2 +
mm/mempolicy.c | 5 +++
mm/migrate.c | 1 +
mm/mincore.c | 2 +
mm/mlock.c | 5 +++
mm/mmap.c | 7 +++
mm/mprotect.c | 1 +
mm/mremap.c | 2 +
mm/msync.c | 2 +
net/ipv4/tcp.c | 9 +++-
tools/testing/selftests/arm64/.gitignore | 1 +
tools/testing/selftests/arm64/Makefile | 11 +++++
.../testing/selftests/arm64/run_tags_test.sh | 12 +++++
tools/testing/selftests/arm64/tags_test.c | 21 +++++++++
26 files changed, 144 insertions(+), 27 deletions(-)
create mode 100644 tools/testing/selftests/arm64/.gitignore
create mode 100644 tools/testing/selftests/arm64/Makefile
create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh
create mode 100644 tools/testing/selftests/arm64/tags_test.c
--
2.21.0.225.g810b269d1ac-goog